This is an old revision of the document!


By default , LSCWP built-in crawler will add an URI into blacklist if following conditions are met:

1. the page is not cache by design or default , in other word, any pages that sends response header `x-litespeed-cache-control: no-cache` will be added into blacklist after initial crawling.

2. If the page is not responding the following headers:

HTTP/1.1 200 OK
HTTP/1.1 201 Created
HTTP/2 200
HTTP/2 201

One real debug case:

Problem:

a user reports some pages are always being added into blacklist after first crawling, but manually use curl or Chrome browser , it always shows x-litespeed-cache header and 200 OK status code, but there are always dozens of URIs being added into blacklist when doing crawl.

Analyze:

So as mentioned above , we know the condition why it is blacklist , so we just need to figure what happened to trigger crawler to add it into blacklist.

Investigation:

Upon the checking debug log , but apparently it didn't log the response header, so we will need a little modification.

So we add a line to log more by inserting following code into file `litespeed-cache/lib/litespeed/litespeed-crawler.class.php` at line 273

LiteSpeed_Cache_Log::debug( 'crawler logs headers', $headers ) ;

This way , we will get the `$headers` when crawler deals it.

  • Admin
  • Last modified: 2019/07/11 19:04
  • by qtwrk