Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
litespeed_wiki:cache:lscwp:configuration:crawler [2017/11/17 20:15]
Lisa Clarke [LSCWP Configuration Settings: Crawler]
litespeed_wiki:cache:lscwp:configuration:crawler [2020/05/04 13:41]
Shivam Saluja
Line 1: Line 1:
-====== ​LSCWP Configuration ​Settings: Crawler ======+====== ​LiteSpeed Cache for WordPress ​Settings: Crawler ====== 
 +**Please Note**: This wiki is valid for v2.9.x and below of the LiteSpeed Cache Plugin for WordPress. If you are using v3.0 or above, please see [[https://​docs.litespeedtech.com/​lscache/​lscwp/​overview/​|the new documentation]].
  
 The crawler must be enabled at the server-level or the virtual host level by a site admin. Please see: [[litespeed_wiki:​cache:​lscwp:​configuration:​enabling_the_crawler|Enabling the Crawler at the Server or Virtual Host Level]] The crawler must be enabled at the server-level or the virtual host level by a site admin. Please see: [[litespeed_wiki:​cache:​lscwp:​configuration:​enabling_the_crawler|Enabling the Crawler at the Server or Virtual Host Level]]
Line 5: Line 6:
 [[https://​blog.litespeedtech.com/​2017/​06/​14/​wpw-crawl-your-site-make-it-fly/​|Learn more about crawling on our blog.]] [[https://​blog.litespeedtech.com/​2017/​06/​14/​wpw-crawl-your-site-make-it-fly/​|Learn more about crawling on our blog.]]
  
-{{:​litespeed_wiki:​cache:​lscwp:​lscwp-settings-crawler.png?​direct&​800|}}+{{:​litespeed_wiki:​cache:​lscwp:​lscwp-settings-crawler.png?​nolink|}}
  
 ===== Delay ===== ===== Delay =====
-//10000//+//500//
  
-Set the Delay to let LSCache know how often to send a new request to the server. You can increase this amount to lessen the load on the server, just be aware that will make the entire crawling process take longer.+Set the Delay in microseconds ​to let LSCache know how often to send a new request to the server. You can increase this amount to lessen the load on the server, just be aware that will make the entire crawling process take longer
 + 
 +This setting may be limited at the server level. Learn more about [[litespeed_wiki:​cache:​lscwp:​configuration:​enabling_the_crawler#​limiting_the_crawler|limiting the crawler'​s impact on the server]].
  
 ===== Run Duration ===== ===== Run Duration =====
Line 36: Line 39:
  
 This setting is a way to keep the crawler from monopolizing system resources. Once it reaches this limit, the crawler will be terminated rather than allowing it to compromise server performance. This setting is based on linux server load. (A completely idle computer has a load average of 0. Each running process either using or waiting for CPU resources adds 1 to the load average.) This setting is a way to keep the crawler from monopolizing system resources. Once it reaches this limit, the crawler will be terminated rather than allowing it to compromise server performance. This setting is based on linux server load. (A completely idle computer has a load average of 0. Each running process either using or waiting for CPU resources adds 1 to the load average.)
 +
 +This setting may be limited at the server level. Learn more about [[litespeed_wiki:​cache:​lscwp:​configuration:​enabling_the_crawler#​limiting_the_crawler|limiting the crawler'​s impact on the server]].
  
 ===== Site IP ===== ===== Site IP =====
Line 62: Line 67:
    
 The middlemen are eliminated, along with all of their overhead. The middlemen are eliminated, along with all of their overhead.
 +
 +===== Role Simulation =====
 +//Empty list//
 +
 +By default, the crawler runs as a non-logged-in "​guest"​ on your site. As such, the pages that are cached by the crawler are all for non-logged-in users. If you would like to also pre-cache logged-in views, you may do so here.
 +
 +The crawler simulates a user account when it runs, so you need to specify user id numbers that correspond to the roles you'd like to cache.
 +For example, to cache pages for users with the "​Subscriber"​ role, choose one user that has the "​Subscriber"​ role, and enter that user's ID in the box.
 +
 +You may crawl multiple points-of-view by entering multiple user ids in the box, one per line.
 +
 +**NOTE**: Only one crawler may run at a time, so if you have specified one or more user ids in the **Role Simulation** box, first the "​Guest"​ crawler will run, and then the role-based crawlers will run, one after the other.
 +
 +===== Cookie Simulation =====
 +To crawl for a particular cookie, enter the cookie name, and the values you wish to crawl for. Values should be one per line, and can include a blank line. There will be one crawler created per cookie value, per simulated role. Press the + button to add additional cookies, but be aware the number of crawlers grows quickly with each new cookie, and can be a drain on system resources.
 +
 +For example, if you crawl for ''​Guest''​ and ''​Administrator''​ roles, and you add ''​testcookie1''​ with the values ''​A''​ and ''​B'',​ you have 4 crawlers:
 +
 +  - Guest, testcookie1=A
 +  - Guest, testcookie1=B
 +  - Administrator,​ testcookie1=A
 +  - Administrator,​ testcookie1=B
 +  ​
 +Add ''​testcookie2''​ with the values ''​C'',​ ''​D'',​ and <​blank>​ and you suddenly have 12 crawlers.
 +
 +  - Guest, testcookie1=A,​ testcookie2=C
 +  - Guest, testcookie1=B,​ testcookie2=C
 +  - Administrator,​ testcookie1=A,​ testcookie2=C
 +  - Administrator,​ testcookie1=B,​ testcookie2=C
 +  - Guest, testcookie1=A,​ testcookie2=D
 +  - Guest, testcookie1=B,​ testcookie2=D
 +  - Administrator,​ testcookie1=A,​ testcookie2=
 +  - Administrator,​ testcookie1=B,​ testcookie2=
 +  - Guest, testcookie1=A,​ testcookie2=
 +  - Guest, testcookie1=B,​ testcookie2=
 +  - Administrator,​ testcookie1=A,​ testcookie2=
 +  - Administrator,​ testcookie1=B,​ testcookie2=
 +
 +There aren't many situations where you would need to simulate a cookie crawler, but it can be useful for sites that use a cookie to control multiple languages or currencies.
 +
 +For example, WPML uses the ''​​_icl_current_language=''​​ cookie to differentiate between visitor languages. An English speaker'​s cookie would look like ''​​_icl_current_language=EN'',​ while a Thai speaker'​s cookie would look like ''​​_icl_current_language=TH''​. To crawl your site for a particular language, use a ''​Guest''​ user, and the appropriate cookie value.
 ===== Custom SiteMap ===== ===== Custom SiteMap =====
 //Empty string// //Empty string//
Line 71: Line 117:
 **Note**: the sitemap must be in Google XML Sitemap format. **Note**: the sitemap must be in Google XML Sitemap format.
  
-===== Include Posts / Include Pages / Include Categories / Include Tags =====+===== Sitemap Generation ===== 
 +Use these fields, if you don't already have a custom sitemap to use. 
 + 
 +==== Include Posts / Include Pages / Include Categories / Include Tags ====
 //on// //on//
  
 These four settings determine which taxonomies will be crawled. By default, all of them are. These four settings determine which taxonomies will be crawled. By default, all of them are.
  
-===== Exclude Custom Post Types =====+==== Exclude Custom Post Types ====
 //Empty string// //Empty string//
  
 By default all custom taxonomies are crawled. If you have some that should not be crawled, list them in this field, one per line. By default all custom taxonomies are crawled. If you have some that should not be crawled, list them in this field, one per line.
  
-===== Order Links By =====+==== Order Links By ====
 //Date, descending//​ //Date, descending//​
  
 This field determines the order that the crawler will parse the sitemap. By default, priority is given to the newest content on your site. Set this value so that your most important content is crawled first, in the event the crawler is terminated before it completes the entire sitemap. This field determines the order that the crawler will parse the sitemap. By default, priority is given to the newest content on your site. Set this value so that your most important content is crawled first, in the event the crawler is terminated before it completes the entire sitemap.
  • Admin
  • Last modified: 2020/11/14 15:22
  • by Lisa Clarke