Differences

This shows you the differences between two versions of the page.

--- litespeed_wiki:cache:lscwp:configuration:crawler [2017/11/17 20:15]
Lisa Clarke [LSCWP Configuration Settings: Crawler]
+++ litespeed_wiki:cache:lscwp:configuration:crawler [2020/11/14 15:22] (current)
Lisa Clarke Redirect to new Documentation Site
@@ Line 1: / Line 1: @@
-====== LSCWP Configuration Settings: Crawler ======
+~~REDIRECT>https://docs.litespeedtech.com/lscache/lscwp/crawler/~~
-The crawler must be enabled at the server-level or the virtual host level by a site admin. Please see: [[litespeed_wiki:cache:lscwp:configuration:enabling_the_crawler|Enabling the Crawler at the Server or Virtual Host Level]]
-[[https://blog.litespeedtech.com/2017/06/14/wpw-crawl-your-site-make-it-fly/|Learn more about crawling on our blog.]]
-{{:litespeed_wiki:cache:lscwp:lscwp-settings-crawler.png?direct&800|}}
-===== Delay =====
-//10000//
-Set the Delay to let LSCache know how often to send a new request to the server. You can increase this amount to lessen the load on the server, just be aware that will make the entire crawling process take longer.
-===== Run Duration =====
-//200//
-This is how long the crawler runs before taking a break. The default of ''200'' has the crawler run for 200 seconds, then it temporarily stops. After the break is over, the crawler will start back up exactly where it left off and run for another 200 seconds. This will continue until the entire site has been crawled.
-===== Interval Between Runs =====
-//28800//
-This setting determines the length of the break mentioned above. By default, the crawler rests for 28800 seconds in between every 200-second run.
-===== Crawl Interval =====
-//604800//
-This value determines how long to wait before re-initiating the entire crawling process. To keep your site regularly-crawled, determine how long the crawler usually takes to run, and set this value to slightly longer than that.
-===== Threads =====
-//3//
-This is the number of separate crawling processes happening concurrently. The higher the number, the faster your site is crawled, but also the more load that is put on your server.
-===== Server Load Limit =====
-//1//
-This setting is a way to keep the crawler from monopolizing system resources. Once it reaches this limit, the crawler will be terminated rather than allowing it to compromise server performance. This setting is based on linux server load. (A completely idle computer has a load average of 0. Each running process either using or waiting for CPU resources adds 1 to the load average.)
-===== Site IP =====
-//Empty string//
-As of v1.1.1, you can enter your Site’s IP address to simplify the crawling process and eliminate the overhead involved in DNS and Content Delivery Network (CDN) lookups.  To understand why, let’s look at a few scenarios.
-This is how it works if you’re using a CDN:
-  -The crawler gets ''<nowiki>http://yourserver.com/path</nowiki>'' from the sitemap
-  -The crawler checks with the DNS to find ''yourserver.com''’s IP address
-  -The DNS returns the CDNs IP address to the crawler
-  -The crawler goes to the CDN to ask for the page
-  -The CDN grabs the page from ''yourserver.com''
-  -The CDN returns the page to the crawler
-This is how it works if you’re not using a CDN:
-  -The crawler gets ''<nowiki>http://yourserver.com/path</nowiki>'' from the sitemap
-  -The crawler checks with the DNS to find ''yourserver.com''’s IP address
-  -The crawler grabs the page from ''yourserver.com''
-In both scenarios, there are lookups that occur, expending time and resources. These lookups can be eliminated by entering your site’s IP in this field.
-When the crawler knows your IP, this is how it works:
-  -The crawler gets ''<nowiki>http://yourserver.com/path</nowiki>'' from the sitemap
-  -The crawler grabs the page directly from ''yourserver.com''because it already knows the IP address
-The middlemen are eliminated, along with all of their overhead.
-===== Custom SiteMap =====
-//Empty string//
-A sitemap tells the crawler which pages on your site should be crawled. By default, LSCache for WordPress generates its own sitemap. If, however, you already have a sitemap that you’d like to use, that is an option as of v1.1.1.
-Enter the full URL to the sitemap in this field.
-**Note**: the sitemap must be in Google XML Sitemap format.
-===== Include Posts / Include Pages / Include Categories / Include Tags =====
-//on//
-These four settings determine which taxonomies will be crawled. By default, all of them are.
-===== Exclude Custom Post Types =====
-//Empty string//
-By default all custom taxonomies are crawled. If you have some that should not be crawled, list them in this field, one per line.
-===== Order Links By =====
-//Date, descending//
-This field determines the order that the crawler will parse the sitemap. By default, priority is given to the newest content on your site. Set this value so that your most important content is crawled first, in the event the crawler is terminated before it completes the entire sitemap.