Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
litespeed_wiki:cache:lscps:crawler [2018/07/24 15:22]
Eric Leu [How to Use Crawl script]
litespeed_wiki:cache:lscps:crawler [2019/10/23 17:41]
Eric Leu [More Options]
Line 8: Line 8:
   - SiteMap: Prepare your site's sitemap, e.g. ''<​nowiki>​http://​prestashop-123/​456_sitemap.xml</​nowiki>''​   - SiteMap: Prepare your site's sitemap, e.g. ''<​nowiki>​http://​prestashop-123/​456_sitemap.xml</​nowiki>''​
  
-===== How to Use Crawl script===== +===== How to Use the Crawler Script===== 
-[[https://​www.litespeedtech.com/​packages/​prestashop/​cachecrawler.sh | DownLoad ​from here]]+[[https://​www.litespeedtech.com/​packages/​prestashop/​cachecrawler.sh | Download ​from here]]
  
-==== Crawl Desktop View==== +Change the permissions so that the file is executable: ​''​chmod +x cachecrawler.sh''​
-''​sh cachecrawler.sh ​SITE-MAP-URL''​+
  
-==== Crawl Desktop and Mobile Views ==== +Crawl when desktop & mobile share the same theme: ​''​bash cachecrawler.sh SITE-MAP-URL''​ 
-''​sh cachecrawler.sh SITE-MAP-URL -m ''​+ 
 +Crawl when desktop & mobile have different themes: ''​bash ​cachecrawler.sh SITE-MAP-URL -m ''​
  
 By default, in the Prestashop cache plugin Mobile View is DISABLED. To enable mobile view, navigate to **PrestaShop Admin -> LiteSpeed Cache -> Configuration** and set **Separate Mobile View** to ''​Yes''​ By default, in the Prestashop cache plugin Mobile View is DISABLED. To enable mobile view, navigate to **PrestaShop Admin -> LiteSpeed Cache -> Configuration** and set **Separate Mobile View** to ''​Yes''​
Line 21: Line 21:
  
 ==== More Options==== ==== More Options====
-  * To get help: ''​sh cachecrawler.sh -h''​ +  * ''​-h,​ --help''​Show this message and exit. 
-  * To change default ​interval request from 0.1s to custom NUM value: ''​sh cachecrawler.sh ​SITE-MAP-URL -i NUM''​+  * ''​-m, --with-mobile'':​ Crawl mobile view in addition to default view. 
 +  * ''​-c, --with-cookie''​: Crawl with site's cookies. 
 +  * ''​-b,​ --black-list'':​ Page will be added to blacklist if HTML status error and no cache. Next run will bypass page. 
 +  * ''​-g,​ --general-ua'':​ Use general user-agent instead of lscache_runner for desktop view. 
 +  * ''​-i,​ --interval'':​ Change ​request ​interval. ''​-i 0.2''​ changes ​from default ​0.1 second ​to 0.2 seconds. 
 +  * ''​-v,​ --verbose''​Show complete response header under ''/​tmp/​crawler.log''​. 
 +  * ''​-d,​ --debug-url'':​ Test one URL directly. as in ''​sh cachecrawler.sh --d http://​example.com/​test.html''​. 
 +  * ''​-qs,​--crawl-qs''​: Crawl sitemap, including URLS with query strings. 
 +  * ''​-r,​ --report'':​ Display total count of crawl result.
  
 +Example commands: ​
 +  * To get help: ''​bash cachecrawler.sh -h''​
 +  * To change default interval request from 0.1s to custom NUM value: ''​bash cachecrawler.sh SITE-MAP-URL -i NUM''​
 +  * To crawl with cookie set: ''​bash cachecrawler.sh -c SITE-MAP-URL''​
 +  * To store log in ''/​tmp/​crawler.log'':​ ''​bash cachecrawler.sh -v SITE-MAP-URL''​
 +  * To debug one URL and output on screen: ''​bash cachecrawler.sh -d SITE-URL''​
 +  * To display total count of crawl result: ''​bash cachecrawler.sh -r SITE-MAP-URL''​
 +
 +NOTE: Using multiple parameters at the same time is allowed ​
 ===== How to Generate a Sitemap===== ===== How to Generate a Sitemap=====
 The Google Sitemap module is quite popular for generating a sitemap in Prestashop, and it's much faster than online generation. ​ The Google Sitemap module is quite popular for generating a sitemap in Prestashop, and it's much faster than online generation. ​
Line 33: Line 50:
 Download [[https://​github.com/​PrestaShop/​gsitemap/​archive/​master.zip | gsitemap]]; then change the file name to ''​gsitemap.zip''​. Download [[https://​github.com/​PrestaShop/​gsitemap/​archive/​master.zip | gsitemap]]; then change the file name to ''​gsitemap.zip''​.
  
-Click the **Configure** button, ​then click ''​xxx.sitemap.xml''​(This is your SITE-MAP-URL).  +Click the **Configure** button, ​you will see e.g. ''​xxx/​1_index_sitemap.xml''​(This is your main SITE-MAP-URL).  
-{{:​litespeed_wiki:​cache:​lscps:​prestashop-9.png?600|}}+{{:​litespeed_wiki:​cache:​lscps:​ps-10.png?600|}}
  
 ==== SiteMap Online Generator ==== ==== SiteMap Online Generator ====
Line 41: Line 58:
  
 {{:​litespeed_wiki:​cache:​lscps:​prestashop-6.png?​600|}} {{:​litespeed_wiki:​cache:​lscps:​prestashop-6.png?​600|}}
 +
 +===== Crawl Interval =====
 +How often do we want to re-initiate the crawling process? This depends on how long it takes to crawl your site and what did you set for Public Cache TTL. \\
 +Default TTL is one day(24hr). Maybe you can consider to run the script by cronjob every 12 hours.\\
 +E.g. This will run twice a day, at 3:​30am/​15:​30:​ ''​30 3/15 * * * path_to_script/​cachecrawler.sh SITE-MAP-URL -m -i 0.2''​
 +
 +Note: You can also use [[https://​crontab.guru/​|online crontab tool]] help you to verify time settings.
  
 ===== How to Verify ===== ===== How to Verify =====
  • Admin
  • Last modified: 2020/08/11 19:17
  • by Lisa Clarke