Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
litespeed_wiki:cache:litemage2:crawler [2019/10/23 16:02]
Eric Leu [More Options]
litespeed_wiki:cache:litemage2:crawler [2020/06/01 15:34]
Jackson Zhang [Crawl Interval]
Line 14: Line 14:
  
 ==== More Options==== ==== More Options====
-  * ''​-h,​ --help'' ​         Show this message and exit +  * ''​-h,​ --help''​Show this message and exit. 
-  * ''​-m,​ --with-mobile'' ​  ​Crawl mobile view in addition to default view +  * ''​-m,​ --with-mobile''​Crawl mobile view in addition to default view. 
-  * ''​-c,​ --with-cookie'' ​  ​Crawl with site's cookies +  * ''​-c,​ --with-cookie''​Crawl with site's cookies. 
-  * ''​-b,​ --black-list'' ​   Page will be added to black list if html status error and no cache. Next run will bypas page +  * ''​-b,​ --black-list''​Page will be added to blacklist ​if HTML status error and no cache. Next run will bypass ​page. 
-  * ''​-g,​ --general-ua'' ​   Use general user-agent instead of lscache_runner for desktop view +  * ''​-g,​ --general-ua''​Use general user-agent instead of lscache_runner for desktop view. 
-  * ''​-i,​ --interval'' ​     Change request interval. ​"-i 0.2" ​changes from default 0.1s to 0.2s +  * ''​-i,​ --interval''​Change request interval. ​''​-i 0.2'' ​changes from default 0.1 second ​to 0.2 seconds. 
-  * ''​-v,​ --verbose'' ​      ​Show complete response header under /​tmp/​crawler.log +  * ''​-v,​ --verbose''​Show complete response header under ''​/​tmp/​crawler.log''​. 
-  * ''​-d,​ --debug-url'' ​    ​Test one URL directly. ​"sh M2-crawler.sh -v -d http://​example.com/​test.html" +  * ''​-d,​ --debug-url''​Test one URL directly. ​as in ''​sh M2-crawler.sh -v -d http://​example.com/​test.html''​. 
-  * ''​-qs,​--crawl-qs'' ​     Crawl sitemap, including URLS with query strings +  * ''​-qs,​--crawl-qs''​Crawl sitemap, including URLS with query strings. 
-  * ''​-r,​ --report'' ​       Display total count of crawl result +  * ''​-r,​ --report''​Display total count of crawl result
-Example ​command+ 
 +Example ​commands
   * To get help: ''​bash M2-crawler.sh -h''​   * To get help: ''​bash M2-crawler.sh -h''​
   * To change default interval request from 0.1s to custom NUM value: ''​bash M2-crawler.sh SITE-MAP-URL -i NUM''​   * To change default interval request from 0.1s to custom NUM value: ''​bash M2-crawler.sh SITE-MAP-URL -i NUM''​
   * To crawl with cookie set: ''​bash M2-crawler.sh -c SITE-MAP-URL''​   * To crawl with cookie set: ''​bash M2-crawler.sh -c SITE-MAP-URL''​
-  * To store log in ''/​tmp/​M2-crawler.log'':​ ''​bash M2-crawler.sh -v SITE-MAP-URL''​+  * To store log in ''/​tmp/​crawler.log'':​ ''​bash M2-crawler.sh -v SITE-MAP-URL''​
   * To debug one URL and output on screen: ''​bash M2-crawler.sh -d SITE-URL''​   * To debug one URL and output on screen: ''​bash M2-crawler.sh -d SITE-URL''​
   * To display total count of crawl result: ''​bash M2-crawler.sh -r SITE-MAP-URL''​   * To display total count of crawl result: ''​bash M2-crawler.sh -r SITE-MAP-URL''​
-  * Use multiple parameters at the same time is allowed ​ 
  
 +NOTE: Using multiple parameters at the same time is allowed ​
 ===== How to Generate a Sitemap===== ===== How to Generate a Sitemap=====
 Magento 2 has a builtin module for generating a sitemap and it's fast. Magento 2 has a builtin module for generating a sitemap and it's fast.
Line 63: Line 64:
 Note: You can also use [[https://​crontab.guru/​|online crontab tool]] to help you to verify the time settings. Note: You can also use [[https://​crontab.guru/​|online crontab tool]] to help you to verify the time settings.
  
 +===== Run crawler after any product update on Magento 2 =====
 +In Magento 2, any product update will trigger all caches purged by the design of Magento 2. LiteMage doesn'​t have any control of this Magento 2 designed behavior. ​ Hence you may find pages uncached even you run above crawl interval less than TTL. It doesn'​t mean LiteMage 2 doesn'​t work well or Crawler doesn'​t work well. It is simply a Magento 2 design matter.
 +
 +To avoid above situation, we would recommend you schedule a specific windows to do any product changes through Magento admin. For example, two hours from 6:00pm to 8:00pm offpeak time. Then you run the crawler immediately after the change. ​
 ===== How to Verify the Crawler is Working ===== ===== How to Verify the Crawler is Working =====
 When using [[https://​developers.google.com/​web/​tools/​chrome-devtools/​|the browser developer tool]], load a previously uncached page. You should see ''​X-LiteSpeed-Cache:​ hit,​litemage''​ on the first view. When using [[https://​developers.google.com/​web/​tools/​chrome-devtools/​|the browser developer tool]], load a previously uncached page. You should see ''​X-LiteSpeed-Cache:​ hit,​litemage''​ on the first view.
  
 {{:​litespeed_wiki:​cache:​litemage2:​m2-3.png?​600|}} {{:​litespeed_wiki:​cache:​litemage2:​m2-3.png?​600|}}
  • Admin
  • Last modified: 2020/07/08 19:35
  • by Lisa Clarke