Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
litespeed_wiki:cache:litemage2:crawler [2018/07/27 17:58]
Lisa Clarke [How to Use Crawl script]
litespeed_wiki:cache:litemage2:crawler [2019/10/23 17:42]
Eric Leu [More Options]
Line 9: Line 9:
  
 ===== How to Use the Crawler Script===== ===== How to Use the Crawler Script=====
-[[https://​www.litespeedtech.com/​packages/​litemage2.0/​M2-crawler.sh | Download from here]] +  -[[https://​www.litespeedtech.com/​packages/​litemage2.0/​M2-crawler.sh | Download from here]] 
- +  ​- ​Change the permissions so that the file is executable: ''​chmod +x M2_crawler.sh''​ 
-Change the permissions so that the file is executable: +  - Run the script: ​''​bash M2-crawler.sh SITE-MAP-URL''​
-''​chmod +x M2_crawler.sh''​ +
- +
-==== Crawl Desktop&​mobile share same theme==== +
-''​sh M2-crawler.sh SITE-MAP-URL''​+
  
 ==== More Options==== ==== More Options====
-  * To get help: ''​sh M2-crawler.sh -h''​ +  * ''​-h,​ --help''​Show this message and exit. 
-  * To change default ​interval request from 0.1s to custom NUM value: ''​sh M2-crawler.sh ​SITE-MAP-URL -i NUM''​+  * ''​-m, --with-mobile'':​ Crawl mobile view in addition to default view. 
 +  * ''​-c, --with-cookie''​: Crawl with site's cookies. 
 +  * ''​-b,​ --black-list'':​ Page will be added to blacklist if HTML status error and no cache. Next run will bypass page. 
 +  * ''​-g,​ --general-ua'':​ Use general user-agent instead of lscache_runner for desktop view. 
 +  * ''​-i,​ --interval'':​ Change ​request ​interval. ''​-i 0.2''​ changes ​from default ​0.1 second ​to 0.2 seconds. 
 +  * ''​-v,​ --verbose''​Show complete response header under ''/​tmp/​crawler.log''​. 
 +  * ''​-d,​ --debug-url'':​ Test one URL directly. as in ''​sh M2-crawler.sh --d http://​example.com/​test.html''​. 
 +  * ''​-qs,​--crawl-qs''​: Crawl sitemap, including URLS with query strings. 
 +  * ''​-r,​ --report'':​ Display total count of crawl result.
  
 +Example commands: ​
 +  * To get help: ''​bash M2-crawler.sh -h''​
 +  * To change default interval request from 0.1s to custom NUM value: ''​bash M2-crawler.sh SITE-MAP-URL -i NUM''​
 +  * To crawl with cookie set: ''​bash M2-crawler.sh -c SITE-MAP-URL''​
 +  * To store log in ''/​tmp/​crawler.log'':​ ''​bash M2-crawler.sh -v SITE-MAP-URL''​
 +  * To debug one URL and output on screen: ''​bash M2-crawler.sh -d SITE-URL''​
 +  * To display total count of crawl result: ''​bash M2-crawler.sh -r SITE-MAP-URL''​
 +
 +NOTE: Using multiple parameters at the same time is allowed ​
 ===== How to Generate a Sitemap===== ===== How to Generate a Sitemap=====
-The Sitemap ​module ​is build-in ​for generating a sitemap ​in Magento 2, and it's fast. +Magento 2 has a builtin ​module for generating a sitemap and it's fast.
  
 ==== Enable sitemap ==== ==== Enable sitemap ====
-Navigate to Magento ​admin page -> Stores ​-> Settings ​-> Configuration ​-> Catalog ​-> XML Sitemap ​\\ +Navigate to **Magento ​Admin > Stores > Settings > Configuration > Catalog > XML Sitemap** 
-{{:​litespeed_wiki:​cache:​litemage2:​m2-4.png?​600|}} ​\\+{{:​litespeed_wiki:​cache:​litemage2:​m2-4.png?​600|}}
  
-Set Generation Settings Enabled to ''​Yes'' ​\\+Set **Generation Settings ​Enabled** to ''​Yes''​
 {{:​litespeed_wiki:​cache:​litemage2:​m2-5.png?​600|}} {{:​litespeed_wiki:​cache:​litemage2:​m2-5.png?​600|}}
  
-==== Configuring a single sitemap ​for all storefronts ​==== +==== Configuring a Single Sitemap ​for All Storefronts ​==== 
-Navigate to Magento ​admin page -> Marketing ​-> Seo & Search ​-> Sitemap +Navigate to **Magento ​Admin > Marketing > Seo & Search > Sitemap** 
-  - Click **Add Sitemap** button +  - Click the **Add Sitemap** button 
-  - Enter value +  - Enter values 
-    * Filename: ''​sitemap.xml''​ +    ​* **Filename**: ''​sitemap.xml''​ 
-    * Path: ''/''​ +    ​* **Path**: ''/''​ 
-  - Click **Save & Generate** button+  - Click the **Save & Generate** button
  
 {{:​litespeed_wiki:​cache:​litemage2:​m2-2.png?​600|}} \\ {{:​litespeed_wiki:​cache:​litemage2:​m2-2.png?​600|}} \\
-If all went well, a sitemap.xml file will generated in your magento ​2 document root.+If all went well, a ''​sitemap.xml'' ​file will have been generated in your Magento ​2 document root.
  
 ===== Crawl Interval ===== ===== Crawl Interval =====
-How often do we want to re-initiate the crawling process? This depends on how long it takes to crawl your site and what did you set for Public Cache TTL. \\ +How often do you want to re-initiate the crawling process? This depends on how long it takes to crawl your site and what you set for Public Cache TTL. 
-Default ​TTL is one day(24hr). Maybe you can consider ​to run the script by cronjob every 12 hours.\\+ 
 +The default ​TTL is one day(24hr). Maybe, for example, ​you'd like to run the script by cronjob every 12 hours instead. 
 E.g. This will run twice a day, at 3:​30am/​15:​30:​ ''​30 3/15 * * * path_to_script/​M2_crawler.sh SITE-MAP-URL -m -i 0.2''​ E.g. This will run twice a day, at 3:​30am/​15:​30:​ ''​30 3/15 * * * path_to_script/​M2_crawler.sh SITE-MAP-URL -m -i 0.2''​
  
-Note: You can also use [[https://​crontab.guru/​|online crontab tool]] help you to verify time settings.+Note: You can also use [[https://​crontab.guru/​|online crontab tool]] ​to help you to verify ​the time settings.
  
-===== How to Verify ===== +===== How to Verify ​the Crawler is Working ​===== 
-By using [[https://​developers.google.com/​web/​tools/​chrome-devtools/​ | the browser developer tool]], ​you should see ''​X-LiteSpeed-Cache:​ hit,​litemage'' ​at the first view \\+When using [[https://​developers.google.com/​web/​tools/​chrome-devtools/​|the browser developer tool]], ​load a previously uncached page. You should see ''​X-LiteSpeed-Cache:​ hit,​litemage'' ​on the first view.
  
 {{:​litespeed_wiki:​cache:​litemage2:​m2-3.png?​600|}} {{:​litespeed_wiki:​cache:​litemage2:​m2-3.png?​600|}}
  • Admin
  • Last modified: 2020/07/08 19:35
  • by Lisa Clarke