This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
litespeed_wiki:cache:litemage2:crawler [2018/07/27 15:41]
Eric Leu [Configuring a single sitemap for all storefronts]
litespeed_wiki:cache:litemage2:crawler [2019/10/23 17:42] (current)
Eric Leu [More Options]
Line 8: Line 8:
   - SiteMap: Prepare your site's sitemap, e.g. ''<​nowiki>​http://​magento2.com/​sitemap.xml</​nowiki>''​   - SiteMap: Prepare your site's sitemap, e.g. ''<​nowiki>​http://​magento2.com/​sitemap.xml</​nowiki>''​
-===== How to Use Crawl script===== +===== How to Use the Crawler Script===== 
-[[ | Download from here]] +  -[[https://​www.litespeedtech.com/​packages/​litemage2.0/​M2-crawler.sh ​| Download from here]] 
- +  ​- ​Change the permissions so that the file is executable: ''​chmod +x M2_crawler.sh''​ 
-Change the permissions so that the file is executable: +  - Run the script: ​''​bash M2-crawler.sh SITE-MAP-URL''​
-''​chmod +x cachecrawler.sh''​ +
- +
-==== Crawl Desktop&​mobile share same theme==== +
-''​sh M2-crawler.sh SITE-MAP-URL''​+
 ==== More Options==== ==== More Options====
-  * To get help: ''​sh M2-crawler.sh -h''​ +  * ''​-h,​ --help''​Show this message and exit. 
-  * To change default ​interval request from 0.1s to custom NUM value: ''​sh M2-crawler.sh ​SITE-MAP-URL -i NUM''​+  * ''​-m, --with-mobile'':​ Crawl mobile view in addition to default view. 
 +  * ''​-c, --with-cookie''​: Crawl with site's cookies. 
 +  * ''​-b,​ --black-list'':​ Page will be added to blacklist if HTML status error and no cache. Next run will bypass page. 
 +  * ''​-g,​ --general-ua'':​ Use general user-agent instead of lscache_runner for desktop view. 
 +  * ''​-i,​ --interval'':​ Change ​request ​interval. ''​-i 0.2''​ changes ​from default ​0.1 second ​to 0.2 seconds. 
 +  * ''​-v,​ --verbose''​Show complete response header under ''/​tmp/​crawler.log''​. 
 +  * ''​-d,​ --debug-url'':​ Test one URL directly. as in ''​sh M2-crawler.sh --d http://​example.com/​test.html''​. 
 +  * ''​-qs,​--crawl-qs''​: Crawl sitemap, including URLS with query strings. 
 +  * ''​-r,​ --report'':​ Display total count of crawl result.
 +Example commands: ​
 +  * To get help: ''​bash M2-crawler.sh -h''​
 +  * To change default interval request from 0.1s to custom NUM value: ''​bash M2-crawler.sh SITE-MAP-URL -i NUM''​
 +  * To crawl with cookie set: ''​bash M2-crawler.sh -c SITE-MAP-URL''​
 +  * To store log in ''/​tmp/​crawler.log'':​ ''​bash M2-crawler.sh -v SITE-MAP-URL''​
 +  * To debug one URL and output on screen: ''​bash M2-crawler.sh -d SITE-URL''​
 +  * To display total count of crawl result: ''​bash M2-crawler.sh -r SITE-MAP-URL''​
 +NOTE: Using multiple parameters at the same time is allowed ​
 ===== How to Generate a Sitemap===== ===== How to Generate a Sitemap=====
-The Sitemap ​module ​is build-in ​for generating a sitemap ​in Magento 2, and it's fast. +Magento 2 has a builtin ​module for generating a sitemap and it's fast.
 ==== Enable sitemap ==== ==== Enable sitemap ====
-Navigate to Magento ​admin page -> Stores ​-> Settings ​-> Configuration ​-> Catalog ​-> XML Sitemap ​\\ +Navigate to **Magento ​Admin > Stores > Settings > Configuration > Catalog > XML Sitemap** 
-{{:​litespeed_wiki:​cache:​litemage2:​m2-4.png?​600|}} ​\\+{{:​litespeed_wiki:​cache:​litemage2:​m2-4.png?​600|}}
-Set Generation Settings Enabled to ''​Yes'' ​\\+Set **Generation Settings ​Enabled** to ''​Yes''​
 {{:​litespeed_wiki:​cache:​litemage2:​m2-5.png?​600|}} {{:​litespeed_wiki:​cache:​litemage2:​m2-5.png?​600|}}
-==== Configuring a single sitemap ​for all storefronts ​==== +==== Configuring a Single Sitemap ​for All Storefronts ​==== 
-Navigate to Magento ​admin page -> Marketing ​-> Seo & Search ​-> Sitemap +Navigate to **Magento ​Admin > Marketing > Seo & Search > Sitemap** 
-  - Click **Add Sitemap** button +  - Click the **Add Sitemap** button 
-  - Enter value +  - Enter values 
-    * Filename: ''​sitemap.xml''​ +    ​* **Filename**: ''​sitemap.xml''​ 
-    * Path: ''/''​ +    ​* **Path**: ''/''​ 
-  - Click **Save & Generate** button+  - Click the **Save & Generate** button
 {{:​litespeed_wiki:​cache:​litemage2:​m2-2.png?​600|}} \\ {{:​litespeed_wiki:​cache:​litemage2:​m2-2.png?​600|}} \\
-If all went well, a sitemap.xml file will generated in your magento ​2 document root.+If all went well, a ''​sitemap.xml'' ​file will have been generated in your Magento ​2 document root.
 ===== Crawl Interval ===== ===== Crawl Interval =====
-How often do we want to re-initiate the crawling process? This depends on how long it takes to crawl your site and what did you set for Public Cache TTL.  +How often do you want to re-initiate the crawling process? This depends on how long it takes to crawl your site and what you set for Public Cache TTL. 
-Default ​TTL is one day(24hr). Maybe you can consider ​to run the script by cronjob every 12 hours. + 
-E.g. This will run twice a day, at 3:30am: 30 3 */* * [command+The default ​TTL is one day(24hr). Maybe, for example, ​you'd like to run the script by cronjob every 12 hours instead. 
 +E.g. This will run twice a day, at 3:30am/15:30: ''​30 3/15 * * * path_to_script/​M2_crawler.sh SITE-MAP-URL -m -i 0.2''​ 
 +Note: You can also use [[https://​crontab.guru/​|online crontab tool]] to help you to verify the time settings.
-===== How to Verify ===== +===== How to Verify ​the Crawler is Working ​===== 
-By using [[https://​developers.google.com/​web/​tools/​chrome-devtools/​ | the browser developer tool]], ​you should see ''​X-LiteSpeed-Cache:​ hit,​litemage'' ​at the first view \\+When using [[https://​developers.google.com/​web/​tools/​chrome-devtools/​|the browser developer tool]], ​load a previously uncached page. You should see ''​X-LiteSpeed-Cache:​ hit,​litemage'' ​on the first view.
 {{:​litespeed_wiki:​cache:​litemage2:​m2-3.png?​600|}} {{:​litespeed_wiki:​cache:​litemage2:​m2-3.png?​600|}}
  • Admin
  • Last modified: 2018/07/27 15:41
  • by Eric Leu