Problem with sitemap

#1
Hola a todos,

Estoy teniendo problemas con un sitio web (prestashop1767). Hoy mismo he descargado vuestro módulo y el crawler. Lo ejecuto por consola, pero sólo me hace el último idioma que aparece en el xml del sitemap. No se muy bien qué está pasando porque lo uso en otras tiendas que también tienen diferentes idiomas y en ellas sí funciona corectamente.

------------

Hi all,


I am having problems with a website (prestashop1767). Today I downloaded your module and the crawler. I run it by console, but it only makes me the last language that appears in the xml of the sitemap. I don't know very well what is happening because I use it in other stores that also have different languages and in them it does work correctly.

Command:
Code:
bash cachecrawler.sh -c -m https://web.site.com/1_index_sitemap.xml
My xml sitemap:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap><loc>https://site.web.com/1_es_0_sitemap.xml</loc><lastmod>2020-07-15T09:23:41+02:00</lastmod></sitemap>
<sitemap><loc>https://site.web.com/1_ca_0_sitemap.xml</loc><lastmod>2020-07-15T09:23:41+02:00</lastmod></sitemap>
<sitemap><loc>https://site.web.com/1_gl_0_sitemap.xml</loc><lastmod>2020-07-15T09:23:41+02:00</lastmod></sitemap>
<sitemap><loc>https://site.web.com/1_eu_0_sitemap.xml</loc><lastmod>2020-07-15T09:23:41+02:00</lastmod></sitemap>
</sitemapindex>
Output:
Code:
SiteMap connection success

Prepare to crawl https://site.web.com/1_eu_0_sitemap.xml XML file
There are 33 urls in this sitemap
Starting to view with desktop agent...
https://site.web.com/es/blog -> Cache page
https://site.web.com/eu/ -> Already cached
https://site.web.com/eu/3-clothes -> Already cached
https://site.web.com/eu/4-men -> Already cached
https://site.web.com/eu/5-women -> Already cached
https://site.web.com/eu/6-accessories -> Already cached
https://site.web.com/eu/7-stationery -> Already cached
https://site.web.com/eu/8-home-accessories -> Already cached
https://site.web.com/eu/9-art -> Already cached
https://site.web.com/eu/art/12-mountain-fox-vector-graphics.html -> Already cached
https://site.web.com/eu/art/13-brown-bear-vector-graphics.html -> Already cached
https://site.web.com/eu/art/14-hummingbird-vector-graphics.html -> Already cached
https://site.web.com/eu/art/3-the-best-is-yet-to-come-framed-poster.html -> Already cached
https://site.web.com/eu/art/4-the-adventure-begins-framed-poster.html -> Already cached
https://site.web.com/eu/art/5-today-is-a-good-day-framed-poster.html -> Already cached
https://site.web.com/eu/blog/1-home -> No Cache page
https://site.web.com/eu/blog/2-blog-category-sample -> No Cache page
https://site.web.com/eu/home-accessories/10-brown-bear-cushion.html -> Already cached
https://site.web.com/eu/home-accessories/11-hummingbird-cushion.html -> Already cached
https://site.web.com/eu/home-accessories/15-pack-mug-framed-poster.html -> Already cached
https://site.web.com/eu/home-accessories/19-customizable-mug.html -> Already cached
https://site.web.com/eu/home-accessories/6-mug-the-best-is-yet-to-come.html -> Already cached
https://site.web.com/eu/home-accessories/7-mug-the-adventure-begins.html -> Already cached
https://site.web.com/eu/home-accessories/8-mug-today-is-a-good-day.html -> Already cached
https://site.web.com/eu/home-accessories/9-mountain-fox-cushion.html -> Already cached
https://site.web.com/eu/men/1-hummingbird-printed-t-shirt.html -> Already cached
https://site.web.com/eu/men/20-hummingbird-printed-t-shirt.html -> Already cached
https://site.web.com/eu/men/21-hummingbird-printed-t-shirt.html -> Already cached
https://site.web.com/eu/men/22-copy-of-hummingbird-printed-t-shirt.html -> Already cached
https://site.web.com/eu/stationery/16-mountain-fox-notebook.html -> Already cached
https://site.web.com/eu/stationery/17-brown-bear-notebook.html -> Already cached
https://site.web.com/eu/stationery/18-hummingbird-notebook.html -> Already cached
https://site.web.com/eu/women/2-brown-bear-printed-sweater.html -> Already cached
***Total of 20 seconds to finish process***
 
Last edited:

Pong

Administrator
Staff member
#4
First of all, you should confirm if the cache working for all language URLs or not.
for example, assuming you have different URL by different language like the following:
https://site.web.com/eu/5-wome
https://site.web.com/en/5-wome
https://site.web.com/jp/5-wome

You should manually check if all of them show cache hit header or not. If yes, then they all cachable, if not, some URL are not cachable.

Second. If all cacheable, you can check your sitemap to see if they include all correct URLs or not, if it does, crawler should work. If something missing. You will need to check the sitemap itself.

Better providing real URL example for us to see.
 
#5
Do you have other Prestashop web's where the crawler works?
Yes, I have 2 more sites with the same config. One of them is caching alright and the other one has different issue - you can check my thread next to this one.

If one of the languages is cacheable what will cause the rest not to be?! I'm using PS1.6 and TB for >5 years and I think I understand the fundamentals pretty well and I can't figure out reason for that.
 

Pong

Administrator
Staff member
#6
In that case, not a crawler issue. Just some URl cacheable, some URL non cacheable. you can log a ticket with real URls and prestahop login and your root ssh log for us to check.
 
#7
First of all, you should confirm if the cache working for all language URLs or not.
for example, assuming you have different URL by different language like the following:
https://site.web.com/eu/5-wome
https://site.web.com/en/5-wome
https://site.web.com/jp/5-wome

You should manually check if all of them show cache hit header or not. If yes, then they all cachable, if not, some URL are not cachable.

Second. If all cacheable, you can check your sitemap to see if they include all correct URLs or not, if it does, crawler should work. If something missing. You will need to check the sitemap itself.
It only work for the 'eu' language, as I said, the last one in my sitemap file. And all the URL's in the different languajes are OK. Other languages does not work, I always saw 'miss'.

Where can I open a ticket?
 

serpent_driver

Well-Known Member
#8
Where can I open a ticket?
Tell me if you want an alternative method instead of LSCache crawler to warmup the cache. This way is up to 10 times faster than LScrawler, but you must be familiar with PhpMyAdmin, MS Powershell and cURL. LScrawler has some deficits, so it will take some time to get a better one.
 
#9
Tell me if you want an alternative method instead of LSCache crawler to warmup the cache. This way is up to 10 times faster than LScrawler, but you must be familiar with PhpMyAdmin, MS Powershell and cURL. LScrawler has some deficits, so it will take some time to get a better one.
Ok, I'm familiar with phpMyadmin and cURL, but I can´t do MS Powershell in my server, must be Bash, to create a cron every day
 

serpent_driver

Well-Known Member
#10
For this method you need Powershell (already installed on every Windows based computer) and you must run it from your local computer, because for this method you need curl (for windows) version newer than 7.33. curl is part of OS and most servers only have version 7.29 installed. With this method cache warmup is x-time faster, cached pages will be compressed, you save a lot of disk space on your server and used protocol is HTTP/2. LScrawler can't this.
 
#13
Check curl version on your server:

run
Code:
curl --version
curl 7.52.1 (x86_64-pc-linux-gnu) libcurl/7.52.1 OpenSSL/1.0.2u zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.16) libssh2/1.7.0 nghttp2/1.18.1 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL
 
Top