Issue with cache crawler for PS

#1
Hello,

Recently I talked with our hosting and they decided to test the crawler on one of their servers so I got moved there.

I have 3 websites all running on PS'es derivative - Thirtybees.

The module is working fine and I'm using Warehouse as theme.

Here comes the issue:

on one of the sites I recieve the following in the log:

[38;5;148mCaching[39m

On the next this one:

[38;5;148mAlready cached[39m

And on the last this one:

[38;5;148mNo Cache page[39m

Obvously the cache is warmed only in the second website.

A strange thing is that the cache says 'missed' even on second or third reload on the websites with other codes.

Any hints?
 

serpent_driver

Well-Known Member
#4
That what it is meant for. It accepts cookies set by the shop. Take also care with -m parameter, if you have have enabled mobile view in Presta settings. See documentation of crawler script.
 

serpent_driver

Well-Known Member
#6
What does the cookie do in this case?
Cookie can have a wide range of affects and depends on application, cache plugin and its settings.

To find what your issue is caused by, setup a sitemap file with 1 single URL. Purge the cache, request the URL with browser and run crawler. Check if cache header is miss or hit and and come back.
 
#7
Added the cookie option and also I turned Result ON and it says so:


[38;5;148m=====================Crawl result:=======================[39m
[38;5;71mTotal URLs :[39m 152 [1;30m[0m
[38;5;71mAdded :[39m 0 [1;30m[0m
[38;5;71mExisting :[39m 0 [1;30m[0m
[38;5;71mSkipped :[39m 151 [1;30m(Page with 'no cache', please check cache debug log for the reason)[0m
[38;5;71mFailed :[39m 1 [1;30m(Pages with status code '400'|'401'|'403'|'404'|'407|'500'|'502' may add into blacklist)[0m

No improvement.

Also only one language is scanned, the second one is skipped.

Where is located this log? I fail to find it in my root folder (tmp folder should be in the same level as crawler.sh?!)
 

serpent_driver

Well-Known Member
#10
Something is wrong with your server configuration. At first load I get a blank page. Page must be requested twice to complete request. Sorry, without access to your server I can't help. Open a support ticket to get more qualified support.
 
#13
Something is wrong with your server configuration. At first load I get a blank page. Page must be requested twice to complete request. Sorry, without access to your server I can't help. Open a support ticket to get more qualified support.
Same for me dude, first time I recieve a blank page... And If I clean browser's cookies, I recieve a blank page again.
 
#14
I was able to reproduce it with Incognito mode. Very strange, never stumbled upon that.

Hopefully now it's working. I changed 'No default guest view' to "Yes, has" and it appears to work for me (no blank page on first load).

Not I've noticed another thing:


x-litespeed-cache-control: no-cache

is set. What could be causing it?

Also what does exactly "Enable Guest Mode" do with each of the settings. The description is not very descriptive (I think I understand English but none of the info makes sense for me - what is expected, what is fastest, etc)
 
#15
Here is my robots file:


Code:
# robots.txt automaticaly generated by PrestaShop e-commerce open-source solution
# http://www.prestashop.com - http://www.prestashop.com/forums
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/robotstxt.html
User-agent: *
# Allow Directives
Allow: */modules/*.css
Allow: */modules/*.js
# Private pages
Disallow: /*?orderby=
Disallow: /*?orderway=
Disallow: /*?tag=
Disallow: /*?id_currency=
Disallow: /*?search_query=
Disallow: /*?back=
Disallow: /*?n=
Disallow: /*&orderby=
Disallow: /*&orderway=
Disallow: /*&tag=
Disallow: /*&id_currency=
Disallow: /*&search_query=
Disallow: /*&back=
Disallow: /*&n=
Disallow: /*controller=addresses
Disallow: /*controller=address
Disallow: /*controller=authentication
Disallow: /*controller=cart
Disallow: /*controller=discount
Disallow: /*controller=footer
Disallow: /*controller=get-file
Disallow: /*controller=header
Disallow: /*controller=history
Disallow: /*controller=identity
Disallow: /*controller=images.inc
Disallow: /*controller=init
Disallow: /*controller=my-account
Disallow: /*controller=order
Disallow: /*controller=order-opc
Disallow: /*controller=order-slip
Disallow: /*controller=order-detail
Disallow: /*controller=order-follow
Disallow: /*controller=order-return
Disallow: /*controller=order-confirmation
Disallow: /*controller=pagination
Disallow: /*controller=password
Disallow: /*controller=pdf-invoice
Disallow: /*controller=pdf-order-return
Disallow: /*controller=pdf-order-slip
Disallow: /*controller=product-sort
Disallow: /*controller=search
Disallow: /*controller=statistics
Disallow: /*controller=attachment
Disallow: /*controller=guest-tracking
# Directories
Disallow: */classes/
Disallow: */config/
Disallow: */download/
Disallow: */mails/
Disallow: */modules/
Disallow: */translations/
Disallow: */tools/
# Files
Disallow: /*en/password-recovery
Disallow: /*en/address
Disallow: /*en/addresses
Disallow: /*en/authentication
Disallow: /*en/cart
Disallow: /*en/discount
Disallow: /*en/order-history
Disallow: /*en/identity
Disallow: /*en/my-account
Disallow: /*en/order-follow
Disallow: /*en/order-slip
Disallow: /*en/order
Disallow: /*en/search
Disallow: /*en/quick-order
Disallow: /*en/guest-tracking
Disallow: /*en/order-confirmation
Disallow: /*bg/password-recovery
Disallow: /*bg/address
Disallow: /*bg/addresses
Disallow: /*bg/authentication
Disallow: /*bg/cart
Disallow: /*bg/discount
Disallow: /*bg/order-history
Disallow: /*bg/identity
Disallow: /*bg/my-account
Disallow: /*bg/order-follow
Disallow: /*bg/order-slip
Disallow: /*bg/order
Disallow: /*bg/search
Disallow: /*bg/quick-order
Disallow: /*bg/guest-tracking
Disallow: /*bg/order-confirmation
# Sitemap
Sitemap: https://www.rampagesport.eu/1_index_sitemap.xml
 

serpent_driver

Well-Known Member
#17
...means customer doesn't need to create an account, but must enter all personal data again and again for each time a customer (guest) buys an item.

robots.txt has nothing to do with caching

RewriteRule .* - [E=Cache-Control:vary=guest]
This is only a cache-control setting for guests and should be used together with Guest Mode. If Guest Mode is enabled, but this cache-control for guests is missing caching will not work correctly.
 
#18
I know what the Guest mode does in PS/TB.

Do I need the Guest mode in the LSpeed cache module enabled that was my question. And what each of it's settings do?

I deleted this rewrite rule and now I have hit on second load but still have issues with the crawler - same result. I'm out of ideas other than completely removing the .hcaccess file and starting from blank.
 

serpent_driver

Well-Known Member
#20
Do I need the Guest mode in the LSpeed cache module enabled that was my question.
Already answered.....

I can't tell you what your issue is caused by. For testing and developing I have a PrestaShop installed with LScache plugin since 2 1/2 years, but without any problems with cache, crawler or whatever. As already suggested, open a support ticket.
 
Top