Bots and some User Agents not accept cookies. How detect these?

AndreyPopov

Well-Known Member
#1
Google and other bots not accept cookies when crawling site.
some device also not accept cookies

for mobile devices lscache use separate view and _lscache_vary cookie
Google Mobile Bot not accept cookie and cannot use provided by lscache copy of cache.

is exist way to detect that UA not accept cookie and redirect it to copy of cache without _lscache_vary cookie?
 

serpent_driver

Well-Known Member
#2
for mobile devices lscache use separate view and _lscache_vary cookie
Not lscache uses cookies, your plugin uses vary cookie. Using cookie to store information which device is used is a absolute wrong method. To me, bad idea to use cookie for this. Ask LiteSpeed staff if they re-design this plugin.
 

AndreyPopov

Well-Known Member
#3
your plugin uses vary cookie.
yes, lscache Opencart plugin use vary cookie.
for most variants now (mobile view, Apple Safari view) all work with vary cookie and vary control in .htaccess


I add my code that detect bots and not set cookie (clear vary)
Code:
        if ( (strpos($_SERVER['HTTP_USER_AGENT'], 'Bot') !== FALSE) || (strpos($_SERVER['HTTP_USER_AGENT'], 'bot') !== FALSE) ) {
        $vary = array();
        }
and bots can use lscache.

but I see in log that some devices/UA also not use/accept cookies.
is exist way to detect that devices/UA not accept cookie?
 

serpent_driver

Well-Known Member
#4
is exist way to detect that devices/UA not accept cookie?
Of Course there is a way, but any other way brings new issues. The consequence of this dilemma could be to disable cache for any users that doesn't accept cookies.

Apache config:
RewriteCond %{HTTP_COOKIE} !name_of_vary_cookie [NC]
RewriteRule .* - [E=Cache-Control:no-cache]
 

AndreyPopov

Well-Known Member
#5
Of Course there is a way, but any other way brings new issues. The consequence of this dilemma could be to disable cache for any users that doesn't accept cookies.

Apache config:
RewriteCond %{HTTP_COOKIE} !name_of_vary_cookie [NC]
RewriteRule .* - [E=Cache-Control:no-cache]
not this way.

ok, describe step by step on Google Mobile Bot example
1. UA
Code:
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.137 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2. in .htaccess
Code:
RewriteCond %{HTTP_USER_AGENT} Android [NC]
RewriteCond %{HTTP_USER_AGENT} Chrome [NC]
RewriteCond %{HTTP_USER_AGENT} Bot [NC]
RewriteRule .* - [E=Cache-Control:vary=ismobilebot]
3. Mobile Detect algorithm set _lscache_vary to device:mobile
and plugin setcookie("_lscache_vary=device:mobile","0","/")

4.
a)
Code:
$_SERVER['LSCACHE_VARY_VALUE']=ismobilebot
b)
Code:
$_SERVER['LSCACHE_VARY_VALUE']=ismobilebot
$_COOKIE['_lscache_vary']=device:mobile
a) and b) are different copy of cache

5. lscache plugin provide to Google Mobile Bot b) variant of cache and it cannot use this copy. because it require a)

6. I add this code to _lscache_vary setting algortihm
Code:
        if ( (strpos($_SERVER['HTTP_USER_AGENT'], 'Bot') !== FALSE) || (strpos($_SERVER['HTTP_USER_AGENT'], 'bot') !== FALSE) ) {
        $vary = array();
        }
7. lscache now always provide to Google Mobile Bot a) copy of cache. and Google Mobile Bot can use it.

I solve problem for bots

continue later......
 
Last edited:

AndreyPopov

Well-Known Member
#6
is right way add function:


Code:
    protected function checkCookiesEnabled() {

        setcookie("checkCookies", "Enabled", "0", "/");

        if( isset( $_COOKIE['checkCookies'] ) ) {
        setcookie("checkCookies", "", "0", "/");
        error_log(print_r('cookies enabled in browser',true));
        return TRUE;
        }
        error_log(print_r('cookies disabled in browser',true));
        return FALSE;

    }
and call in code this->checkCookiesEnabled()
 

AndreyPopov

Well-Known Member
#8
That all will solve nothing. If a page is cached every PHP code is worthless....! A cached page is pure HTML and PHP can't be executed!
catalog/controller/extension/module/lscache.php
contain checkvary() function:
Code:
    protected function checkVary() {
     
        $vary = array();
     
        if ($this->customer->isLogged() && isset($this->lscache->setting['module_lscache_vary_login']) && ($this->lscache->setting['module_lscache_vary_login']=='1'))  {
            $vary['session'] = 'loggedIn';
        }
     
        if (($device=$this->checkMobile()) && isset($this->lscache->setting['module_lscache_vary_mobile']) && ($this->lscache->setting['module_lscache_vary_mobile']=='1'))  {
            $vary['device'] = $device;
        }
     
        if($this->session->data['currency']!=$this->config->get('config_currency')){
            $vary['currency'] = $this->session->data['currency'];
        }
     
        if((isset($this->session->data['language'])) && ($this->session->data['language']!=$this->config->get('config_language'))){
            $vary['language'] = $this->session->data['language'];
        }
     
        if ((count($vary) == 0) && (isset($_COOKIE['lsc_private']) || defined('LSC_PRIVATE'))) {
            $vary['session'] = 'loggedOut';
        }

        ksort($vary);

        $varyKey = $this->implode2($vary, ',', ':');
         
        //$this->log('vary:' . $varyKey, 0);
        $this->lscache->lscInstance->checkVary($varyKey);
    }
this function:
1. detect which keys sets to _lscache_vary: loggedOut,loggedIn,language,currency,device, etc.
2. call $this->lscache->lscInstance->checkVary to setcookie



question:

if device/browser/UA not accept cookies,
is necessary detect all vary keys and than send varyKey to set cookie _lscache_vary?
 

AndreyPopov

Well-Known Member
#10
Knock! Knock! Why do you always ask for solution where PHP is needed if a cached page can't execute PHP?!
we speak about different things!!!

you always says about "PHP is needed if a cached page can't execute PHP" in this tread and in no Webp in Safari

but solution for "no Webp in Safari" found and work now by PHP code of plugin and right place of .htaccess rules!

I not need cached page to execute. I only need that device request use right copy of cache.
I solve problem for use right copy of cache for Bots!!!!!!!!!!!!!!!
now Bots use cache and working Rebuild/Recache.

now I want to solve problem for other device/browser that not accept cookies and that's why they cannot use lscache.
 

serpent_driver

Well-Known Member
#11
Again, if page is cached PHP can't be executed and you need to solve it with .htaccess Rewrite Rules. .htaccess has no logic to detect if client accept ccokies or not, so you can only check with with .htaccess if a specific cookie exist or not. If not -> no-cache. Otherwise such clients like bots will cache pages on a wrong way.

RewriteCond %{HTTP_COOKIE} !name_of_vary_cookie [NC]
RewriteRule .* - [E=Cache-Control:no-cache]
 

serpent_driver

Well-Known Member
#13
Penetration test for your "code"

What happens if these UAs visit your page? ;) They are very popular all over the world and they are all crawler like.

okhttp/2.5.0
Safari/14609.1.20.111.8 CFNetwork/978.2 Darwin/18.7.0 (x86_64)
Dalvik/2.1.0 (Linux; U; Android 10; ART-L29 Build/HUAWEIART-L29)
WhatsApp/2.20.200.22 A
curl/7.68.0
AdsTxtCrawler/1.0.2
Apache-HttpClient/4.5.6 (Java/11.0.2)
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Java/1.8.0_262
libwww-perl/6.49
Mozilla/5.0 <<<<<---- Google(bot)
Python-urllib/3.5
......
 

AndreyPopov

Well-Known Member
#14
Penetration test for your "code"

What happens if these UAs visit your page? ;) They are very popular all over the world and they are all crawler like.

okhttp/2.5.0
Safari/14609.1.20.111.8 CFNetwork/978.2 Darwin/18.7.0 (x86_64)
Dalvik/2.1.0 (Linux; U; Android 10; ART-L29 Build/HUAWEIART-L29)
WhatsApp/2.20.200.22 A
curl/7.68.0
AdsTxtCrawler/1.0.2
Apache-HttpClient/4.5.6 (Java/11.0.2)
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Java/1.8.0_262
libwww-perl/6.49
Mozilla/5.0 <<<<<---- Google(bot)
Python-urllib/3.5
......

I have in .htaccess
Code:
### LITESPEED_CACHE_START - Do not remove this line
<IfModule LiteSpeed>
CacheLookup on
## Uncomment the following directives if you has a separate mobile view
RewriteEngine On
## Uncomment the following directives if you has a separate Safari browser view
RewriteCond %{HTTP_USER_AGENT} Macintosh [NC]
RewriteRule .* - [E=Cache-Control:vary=isMac]
RewriteCond %{HTTP_USER_AGENT} "iPhone|iPad|Petal" [NC]
RewriteRule .* - [E=Cache-Control:vary=ismobileapple]
RewriteCond %{HTTP_USER_AGENT} "bot|yandeximages|cfnetwork|favicon|facebook" [NC]
RewriteCond %{HTTP_USER_AGENT} !Chrome [NC]
RewriteCond %{HTTP_USER_AGENT} !Mobile [NC]
RewriteCond %{HTTP_USER_AGENT} !Macintosh [NC]
RewriteRule .* - [E=Cache-Control:vary=isBot]
RewriteCond %{HTTP_USER_AGENT} Android [NC]
RewriteCond %{HTTP_USER_AGENT} "Chrome|Firefox|Opera|OPR" [NC]
RewriteCond %{HTTP_USER_AGENT} !Bot [NC]
RewriteRule .* - [E=Cache-Control:vary=ismobile]
RewriteCond %{HTTP_USER_AGENT} Android [NC]
RewriteCond %{HTTP_USER_AGENT} Chrome [NC]
RewriteCond %{HTTP_USER_AGENT} Bot [NC]
RewriteRule .* - [E=Cache-Control:vary=ismobilebot]
</IfModule>
### LITESPEED_CACHE_END


Safari/14609.1.20.111.8 CFNetwork/978.2 Darwin/18.7.0 (x86_64)
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Mozilla/5.0 <<<<<---- Google(bot)
try use isBot copy of cache

others try use main copy without any vary :(

real Googlebot visit:
Code:
07-Oct-2020 18:02:43 Europe/Kiev    lscache variables: vary value: isBot cache control: vary=isBot
07-Oct-2020 18:02:43 Europe/Kiev    lscache defined vary:
07-Oct-2020 18:02:43 Europe/Kiev    desktop,googlebot,oc30,is-guest,route-product-product,product-588,store-0,skin-1,desktop-header-active,mobile-sticky,layout-2
07-Oct-2020 18:02:43 Europe/Kiev    Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
07-Oct-2020 18:02:43 Europe/Kiev    /gazzal-baby-cotton-3411
07-Oct-2020 18:02:43 Europe/Kiev    </image/cache/catalog/products/GazzalBabyCotton/gazzal-baby-cotton-3411-550x550h.jpg>;rel=preload;as=image
07-Oct-2020 18:02:43 Europe/Kiev    </image/cache/catalog/main/priazha-main1a-1812x468.png>;rel=preload;as=image,</image/cache/catalog/main/priazha-main1a-150x38fill.png>;rel=preload;as=image



for example other UA:
Code:
07-Oct-2020 18:03:40 Europe/Kiev    mobile,phone,touchevents,android,chrome,chrome85,webkit,oc30,is-guest,route-product-product,product-2965,store-0,skin-1,mobile-header-active,mobile-sticky,layout-2,has-bottom-menu
07-Oct-2020 18:03:40 Europe/Kiev    lscache variables: _lscache_vary: device:mobile vary value: ismobile cache control: vary=ismobile
07-Oct-2020 18:03:40 Europe/Kiev    lscache defined vary: device:mobile
07-Oct-2020 18:03:40 Europe/Kiev    Mozilla/5.0 (Linux; Android 10; SM-G770F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.127 Mobile Safari/537.36
07-Oct-2020 18:03:40 Europe/Kiev    /yarnart-macrame-cotton-751
07-Oct-2020 18:03:40 Europe/Kiev    </image/cache/catalog/products/YarnArtMacrameCotton/yarnart-macrame-cotton-751-1100x1100w.jpg.webp>;rel=preload;as=image
07-Oct-2020 18:03:40 Europe/Kiev    </image/cache/catalog/main/priazha-main1a-mobile-906x234.png.webp>;rel=preload;as=image,</image/cache/catalog/main/priazha-main1a-300x77fill.png.webp>;rel=preload;as=image
 
Last edited:

serpent_driver

Well-Known Member
#16
now testing another way:

isset( $_SERVER['HTTP_COOKIE'])

Again, if page is cached PHP can't be executed and you need to solve it with .htaccess Rewrite Rules. .htaccess has no logic to detect if client accept ccokies or not, so you can only check with with .htaccess if a specific cookie exist or not. If not -> no-cache. Otherwise such clients like bots will cache pages on a wrong way.
 

AndreyPopov

Well-Known Member
#17
1. why no-cache? bots can use use cache, but without coockies
2. I already use rewrite rule in .htaccess to redirect bots to it's "own cache copy"
3. my task that php not set cookies for this "own cache copy"
 

AndreyPopov

Well-Known Member
#19
If page is already cached there is no more PHP to use PHP for UA or Cookie detection. When will you understand that?!
you don't want to understand that:
php code of lscache try to set cookie to UA(bots) that not accept cookie.

copies of cache:
a) vary=isBot without _lscache_vary cookie
b) vary=isBot with _lscache_vary=language:ua-ua
are different.
 

serpent_driver

Well-Known Member
#20
php code of lscache try to set cookie to UA(bots) that not accept cookie.
Please explain how to set cookie with PHP if there is no PHP if page is already cached?

PHP and .htaccess can't detect if client accept cookie or not. You can only check if client sends request cookie or not, but only with .htaccess and not with PHP if page is already cached.
 
Top