CACHE

AndreyPopov

Well-Known Member
#23
IS there a way for the crawler to a take a sitemap url abc.com and tell it to index the paging ?
all depends from what CMS you use.

crawler from plugin for some CMS can build sitemap and recache it
some CMS plugin require generated sitemap and provide link to it
third party crawler also require generated sitemap and provide link to it

some crawlers can recache &page=... , but usually required manually recache for pagination


my example:
https://www.litespeedtech.com/suppo...rawler-for-recache-some-ideas-and-code.19763/

I modified internal crawler

some ideas developers implement to crawler, some ideas stay non-implemented :(
 
Last edited:

AndreyPopov

Well-Known Member
#24
An HTML page is a text file that only becomes an HTML page because of the file extension and is rendered as such by the browser.
yes, basically HTML is a text file, but NOT plain text!

it rendered by the browser not only because of the file extension!
if you change extension of plain text file to html - it NOT rendered because NOT contain HTML formatting tags!
and most of SEO named pages NOT contain extension, but HOW browser render they? ;)
 
#25
In wordpress if you build a page with dynamic data the data will automatically build and pagination will be added to a single page. You dont have pages in Wordpress abc.com, abc.com/p1, abc.com/p2. Its a single page. On that single page you could have 3 pages 15 pages 40 pages. Its all based on the dynamic data. How do you tell Lite-speed Cache to follow the paging so it can fetch all of those pages and cache them.
 
#26
Another way to do things is if you hit the main page, you have a way to trigger the other pages on the fly. So another words you have abc.com in the sitemap. When a user hits that page it then triggers off a way to cache the rest of the pages in the background. That would wok as well
 

serpent_driver

Well-Known Member
#27
How do you tell Lite-speed Cache to follow the paging so it can fetch all of those pages and cache them.
There's nothing you can tell LScache about. LScache is not a bot that follows every link. LScache is just a cache engine.

But I understand what you are talking about and there are solutions for that, but these solutions are not part of LScache or a plugin or Wordpress. Either you have such a solution programmed for you or you use ready-made software that is not actually intended for this, but can be misused. What I mean by that is a crawler script that works similar to a search engine bot and automatically follows every link and generates a sitemap from the crawl result, which you can then use in the LIteSpeed Cache plugin.
 

serpent_driver

Well-Known Member
#28
That would wok as well
But only in theory.

There is another and much better solution. As a website operator, you mistakenly assume that all of your pages are requested by users. If you use tracking software and deal extensively with the evaluation of the tracking analysis, you will find that a very high proportion of your pages are either very rarely or never requested by users. So the question inevitably arises, why spend resources on the cache warmup if no one is requesting these pages?

And that against the background that the cache warmup can sometimes take a very long time and consumes a lot of resources or can generate a high load. This is particularly critical for shop pages, because with most cache plugins the cache is purged after changes to a product or when a product is purchased. It is often the case that the crawler for the cache warmup has not yet finished crawling, but in the meantime the cache of pages that have already been crawled has already been purged again. This way of working is therefore not very economical.

The solution is to track the URLs that users request, so you know which URLs to warm up the cache for. In my case and on my website, that's less than 10% of all available URLs. That's why the cache process takes me just 10 minutes and not hours or even a whole day. However, I don't use the LiteSpeed crawler, which unfortunately wasn't programmed very carefully. I have my own custom solution that is x times faster and generates only half the load.

litecache.png

WordPress.png
 
Last edited:

serpent_driver

Well-Known Member
#29
if you change extension of plain text file to html - it NOT rendered because NOT contain HTML formatting tags!
How did you come up with this idea? If I write HTML code in a .txt file and change the file extension to .html, what happens when I request that file in the browser?

A plain text file is not defined by the content but by the http header and with this header I tell the browser how to handle the file.
 
#31
How do I download it. I would like to check it out

But only in theory.

There is another and much better solution. As a website operator, you mistakenly assume that all of your pages are requested by users. If you use tracking software and deal extensively with the evaluation of the tracking analysis, you will find that a very high proportion of your pages are either very rarely or never requested by users. So the question inevitably arises, why spend resources on the cache warmup if no one is requesting these pages?

And that against the background that the cache warmup can sometimes take a very long time and consumes a lot of resources or can generate a high load. This is particularly critical for shop pages, because with most cache plugins the cache is purged after changes to a product or when a product is purchased. It is often the case that the crawler for the cache warmup has not yet finished crawling, but in the meantime the cache of pages that have already been crawled has already been purged again. This way of working is therefore not very economical.

The solution is to track the URLs that users request, so you know which URLs to warm up the cache for. In my case and on my website, that's less than 10% of all available URLs. That's why the cache process takes me just 10 minutes and not hours or even a whole day. However, I don't use the LiteSpeed crawler, which unfortunately wasn't programmed very carefully. I have my own custom solution that is x times faster and generates only half the load.

View attachment 3179

View attachment 3181
 

serpent_driver

Well-Known Member
#35
Of course you need a corresponding PHP function that generates CSS or JS files. For me, no CSS or JS files are generated, but combined. Only then is it possible to cache static sources. However, to be fair, it has to be said that there is no advantage to doing so, because when the respective source is loaded by the browser, it is in the browser cache. The static source caching example was only meant to show you that it's not just about HTML.

That's why you're still someone who likes to tell fairy tales. ;)
 

AndreyPopov

Well-Known Member
#36
How did you come up with this idea? If I write HTML code in a .txt file and change the file extension to .html, what happens when I request that file in the browser?

A plain text file is not defined by the content but by the http header and with this header I tell the browser how to handle the file.
you by yourself answer for your stupid "plain text" words.
 

serpent_driver

Well-Known Member
#37
but these pages already cached by user often requests. why need to crawl they again?
It's not about crawling the URLs again. You have to read it properly and try to understand it. The first thing to do is to record (track) the URLs that are requested by users and generate a sitemap based on this, so that the crawler only crawls the URLs that are actually requested. By the way, search engines are excluded from this.
 

AndreyPopov

Well-Known Member
#39
In wordpress if you build a page with dynamic data the data will automatically build and pagination will be added to a single page. You dont have pages in Wordpress abc.com, abc.com/p1, abc.com/p2. Its a single page. On that single page you could have 3 pages 15 pages 40 pages. Its all based on the dynamic data. How do you tell Lite-speed Cache to follow the paging so it can fetch all of those pages and cache them.
infinitive scroll and technics like "Lazy Load" cannot be cached by crawler by default.
you need manually or by third party code build links for recache.
 

serpent_driver

Well-Known Member
#40
infinitive scroll and technics like "Lazy Load" cannot be cached by crawler by default.
you need manually or by third party code build links for recache.
You don't need anything at all for lazy load because the loading="lazy" attribute has been a standard feature of almost every modern browser for quite some time. A so-called 1-pager can of course be cached. What you probably mean is the occasionall oading of content when the content to be loaded comes into the viewport, which serves to make pagination superfluous.
 
Top