Enabling and Limiting the Crawler

These instructions apply to the WordPress LSCache crawler and other CMS LSCache crawlers where available.

Due to the potential of the crawler to consume considerable resources, we have put the on/off switch in the hands of the server administrators. On a control panel environment, such as cPanel, the crawler is disabled by default and can only be enabled by an admin through Apache configuration. While on LSWS native environment, the crawler is enabled by default and can be disabled on server level or virtual host level starting from LSWS 5.3.5 release.

NOTE: it is not recommended to turn on the crawler for shared hosting setups unless the server has enough capacity to handle it!

Enabling the Crawler on a shared hosting/control panel environment

As of LSWS v5.1.16*, there are a few different approaches you can take to crawling on your server:

  • You can disable it for the entire server
  • You can enable it for the entire server
  • You can selectively enable it for particular clients, while leaving it disabled for everyone else

To enable the crawler in either of the second two scenarios, you need to add this “Crawler Snippet” to the appropriate configuration or include file:

<IfModule Litespeed>
 CacheEngine on crawler
</IfModule>

The exact location of the relevant configuration or include file varies, depending on the control panel you use (or if you use no control panel at all), and which of the above options you are looking to enact. See below for instructions relevant to your setup.

After you've added the Crawler Snippet in the appropriate location, you should gracefully restart the server.

*If you are on v5.1.16 and having difficulty getting this to work, please force reinstall to the latest build.

Limiting the Crawler

Currently, the following variables are available for use with the Crawler function:

  • CRAWLER_USLEEP puts a minimum allowed value on the Delay field.
  • CRAWLER_LOAD_LIMIT sets a default for the Server Load Limit field.
  • CRAWLER_LOAD_LIMIT_ENFORCE sets a maximum allowed value on the Server Load Limit field.

To use these variables, add them one-per-line to the appropriate configuration file. For example:

<IfModule LiteSpeed>
CacheEngine on crawler
SetEnv CRAWLER_USLEEP 1000
SetEnv CRAWLER_LOAD_LIMIT 5.2
</IfModule>

cPanel/WHM

Server level

Change your working directory to: /usr/local/apache/conf/includes/ for EA3 or /etc/apache2/conf.d/includes/ for EA4.

Add the Crawler Snippet and optional server variables to the pre_main_global.conf file.

Global virtual host level

Change your working directory to: /usr/local/apache/conf/userdata/for EA3 or /etc/apache2/conf.d/userdata/ for EA4

If these directories do not exist, create them.

Add the Crawler Snippet and optional server variables to the lscache_vhosts.conf file.

Apply these changes to all Virtual Hosts by running the following command:

/scripts/ensure_vhost_includes --all-users

Note: You only need to run this command once and it will activate for all users, including new users created by WHM later. There is no need to edit the cPanel skeleton file.

Individual virtual host level

Change your working directory to:

  1. For EA3: /usr/local/apache/conf/userdata/std/2_4/<user>/<domain>/
  2. For EA4: /etc/apache2/conf.d/userdata/std/2_4/<user>/<domain>/

If your site support https(ssl), please also change working directory to:

  1. For EA3: /usr/local/apache/conf/userdata/ssl/2_4/<user>/<domain>/
  2. For EA4: /etc/apache2/conf.d/userdata/ssl/2_4/<user>/<domain>/

* Above example path of 2_4 can be other version of your apache's, e.g. 2, 2_2

If these directories do not exist, create them.

Add the Crawler Snippet and optional server variables to the lscache_vhosts.conf file. This will enable the crawler for this Virtual Host only.

Apply these changes by running the following command:

/scripts/ensure_vhost_includes --user=$user

Plesk

Server level

Change your working directory to: /etc/httpd/conf.d/ for CentOS /etc/apache2/conf.d/ for Debian /etc/apache2/conf-enabled for Ubuntu

Add the Crawler Snippet and optional server variables to lscache.conf. If it doesn’t exist, create it.

Global virtual host level

Change your working directory to /usr/local/psa/admin/conf/templates/custom/domain Create it if it doesn’t exist. Copy/usr/local/psa/admin/conf/templates/default/domain/domainVirtualHost.php to this location.

Edit the file and add the Crawler Snippet and optional server variables after the mod_suexec.c block.

Reconfigure all virtual hosts (this will regenerate new configuration files for all vhosts):

/usr/local/psa/admin/bin/httpdmng --reconfigure-all

Individual virtual host level

Change your working directory to /var/www/vhosts/system/<domain_name>/conf/ Create a file called vhost.conf if it does not already exist ( or vhost_ssl.conf for HTTPS sites). Add the Crawler Snippet and optional server variables to this file.

Reconfigure this Virtual Host (this will regenerate new configuration files for this vhost):

/usr/local/psa/admin/bin/httpdmng --reconfigure-domain <domain_name>

DirectAdmin

Server level

Add the Crawler Snippet and optional server variables to the /etc/httpd/conf/extra/httpd-includes.conf file. Global virtual host level Create a /usr/local/directadmin/data/templates/custom/cust_httpd.CUSTOM.2.pre file and add the Crawler Snippet and optional server variables to it.

Apply these changes to all Virtual Hosts by running the following commands:

 
cd /usr/local/directadmin/custombuild
./build rewrite_confs

''CacheEngine -crawler''

Starting from LSWS 5.3.5 or later, in any situation, if you just want to ensure to disable crawler for apache virtual host, you can add CacheEngine -crawler to the Apache virtual host configuration.

<IfModule LiteSpeed>
CacheEngine -crawler
</IfModule>

CacheEngine -crawler(this is supported in LSWS v5.3.5 and later) in

The cache crawler is enabled by default in a LSWS native environment.

To disable it at the Server Level, you will need to use LSWS 5.4 and above version, since there is a new Cache Features function added to control this.

In the LSWS WebAdmin interface, navigate to LSWS Admin > Configuration > Server > Cache. In Cache Features, check On, uncheck Crawler, check ESI, and uncheck Not Set.

If Not Set is checked, the other three values will be ignored and the default values will be used. (By default, all three are checked.)

To disable the cache crawler at the LSWS native Virtual Host level, you can go to LSWS Admin > Configuration > Virtual Host > VH Name > Cache >, and set Cache Features in the same manner as above. If Not Set is checked, the other three values will be ignored and the server-level configuration will be inherited.

Please note: Do not set Enable LiteMage to On, as this setting will also enable the crawler, even if Crawler is unchecked.

To add any of the optional server variables, navigate to Server > External App and add the variable(s) to the Environment setting, one per line. For example:

CRAWLER_USLEEP=1000
CRAWLER_LOAD_LIMIT=5.2

LiteSpeed Web server cache engine will set environment varibles for X-LSCACHE. You can always check Envirment Variables through phpinfo page to see if crawler is on or not. If the crawler is not there, then it has been disabled successfully. LSWS can only disable the LiteSpeed cache plugin or LiteSpeed crawler since such LiteSpeed crawlers will check X_LSCACHE environment variable. LSWS can not stop any third party crawler from working since they don't check X_LSCACHE to act accordingly.

$_SERVER['X-LSCACHE']	on,esi

In the LiteSpeed cache for WordPress plugin, under Settings > Crawler, it should show Crawler Cron set to Disable, and

Warning: The crawler feature is not enabled on the LiteSpeed server. Please consult your server admin.

  • Admin
  • Last modified: 2019/08/05 15:07
  • by Jackson Zhang