Enabling and Limiting the Crawler

These instructions apply to the WordPress LSCache crawler and other CMS LSCache crawlers where available.

Due to the potential of the crawler to consume considerable resources, we have put the on/off switch in the hands of the server administrators. In a control panel environment, such as cPanel, the crawler is disabled by default and can only be enabled by an admin through Apache configuration. In the LSWS Native environment, the crawler is enabled by default and can be disabled at the server level or virtual host level in LSWS v5.3.5 and above.

NOTE: we do not recommend enabling the crawler for shared hosting setups unless the server has enough capacity to handle it!

As of LSWS v5.1.16*, there are four different approaches you can take to crawling on your server:

  • You can disable it for the entire server
  • You can disable it for the entire server, and selectively enable it for specific vHosts
  • You can enable it for the entire server
  • You can enable it for the entire server, and selectively disable it for specific vHosts

Enabling the Crawler

To enable the crawler in either of the second two scenarios, you need to add this “Crawler Snippet” to the appropriate configuration or include file:

<IfModule Litespeed>
 CacheEngine on crawler
</IfModule>

The exact location of the relevant configuration or include file varies, depending on the control panel you use (or if you use no control panel at all), and which of the above options you are looking to enact. See below for instructions relevant to your setup.

After you've added the Crawler Snippet in the appropriate location, you should gracefully restart the server.

NOTE: If you are on v5.1.16 and are having difficulty getting this to work, please force reinstall to the latest build.

Limiting the Crawler

Currently, the following variables are available for use with the Crawler function:

  • CRAWLER_USLEEP puts a minimum allowed value on the Delay field.
  • CRAWLER_LOAD_LIMIT sets a default for the Server Load Limit field.
  • CRAWLER_LOAD_LIMIT_ENFORCE sets a maximum allowed value on the Server Load Limit field.

To use these variables, add them one-per-line to the appropriate configuration file. For example:

<IfModule LiteSpeed>
CacheEngine on crawler
SetEnv CRAWLER_USLEEP 1000
SetEnv CRAWLER_LOAD_LIMIT 5.2
</IfModule>

Disabling the Crawler

Starting from LSWS v5.3.5 or later, you may disable the crawler for an Apache virtual host, in any situation. Simply add CacheEngine -crawler to the Apache virtual host configuration, like so:

  
<IfModule LiteSpeed>
CacheEngine -crawler
</IfModule>

cPanel/WHM

Server Level

Change your working directory to:

  • /usr/local/apache/conf/includes/ for EA3 or
  • /etc/apache2/conf.d/includes/ for EA4.

Add the Crawler Snippet and optional server variables to the pre_main_global.conf file.

Global Virtual Host Level

Change your working directory to:

  • /usr/local/apache/conf/userdata/for EA3 or
  • /etc/apache2/conf.d/userdata/ for EA4

If these directories do not exist, create them.

Add the Crawler Snippet and optional server variables to the lscache_vhosts.conf file.

Apply these changes to all Virtual Hosts by running the following command:

/scripts/ensure_vhost_includes --all-users

Note: You only need to run this command once and it will activate for all users, including new users created by WHM later. There is no need to edit the cPanel skeleton file.

Individual Virtual Host Level

Change your working directory to:

  • For EA3: /usr/local/apache/conf/userdata/std/2_4/<user>/<domain>/
  • For EA4: /etc/apache2/conf.d/userdata/std/2_4/<user>/<domain>/

If your site supports HTTPS (SSL), please also change that working directory to:

  • For EA3: /usr/local/apache/conf/userdata/ssl/2_4/<user>/<domain>/
  • For EA4: /etc/apache2/conf.d/userdata/ssl/2_4/<user>/<domain>/

NOTE: The 2_4 in the path is an example. You can replace it with your appropriate version, such as 2 or 2_2.

If these directories do not exist, create them.

Add the Crawler Snippet and optional server variables to the lscache_vhosts.conf file. This will enable the crawler for this Virtual Host only.

Apply these changes by running the following command:

/scripts/ensure_vhost_includes --user=$user

Plesk

Server Level

Change your working directory to:

  • /etc/httpd/conf.d/ for CentOS
  • /etc/apache2/conf.d/ for Debian
  • /etc/apache2/conf-enabled for Ubuntu

Add the Crawler Snippet and optional server variables to lscache.conf. If it doesn’t exist, create it.

Global Virtual Host Level

Change your working directory to /usr/local/psa/admin/conf/templates/custom/domain

Create it if it doesn’t exist.

Copy /usr/local/psa/admin/conf/templates/default/domain/domainVirtualHost.php to this location.

Edit the file and add the Crawler Snippet and optional server variables after the mod_suexec.c block.

Reconfigure all virtual hosts (this will regenerate new configuration files for all vhosts), like so::

/usr/local/psa/admin/bin/httpdmng --reconfigure-all

Individual Virtual Host Level

Change your working directory to /var/www/vhosts/system/<domain_name>/conf/

Create a file called vhost.conf if it does not already exist ( or vhost_ssl.conf for HTTPS sites).

Add the Crawler Snippet and optional server variables to this file.

Reconfigure this Virtual Host (this will regenerate new configuration files for this vhost), like so:

/usr/local/psa/admin/bin/httpdmng --reconfigure-domain <domain_name>

DirectAdmin

Server Level

Add the Crawler Snippet and optional server variables to the /etc/httpd/conf/extra/httpd-includes.conf file.

Global virtual host level

Create a /usr/local/directadmin/data/templates/custom/cust_httpd.CUSTOM.2.pre file and add the Crawler Snippet and optional server variables to it.

Apply these changes to all Virtual Hosts by running the following commands:

 
cd /usr/local/directadmin/custombuild
./build rewrite_confs

The cache crawler is enabled by default in a LSWS Native environment.

To disable it at the Server Level, you will need to use LSWS 5.4 and above. There was a Cache Features function added to control this.

In the LSWS WebAdmin interface, navigate to LSWS Admin > Configuration > Server > Cache. In Cache Features, check On, uncheck Crawler, check ESI, and uncheck Not Set.

If Not Set is checked, the other three values will be ignored and the default values will be used. (By default, all three are checked.)

To disable the cache crawler at the LSWS Native Virtual Host level, navigate to LSWS Admin > Configuration > Virtual Host > VH Name > Cache >, and set Cache Features in the same manner as above. If Not Set is checked, the other three values will be ignored and the server-level configuration will be inherited.

Please note: Do not set Enable LiteMage to On, as this setting will also enable the crawler, even if Crawler is unchecked.

To add any of the optional server variables, navigate to Server > External App and add the variable(s) to the Environment setting, one per line. For example:

CRAWLER_USLEEP=1000
CRAWLER_LOAD_LIMIT=5.2

The LiteSpeed Web server cache engine will set environment varibles for X-LSCACHE. You can always check Environment Variables through the phpinfo page to see if the crawler is on or not. If the crawler is not there, then it has been disabled successfully. LSWS can only disable the LiteSpeed Cache plugin or LiteSpeed crawler since such LiteSpeed crawlers will check X_LSCACHE environment variable. LSWS can not stop any third party crawler from working since they don't check X_LSCACHE to act accordingly.

$_SERVER['X-LSCACHE']	on,esi

In the LiteSpeed cache for WordPress plugin, under Settings > Crawler, it should show Crawler Cron set to Disable, and

Warning: The crawler feature is not enabled on the LiteSpeed server. Please consult your server admin.

  • Admin
  • Last modified: 2019/12/09 19:19
  • by Lisa Clarke