This is an old revision of the document!


Enabling and Limiting the Crawler

These instructions apply to the WordPress LSCache crawler and other CMS LSCache crawlers where available.

Due to the potential of the crawler to consume considerable resources, we have put the on/off switch in the hands of the server administrators. On a control panel environment, such as cPanel, the crawler is disabled by default and can only be enabled by an admin through Apache configuration. While on LSWS native environment, the crawler is enabled by default and can be disabled on server level or virtual host level starting from LSWS 5.3.5 release.

NOTE: it is not recommended to turn on the crawler for shared hosting setups unless the server has enough capacity to handle it!

Enabling the Crawler on a shared hosting/control panel environment

As of LSWS v5.1.16*, there are a few different approaches you can take to crawling on your server:

  • You can disable it for the entire server
  • You can enable it for the entire server
  • You can selectively enable it for particular clients, while leaving it disabled for everyone else

To enable the crawler in either of the second two scenarios, you need to add this “Crawler Snippet” to the appropriate configuration or include file:

<IfModule Litespeed>
 CacheEngine on crawler
</IfModule>

The exact location of the relevant configuration or include file varies, depending on the control panel you use (or if you use no control panel at all), and which of the above options you are looking to enact. See below for instructions relevant to your setup.

After you've added the Crawler Snippet in the appropriate location, you should gracefully restart the server.

*If you are on v5.1.16 and having difficulty getting this to work, please force reinstall to the latest build.

Limiting the Crawler

Currently, the following variables are available for use with the Crawler function:

  • CRAWLER_USLEEP puts a minimum allowed value on the Delay field.
  • CRAWLER_LOAD_LIMIT sets a default for the Server Load Limit field.
  • CRAWLER_LOAD_LIMIT_ENFORCE sets a maximum allowed value on the Server Load Limit field.

To use these variables, add them one-per-line to the appropriate configuration file. For example:

<IfModule LiteSpeed>
CacheEngine on crawler
SetEnv CRAWLER_USLEEP 1000
SetEnv CRAWLER_LOAD_LIMIT 5.2
</IfModule>

cPanel/WHM

Server level

Change your working directory to: /usr/local/apache/conf/includes/ for EA3 or /etc/apache2/conf.d/includes/ for EA4.

Add the Crawler Snippet and optional server variables to the pre_main_global.conf file.

Global virtual host level

Change your working directory to: /usr/local/apache/conf/userdata/for EA3 or /etc/apache2/conf.d/userdata/ for EA4

If these directories do not exist, create them.

Add the Crawler Snippet and optional server variables to the lscache_vhosts.conf file.

Apply these changes to all Virtual Hosts by running the following command:

/scripts/ensure_vhost_includes --all-users

Note: You only need to run this command once and it will activate for all users, including new users created by WHM later. There is no need to edit the cPanel skeleton file.

Individual virtual host level

Change your working directory to:

  1. For EA3: /usr/local/apache/conf/userdata/std/2_4/<user>/<domain>/
  2. For EA4: /etc/apache2/conf.d/userdata/std/2_4/<user>/<domain>/

If your site support https(ssl), please also change working directory to:

  1. For EA3: /usr/local/apache/conf/userdata/ssl/2_4/<user>/<domain>/
  2. For EA4: /etc/apache2/conf.d/userdata/ssl/2_4/<user>/<domain>/

* Above example path of 2_4 can be other version of your apache's, e.g. 2, 2_2

If these directories do not exist, create them.

Add the Crawler Snippet and optional server variables to the lscache_vhosts.conf file. This will enable the crawler for this Virtual Host only.

Apply these changes by running the following command:

/scripts/ensure_vhost_includes --user=$user

Plesk

Server level

Change your working directory to: /etc/httpd/conf.d/ for CentOS /etc/apache2/conf.d/ for Debian /etc/apache2/conf-enabled for Ubuntu

Add the Crawler Snippet and optional server variables to lscache.conf. If it doesn’t exist, create it.

Global virtual host level

Change your working directory to /usr/local/psa/admin/conf/templates/custom/domain Create it if it doesn’t exist. Copy/usr/local/psa/admin/conf/templates/default/domain/domainVirtualHost.php to this location.

Edit the file and add the Crawler Snippet and optional server variables after the mod_suexec.c block.

Reconfigure all virtual hosts (this will regenerate new configuration files for all vhosts):

/usr/local/psa/admin/bin/httpdmng --reconfigure-all

Individual virtual host level

Change your working directory to /var/www/vhosts/system/<domain_name>/conf/ Create a file called vhost.conf if it does not already exist ( or vhost_ssl.conf for HTTPS sites). Add the Crawler Snippet and optional server variables to this file.

Reconfigure this Virtual Host (this will regenerate new configuration files for this vhost):

/usr/local/psa/admin/bin/httpdmng --reconfigure-domain <domain_name>

DirectAdmin

Server level

Add the Crawler Snippet and optional server variables to the /etc/httpd/conf/extra/httpd-includes.conf file. Global virtual host level Create a /usr/local/directadmin/data/templates/custom/cust_httpd.CUSTOM.2.pre file and add the Crawler Snippet and optional server variables to it.

Apply these changes to all Virtual Hosts by running the following commands:

 
cd /usr/local/directadmin/custombuild
./build rewrite_confs

Crawler is enabled by default on an LSWS native environment.

To disable it from Server Level, in the LSWS WebAdmin interface, navigate to LSWS Admin > Server > General > Apache Style Configurations, click Edit, and make sure it reads CacheEngine -crawler(this is supported from LSWS 5.3.5).

To disable it from Virtual Host level, you can add the vhost level Apache-style configuration CacheEngine -crawler to turn off the crawler.

To add any of the optional server variables, navigate to Server > External App and add the variable(s) to the Environment setting, one per line. For example:

CRAWLER_USLEEP=1000
CRAWLER_LOAD_LIMIT=5.2

  • Admin
  • Last modified: 2018/12/13 21:56
  • by Jackson Zhang