This is an old revision of the document!


How to drop Query Strings when using LSCache?

In an effort to make some requests more cache friendly, LiteSpeed Enterprise v5.2.3+ added a feature to drop certain query string parameters.

For each query string that is attached to a URL, a separate copy of the page is cached. In many cases this is intentional, desired behavior. However, when you have “junk” query string parameters that don't change the content of the page, it's redundant to cache separate copies.

The Apache-style configuration directive is CacheKeyModify …, and its purpose is to modify the query string attached to a URL.

The directive can be added to the Apache server, vhost and .htaccess levels.

Upper level configurations are inherited by lower levels. If a lower level adds more rules, they are in addition to those of the upper level.(This addition feature may not be fully implemented yet, but it will be fully implemented.) If a lower level doesn't want to use the upper level's configuration, the “clear” parameter should be used before adding the new rules.

The CacheKeyModify directive can be used multiple times. Adding multiple modifications in one line is not supported. Multiple lines are combined.

This function is suitable for users whose site brings junk query strings, e.g. UTM code, Google AdWords auto tags, etc, and gets too many different URLs which should be stored and served from the same cache page but in practice are not.

What is a UTM code?

A UTM code is a simple code that you can attach to a custom URL in order to track a source, medium, and campaign name. This enables Google Analytics to tell you where searchers came from as well as what campaign directed them to you.

While such query strings may be useful for tracking purposes, they have no effect on the content of the page, and therefore should not be considered when storing the page in cache.

What is a Google AdWords auto tag?

Google AdWords can be configured to add tracking parameters to your URLs in order to pass information about the click. Similar to UTM codes, this kind of tag has no bearing on the content of the page, and therefore may be ignored when caching. These tags appear in the format &glcid=XXXXXXX.

Examples

  • CacheKeyModify -qs:utm* drops all query strings where the name part starts with “utm”
  • CacheKeyModify -qs:utm drops the query string where the name exactly matches “utm”
  • CacheKeyModify -qs:glcid drops all query strings where the name part exactly matches “glcid”
  • CacheKeyModify clear discards all previous configurations.

As long as the LSWS version is 5.2.3 or above, this feature is enabled by default and does not need any further configuration in the LSWS WebAdmin GUI or in Apache configurations. You may wish to override the default settings at the server level, virtual-host level or even the .htaccess level.

Examples for a WHM/cpanel EA4 environment

After you run the following, the drop query string feature will be automatically enabled globally (replace 5.2.3 with the appropriate version):

/usr/local/lsws/admin/misc/lsup.sh -f -v 5.2.3 

You may wish to set a rule. You will need to set it at the server level of the Apache configuration file:

vi /etc/apache2/conf.d/includes/pre_main_global.conf

and add:

<IfModule Litespeed>
CacheKeyModify -qs:utm*
</IfModule>

This will drop all query strings where the name part starts with “utm” for all virtual hosts.

You can also drop query strings where the name exactly matches “utm”:

<IfModule Litespeed>
CacheKeyModify -qs:utm
</IfModule>

Regardless of server-level settings, the end user has the ability to clear previous rules through .htaccess by adding the following:

<IfModule Litespeed>
CacheKeyModify clear
</IfModule>

To verify the server- and virtual-host-level settings, you may run the following command:

cd /etc/apache2/
grep -i -r CacheKeyModify

The design logic looks like the following: Assume A,B and C refer to defined rules.

Server Level VHost Level .htaccess Result
Anot setnot setA
Anot setCA+C
ABnot setA+B
ABCA+B+C

* The feature to add rule sets may not be fully implemented on v5.2.3, but will be fully implemented in the next release.

We can also use rewrite rules. This method supports multiple commands combined, and gives you more flexibility.

For this example, we remove utm_source with an exact match, and utm_medium with a prefix match.

Rewritecond %{QUERY_STRING} 'utm_source=google&utm_medium=email1&utm_campaign=promo%20code'
RewriteRule  .* - [E=cache-key-mod:-qs:utm_source, E=cache-key-mod:-qs:utm_medium*]

Log shows only utm_campaign is left in the query string:

Remove exact matched QS key [utm_source],
Remove prefix matched QS key [utm_medium],
CacheKey data: URI [/wordpress/?], QS [utm_campaign=promo%20code],

Prepare a URL with a Junk Query String

Assume we have a public WordPress site with the domain testquerystring.com and LSCache enabled. Use Campaign URL Builder or other UTM plugin to generate a URL, which will look like this:

Access the site with both of the URLs:

  • https://testquerystring.com
  • https://testquerystring.com/?utm_source=google&utm_medium=email&utm_campaign=promo%20code

Set up Rules to Drop the Query String

For testing purpose, we can simply add the following to .htaccess file

 CacheKeyModify -qs:utm_medium 

Verify From Developer Tool

Before the rules are created

All query strings with the same domain will be stored with a different cache key. The above URLs will be stored in 2 separate cache files

After the rules are created

The utm_medium query string is stripped. Due to the fact that there are other query strings attached to the URL, there will still be 2 separate cache files:

  1. https://testquerystring.com/
  2. https://testquerystring.com/?utm_source=google&utm_campaign=promo%20code (notice &utm_medium=email has been stripped)

If you visit nowiki>https://testquerystring.com/?utm_source=google&utm_campaign=promo%20code</nowiki> and then access the following urls, you will see that they are all stored in a single cache file and are a cache hit from the first visit:

  1. https://testquerystring.com/?utm_source=google&utm_medium=text&utm_campaign=promo%20code
  2. https://testquerystring.com/?utm_source=google&utm_medium=instant-message&utm_campaign=promo%20code
  3. https://testquerystring.com/?utm_source=google&utm_medium=browser&utm_campaign=promo%20code
  4. https://testquerystring.com/?utm_source=google&utm_medium=direct&utm_campaign=promo%20code

Verify From Debug Log

We can search for the keywords QS & KEY appearing with CACHE in the debug log:

tail -f /etc/apache2/logs/* | grep 'CACHE' | grep 'QS\|KEY'

We can then verify that utm_medium has been removed from CacheKey data → QS

[CACHE] Remove exact matched QS key [utm_medium],
[CACHE] modified QS in cache key is [],
[CACHE] CacheKey data: URI [/testquerystring.com/?], QS [utm_source=google&utm_campaign=promo%20code], Vary Cookie [_lscache_vary=xxxxx], Private Cookie [wp_woocommerce_session_xxxxx], IP [x.x.x.x]
  • Admin
  • Last modified: 2018/07/26 13:14
  • by Jackson Zhang