How to Avoid a Faked Google Bot

LiteSpeed's reCAPTCHA feature is helpful for avoiding attacks, but reCAPTCHA may be bypassed when bad actors fake user-agents and IPs, and hit random URIs (like /kategori/wewuj-qunuf-litiv/, as seen in the screenshot.) In particular, an attacker may pretend to be googlebot, originating from an IP within the real Google IP range.

There are a few ways to deal with this problem:

  1. Update robots.txt to tell Google to be more gentle crawling the site.
  2. Customize rewrite rules to test against the user agent, and redirect to the home page.

Usually, googlebot is not so aggressive, so it's reasonable to assume you are under attack. If someone fakes the googlebot IP through the X-Forwarded-For header, you will need to detect and stop it.

For example, you might see these headers:

CF-IPCountry: XX
X-Forwarded-For: 39.33.93.175,66.249.93.199
CF-RAY: 5473956aa9d2cde3-CDG
X-Forwarded-Proto: http
CF-Visitor: {"scheme":"http"}
Accept: image/webp,image/apng,image/, /*;q=0.8
Accept-Language: en-NZ,en-GB;q=0.9,en-US;q=0.8,en;q=0.7
Forwarded: for=39.33.93.175

There are two ip adresses 39.33.93.175(the faked google IP) and 66.249.93.199(the real google IP).

Configure LiteSpeed Web Server only update the client IP if the original IP is trusted. Navigate to LiteSpeed WebAdmin Console > Configuration > General Settings and set Use Client IP in Header to Trusted IP Only(never set to Yes), and add the google IPs/subnets to the trusted list.

Google doesn't post a public list of IP addresses for webmasters to whitelist, but you can verify googlebot IPs before adding them to allowed list.

  • Admin
  • Last modified: 2020/01/07 17:18
  • by Lisa Clarke