This is an old revision of the document!


How to avoid faked Google bot?

A user received for some sort of attack from real Google ip ranges and Googlebot user-agent bypassing LSWS Recaptcha feature, which hit random GET /kategori/wewuj-qunuf-litiv/, etc.

Since those are real google bot, just update robot.txt to tell google to be more gentle crawling the site.

Another possible solution is to customize rewrite rules testing against the user-agent, and redirect to the home page.

Usually, Google bot is not that aggressive.

If someone fakes google bot IP through the X-Forwarded-For header, you will need to detect and stop that. How?

For example, you might see some headers:

CF-IPCountry: XX
X-Forwarded-For: 39.33.93.175,66.249.93.199
CF-RAY: 5473956aa9d2cde3-CDG
X-Forwarded-Proto: http
CF-Visitor: {"scheme":"http"}
Accept: image/webp,image/apng,image/, /*;q=0.8
Accept-Language: en-NZ,en-GB;q=0.9,en-US;q=0.8,en;q=0.7
Forwarded: for=39.33.93.175

There are two ip adresses 39.33.93.175(faked google) and 66.249.93.199(real google IP).

While LiteSpeed Web Server needs to be configured to only update client IP if the original IP is trusted. In LiteSpeed WebAdmin Console > Configuration > General Settings and set Use Client IP in Header to Trusted IP Only(never set to Yes), and add google IPs/Subnets to the trusted list. On the other hand, Google doesn't post a public list of IP addresses for webmasters to whitelist and you can verify googlebot IPs before adding them to allowed list.

  • Admin
  • Last modified: 2020/01/06 21:38
  • by Jackson Zhang