What do you do with Unauthenticated Search Engine Bots?
Over at Search Engine Journal, Ann Smarty explains how to switch your UA to Googlebot and browse the web. The technique uses a Firefox extension to change the user agent string to that of Googlebot. Simple and works a treat. Except for...
The problem here is that it is very easy to authenticate Googlebot, Slurp, or MSNBot. The three major search engines give us a double-DNS trip to check whether a request pretending to be one of their crawlers is genuine or not. The authentication helps us webmasters fight against crawlers (not to mention other things
). So the SEJ article is useful but it's not 100% foolproof and people pretending to be GBot/Slurp/MSNBot will probably get trapped with snares laid by clever webmasters.
This raises an interesting question: If you do authenticate SE bot requests, what do you do with unauthenticated ones?
Personally, I just block all unauthenticated bots. The request is served with a blank page without any content. I've found that this helped stop *all* (yes all) unauthenticated bots but with proportional rise in more sleuthing bots (i.e. scrapers pretending to be a browser). No matter, this is an arms race and I'm in it for the long-run.
Other people suggest you should feed unauthenticated requests with content that AdSense frowns upon like guns or porn. The idea is that these crawlers are out to get your content for MFA sites and so it's best to get them banned the quick and dirty way.
Others suggest just ignoring them; after all, they'll come back with a different UA anyway, so what's the point? This attitude bothers me because it just means giving up and letting your content get scraped far and wide without any control.
So what do you do with unauthenticated bots and more generally, what do you do with bots?
Subscribe to Things of Sorts
If you liked this post, please subscribe to the Things of Sorts RSS feed: ![]()

July 11th, 2008 at 5:39 am
I do nothing, but I was considering banning unrecognized accesses to save bandwidth/reduce spam.
August 21st, 2008 at 10:35 am
[…] Pierre Far, the Search Blogger of the Day. Today I’d like to highlight a post called What Do You Do With Unauthenticated Search Engine Bots?. Pierre brings up some interesting thoughts about what to do with those pesky bots. He just blocks […]
August 22nd, 2008 at 5:30 am
How do you block the scapper sites? Via IPs and htaccess?