New Stealth Crawler from Yahoo!

For the past few months, I’ve been tracking a crawler from Yahoo! that does not identify itself on my science blog. The bot’s details are:

Requested page: /science/converting-blood-groups
  • At: 06 May 2008 10:21:05 AM GMT
  • Routed to: /index.php
  • Referred from: http://blogsci.com/science/converting-blood-groups
  • Remote: crawl1.image.srch.kr1.yahoo.com (203.212.174.181)
  • Request: HTTP/1.1 GET
  • Accepting:
    • HTTP: */*
    • Charset:
    • Enconding:
    • Languages:
  • UA:
  • Cookies:

Notice a few interesting details: No user-agent string, the fact it provides an HTTP_REFERER header that’s the same page being requested, it comes from *.yahoo.com not the usual yahoo.net for Slurp, and the fact it says "image" and "srch" in the host.

The tracking is very low-level, a few hits a day with lots of one-hit-a-day visits.

What’s really interesting is how laser-targeted it is: it’s only requested the same two pages many times since May. The pages are the specific blog post linked to above plus the archive page that contains that post, so it’s likely something about that post that’s of interest to the bot. And yes, the post contains an image, and the image is the only one in the main content of the archive.

I’ll dig deeper when I have a chance. Please let me know in the comments below if you’re seeing something similar.

Subscribe to Things of Sorts

If you liked this post, please subscribe to the Things of Sorts RSS feed:

Leave a Reply