What is YahooCacheSystem?
I just started noticing some hits coming from a few *.yahoo.net IP addresses with a user agent of just "YahooCacheSystem" and requesting only the raw RSS XML feed so far. All requests are HTTP/1.0 GET, setting the HTTP_ACCEPT to */*. No other headers are set.
The first hit I've seen was on April 27th, which came from the IP address 216.39.58.78. Back then, that resolved to htproxy3.ops.re4.yahoo.net. However, ever since, the hits are all from a different C-block, 209.131.41.*, which resolves variously to, htproxyX.ops.sp1.yahoo.net (X is a number like 1 or 2 to give htproxy1.ops.sp1.yahoo.net or htproxy2.ops.sp1.yahoo.net). Even more recently, the IP addresses remained the same, but the hosts they resolve to changed to htproxyX.ops.re4.yahoo.net (again, X is a number to give htproxy1.ops.re4.yahoo.net or htproxy2.ops.re4.yahoo.net).
I post about this bot for one simple reason: the UA is very intriguing and the fact that it's requesting just RSS XML feeds is also interesting. Are we going to see a Yahoo! service or a set of services that deal with just blogs?
TechCrunch reported way back in 2005 about the launch of Yahoo! blog Search, which back then and today has pointed to what Yahoo! calls the News Search, which according to the web page is to "Search real-time news stories from Yahoo! News and across the web." That's fine and dandy, but it's no blog search per se.
So the YahooCacheSystem bot could represent one of two things:
- Yahoo! is consolidating its backend infrastructure to deal with RSS-based sites better. So they are building a centralized RSS cache for all their services to use. For webmasters, this means we now have a new analytics data point we can look at.
- Or... (wait, I need peer at my crystal ball...) Yahoo! is moving towards building a serious set of services centred around XML feeds. This could mean we could see a true blog search product soon, or something else we can only guess at.
So which one is it? I can only provide guesses. Given the utter lack of evidence and, more importantly, rumors, I'm leaning towards the infrastructure explanation. However, a good infrastructure is necessary for a major strategic shift or product launch. Time will tell.
Subscribe to Things of Sorts
If you liked this post, please subscribe to the Things of Sorts RSS feed: ![]()

June 9th, 2008 at 3:24 am
Y Pipes?
June 9th, 2008 at 6:59 am
That’s a very good guess Yura. I didn’t think of it. The only observation that it’s probably not Y Pipes is that the bot hits once every 24hrs or so, although I’ve seen the odd faster-than-normal request rate (could it be throttled?).
Interesting. We’ll see how that goes…