Yell if Microsoft’s Live.com Spammed You Too - Updated
Welcome Reddit, Digg, and StumbleUpon users! If you like this post, please vote below. Thank you!
Update 2: Yuri explains more background and asks What happens next?. Reuben Yau and Kichus have both blocked the IP addresses. Boy are people angry.
Update 1: DazzlinDonna from SEO Scoop has written an excellent background to this fiasco, and Michael VanDeMar is reporting that Microsoft is interfering with AdSense. Ouch.
The bot analysis continues, and this post presents evidence indicating that Microsoft is spamming websites. A big claim, I know, but I can't find a better explanation. You'll have to decide.
The summary: IP addresses belonging to Microsoft are requesting pages from eKstreme.com and blogSci.com (my science blog) with HTTP referer headers suggesting that the hits were from live.com searches. These referer headers are spoofed as the keywords from these supposed searches are sometimes in no way related to the requested page. Additionally, for most of the other supposed searches, the requested pages do not rank in the top 10 (first page of results) in a way to send this traffic.
For some odd reason, the webmaster community has known about this for a couple of months. In September, SE Roundtable posted about other webmasters complaining about this spam. Surprisingly, we also got official confirmation (via a WMW thread) from msndude that this indeed happening and it's (and I'm quoting) "part of a quality check we run on selected pages". This is an unacceptable explanation as you'll see from the data below because it has none of the hallmarks of a quality check but all the marks of referral spam.
The hits discussed below are extracted from the blogSci.com data to keep things simple, but a similar data set exists for eKstreme.com.
The Hits
The whole list of hits is way too long to quote in full here, so here is a sampling of my favorite requests:
- At: 17 August 2007 05:53:27 PM GMT
- Routed to: /index.php
- Referred from: http://search.live.com/result.aspx?q=make+money+online&mrt=en-us&FORM=LVSP
- Remote: bl2sch1082213.phx.gbl [] (65.55.165.119)
- Request: HTTP/1.0 GET
- Accepting:
- HTTP: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
- Charset:
- Enconding:
- Languages: en-us
- UA: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
- Cookies:
- At: 18 August 2007 03:05:43 PM GMT
- Routed to: /index.php
- Referred from: http://search.live.com/result.aspx?q=make+money+online&mrt=en-us&FORM=LVSP
- Remote: bl2sch1082008.phx.gbl [] (65.55.165.66)
- Request: HTTP/1.0 GET
- Accepting:
- HTTP: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
- Charset:
- Enconding:
- Languages: en-us
- UA: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
- Cookies:
These two hits above are the first I have in my records. What's amusing about them is that both supposedly came from a search for [make money online].
- At: 19 August 2007 03:55:48 AM GMT
- Routed to: /index.php
- Referred from: http://search.live.com/result.aspx?q=ticket&mrt=en-us&FORM=LVSP
- Remote: bl2sch1081815.phx.gbl [] (65.55.165.25)
- Request: HTTP/1.0 GET
- Accepting:
- HTTP: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
- Charset:
- Enconding:
- Languages: en-us
- UA: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
- Cookies:
This one is also very random: a blog post about a cool new magnet-based technology to create colors is ranking in the top 10 for the query [ticket]? Not even Live.com generates such irrelevant results.
Anything more recent? Sure:
- At: 11 November 2007 03:26:43 PM GMT
- Routed to: /index.php
- Referred from: http://search.live.com/results.aspx?q=osteoporosis&mrt=en-us&FORM=LIVSOP
- Remote: bl2sch1081815.phx.gbl [] (65.55.165.25)
- Request: HTTP/1.0 GET
- Accepting:
- HTTP: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
- Charset:
- Enconding:
- Languages: en-us
- UA: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
- Cookies:
- At: 11 November 2007 03:29:24 PM GMT
- Routed to: /index.php
- Referred from: http://search.live.com/results.aspx?q=amazon&mrt=en-us&FORM=LIVSOP
- Remote: bl2sch1081909.phx.gbl [] (65.55.165.43)
- Request: HTTP/1.0 GET
- Accepting:
- HTTP: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
- Charset:
- Enconding:
- Languages: en-us
- UA: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
- Cookies:
At the time of writing, there are 245 such hits in my records since August 2007.
Not convinced? There is more. Some of these hits came within seconds of being indexed by MSNBot. The pattern is like this: the page is requested by MSNBot (which is authenticated, so it's genuine) and within a few seconds, the very same page is requested as described above with a live.com search are referer. An example:
- At: 10 November 2007 12:05:14 PM GMT
- Routed to: /index.php
- Referred from: (No referer.)
- Remote: livebot-65-55-209-143.search.live.com [] (65.55.209.143)
- Request: HTTP/1.0 GET
- Accepting:
- HTTP: text/html, text/plain, text/xml, application/*, Model/vnd.dwf, drawing/x-dwf
- Charset:
- Enconding: identity;q=1.0
- Languages:
- UA: msnbot/1.0 (+http://search.msn.com/msnbot.htm)
- Cookies:
- At: 10 November 2007 12:05:36 PM GMT
- Routed to: /index.php
- Referred from: http://search.live.com/results.aspx?q=problem&mrt=en-us&FORM=LIVSOP
- Remote: bl2sch1081810.phx.gbl [] (65.55.165.20)
- Request: HTTP/1.0 GET
- Accepting:
- HTTP: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
- Charset:
- Enconding:
- Languages: en-us
- UA: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
- Cookies:
The typical delay between the indexing request and the spoofed search hit request is 5-20 seconds.
How to Recognize the Fake Hits
Anyone staring at these hits long enough will see some signatures to detect them:
- Note how all of them have identical user agents (UA field) and pretty much everything else is identical (bar the the requested page and the referer).
- The IP adresses all belong to the same C-block, namely 65.55.165.*.
- All of the query strings in the live.com referrers have &mrt=en-us in them. Here in the UK, I get &mkt=en-gb when I really use Live.com for a search.
Needless to say, this smells like bot behavior.
An Analysis
Let's think about this for a minute: What on Earth is going? Why are these hits happening? I can think of two explanations:
- The tinfoil/sinister explanation: pure spam from MS. Why? So that webmasters see Live.com referrals coming in increasing numbers. This is not hard to hide: if you only get like 10 referrals from live.com a month, another 10 is a doubling but which sad webmaster would check those out (apart from me)?
- The "surely not" explanation: this is an automated way to check the search results to see where pages rank for keywords the page could potentially rank for. This is what msndude confirmed in the WMW thread, but as you can see above, it doesn't really look like a quality check. Also, if this is indeed a quality check, why not run it on the cached pages and not alert (and annoy) the webmasters? Microsoft have full access to their index and they should use it!
I subscribe firmly to the first explanation: the search keywords are spammy in some cases, always too general, the requested pages never rank in the top 10 as the referring URLs would suggest, the hits have identical user agents (i.e. not the typical variation you would expect from various people using normal browsers on different operating systems withing the same company to show) and the actual referring URL does not match what a human being searching on live.com generates.
In short: it's spam and not a quality control check. What do you think?

