Google Web Accelerator: Please identify yourself

A while back, I noticed some fishy bot-like behavior coming from a Google-owned IP address. After asking around, a friend suggested it could be Google Accelerator. So I emailed Google support and, cut a long story short, it indeed was Google Accelerator (GWA for short).

The IP address back then was 64.233.172.34, which Google confirmed to be a public-facing GWA IP address. The hits were very bot-like: no referer, requesting pages blocked by robots.txt, and identifying themselves using the default user agents for IE or Firefox. However, the hits also showed atypical bot signs: Looking through the log files, I noticed that after the page is requested, all associated files are also requested: the Javascript files, the image files, and the CSS files. Interesting in its own right because remember, the hits are coming from a Google IP address but are really requests from real users - the GWA was acting as a proxy. I hope the implications of this are clear.

Regardless, I dropped it - my question was answered. But now it's back...

Over the past 10 days or so, a new IP address started to show the same pattern. This time, the IP address is 66.249.85.133 and it certainly belongs to Google. It resolves ff-in-f133.google.com and requests using HTTP/1.1 and asking for gzip'ed pages. The requested pages are still ones blocked by robots.txt, identify themselves as IE 6.0 (default user agent), and come in without any referer. However, this time associated JS files are not requested, putting the new behavior firmly in botland.

So far, I've noticed only a few hits, none of which identified themselves as Firefox. Given the history, my best bet at the moment is that it is GWA again on a new IP address, but the lack of JS requests makes me wonder if they also updated the code - maybe for analytics purposes? Regardless, GWA is still acting as a proxy, and so I expect it to identify itself as such. It can easily modify the user agent to hint that it's there. At the very least, it will be useful for analytics; examples of why identification is useful:

  • How many GWA requests does your site get?
  • Are GWA requests labelled as bots and discounted?
  • Should GWA requests be labelled as bots? This is more philosophical than technical.
  • Can GWA be used to scrape websites?

And of course, many more questions. So if anyone works for Google maybe you can spare a minute for this? :D

Subscribe to Things of Sorts

If you liked this post, please subscribe to the Things of Sorts RSS feed:

2 Responses to “Google Web Accelerator: Please identify yourself”

  1. jeff Says:

    I just got one of these requests. I just googled it a found your site. It doesn’t look like bot activity because it only selected certain pages especially pertaining to adsense stuff. It was only on my site for about 2 minutes. I am guessing it is some sort of proxy and theres a human behind it. But its also fairly late 8:20pm in California.

  2. Pierre Says:

    Hi Jeff

    Welcome to the site :)

    It shows signs of bot and not-bot behavior. Regardless, I still think it should identify itself properly.

    Pierre

Leave a Reply

 

Site Navigation

Blog Categories

Popular Pages

The most popular pages on eKstreme.com.

Search

Subscribe

Subscribe to RSS 2.0 feed

Community

 
thermodelly