GoogleBot Requested a CSS File! - Updated

A recent SEO Refugee thread brought up the subject of whether Google (and the SE bots in general) check CSS files. Testing that is easy: download your log files and search for all requests to the CSS file(s) coming from Googlebot. I usually get a few hits every time I do this (every couple of months or so), but on closer inspection, the hits have always been from a spoofed user agent string, where some clown browses the web pretending to be Googlebot. This is easy enough to accomplish, for example, using Firefox and the user agent switcher extension.

Just now, I decided to check again, and one hit actually did come from a Googlebot IP address. The exact line from the log file is:

66.249.72.52 - - [24/Oct/2006:17:17:35 -0500] "GET /global/x.css HTTP/1.1" 200 8382 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Sure enough, the requesting IP address really does belong to the Google IP block. Also notice that there is no HTTP referer header set, as you'd expect. As far as I am aware, this is the first time anyone has spotted Googlebot requesting a CSS file.

Digging deeper, I tried to find more requests to CSS files. The one requested (x.css) is the main CSS file for ekstreme.com. There is another stylesheet for the Socializer. I couldn't find any Googlebot requests for that. I also checked my other sites, and I couldn't find any requests there either. In short, this is the only CSS file request by Googlebot I could find.

Incidentally, that same Googlebot requested just over 3000 pages from eKstreme.com that day.

What does this mean? If Google is now interested in CSS files, could it be also interesting in discovering hidden text? That would be interesting ;)

So can you please check your logs? I'm sure eKstreme.com is not special enough to be the only site whose CSS file got crawled. This is how I check my logs: I use grep (Windows users get Unix Utils) with the following command:

grep -F "x.css" logfile.txt > css-hits.txt

This picks up all requests to the CSS file. Make sure you replace 'x.css' with your CSS file's name! Next, we fish out all hits that mention Google. To avoid any case sensitivity issues, we simply search for 'oogle':

grep -F "oogle" css-hits.txt > oogle-css.txt

Now open up oogle-css.txt and look at each hit individually. You'll usually have a few dozen so it won't be that hard. For any hits claiming to be Googlebot, check the IP address to see if it is part of Google's IP block or not.

If you find something, please comment below or contact me.

UPDATE: A few updates and reactions from around the web:

  • First off, SEO Scoop's Donna gave some background to the story about how this was alluded to at Pubcon Vegas. Thanks for the mention Donna!
  • Barry over at SE Roundtable mentioned this story too (thanks!) and also linked to more references about other reports like this one. Read them all, especially the ones from last month.
  • A question asked by many people was whether the directory where the CSS file resides is blocked by robots.txt. No, it's not.
  • Two lively forum threads are debating this story, one at Cre8 and one at SEO Refugee. Come join the fun!
  • Interestingly, Michael Martinez mentioned in the SEO Refugee thread and in the SE Roundtable comments that he's seen this kind of behavior before - i.e., Google requesting CSS and JS files. So I decided to check...
  • External Javascript files were downloaded too! In total, I saw 71 requests to the various JS files used on eKstreme.com from Google IP addresses identifying themselves as GoogleBot. Twenty of the 71 hits came from 66.249.72.72. The first hit 5 September 2006 and the last 7 November 2006. An excerpt from the log file:

    66.249.72.100 - - [05/Sep/2006:13:31:50 -0500] "GET /socializer/socializer.js HTTP/1.1" 200 228 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

    66.249.72.34 - - [06/Sep/2006:00:01:22 -0500] "GET /tracker/ajaxlinktracker.js HTTP/1.1" 200 14212 "-" "'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'"

    66.249.72.234 - - [07/Sep/2006:00:47:29 -0500] "GET /socializer/socializer.js HTTP/1.1" 200 228 "-" "'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'"

    66.249.72.234 - - [07/Sep/2006:00:48:05 -0500] "GET /tracker/ajaxlinktracker.js HTTP/1.1" 200 14212 "-" "'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'"

    66.249.72.234 - - [07/Sep/2006:01:01:33 -0500] "GET /socializer/socializer.js HTTP/1.1" 200 228 "-" "'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'"

    66.249.72.234 - - [07/Sep/2006:01:10:55 -0500] "GET /tracker/ajaxlinktracker.js HTTP/1.1" 200 14212 "-" "'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'"

    66.249.72.234 - - [07/Sep/2006:18:11:49 -0500] "GET /socializer/socializer.js HTTP/1.1" 200 228 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

    66.249.72.234 - - [08/Sep/2006:12:31:13 -0500] "GET /tracker/ajaxlinktracker.js HTTP/1.1" 200 14212 "-" "'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'"

  • I've talked before about Google indexing more types of files, including ZIP file and exe files (see Binary Data in SERPs) and I've even spotted what appears to be a Javascript link recognized as a backlink. All in all, this could be a trend of Googlebot getting 'smarter'.

Technorati Tags: , ,

Subscribe to Things of Sorts

If you liked this post, please subscribe to the Things of Sorts RSS feed:

24 Responses to “GoogleBot Requested a CSS File! - Updated”

  1. leadegroot Says:

    I have seen this a couple of times a long time ago, but I ended up deciding that it was a cache view pulling the file.
    Did you look at the context of page loads around it?
    Lack of referer is a contra-indicator though… :(

    I guess it was only a matter of time before it happened.

  2. Pierre Says:

    I looked at the context as you say, part of me figuring out the 3000 requests of that day. As far as I can tell, it seems to be just a random request as part of a queue.

    Pierre

  3. Googlebot richiede anche i CSS? » Posizionamento Google | SEO e Posizionamento motori di ricerca Says:

    […] E’ una ricorrente domanda che molti operatori sem si fanno e a cui finora la risposta più diffusa è stato un secco NO.. Ora, invece, ecco una novità che ci giunge dal blog di eKstreme. Da una analisi dei file di log, l’autore afferma di aver rinvenuto una precisa richiesta da parte di Googlebot del file css. In effetti, quanto riportato dall’autore non fa una grinza dal lato tecnico. La richiesta, cosi come è riportata sul sito, sembra proprio essere quello che sembra, ma fare di una eccezione una regola direi che è eccessivo. […]

  4. Ramblings About SEO » Blog Archive » GoogleBot Crawling CSS Files Says:

    […] Barry Schwartz spotted a thread in the Cre8site Forums discussing a report by Ekstreme.com that Googlebot requested a CSS file. This is a very interesting, but not surprising development. […]

  5. miketheinternetguy Says:

    Just a shot in the dark, but could this be indexing for Google Code Search?

  6. tkmoney Says:

    Word to miketheinternetguy. I was just going to say that.

  7. DygitiScape » Blog Archive » Is GoogleBot Getting Smarter or Discriminatory? Says:

    […] It appears Google may be starting to index non-HTML content such as Javascript and CSS. On the blog titled, “things of sorts,” suspects this is a move by Google to go after Black Hat SEOs. The blog provides a reference to the header requests from Google against his CSS and JS files. […]

  8. Googlebot Requests CSS? at Epsilon’s Blog Says:

    […] Sure enough, the requesting IP address really does belong to the Google IP block. # […]

  9. » Savaitgalio skaitiniai #8 Archyvas » Pixel.lt Says:

    […] Sveiki, turbÅ«t visi pastebÄ—jot, kad laikas lekia nerealiai greitai. Jau praÄ—jo visa savaitÄ— po naujųjų linksmybių, o galvos dar tik baigia blaivytis ) Å iÄ… savaitÄ™ su mumis pasidalino mintimis justjust, parašęs straipsnį “Internet Explorer sÄ…lygos sakiniai - komentarai (if…else)“. Dekui jam. Skaitiniai Å¡iam savaitgaliui: Humble Little Ruby Book (čia visa knyga) Debugging PHP scripts GoogleBot Requested a CSS File! HTML Cheat Sheet (html Å¡pera) D Programming Language […]

  10. Google индексирует CSS? - SEO блог - инструменты вебмастера Says:

    […] Ð’ одном западном блоге приводятся данные, что Google действительно индексирует CSS. Я покопался в лог-файлах Searchengines.ru и тоже обнаружил один запрос от Googlebot`а к css-файлу. Вот такой: 64.233.172.2 - - [19/Nov/2006:18:06:18 +0300] “GET /forum/styles.css HTTP/1.1″ 200 5351 “http://forum.searchengines.ru/” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html; SV1)” […]

  11. k2blogman Says:

    Pierre,

    I have seen this happen on one site I manage and it was rather interesting to see, because I just didn’t expect it.

    Comments made by some of the search engine reps at SES Chicago 2006 leaves you wondering what they are up to. They suggested not blocking the CSS or JS from them. I have never been to concerned about it and really I’m still not concerned, but you have to wonder what they are up to.

    Are they looking to penalize folks that block or are they just trying to find out if folks are using them for black hat purposes.

    I have heard a lot of folks using their CSS to hide text which I wouldn’t do, but perhaps they are wanting to find out who may be doing this sort of thing and if you don’t allow them to see it will they penalize you for hiding it? Hard to say what they are up to.

  12. Googlebot reads external CSS and javascript files: good news for reputable sites Says:

    […] Multiple reports from the SEO blogosphere (Ekstreme, Cre8asite) report that the Google spider is starting to slurp external CSS and javascript files to cut down on “black hat SEO tricks”. Hurrah! […]

  13. …time is what you make of it… » Archivio del blog » Notizie dal mondo: web e non solo. Says:

    […] La seconda notizia arriva invece da qui: sembra che i bot di Google stiano migliorando le loro capacità di scansione delle pagine web. In quell’articolo si legge infatti che alcuni bot iniziano a chiedere, durante le scansioni di un sito anche i file css e questo comporterà due cose: […]

  14. All in a days work… Says:

    […] GoogleBot Now Requesting CSS and JavaScript Files! (tags: Googlebot CSS JavaScript) […]

  15. Google controlla i fogli di stile » Il tuttlog di Tassoman Says:

    […] L’autore del blog eKstreme si è accorto che GoogleBot ha richiesto varie volte il foglio di stile del suo sito, prima di iniziare una massiccia indicizzazione. […]

  16. Ituloy AngSulong News » Googlebot Reading CSS? Says:

    […] Matt Cutts has mentioned already shared his thoughts blog about hidden content using CSS and pointed out several sites in the past, and CSS hiding of text to increasing ranking should not be done. With various techniques of doing that, it looks like Googlebot has been recently reported to be reading CSS files as first reported by Pierre Far, could it be searching for hidden text using display:none? negative letter spacing? negative absolute page positioning? overflow:hidden? We really don’t know. I looked further into the story just to have a closer inspection. I had some questions and has a few answers found from his site. Here is the line in the raw access log he had: 66.249.72.52 - - [24/Oct/2006:17:17:35 -0500] “GET /global/x.css HTTP/1.1″ 200 8382 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” […]

  17. Google tutkii myös CSS- ja Javascript-tiedostoja | Nettibisnes.Info Says:

    […] Tämä uutinen on herättänyt keskustelua hakukoneoptimoijien piirissä. Ilmeisesti Google indeksoi myös tyyli- ja Javascript-tiedostoja. Uutinen ja siitä virinnyt keskustelu herättävät useita kysymyksiä. […]

  18. What Digg users love about SEO | Cornwallseo.com Says:

    […] Google Takes on Hidden Text Diggs 657 Pr 0 Alexa 14k Cached pages 961 Technorati Rank N/A Submitted by Pops aka Jimbobcook ekstreme.com […]

  19. Cloaking your CSS files - VinceVinceVince Says:

    […] Is Google Sending Googlebot CSS hunting [SEroundtable] […]

  20. » Googlebot crawling CSS files now? » SEO News - All The SEO Scoop Says:

    […] eKstreme posted about seeing Googlebot fetching a CSS file (via his logfile). He’s only seen it the one time, however, and is asking others to check their logfiles as well. I vaguely remember someone at Pubcon Vegas (I think it was Matt Cutts) mentioning that it would be a good idea for webmasters to not block javascript and css files from the bots. I remember raising my eyebrows at that point, but my memory fails me beyond that. Anyone else who attended Pubcon Vegas remember that moment more specifically? […]

  21. Google indexeert CSS en Javascript | EdWords.nl Zoekmachine Marketing Weblog Says:

    […] afgelopen dagen is er in Zoekmachine-Optimalisatie-land een discussie ontstaan over een post op Ekstreme.com waarin wordt aangetoond dat Google ook CSS en Javascript bestanden indexeert. Waarom zou Google dit […]

  22. Verborgen teksten en zoekmachine optimalisatie (2) Says:

    […] SEO - een grappige website die onder het mom van waarschuwingen precies vertelt wat het is. Een artikel over het opvragen van .css bestanden door Google Vind je dit interessant? Neem dan een abonnement op mijn RSS feed!Wat is een RSS feed? Lees […]

  23. Googlebot Requested Another JS File - eKstreme.com Says:

    […] in January, I blogged about how Googlebot requested CSS and JS files. Ever since the news broke (to much fanfare), this front of SEO has been awfully […]

  24. Googlebot Can See Your CSS - SEOlogs.com Says:

    […] the full article here. function toggleview(element1) { var element1 = document.getElementById(element1); if […]

Leave a Reply

 

Site Navigation

Blog Categories

Popular Pages

The most popular pages on eKstreme.com.

Search

Subscribe

Subscribe to RSS 2.0 feed

Community

 
thermodelly