Google Alerts Now Spell Checks the Queries

Lately I’ve been noticing a lot of weird hits coming in via my Google Alerts emails. I’ve dug into it and I think I’ve figured out what’s going on: Google Alerts is spell checking the queries and matching the queries as it would do in a search. This in addition to matching the Alert queries exactly as previously. This new behavior kicked in about a week or 10 days ago.

For example: I keep an alert for [blogsci] because I have a website at blogsci.com. Up till recently, I used to get alerts only when the word "blogsci" was matched in a page. Now, I’m getting Alerts for pages that do not ever mention the word "blogsci" but the spell checked "blog sci". So I get matches for "…blog: sci-fi…". See what happened there?

Another example: I run a website with a domain name of XY.com where X is a word and Y is another word. My Alert is set to match it exactly as [XY]. This was going well until recently when I started getting alerts that match [X Y].

Another example: I have an alert for [cli.gs], my latest web app. I get a lot of spurious alerts for this because it matches [cli gs] which is a very popular combination apparently.

Anyone else seeing this weirdness? Any other interpretations? Thoughts in the comments please!

Hey YouTube: UK = GB, and both are English

Sometimes I see help messages that just leave me speechless. This message from YouTube about my automatically-set language preferences goes above and beyond anything I’ve seen in a long time because it has two big "WTF moments":

The problems?

  • The red circles: The suggestion that English (UK) is different from English (GB). Psst. They’re the same thing. It’s an exceptional reservation in the ISO standard.
  • The black circle: The whole message is apparently not in English because the link at the bottom right corner gives me the option to view it in English. When I click it, I get the same message, but instead of suggesting English (UK), it suggests just plain old English. And oh, it gives me the option to change my language to the real English of English (US).

Hey, I have news for you YouTube: English, English (UK), English (GB) and English (US) are all freakin’ English.

Yahoo! Search Doing a SERPs Usability Survey


I was just searching with Yahoo! and I saw a survey request from "Yahoo! Surveys". It was a big purple box to the immediate right of the results list, and it was anchored to the bottom of the screen (so even if I scrolled down, it went down too). I clicked on it before I realized I should have taken a screenshot, but I did take a screenshot of the single question in the survey. The question opens in a new window:

Photobucket

Click for full size, and no, I’m not going to tell you what my answer was :p

The SEOmoz Linkscape Ghost

If you’re part of the SEO industry, unless you’ve been livining under a rock for the past couple of days, you will know that SEOmoz launched a new tool called Linkscape, to much fanfare. First things first, congrats and kudos are due to the SEOmoz team for building such a complex beast. It’s not easy at the very least on the technical level.


But there is a problem: SEOmoz has not disclosed the user agent (UA) of its crawler. Here I will talk about why this is a bad thing, and also take a stab and go out on a limb and say: there is no SEOmoz crawler, at least not in the traditional sense. For the latter, I will offer a viable technical alternative, which may or not be correct, but the fact the alternative exists gives a sensible explanation as to why SEOmoz is not offering a straight answer to the UA question.

Why Disclosing the UA is Essential

Let’s not mince words: we as an SEO community like a little mud fight once in a while. We debate and discuss and yes fight. But one thing we all know how to recognize is malicious activity and differentiate it from aggressive activity.

Example: a bot scraping our content for an MFA site is a tolerated nusance. We take steps to negate the effects of scrapers but at the end of the day we don’t fight them hard. On the other hand, a bot probing for security holes is treated like a witch in 1209AD.

Which is why the Linkscape’s lack of disclosure hurts: We as a community work hard at identifiying bots. SEOmoz is supposed to be a good citizen of the SEO world, and yet the lack of transparency goes against the spirit and the image of SEOmoz. On the one hand we have a company with a strong community doing good deeds (SEO trademark fight anyone?) and yet it behaves in a way we expect out of the shady side of the net we deal with every day.

Not just that: the data collected from us, about us, will be used against us. It’s called competitive intelligence.

And not just that: SEOmoz is using the data to make money. The free version is pathetic and the Pro version needs a monthly subscription.

To me, this kind of behavior (stealth, harmful, and to make money) puts Linkscape squarely in the naughty corner. I certainly didn’t expect this out of SEOmoz. Tough luck Rand and co: you have a great brand and I for one expect better!

But I won’t ask for a UA because I think there isn’t one.

How To Build Linkscape

It’s actually quite easy on a conceptual level. However, just like cooking, having a recipe doesn’t make you a great chef – there are lots of details that SEOmoz must have tackled successfully to build Linkscape. I am not trying to belittle their achievment, and all I can show you is one recipe. This recipe is completely my guess and could very well be wrong. I have not talked to anyone at SEOmoz.

So come on Pierre, what is it? The answer is the Yahoo! Search API. It’s an API giving programmers complete access over the Yahoo! index without crawling to a single page. For example, the following URL:

http://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=site%3Aseomoz.org%2F&results=2

fetches the first two hits from a Yahoo! [site:seomoz.org]. Interestingly, it tells you where the cache URLs are, and they reside on Yahoo! servers (unsurprisingly). So you fetch the cache from Yahoo!, do the analysis, save what you care about (links, titles, etc), and you’re done.

You’ll need to kick start this somehow with a seed set of sites. DMOZ and Wikipedia are usually good sources that are freely available. Wikipedia can even be downloaded so no one needs to know. Yahoo!’s very own Delicious, Digg, reddit, etc are also good starting points because they tell you what’s hot right now. The seed is basically a huge set of URLs from which you extract the domain names and do [site:domain] queries. Lather, rinse, repeat.

Notice that you won’t need to crawl a single page yourself. You let Yahoo! do the work for you. Neat, no?

So What Should SEOmoz Disclose?

Above I said two potentially conflicting things: SEOmoz should disclose the Linkscape user agent and then went on to show that it doesn’t need to have a user agent. So what exactly am I asking from SEOmoz?

Easy: complete disclosure. If SEOmoz is using a traditional crawler, we must have its UA and the IP addresses. It’s only a matter of time for us to find them. If not, SEOmoz needs to explain clearly why not.

Announcing Cligs: Short URLs with Analytics and SEO Friendliness

That’s right folks, the short URL market is broken and I’m fixing it. The new service is called Cligs (like Clicks but with a G). It’s a short URL service on steroids. The key feature is that it tracks the clicks of the short URLs.

What kind of analytics do you get? At launch right now:

  • Cligs gives you tons of traffic data and analytics about the traffic your short URLs get. This includes:
    • Number of hits
    • Referral stats
    • Mentions on twitter, blogs, and the web
    • Mentions of the destination URL on twitter, blogs, the web, and delicious

    And lots more! And if you want a more data, just let me know!

  • Cligs forwards with a 301 Permanent Redirect so your destination URL gets full SEO benefits of the link. If you are an affiliate marketer, this means you can hide your backlinks, get traffic, get statistics, and get the SEO benefits.
  • With Cligs, you can create an unlimited number of short URLs for the same destination URL. This is great because you can promote the same destination at different sites like twitter or facebook by using different cligs and watch how each source sends you traffic.

That’s just the start. There are a ton of new features that are going to be added in the coming few days and weeks, including some SEO-useful analytics.

And, of course, there is a bookmarklet:

Shorten Link @ Cli.gs

So what are you waiting for? Stop using plain-vanilla short URL services and start using Cligs.

Comments and feedback most welcome.

Opt Out of Behavioral Ad Targeting by Google/Doubleclick and Yahoo!

Oh yes, finally a way to tell the algo-borgs at Google/Doubleclick and Yahoo! that they should not track your behavior to deliver "more relevant" ads. You do that by visiting a page on each of their websites and click a button which sets a cookie that tells the system to not track your behavior.

Google also links to another page from the Network Advertising Initiative which lists quite a few ad systems you can opt out of.

The pages are:

While I’m at it, does anyone else find Yahoo!’s page to be much better than Google’s? Think about the usability: it tells you if you’ve opted in or out and explains that it’s per computer rather than per user (very important!!!). I’m just saying that as a landing page supposedly to help consumers, Google’s is a mess compared to Yahoo!’s clean and to the point page. The NAI’s is very good too.

Chatting with a Google Street View Driver




Note: some details in this post have been skipped or generalized to be a bit vague to protect the identity of the Google Streeview driver.

Google Street View Car

Sometime in the past few weeks, I was walking with a friend when we spotted a very funny looking car. We both immediately knew what it was and as the car drove closer by, our suspicions were confirmed: it was a Google Streetview car outside London. Feeling naughty, I shouted at the car as it drove by something along the lines of "there are privacy laws" and to my surprise an old man across the streed did the same! It was very funny how both of us knew what a Streetview car looked like!

Then it hit me: the road we were on that the car was driving into was a dead end road. Picture time! So I dropped my stuff and asked my friend to watch them while I set up my phone and found a good spot to take some photos as the car drove back out again. So I watched as the car reached the end, did a U-turn and drove back out again. However, as it got close to me, the car pulled up into an empty parking spot and the driver came out. He shouted at me saying "I know you want to take pictures but I don’t want to be in them." I obliged.

While taking the photos, I talked to the driver a little bit. Here are some details from the notes I scribbled afterwards:

  • Google has a centre in Milton Keynes where this operation was based in. The drivers just showed up for "a driving job" (his words) and didn’t know it was for Google until the arrived to pick up the cars.
  • The drivers were given training to use the computers inside the car. It’s not hard: it’s a large-ish touch screen (I guessed about 17in or maybe a 19in when I saw it) with a record and a pause button.
  • The screen is to the left of the driver in the passenger seat with a large server at the back in the trunk. The back seats of the car were removed – it was just a big space. The connections into the server were just power and ethernet. The ethernet seemed to be going up to the camera but I’m not sure if it ran to something else.
  • The camera is rain sensitive. It collapses in a very funky way and has to be covered. The drivers are under strict instructions to do so.
  • This particular driver was very sensitive to the privacy issues. He was having a personal conflict about the whole thing and was stopped by (his words) "10 people" that very day. Why? Because only recently had the BBC published an article about Google Streetview starting with Google’s plans to launch a mapping tool in the UK could be referred to the Information Commissioner". No wonder the driver didn’t want to be in the photo!

Now some photos of the car with notes:

Google StreetView car, front view

The car from the front.

Google StreetView camera

The car’s camera. The hexagon Octagon at the top is I think is the camera set itself (so 6 8 cameras in total). The yellow box seems to be the communication/processing circuitry; the yellow box is on the back side of the car and so the white box thing at the right hand side of the image points towards the right of the car. This white box thing seems to swivel up and down but this is just a wild guess.

Google StreetView car, back view

The car’s camera kit as seen from the rear of the car. Just guessing what each bit is: Yello box at the top, as above. White boxes to the left and right are the (potentially) swiveling bits – could they be cameras? The yellow disk at the bottom: a wireless communications dish? It could be a GPS receiver.

Update: Looking through some of the other images I had after someone dropped a hint on GTalk to me, the white boxes under the hexagon of cameras are laser range finders. Sure enough, I have a photo that has a warning that it’s a "Class 1 Laser".

Update 2: Thanks for all the comments. Yes I couldn’t count: there are 8 cameras not 6; that’s fixed now. Also, a lot of people wrote about the type of laser range finder and why you’d need it – see the comments below. Finally, lots of people noted a certain irony in the driver not wanting to be photographed. Point taken, but the guy was very conflicted about it. The BBC article was still in memory and clearly some people like me caused his some fuss on that day. He was talking a lot about wanting to quit this job. Deep down I think he did but of course I cannot know.

Update 3:Yes some rain droplets is visible in a photo. It wasn’t raining while we were talking but it had rained earlier that day. When the driver parked, the camera hit some trees (you can see that in the photos) and the droplets are from the tree. It’s hard rain that gets the equipment as I understand it, and that’s when the drivers are supposed to cover up.

Twitter Bug: View Friend-Only Private Updates

On twitter, I’m following someone who I cannot un-follow due to a bug in Twitter. Why? Because said person changed their settings I’m only giving updates to friends – I see the message "I’m only giving updates to friends.". Visiting the person’s home page, I cannot see the Follow/Unfollow button because the interface only lets me ask the person to allow me to see his updates.

But I can easily see his updates.

Here is how: browse twitter using a mobile phone. Yes the mobile interface shows you these "private" updates but the web interface shows me the message "I’m only giving updates to friends.". I discovered this bug by accident while browsing using my mobile phone, but using a couple of extensions, you can pull off this trick in Firefox.

The screenshot below illustrates the bug. It’s basically the mobile version and the full normal version of twitter side by side. The lines map corresponding updates, with the yellow/orange one highlighting the bug.

Twitter bug showing private updates

Download full sizes of the screenshots used to make the image above:

I’ve filed a bug report with twitter.

New Word for Spam: Linkosphere

Yes, that’s right folks. Step right up. We have a new buzzword to hide the fact that we’re scraping content and sending trackbacks to the original content. The new word is… Linkosphere.

So, pray do tell us Pierre, where would you come up with such a silly name? Why I’m glad you asked. It’s the service that’s been spamming me blog for the past few months, hosted at the one and only ectio dot us. See, them scrapers have a serious claim: "Find something to read, guaranteed!" I believe them given all the scraping they’re doing.

And thus because I am in the mood to return them the favo(u)r, I hereby declare them the prototypical scraposphere service. Beat that!

Houston, We Have a Twitter

That’s right folks. Today I decided to actually do something about my Twitter account. Follow me at pierrefar.

The question is *what* will I do with the account? It may be a few days before I dive in properly :) See you @twitter

Irony

Support Wikipedia!

Hint: Look at the source code…


Free Lunch 404

Time Magazine published a nice article by Bill Tancer (of Hitwise fame) talking about the top 10000 keywords searched for that contain the word ‘free’ like [free games] or [free myspace layouts]. The analysis is very interesting for anyone into keyword research but amusingly, he found that no one is searching for [free lunch].

It’s not long, and well worth a read.

[tags]hitwise, free, keyword research[/tags]

Yahoo! and Google have Strongest Brands

A press release just made public covering research by Penn State’s College of Information Sciences and Technology. It’s a very succinct write-up, so I’ll just quote bits of it:

Researchers in the College of Information Sciences and Technology (IST) copied Google results pages from four different e-commerce queries, ascribing them to four different search engines — Google, MSN Live Search, Yahoo! and an in-house engine created for the study. Then the researchers showed the pages to 32 study participants who were asked to evaluate the engines’ performance in returning relevant results.

Despite the results pages being identical in content and presentation, participants indicated that Yahoo! and Google outperformed MSN Live Search and the in-house search engine.

Participants ranked results from Yahoo! more relevant across the four queries.

The whole premise of the press release is that this observation is the result of brand power for both Google and Yahoo!. It’s an interesting observation, and certainly makes sense, but I’m still not 100% convinced. The sample size is too small and as the researchers noted, "many of the participants said they used Google to search". The very next thing they need to try is to recruit MSN/Live users to do the experiment. If their hypothesis is true, the MSN/Live users would rate MSN’s results top.

Regardless, an interesting note that could explain a lot of the momentum behind the top SEs.

[tags]search engines, Google, Yahoo, MSN, Live, research[/tags]

Online Survey Proves I’m Nerdy

I saw it on the Internet. It must be true.

So Michael tagged me with the latest meme going around: taking a nerd test to get a score (of course). And my score? Let me use the test’s words: "I am nerdier than 92% of all people.". That’s right folks, I know my periodic table inside out and can recognize photos of great scientists who passed away hundreds of years ago. I even get a badge:


I am nerdier than 92% of all people. Are you a nerd? Click here to find out!

Apparently, this beats the highest score Michael knows about. If you’ll excuse me, I need to go have a walk or something.

So now I get to choose some people to have a go at beating me. Let’s hear it for:

  • Joe
  • Sophie (wonder if switching to a mac has affected her)
  • Kim

Yahoo! 404

Terrible ad placement: here.

Stand on the Shoulders of… Patent Lawyers?

A colleague tipped me on this: Google Scholar is now showing patent results. An example: Result 9 for [video compression].

I’m of two minds about this. Sure it’s not true to the ‘scholarly’ spirit of their "Stand on the shoulders of giants" search they developed, but it may be kind of useful to have both databases searchable in one interface (just don’t believe it’s scholarly). On the other hand, I see this as yet another attempt by G to force upon us a half-baked service that is very inferior to everything else out there. I already had my rant in this Cre8 thread, and I think it’s still fairly accurate.

To follow on: is this the web equivalent of ‘bundling’ that got Microsoft into so much trouble in the 1990′s? Any lawyers out there? Pinging Bill!

[tags]google, patent, scholar, search[/tags]

Moderating at Cre8asite Forums

I was very excited when a couple of days ago, Kim, one of the founders of Cre8asite Forums invited me to join the moderating team. I love Cre8 (a lot) and without a blink I blurted out “yesssss”. I’ll be moderating the PPC forums, the Google forums, and the Yahoo forums.

Now it’s official: the announcement thread and blog post. A big thank you to Kim and everyone for extending this invitation and for making Cre8 a great place to hang out and talk web stuff.

So come on – join in the fun!

Microsoft Patents Blackhat CMS

This is too funny: US Patent Application 20060288329 was published a few days ago. The abstract starts with this:

A content syndication platform, such as a web content syndication platform, manages, organizes and makes available for consumption content that is acquired from the Internet.

So let me get this straight: the patent is about a system (a syndication platform) that manages content acquired from the internet and makes it available. Hmm… surely not? What’s the very first claim?

1. A system comprising: one or more computer-readable media; computer-readable instructions on the one or more computer-readable media which, when executed, implement: an RSS platform that is configured to receive and process RSS data in one or more formats; and code means configured to enable different types of applications to access RSS data that has been received and processed by the RSS platform.

So it’s a content management system that gets RSS data and reformats it for other applications. Right.

And who was it assigned to? Microsoft Co.

[tags]blackhat, seo, microsoft.[/tags]

Five Things You Didn’t Know About Me

I’ve been tagged… twice! Once by Randall and once by John. Thanks guys! Now I have to post 5 things about me, Pierre…

  • I’m not French… nor French Canadian.
  • SEO and web design are just hobbies of mine. I actually have a PhD in microbial genetics and in real life, I’m working at an ‘innovation consultancy’ helping big firms work out new technologies. It’s really cool!
  • I can’t drink coffee as it makes me ill. So how do I program? I started out on apple juice and Pringles but now it’s just apple juice. Word on the street is that it doesn’t give quite the same kick…
  • I’ve lived in 4 different countries (the shortest stay was 3 months) and visited over 10 countries. My favorite city by far is Frankfurt as it is a great mix of old and new (you can easily be looking at a very old building with a skyscraper in the background!)
  • The first computer I ever played with was an XT (i.e. older than a 286). It ran DOS 3.1, and the first thing I typed at the C:\> command prompt was &quot’Give me all your files". Very promptly (apologies for the pun) it replied with ‘Bad command or file name’. Bring back those days please!!!!

Now I get to tag 5 other people. MWAHAHAHA! Peter Bowyer (sorry for not keeping in touch), G-Man, James Cook (please do it in a cartoon), Nadir Garouche, and KICHUS. Come on folks, let’s hear ‘em!

[tags]blog tags[/tags]

Web Wibble – 1

Some links for your enjoyment.

And that’s it for now. Web Wibble will be a regular feature on things of sorts from now on. If you have something cool you’d like featured (even if it’s your own ;) ), drop me a line.

« Previous Entries