Google Down in the UK

Various reports of Google being down (including for me) here in the UK. It seems to be a few datacenters so it works some times but mostly not. Reports also talk about G Docs, Mail, and YouTube. Sometimes a redirect to google.co.uk works, but mostly even that fails. Some people can get to google.co.uk if they browse to that directly. I've been logged into GMail since last night and it works OK.

Seems to me that there is a DNS issue at play here in that if your browser requests a fresh IP resolution, it works, but the IP addresses fail. If your browser has an IP address cached it seems you're fine.

Reports:

Anyone else seeing this?

Broken GMail Login

This is becoming more frequent so I thought I'd mention it: Once in a while, I can't log into GMail. It started on Gecko-based browsers on OSX; switching to Safari invariably worked. Then it started happening with Safari, and now it's broken on my Ubuntu machine using Seamonkey and Opera.

There are two ways it breaks:

  • Eternally looping on the loading progress bar. What happens is that it keeps refreshing the loading page and never making it to the email list.
  • Mostly on Opera, it just returns to the login page although the username and password are correct; it doesn't show any error messages.

I've learned to be very quick at clicking the the simple HTML view link, but even with all the practice sometimes even that doesn't work.

Anyone else seeing this?

A Review of Plurk: Bad

A little context where this review is coming from: Given the recent, ummm, "uptime challenges" at Twitter (My profile: pierrefar), there was a massive flurry about another service called Plurk. So much was Twitter down and so many cool people moved over to Plurk that I joined the fun and set up an account.

A few days into this exodus of sorts, Brian asked who was still around on twitter, and I replied that I didn't like it so was still around. Surprised, Brian suggested I give Plurk a second chance and so I agreed. Plurk got a second look, and I promised Brian a review. This is the review.

In short: it's bad, and not only that, I think they won't make it as a company with the product with its current interface. Here is why.

What is Plurk?

Plurk is a service for engaging in conversations with other people. It is centered around a timeline, a very cool looking scrolling interface that maps conversations as points on the timeline. The conversations start with someone posting some text and the replies come in attached to the original reply. A screenshot is below:

Screen shot of Plurk

The timeline's 'now' is at the left, and the past is going to the right. When you're browsing past conversations, you scroll left to go to later ('future') posts. At first pass, this is very counter-intuitive: for some reason I expect the future to be to the right and you go back in time by scrolling left - i.e., the reverse of the Plurk arrangement. I've seen other comments along those lines but I can't for the life of me find them. However, this is the best arrangement for a timeline written in English, and Plurk either are genius designers or extremely lucky people. Why? English is written left to right and so as you scroll the first thing you see of a new conversation is the starter's name and the first few words in the sentence of the conversation. If you were scrolling right to left, you would actually see the last few words in the conversation's start's sentence. The Plurk arrangement works much better.

The conversations themselves are shown as little rectangles. You click on the rectangle and it expands to show the full conversation and replies. The layout is a forum thread layout with avatars, usernames, time stamps, and icons. The rectangles are placed along the time line in relation to when the started, and plotted at randomly in the vertical position.

Plurk also has karma, that eternal currency of Web 2.0. The more you use Plurk (start conversations) and the more people you invite, the more karma you get. And the more karma you have the more icons you get. However, if you don't use Plurk for a day, the karma starts dropping. I peaked at 20+ karma about 2 weeks ago and now I'm under 8 karma. It's an interesting twist to an age-old way to foster user engagement.

Why is Plurk Bad?

Plurk is a bad service because the timeline arrangement is the worst implementation to show conversations. There is absolutely no need to have conversations plotted in a timeline. A simpler listing (the extreme of which is a forum-index type of listing) would do much better. Right now, the timeline mars the usability relating to the fact that Plurk is about conversations and not microblogging like Twitter is. It's a gimmick and an annoying one for that.

To know just how annoying this is, try not going to Plurk for a few days and come back. Heck, go to sleep and check it in the morning: you'll have dozens of conversations that have either been started or updated and you can't just seem them and quickly browse them. No, you have to scroll, click each one to expand it, and then read. And there isn't any obvious way to see which conversations were ones you engaged in previously to see if there are any new replies. Nope, they're all lumped together. Raise your hand if you simply just gave up and marked all conversations as read because you just can't be bothered.

And what's with the karma loss? Listen, I try to have a life outside the internet and certainly won't center my life around Plurk. If I don't visit for a day or two, I should not feel like I'm being punished. This is the first time I see anyone implement a karma loss over time idea. Karma should only be deducted if other members of the community feel that way, and even then, it should be implemented carefully - there is no easy answer for this question, but Plurk's implementation is definitely wrong.

Finally, a pet peeve from a marketing point of view: I get an email every time someone wants to follow me so that I can authorize them. Ummm, I love it for people to follow me and they shouldn't ask permission. They should come and go as they please. Twitter called it spot on: people follow you and stop following without intervention, but there are two options: you can lock your conversations or you can use direct messages which are private. That covers pretty much all shades of having an open to private conversation. Having a single blanket permission system by default is weird.

So all in all, a very crappy implementation of a potentially good idea. I've written it off for now but I'm sure my network of friends on Twitter and elsewhere will let me know if things improve or not.

New Word for Spam: Linkosphere

Yes, that's right folks. Step right up. We have a new buzzword to hide the fact that we're scraping content and sending trackbacks to the original content. The new word is... Linkosphere.

So, pray do tell us Pierre, where would you come up with such a silly name? Why I'm glad you asked. It's the service that's been spamming me blog for the past few months, hosted at the one and only ectio dot us. See, them scrapers have a serious claim: "Find something to read, guaranteed!" I believe them given all the scraping they're doing.

And thus because I am in the mood to return them the favo(u)r, I hereby declare them the prototypical scraposphere service. Beat that!

Now Also on Plurk and friendfeed

With Twitter's inconsistent downtime, I had to find somewhere else to hang my hat. I'm trying two places:

  • Plurk, which is Twitter on steroids, jam packed with fun conversations. A full review is coming soon. Please note that I blame Brian Wallace from NowSourcing for my enjoyment of Plurk.
  • friendfeed. I still need to dive into ff deeper but it seems very cool so far. In particular, the Social Media room is interesting...

So if you're a user of any of those services, join me :)

What is YahooCacheSystem?

I just started noticing some hits coming from a few *.yahoo.net IP addresses with a user agent of just "YahooCacheSystem" and requesting only the raw RSS XML feed so far. All requests are HTTP/1.0 GET, setting the HTTP_ACCEPT to */*. No other headers are set.

The first hit I've seen was on April 27th, which came from the IP address 216.39.58.78. Back then, that resolved to htproxy3.ops.re4.yahoo.net. However, ever since, the hits are all from a different C-block, 209.131.41.*, which resolves variously to, htproxyX.ops.sp1.yahoo.net (X is a number like 1 or 2 to give htproxy1.ops.sp1.yahoo.net or htproxy2.ops.sp1.yahoo.net). Even more recently, the IP addresses remained the same, but the hosts they resolve to changed to htproxyX.ops.re4.yahoo.net (again, X is a number to give htproxy1.ops.re4.yahoo.net or htproxy2.ops.re4.yahoo.net).

I post about this bot for one simple reason: the UA is very intriguing and the fact that it's requesting just RSS XML feeds is also interesting. Are we going to see a Yahoo! service or a set of services that deal with just blogs?

TechCrunch reported way back in 2005 about the launch of Yahoo! blog Search, which back then and today has pointed to what Yahoo! calls the News Search, which according to the web page is to "Search real-time news stories from Yahoo! News and across the web." That's fine and dandy, but it's no blog search per se.

So the YahooCacheSystem bot could represent one of two things:

  • Yahoo! is consolidating its backend infrastructure to deal with RSS-based sites better. So they are building a centralized RSS cache for all their services to use. For webmasters, this means we now have a new analytics data point we can look at.
  • Or... (wait, I need peer at my crystal ball...) Yahoo! is moving towards building a serious set of services centred around XML feeds. This could mean we could see a true blog search product soon, or something else we can only guess at.

So which one is it? I can only provide guesses. Given the utter lack of evidence and, more importantly, rumors, I'm leaning towards the infrastructure explanation. However, a good infrastructure is necessary for a major strategic shift or product launch. Time will tell.

A Little Bump

Is Matt McGee the last person on Twitter? Seems so.

So come on, Googs, help him out.

Live.com Spambot Ignores robots.txt

Oh, MSNbot, when will you ever learn? I won't rehash the story that lead me to blocking MSN's referral-spamming bot, and that seems to have worked a bit. The problem is that the referral spam is still coming in! Yes, MSNbot is blocked but the spammy hits are still coming in.

Case in point, this hit from today over at Social Alerter:

/tips/how-not-get-dugg
  • At: 19 April 2008 11:04:39 AM GMT
  • Referred from: http://search.live.com/results.aspx?q=alerts&mrt=en-us&FORM=LIVSOP
  • Remote: livebot-65-55-165-107.search.live.com (65.55.165.107)
  • Request: HTTP/1.0 GET
  • Accepting:
    • HTTP: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
    • Charset:
    • Enconding:
    • Languages: en-us
  • UA: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
  • Cookies:

Is it just me or is this beyond comical now?

I’ve Left AdSense Speechless

The screenshot below is from my AdSense account. It seems I have reached the pinnacle of optimization as no new optimization suggestions have been recommended since February.

AdSense screenshot

Is this a bug or account specific? Each of the reports I see are different.

The Real Strategy Behind Google App Engine

I just had an "OMG this will change the world!" kind of moment while playing for just 5 minutes with Google's App Engine. Let me explain.

A bit of background first: The Google App Engine is a newly-launched service from Google, that for a change, seems to be well thought out. The service offers a Python-only environment (for now) to build applications locally and host them on Google's vast infrastructure. The idea here is that you don't have to worry about scaling your application to handle massive traffic and let the App Engine running on Google's servers deal with it. The Engine comes with lots of goodies like handling database stuff, user logins (and what a boon that will be for Google accounts), and others. All in all, a nice comfy environment for rapid application development and reliable hosting.

But from all the buzz on the net, I think there is something missing that I just hinted at above:

to build applications locally and host them on Google's vast infrastructure

App Engine comes with its own development setup that runs off your computer (available for Windows, OSX, and Linux). You develop the application on your computer, run it, test it, add features, and then upload it to Google's computers. My question is this: What's stopping Google from turning the local development code into a full desktop-based runtime for web applications? Why keep it as a development-only environment?

Let's look at this from another angle: the desktop-webapp integration market. Adobe recently released their oddly-named AIR (Adobe Integrated Runtime). In the AIR-world, you can write applications in HTML/CSS/JS or Actionscript and package them into desktop applications that run within AIR or within the Flash player in the browser. The AIR environment is available for Windows and Macs, and Linux support is on the way. Brilliant move: one code base, both browser and desktop functionality.

Microsoft also has a similar play in the form of .Net, and more specifically Silverlight. The .Net runtime is available for many devices and platforms (mobile, desktop, and I think even the XBox). With Silverlight, Microsoft's play is to give developers a platform to use .Net in the browser; this is coming in Silverlight 2.0 this summer. So with this, again, one code base can be used on the web and on the desktop to give true multi-platform programming.

There are other entries in this market, Mozilla Prism being a prominent example. They all promise the same thing: one code, many places to run it with varying details.

Now back to App Engine and to the question I posed: imagine Google comes out with a desktop runtime/environment that turns App Engine webapps into desktop-based apps. This will be directly parallel to Adobe's AIR but with a big difference: the same code will also be easily deployable on a reliable and scalable infrastructure - Adobe doesn't have that.

There is another difference: because of the way App Engine works, you could easily imagine it talking to Google Apps like Google Docs etc. A desktop App Engine will bring Google's applications onto the desktop and open up a market-disrupting war: direct office productivity competition with Microsoft. To rephrase, App Engine could be Google's way to enter Microsoft's turf on the desktop.

So any evidence for this? Nothing solid, so it's all speculation, but I'll point to three hints:

  • The name. It's not App Server or App Service but App Engine. Google understands branding well enough (it's arguably the main source of their traffic) so their choice of words here is intriguing. And I can't help but think that Google's App Engine will drive some sort of Google Gears. Nudge, nudge, wink, wink.
  • When creating an application, you can specify that only users of a certain Google Apps domain can use the app. This integration with Google Apps is perhaps hinting at bigger things to come.
  • The APIs available in App Engine: already App Engine supports dealing with mail, and given the point above, you can imagine an API for the other Google Apps. This would enable a go for the desktop market.

What do you think? I think this is the best move out of Google yet and as disruptive as AdWords was.

Killing Live.com Bot

I've had it. The live.com spambot, aka msnbot, is officially not welcome either here or at Social Alerter. Why? The bot is still referral spamming. How much? 100% of my live.com referrals at Social Alerter are actually the bot's spam. Granted the absolute number of hits is only in the low tens, but it is not right and such behavior is no longer welcome. And no, the constant lies that this behavior has stopped do not help.

For a background on this, start here, then read this post, and close off with the follow up.

Bye, bye. I hope to see you never.

Houston, We Have a Twitter

That's right folks. Today I decided to actually do something about my Twitter account. Follow me at pierrefar.

The question is *what* will I do with the account? It may be a few days before I dive in properly :) See you @twitter

How to *REALLY* Deal with Hackers

Donna over at SEO Scoop asks an excellent question: more and more we're seeing website attacks for SEO purposes, not more malicious intents (like stealing credit card details). Donna asks, how should we deal with this kind of attack? I'm going to hazard some suggestions.

First things first. We're not dealing with hackers. Nosiree, we're dealing with crackers. A hacker is a well-seasoned coder. A cracker is a hacker who exploits security holes for nefarious purposes.

With semantics out of the way, here are some suggestions:

  • Googlebomb yourself: If you get attacked with, for example, the Slash One Wordpress exploit, essentially you're going to get a lot of spammy "content" pages and lots of links to them. So what happens if you use .htaccess or otherwise to redirect all request to wp-content/1/* to, say, your site's home page? Or why not to your newly minted, specially created, [Texas holdem play online] site? Hey, you're probably going to get a lot of traffic, so use it! Here is the code:
    RewriteEngine On
    RewriteRule wp-content/1(.*)$ http://my-new-spammy-aff-site.com [R]
    Essentially, you'll googlebomb yourself with their links and use their traffic.
  • Use robots.txt as a defensive tool: A search engine doesn't need to see wp-content anyway, so block it:
    User-agent: *
    Disallow: /wp-content
  • It's the keywords stupid: you just got someone dump a load of keyword-laden pages with targeted keyword links back to them. Hello? Anyone care to turn this into a keyword research tool? Here is the pseudocode for the tool:
    Do a Google search for [inurl:wp-content/1]
    Scrape the URLs from the SERPs
    Scrape the spammy URLs
    For each spammy URL, do a [link:] search
    Scrape the backlinks and extract the anchor texts
    Save the keywords along with the spammy HTML
    Write a front-end to search the database
  • Report them! Figure out the IP address of the person who uploaded the spammy pages and report them. If you get trackback spam to the spammy pages, find the IP address of the trackback spammers and report them. Most SEO spammers will be using hosting services and their own computers. It is possible (although I'm guessing unlikely) they'll be using a proper botnet.

So like pretty much in SEO, perhaps even this can be dealt with using some creativity... I'm sure there are better ways to deal with such spam, and the idea is to think about the opportunities here. Good luck!

MS Live Still Referral Spamming

That's right folks, after the initial fuss, the backtracking (with its very own official statement!), Microsoft's Live search engine is still doing these referral spamming requests.

I'm seeing this on my new service Social Alerter. The request details:

  • Remote: livebot-65-55-165-77.search.live.com (65.55.165.77)
  • UA: of Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
  • Referring URL: http://search.live.com/results.aspx?q=social&mrt=en-us&FORM=LIVSOP

Full list of IP addresses doing this:

  1. 65.55.165.90
  2. 65.55.165.43
  3. 65.55.165.96
  4. 65.55.165.120
  5. 65.55.165.100
  6. 65.55.165.76
  7. 65.55.165.16

The fake search queries are all either [social] or [alerts].

Anyone else seeing this? It's clearly not fixed as they claimed and is starting to get annoying.

Announcing Social Alerter

Doesn't it suck when you discover your site is down because a page went popular on Digg? Wouldn't it be nice if you somehow knew that your site is slowly inching its way up the upcoming list? And what about delicious? That could be a serious hit of traffic too.

Well now you can get a warning. Over the past few months, I've been slowly building a service called Social Alerter. Social Alerter is a free service that alerts you when your websites are about to go popular on Digg and delicious. You can monitor as many sites as you want and once it finds one, it sends you an email. You can use it to monitor your own sites, your competitors' sites (ha ;) ), and your favorite sites. You simply sign up and know that there is an eye out doing all the leg work.

This is the service in a nutshell. I've written a huge help section and if you read just one page, read the Social Alerter crash course.

Review of 2007, Predictions of 2008

This kind of post is something a few bloggers do. I enjoy reading them, so I thought I'd try my hand. It's a bit of the final score-card for the year and hopefully inspiration to do better (whatever that actually means) for next year. So what happened with me in 2007?

January

January is probably the best month of 2007. It kicked off the year with my first ever digg home pager, me doing a live podcast/talkshow, and the first rant of the year which set the pace for the months to come :D

It wasn't all fun and joy though: in January, eKstreme.com suffered a DoS attack.

February

February brought lots of developments: I started moderating at Cre8 a Site Forums, easily the friendliest place on the net. The second Digg home pager arrived too, and a major statistical analysis of the Socializer data got a lot of people interested.

March-July

Very quiet period. In March, I was busy thinking about my online strategy about eKstreme.com, blogSci.com, and the other major property I owned back then, fontfox.com. The outcome of that is a major change (for the better!) monitization effort of eKstreme.com, a decision to keep blogSci.com ad-free, and realizing that I wasn't doing much with fontfox. In the end, fonfox got sold in May.

In July, this blog got its first ever guest post. It was a great piece. However, this effort to bring fresh blood into this site was a dud: a lot of other people agreed to blog post but none actually sent me stuff :( Waaah.

Of course, lots of ranty anti-Google posts were written in this period. Back then, Google thought it was OK to abuse user data in many ways. To this day I still think they are abusing our data and it will probably get worse in 2008.

June-August

While the blogging was quiet, a lot was happening in the background. The CMS of eKstreme.com has been showing its age and slowing things down. The strategic review in February concluded that this has to be fixed. So the whole site was moved to use Wordpress as the CMS, which involved a lot of hacking to get WP to like my SEO tools and not break them. I also moved hosts.

July onwards

I started taking a very close look at the bots/crawlers hitting eKstreme.com and blogSci.com. This research resulted in a lot of bot-related posts and insights. I'm still collecting data to learn more about how bots look like. By bots, I mean the more malicious scraper spammy types, not the nice ones like Googlebot and Slurp!.

Out of this also came the realization that msnbot was misbehaving. First, the authentication was broken, second, it was not obeying the robots.txt file, and thirdly, a very strange pattern of bot activity from live.com was detected. This resulted in third Digg home pager. A few weeks later, MS backtracked. I don't know if it had anything to do with my post or not - I doubt it.

All in all, a great year. Stay tuned for 2008 because there is a lot of great stuff coming. They'll be announced here as always.

Predictions for 2008

Now the really fun part :) What will happen in 2008? Here are some of my predictions:

  • Online office: Microsoft will release Silverlight 2.0 in early '08 (we already know that). Shortly afterwards, they'll release an online version of Office based on that. This will disrupt the market, making Google's Apps look like toys and Zoho very very vulnerable. Zoho will get acquired.

  • At least one major privacy scare on the web. Top contenders are Google and Facebook, but Microsoft cannot be discounted. My prediction is that it will be related to user profiling for ad-targeting purposes.
  • Rich Internet Applications (RIA) will arrive in full force. Everyone will look at each other and go 'eh' until a killer app is released. That app will probably be the online MS Office. Top contenders are Silverlight and Flex from Adobe. Flex has no chance against Silverlight because Adobe doesn't know how to write web-friendly software (like Acrobat Reader plugin for browsers, which sucks) and certainly is no match for the developer-friendly MS. Flex will live through 2008 though because end-consumers will think it's the Flash player.
  • In Search: Google will continue to dominate, but slow down its growth. Semantic search engines like Powerset (which I'm a member of the public beta testers) will rock. Hakia will figure out that its biggest obstacle to world domination is its index: full of spam and very stale. Their technology is great though.
  • Yahoo will chug along. A few gem products will come out of their R&D efforts along with the continuous stream of half-baked ideas. The new delicious service, which finally loses the an.nno.ying dots from its name will be a great hit.
  • Generally: more memes and more bloggers working in synchrony for a common cause.

So... will I eat my words in December 2008? Stick around and you'll find out :)

GTalk Translator Bot is Mediocre but Useful

By now you must have heard that Google Talk now includes translation bots you can invite into a conversation. When you invite any of these bots, they translate whatever you type from your language into the target language. A very brilliant idea with a perfect implementation mechanism, but does it work? Let's find out.

I've mentioned before that I am an Arabic speaker. Given that Arabic sports one of the most convoluted grammars on Earth, I thought what better way to test the bots by having a solo chat with the en2ar bot. That is, I write in English and watch its Arabic responses. The results are below:

Google Talk translation bot conversation translating English to Arabic

Arabic speakers among you will spot many mistakes but the ideas are still mostly translated well. With basic phrases, the translation is flawless in most cases. With more convoluted writing, the translation breaks down. You can see two comments relating a bad translation. The first one said "This translation sucks" which colloquially in English, that means it's bad. The translation used the meaning of "suck" literally, i.e., something you'd do to straw and some juice. The next phrase saying "This is a bad translation" was translated, well, badly, but the idea was still conveyed. The translation in Arabic actually says "This is the bad of translation". This grammatical structure is used in Arabic to emphasize the pinnacle of something (i.e. exemplary in its class), so in this case, the Arabic actually means "This is the worst of translation".

So all in all a useful feature but I don't see it being used for anything important like a business chat: the mistakes are simply too frequent for this to be used to convey complex ideas. It is machine translation after all and the state of the art is still bad.

Query String Collapsing

One of the problems that search engines and analytics packages have is dealing with URLs with query strings. For example, the following two URLs will be return the same content from any given content management system but they are two different URLs in the eyes of search engines and analytics packages:

http://example.com/page.php?id=1&title=hello&from=homepage

http://example.com/page.php?title=hello&from=homepage&id=1

So how can we figure out that they are actually the same URL really? The solution I came up with is a simple multi-step processing algo. It goes like this:

  • Take the query string variables and save them in an array. So in the case of our first URL, the array would contain the following key=>value pairs:

    $vars = array('id'=>'1', 'title'=>'hello', 'from'=>'homepage');
  • Next, sort the array by alphabetical order based on the keys names, like:

    $vars = array('from'=>'homepage', 'id'=>'1', 'title'=>'hello');
  • Now rebuild the URL based on the new order of the variables:

    http://example.com/page.php?from=homepage&id=1&title=hello
  • By now the trick should be clear: if you do that to all the URLs, you would always reach the same final re-composed URL as long as the variables are same (i.e. the same names and one URL doesn't have extra or missing variables).

I call this Query String Collapsing. Why "collapsing" instead of normalization or decomposition? No real reason apart from thinking about this as collapsing a whole slew of URLs into a single representative entity. And I just like that name more that way :)

With this, what can we do with analytics? Save both the original URL as requested and the collapsed URL. This opens up a nice set of funky things you can do, but that's another post...

Irony

Support Wikipedia!

Hint: Look at the source code...

MS Admits to Referral Spamming for As Cloaking Check

Hot off the press: after the fuss raised by a bunch of us a few weeks ago, Donna now reports that Live ponies up about the referrer spam. They've issued a statement where they:

  • A bug that caused issues with AdSense/Overture reporting.
  • Distorting site statistics with unfilterable bot traffic (except we know how to filter them!)
  • Polluting HTTP logs with inappropriate terms (true).

Microsoft also states that "Hopefully webmasters have also noticed these issues disappearing. If you are still experiencing any issues, please contact us before you block MSNBot, to see if we can address the issue."

Let me be the first to say a big thank you to Microsoft for making a very solid public response to the issue and answering our questions. This kind of transparency is exactly what fosters a good relationship between a search engine and webmasters.

And yes, Live.com team, I do default to your search engine for my searches. Works a treat (most of the time ;) ).

« Previous Entries  

Site Navigation

Blog Categories

Popular Pages

The most popular pages on eKstreme.com.

Search

Subscribe

Subscribe to RSS 2.0 feed

Community

 
thermodelly