Query String Collapsing
One of the problems that search engines and analytics packages have is dealing with URLs with query strings. For example, the following two URLs will be return the same content from any given content management system but they are two different URLs in the eyes of search engines and analytics packages:
http://example.com/page.php?id=1&title=hello&from=homepage
http://example.com/page.php?title=hello&from=homepage&id=1
So how can we figure out that they are actually the same URL really? The solution I came up with is a simple multi-step processing algo. It goes like this:
Take the query string variables and save them in an array. So in the case of our first URL, the array would contain the following key=>value pairs:
$vars = array('id'=>'1', 'title'=>'hello', 'from'=>'homepage');Next, sort the array by alphabetical order based on the keys names, like:
$vars = array('from'=>'homepage', 'id'=>'1', 'title'=>'hello');Now rebuild the URL based on the new order of the variables:
http://example.com/page.php?from=homepage&id=1&title=helloBy now the trick should be clear: if you do that to all the URLs, you would always reach the same final re-composed URL as long as the variables are same (i.e. the same names and one URL doesn't have extra or missing variables).
I call this Query String Collapsing. Why "collapsing" instead of normalization or decomposition? No real reason apart from thinking about this as collapsing a whole slew of URLs into a single representative entity. And I just like that name more that way
With this, what can we do with analytics? Save both the original URL as requested and the collapsed URL. This opens up a nice set of funky things you can do, but that's another post...
Subscribe to Things of Sorts
If you liked this post, please subscribe to the Things of Sorts RSS feed: ![]()
