INCLUDE_DATA

1,000,000,000,000 Unique URL’s but Whose Counting!

by Sachin Balagopalan on July 26, 2008 · 2 comments

If you’re a person who pays attention to milestones then this one is a keeper! Google just announced on their official blog that they have indexed their one trillionth unique URL. I’d be curious to find out what that URL is - can you imagine the “curiosity” traffic!

In all seriousness however this only confirms how huge the web is and it’s still growing

To keep up with this volume of information, our systems have come a long way since the first set of web data Google processed to answer queries. Back then, we did everything in batches: one workstation could compute the PageRank graph on 26 million pages in a couple of hours, and that set of pages would be used as Google’s index for a fixed period of time. Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day. This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States. Except it’d be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections.

Interesting history lesson in how far search engine technology has evolved since the early days - indexing 26 million pages per day to indexing a trillion unique URL’s several times a day is no small feat especially when you figure in the processing power.

As you can see, our distributed infrastructure allows applications to efficiently traverse a link graph with many trillions of connections, or quickly sort petabytes of data, just to prepare to answer the most important question: your next Google search.

Well on the flip side of the coin one has to wonder if search engine technology has reached the top of the bell curve. A couple of days ago in this post I said…

Just search on any topic today and you’ll be deluged with links to literally hundreds if not thousands of sites. That in turn results in information overload and decision making as to how best to consume the information because more often than not you’ll come across conflicting views on the topic you’re searching for.

I guess the question then is as the web gets bigger how relevant are the search results vis a vis “your next Google search” going to be ? Should search results be vetted for some semblance of accuracy before being presented to the user ? Perhaps the social networking graphs and the associated data streams could be utilized somehow to present the users with relevant data?

It will interesting to see how search evolves (or dies) as Google marches towards the next milestone - indexing a quadrillion unique URL’s!


{ 2 comments… read them below or add one }

Vardaan 07.27.08 at 7:41 am

You have taken my words, I was also curious to know what would be that 1 trillionth URL..but its deceiving stats from Google anyways

Sachin Balagopalan 07.27.08 at 8:52 am

Vardaan - maybe the stats are deceiving (we’ll give them the benefit of the doubt :) ) However the bigger question is the web is no doubt getting larger (in spite of the Google stats) so the question then is how relevant are the search results ? - Read my post on this topic .. http://tinyurl.com/6f9rtk

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

copyright

Copyright © 2007–2009, Republic of Internets. All rights reserved.

Male Impotence (Male Impotence)

Erectile dysfunction or male impotence Semenax tablets Semenax increase is being unable to Semen volume volume pills Natural ingredinents in volumepills get or maintain a hardon that is certainly company more VigRX Plus VigRX