Google Now On Full Caffeine

Share via email

Google have announced on their Blogspot blog that their latest major Search Algorithm update, Caffeine, is not fully live and on all servers. From now on, Google search is even better than the competition. It is faster and indexes more sites at a greater level.

Google are so excited about this news that they have posted it on both their Official Blog and also on the Webmaster’s Blog.

Here is an explanation straight from the horse’s mouth (the horse being Google, no offence meant).

“Today, we’re announcing the completion of a new web indexing system called Caffeine. Caffeine provides 50 percent fresher results for web searches than our last index, and it’s the largest collection of web content we’ve offered. Whether it’s a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before.

Some background for those of you who don’t build search engines for a living like us: when you search Google, you’re not searching the live web. Instead you’re searching Google’s index of the web which, like the list in the back of a book, helps you pinpoint exactly the information you need. (Here’s a good explanation of how it all works.)

So why did we build a new search indexing system? Content on the web is blossoming. It’s growing not just in size and numbers but with the advent of video, images, news and real-time updates, the average web page is richer and more complex. In addition, people’s expectations for search are higher than they used to be. Searchers want to find the latest relevant content and publishers expect to be found the instant they publish.

To keep up with the evolution of the web and to meet rising user expectations, we’ve built Caffeine. The image below illustrates how our old indexing system worked compared to Caffeine:

Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.

With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.

Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.

We’ve built Caffeine with the future in mind. Not only is it fresher, it’s a robust foundation that makes it possible for us to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you. So stay tuned, and look for more improvements in the months to come.”

Source: Webmaster Central Blog and the OfficialGoogle Blog

So what does this all mean for web users and webmasters (those that own websites). For users it should mean that the latest information is more readily available. When people write new pages those should be getting listed and ranked much faster, almost in real time. So if there is a major news item you should see results coming up on the search index as it happens. This will revolutionise the way news is searched for online. People will only need to search Google for the latest news rather than root around on their preferred news websites (handy if Murdoch makes them all subscription based anyway).

So what about Webmasters? People involved in the marketing and promotion of websites always end up delving into SEO (search engine optimisation) to work out the best ways to get their new content featured well in the search index. In the early days of search it was all very simple, based on META tags. Then Google came up with the “votes” idea whereby pages were ranked based on how much other websites liked them. These votes were determined by counting the links to the websites. In time SEO’s caught on and gamed the system with their own generated links.

So the future? Well, content is king. It is likely that the most important factors going forward will be relevancy of content, authority and accuracy. What do I mean? Well, if an event occurs and 1000 people write about it, rather than the website with the most links already being top of the search pile, the site that actually writes the story first will perform better than before. If later sites are saying the same thing, this will help to promote the first one, even with no link. So from now on, the most popular news sites will not be the biggest and most established, but those that are really on the ball.

Maybe this is all in response to Rupert Murdoch’s plans to make his news websites subscription based – Google has created a way to deliver the latest information to its users without needing to rely on the old school reporters.

Lets see how quickly this post appears in Google. The current time is 18.51 hrs BST. Publishing now…….

  • Update: 18.58hrs, nothing. But then again I am a day late in posting this “news”. Maybe this page will never be indexed?
  • Update: 19.22hrs, still nothing. It seems that Google has popped to Starbucks for some more coffee. Of course, there is some poor SEO here, in terms of the duplicated/copied content that is the voice of Google above. So maybe it will not be listed at all?