Why Google Dances and the reasoning behind it


New Member
I found a great article the other day on explaining what google dance is and the reason it does it. I will show the important parts below, and at the end give you a link to the article if you want to read the whole thing.The name "Google Dance" is often used to describe the index update of the Google search engine. Google's index update occurs on average once per month. It can be identified by significant movement in search results and especially by Google's cache of all indexed pages reflecting the status of Google's last spidering. But the update does not proceed as a switch from one index to another at one point in time. In fact, it takes several days to complete the index update. During this period, the old and the new index alternate on http://www.google.com. At an early stage, the results from the new index occur sporadically. But later on, they appear more frequently. Google dances. The Google search engine pulls its results from more than 10,000 servers which are simple Linux PCs that are used by Google for reasons of cost. Naturally, an index update cannot be proceeded on all those servers at the same time. One server after the other has to be updated with the new index.Not only Google's index is spread over more than 10,000 servers, but also these servers are, as of now, placed in eight different data centers. These data centers are mainly located in the US (i.e. Santa Clara, California and Herndon, Virginia), indeed, in June 2002 Google's first European data center in Zurich, Switzerland went online. Very likely, there are more data centers to come, which will perhaps be spread over the whole world. However, in January 2003 Google has put a data center on stream which is again located in the US. The records for a domain at the responsible name server constitute for how long the record may be cached by a caching name server. This is the Time To Live (TTL) of a domain. As soon as the TTL expires, the caching name server has to fetch the record for a domain again from the responsible name server. Quite often, the TTL is set to one or more days. In contrast, the Time To Live of the domain http://www.google.com is only five minutes. So, a name server may only cache Google's IP address for five minutes and has then to look up the IP address again. Each time, Google's name server is contacted, it sends back the IP address of only one data center. In this way, Google queries are always directed to different data centers by changing DNS records. On the one hand, the DNS records may be based on the load of the single data centers. In this way, Google would conduct a simple form of load balancing by its use of the DNS. How data centers, DNS and Google Dance are related, is easily answered. During the Google Dance, the data centers do not receive the new index at the same time. In fact, the new index is transferred to one data center after the other. When a user queries Google during the Google Dance, he may get the results from a data center which still has the old index at one point im time and from a data center which has the new index a few minutes later. From the users perspective, the index update took place within some minutes. But of course, this procedure may reverse, so that Google switches seemingly between the old and the new index. The progression of a Google Dance could basically be watched by querying the IP addresses of Google's data centers. But queries on the IP addresses are normally redirected to http://www.google.com. However, Google has domains which resolve to the single data centers' IP addresses. These domains as well as their IP addresses are shown in the following list. Domain IP Address www-ex.google.com www-sj.google.com www-va.google.com www-dc.google.com www-ab.google.com www-in.google.com www-zu.google.com www-cw.google.com For every domain www-xx.google.com, there is an additional domain www-xx2.google.com. The IP address of such a domain ends on .101 instead of .100. These pairs of domains and IP addresses belong to the same data center and, hence, the same index is searched by queries on them. The beginning of a Google Dance can always be watched at the test domains www2.google.com and www3.google.com. The reason for having www2 and www3 is rather to show the new index to webmasters which are interested in their upcoming rankings. Many of these webmasters discuss the new index at the Google forums out on the web. These discussions can be observed by Google employees. At that time, the general public cannot see the new index yet, because the DNS records for http://www.google.com normally do not point to the IP address of the data center that is updated first when the update begins.As soon as Google's test community of forums members does not find any severe malfunctions caused by the new index, Google's DNS records are ready to make http://www.google.com resolve the the data center that is updated first. This is the time when the Google Dance begins. But if severe malfunctions become obvious during this test phase, there is still the possibility to cancel the update at the other data centers. The domain http://www.google.com would not resolve to the data center which has the flawed index and the general public could not take any notice about it. In this case, the index could be rebuilt or the web could be spidered again. If you wish to see the entire article please visit this link:http://dance.efactory.de/sometimes, there is such a thing as 'too much information.'this....is defintely one of those times. Nah I disagree, this is really useful information if you care anything about Google and how it works it was a joke sir...sheesh i had no idea about all the behind the scenes stuff with google until now... ill have to read it a couple more times to really absorb it though...10000 servers?!Do you realize how many fricken servers that is!?I dont because I am sure there is probably 100x that much on the globe right now, chances are there is probably 2 servers to every male adult (i'd laugh really hard if thats right).Very nice info, too bad I'm too tired for any of it to absorb it really is amazing and baffling at the same time. To think from what theystarted from.Great info BWM, thanks for taking the time to do it.I was wondering if Google is now updating its information on a day-to-day basis rather than once a month. Does anyone know if this is true?Google certainly has changed the way they update things, and the google dance is a thing of the past. You'll see a lot of small changes every day now, as google spiders so much now, and updates the cache for a lot of pages everyday, or every second day.Page Rank and backward links still don't update often, so you should see some big changes sometimes, but I don't think you'll actually see the dance anymore like we used to.It sure has.It should be noted that although this article would have been great a year ago, it's pretty out of date. Google completely changed the way it danced just after the original post.It should be noted that the article was posted a year ago and it was great b_heyer wrote:
Hmm looks like huge information, so it creates bad impression on it. My suggestion is instead of posting all the content better try to post the link which holds the information.