omgtiffarhea
New Member
I have a strange issue that is currently going on with ozzu. Since about July 18th I noticed Googlebot had stopped going to the site. Normally Googlebot crawls about 3000-15000 pages per day.I also have the site in Google Sitemaps so I went in there to see if there were any errors that might explain why Googlebot has stopped going to the site. There was indeed an error which says: 5xx error robots.txt unreachable There is a detailed description of this error which says:Quote:I have no clue what the problem is, but at work I start with troubleshooting 101 and the first thing I would think to do is go back in time to July 17th or 18th and ask myself what if anything did I change?hmmmmmm. This is interesting. From herehttp://forums.searchenginewatch.com/sho ... eadid=2786Quoteid you see this page: http://www.google.com/support/webmaster ... tx=relatedYes its not that. There are absolutely zero requests of Google making requests in the logs since about the 18th of July.Hitting a few other forums it sounds like you are not alone, and many experienced this starting the end of March to early April. No answers either.There was some guesses saying it was learning to change it's frequency to sites based on content. Two examples that they guessed were sites that didn't update often no longer required regular scans, and sites that had so much new content (like a forum) it would reduce its scan frequency, but increase its scan amount. The final guess is that the demond seed has taken over the internet, locked all the doors, and is incubating a new child.Guesses or not there are people out there just like you.Hey guys I just thought I would let you know that the problem got fixed today. Googlebot is now active again and all my Google Sitemap errors have vanished too. So everything is great in ozzuland!YAY Happy times in Ozzuland Bigwebmaster wrote:I have no idea, and I doubt I will ever know. I made lots of changes just in case it was something on my end. As far as I know the problem could have very well been something on Google's end too. Regardless whether it was something I did or not the problem seems to be fixed. Googlebot has already crawled over 5000 pages today and counting.In case you are curious on exaclty what changes I made here is what I did:Changed Ozzu's IP address to a new one and so that it was no longer shared with the name server IP addressCreated a reverse DNS entry on the IP address to point to http://www.ozzu.comRemoved all of the banned IP Addresses from my firewall just in case something there was causing problemsFixed a glitch in the Ozzu sitemap files which was making so the content type was text/html when it should have been text/xmlModified Server Tokens in Apache so that the server reveals less about itself as far as versions and what modules are installedRemoved the Etag entry on the headers for robots.txt which gets added automatically. I removed it with the "FileETag None" directiveCleaned up the robots.txt file some for entries that were no longer neededRemoved Ozzu from the old server, and removed all DNS entries on the old serverUpdated Apache to the latest version (it was slightly out of date)Updated other software to their latest versionsRebooted the server (you never know!)Posted this problem in a few other places other than ozzu such as webmasterworld and the Google Sitemaps GroupI have a contact at Google that I discussed about this too, although his resources are limitedEmailed the Google Sitemap departmentMade sure that robots.txt wasn't in DOS formatWent through all of the apache log files to see if I could find any evidence of what was going onSniffed the network packets on the server to see if Google was even attempting to make a TCP connection at allDisabled firewall for awhile to see if that would do anything. I also rechecked all the settings to make sure everything was correctNormally when anybody tries to access something at ozzu.com without the www it is permanently redirected to the www domain (301 redirect). I kept this how it was except for the robots.txt file I made it so that it would serve it both on the domain with and without the www . There will be no redirect for that fileBefore I had 404 pages being redirected to the ozzu homepage. I removed that redirect and now it just serves the standard 404 page. I might customize that a bit more in the future but was worth mentioningRemoved all the banned IPs from the PHPBB Admin panelTested another site on the same server as ozzu to see if Googlebot was able to successfully reach that. While the problem still existed on ozzu, Googlebot was successfully crawling this other site which ruled out the fact that firewalls were to blame.Stalked logs 24 hours a day in hopes that I would see an entry by Googlebot. Finally saw an entry this morning where Google Media bot requested the robots.txt file. Soon after that Googlebot started doing its thing.The mediabot has started indexing for googlebot recently, So it might have been a problem googles end regarding that.Yes it very well could be related to that and how they have a caching system now. Google Media Bot obviously retreived the robots.txt file for the regular Googlebot today. Until today, there havent been any Googlebot or Google Media bot entries for the last 10 days. So both bots were unable to reach ozzu for some reason. Now today both bots are retreiving pages from the server.I really wish I could ultimately know what caused the problem, but at least its fixed.Have you checked the CHMOD on the robots.txt file? - Jacob Kerr WI Works, Inc. - Web Development EngineerSee RFC 2616 for a complete list of these status codes. Likely reasons for this error are an internal server error or a server busy error. If the server is busy, it may have returned an overloaded status to ask the Googlebot to crawl the site more slowly. In this case, we'll return again later to crawl additional pages.http://www.google.com/support/webmaster ... swer=35149