google and robots.txt

Thread starter Givigier
Start date Apr 26, 2012

G

Givigier

New Member

Apr 26, 2012

#1

Will a page that is disallowed in robots.txt still have a google pagerank after a database update?Does the robots.txt prevent the page from being accessed or cached by google?How does google treat '*' in robots.txt files?Yes, here's more info about that (from the big men themselves...)http://www.google.com/bot.htmlFYI: ever wondered what Google's robots.txt was? http://www.google.com/robots.txtRobots.txt is a standard document that can tell Search Engine Bots not to download some or all information from your web server. For information on how to create a robots.txt file, see The Robot Exclusion Standard[url]==============The robots.txt prevents the GoogleBot from accessing the page thus the cacheing is also prevented.Google automatically takes a "snapshot" of each page it crawls and caches it. This enables us to show the search terms highlighted on text heavy pages so users can find relevant information quickly, and to retrieve pages for users if the site's server temporarily fails. Users can access the cached version by choosing the "Cached" link on the search results page. If you do not want your content to be accessible through Google's cache, you can use the NOARCHIVE meta-tag. Place this in the <HEAD> section of your documents: <META NAME="ROBOTS" CONTENT="NOARCHIVE">This tag will tell robots not to archive the page. Google will continue to index and follow links from the page, but will not present cached material to users.If you want to allow other robots to archive your content, but prevent Google's robots from caching, you can use the following tag: <META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">Note that the change will occur the next time Google crawls the page containing the NOARCHIVE tag (typically at least once per month). To control whether the page is indexed, use the NOINDEX tag; to control whether links are followed, use the NOFOLLOW tag. See the Robots Exclusion page for more information.http://www.google.co.in/webmasters/3.html=========CheersIf I have a page that I don't want to have crawled in folder1.....and I put a disallow statement in my robots.txt file to deny access to folder1...then I link to this page from my site's index page. Will the spiders index the page? Or will the disallow prevail even though I linked to the page from a spiderable page?if there is link from another site on that it will get pagerank anywaysI'm not worried about PR. What I don't want to happen is for my login page to show up in search results. It's only for the site owner not the general public.phaugh wrote:If I have this code...Code: [ Select ]There seems to somewhat of a misunderstanding here ...NOARCHIVE tells Google not to cache the pageNOINDEX tells Google not to include the site in search resultsNOFOLLOW tells Google to ignore links on the pageIf a page is NOINDEX'd there is no way page rank can be assigned, since it is not included in the database. This also prevents a site from being included in search results.A page that is NOARCHIVE'd can be found in the search results, and is assigned page rank. However no snapshot is kept on file.With NOFOLLOW the linked pages will not receive page rank Google will not even follow the link path.I actually knew what they meant, but I was unsure of how strictly they (Google, whoever...) followed those definitions. For example, I was confused if a NOFOLLOW page would pass along PageRank to a page Google knew of via other means, because Google would not have to actually follow the link to know about and index that page. I was thinking too technically, as that is obviously not the case. Thanks for the clarification.

You must log in or register to reply here.

Share:

Facebook X (Twitter) LinkedIn Reddit Pinterest Tumblr WhatsApp Email Link

Top