I am curious to see some example uses and implementations of robots.txt.. specifically implementations and the reasons behind them for increasing SEO.
The robots.txt I am using for WordPress 2.1 is based on the example at SEO with robots.txt (<!-- m --><a class="postlink" href="http://www.askapache.com/2007/seo/seo-with-robotstxt.html">http://www.askapache.com/2007/seo/seo-w ... tstxt.html</a><!-- m -->):
User-agent: *
# disallow files in /cgi-bin
Disallow: /cgi-bin/
Disallow: /comments/
Disallow: /z/j/
Disallow: /z/c/
# disallow all files ending in .php
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
#disallow all files in /wp- directorys
Disallow: /wp-*/
# disallow all files with ? in url
Disallow: /*?
Disallow: /stats*
Disallow: /dh_
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /about/feed/
Disallow: /about/trackback/
Disallow: /contact/
Disallow: /tag
Disallow: /docs*
Disallow: /manual*
Disallow: /category/uncategorized*
Basically this helps get rid of duplicate content, low-quality content, css, javascript, php, etc.. but does allow search engines to read the articles, find images, find pdfs, etc.
Also, I know the wildcard in my robots.txt works for googles bots, but do you konw if it works for other bots?
Anyone else have improvements or other robots.txt examples?
The robots.txt I am using for WordPress 2.1 is based on the example at SEO with robots.txt (<!-- m --><a class="postlink" href="http://www.askapache.com/2007/seo/seo-with-robotstxt.html">http://www.askapache.com/2007/seo/seo-w ... tstxt.html</a><!-- m -->):
User-agent: *
# disallow files in /cgi-bin
Disallow: /cgi-bin/
Disallow: /comments/
Disallow: /z/j/
Disallow: /z/c/
# disallow all files ending in .php
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
#disallow all files in /wp- directorys
Disallow: /wp-*/
# disallow all files with ? in url
Disallow: /*?
Disallow: /stats*
Disallow: /dh_
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /about/feed/
Disallow: /about/trackback/
Disallow: /contact/
Disallow: /tag
Disallow: /docs*
Disallow: /manual*
Disallow: /category/uncategorized*
Basically this helps get rid of duplicate content, low-quality content, css, javascript, php, etc.. but does allow search engines to read the articles, find images, find pdfs, etc.
Also, I know the wildcard in my robots.txt works for googles bots, but do you konw if it works for other bots?
Anyone else have improvements or other robots.txt examples?