Monday, April 4, 2011

Robots.txt Vs Robots Meta Tag

Robots.txt



A robots.txt file restricts access to your site by search engine robots that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages. (All respectable robots will respect the directives in a robots.txt file, although some may interpret them differently. However, a robots.txt is not enforceable, and some spammers and other troublemakers may ignore it. For this reason, we recommend password protecting confidential information.)
You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one).


While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results.


In order to use a robots.txt file, you'll need to have access to the root of your domain (if you're not sure, check with your web hoster). If you don't have access to the root of a domain, you can restrict access using the robots meta tag.


Robots Meta Tag


When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. Other search engines, however, may interpret this directive differently. As a result, a link to the page can still appear in their search results.


Note that because we have to crawl your page in order to see the noindex meta tag, there's a small chance that Googlebot won't see and respect the noindex meta tag. If your page is still appearing in results, it's probably because we haven't crawled your site since you added the tag. (Also, if you've used your robots.txt file to block this page, we won't be able to see the tag either.)


If the content is currently in our index, we will remove it after the next time we crawl it. To expedite removal, use the URL removal request tool in Google Webmaster Tools.


source link : Google webmaster central

1 comment: