Block URLs with robots.txt
Learn about robots.txt files
A robots.txt
file is a text file that stops web crawler software, such as Googlebot, from crawling certain pages of your site. The file is essentially a list of commands, such Allow
and Disallow
, that tell web crawlers which URLs they can or cannot retrieve. So, if a URL is disallowed in your robots.txt
, that URL and its contents won't appear in Google Search results.
You need a
robots.txt
file only if your site includes content that you don't want Google or other search engines to index. To let Google index your entire site, don't make a robots.txt
file (not even an empty one).To test which URLs Google can and cannot access on your website, try using the robots.txt Tester.
Understand the limits of robots.txt
Before you build your robots.txt
, you should know the risks of only using this URL blocking method. At times, you might want to consider other mechanisms to ensure your URLs are not findable on the web.
-
Ensure private information is safe
The commands inrobots.txt
files are not rules that any crawler must follow; instead, it is better to think of these commands as guidelines. Googlebot and other respectable web crawlers obey the instructions in arobots.txt
file, but other crawlers might not. Therefore, it is very important to know the consequences of sharing the information that you block in this way. To keep private information secure, we recommend using other blocking methods, such as password-protecting private files on your server. -
Use the right syntax for each crawler
Although respectable web crawlers follow the directives in arobots.txt
file, some crawlers might interpret those directives differently. You should know the proper syntax for addressing different web crawlers as some might not understand certain instructions. -
Block crawlers from references to your URLs on other sites
While Google won't crawl or index the content blocked byrobots.txt
, we might still find and index information about disallowed URLs from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results. You can stop your URL from appearing in Search results completely by using your robots.txt in combination with other URL blocking methods, such as password-protecting the files on your server, or inserting meta tags into your HTML.