Tweaking robots.txt
Solid information on robots.txt here.
Originally shared by John Mueller
I noticed there’s a bit of confusion on how to tweak a complex robots.txt file (aka longer than two lines :)). We have awesome documentation (of course :)), but let me pick out some of the parts that are commonly asked about:
– Disallowing crawling doesn’t block indexing of the URLs. This is pretty widely known, but worth repeating.
– More-specific user-agent sections replace less-specific ones. If you have a section with “user-agent: *” and one with “user-agent: googlebot”, then Googlebot will only follow the Googlebot-specific section.
– More-specific directives trump less-specific ones. We look at the length of the “path-part”. For example, “allow: /javascript.js” will trump “disallow: /java”, but “allow: *.js” won’t.
– The paths / URLs in the robots.txt file are case-sensitive.
For non-trivial files, tweaking can be a bit tricky. I strongly recommend using the robots.txt Tester in Search Console – it will pinpoint the line that blocks any specific URL, lets you test changes directly, and is the fastest way to let Google know of a changed robots.txt file on your site. Find out more about the tool at https://support.google.com/webmasters/answer/6062598
Here’s the full documentation: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt