Robots.txt, does it need preceding directory structure?

Milian

Do you need the entire preceding path in robots.txt for it to match?

e.g:

I know if i add Disallow: /fish to robots.txt it will block

/fish
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anything

But would it block?:

en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything

(taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier!

As basically I'm wanting to block many URL that have BTS- in such as:

http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybob

But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as:

http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingy

Thanks for listening

Milian

Yes this is what I thought, but wanted some second opinions.

Although I wouldn't actually need a wild card after BTS, as just leaving it open is the same as using a wildcard:

/fish*.......... Equivalent to "/fish" -- the trailing wildcard is ignored. https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt Thanks for the link, I'll take a look

PinpointDesigns

You're right in with the **Disallow: /fish **in the robots file blocking all those initial links, but if you wanted to block everything inside the /en/ folder, you would need to do disallow: /en/fish

You could use a wildcard in the robots.txt file to do something along the lines of Disallow: /BTS-*

This _'should' _work, but it's always worth checking using a tool to make sure it's all implemented correctly. Distilled did a post a while back about a JS tool which allows you to test if robots.txt files work correctly which can be found here - http://www.distilled.net/blog/seo/js-bookmarklet-for-checking-if-a-page-is-blocked-by-robots-txt/

In addition to this, you could also use the 'blocked URLs' tool in GWT to see if the pages are successfully blocked once you've implemented the code.

Hope this helps!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Robots.txt, does it need preceding directory structure?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

SEO Best Practices regarding Robots.txt disallow

Need a layman's definition/analogy of the difference between schema and structured data

Should I disallow all URL query strings/parameters in Robots.txt?

Robots.txt - Do I block Bots from crawling the non-www version if I use www.site.com ?

Wordpress Blog in 2 languages. How to SEO or structure it?

Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)

Could you use a robots.txt file to disalow a duplicate content page from being crawled?

All page files in root? Or to use directories?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved