Robots.txt, does it need preceding directory structure?

Milian

Do you need the entire preceding path in robots.txt for it to match?

e.g:

I know if i add Disallow: /fish to robots.txt it will block

/fish
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anything

But would it block?:

en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything

(taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier!

As basically I'm wanting to block many URL that have BTS- in such as:

http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybob

But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as:

http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingy

Thanks for listening

Milian

Yes this is what I thought, but wanted some second opinions.

Although I wouldn't actually need a wild card after BTS, as just leaving it open is the same as using a wildcard:

/fish*.......... Equivalent to "/fish" -- the trailing wildcard is ignored. https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt Thanks for the link, I'll take a look

PinpointDesigns

You're right in with the **Disallow: /fish **in the robots file blocking all those initial links, but if you wanted to block everything inside the /en/ folder, you would need to do disallow: /en/fish

You could use a wildcard in the robots.txt file to do something along the lines of Disallow: /BTS-*

This _'should' _work, but it's always worth checking using a tool to make sure it's all implemented correctly. Distilled did a post a while back about a JS tool which allows you to test if robots.txt files work correctly which can be found here - http://www.distilled.net/blog/seo/js-bookmarklet-for-checking-if-a-page-is-blocked-by-robots-txt/

In addition to this, you could also use the 'blocked URLs' tool in GWT to see if the pages are successfully blocked once you've implemented the code.

Hope this helps!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Robots.txt, does it need preceding directory structure?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

What happens to crawled URLs subsequently blocked by robots.txt?

I need help in doing Local SEO

How to rank if you are an aggregator or a directory of resource?

Need a layman's definition/analogy of the difference between schema and structured data

Should I disallow all URL query strings/parameters in Robots.txt?

Do backlinks need to be clicked to pass linkjuice?

Changing URL structure of date-structured blog with 301 redirects

Block an entire subdomain with robots.txt?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved