Robots.txt, does it need preceding directory structure?
-
Do you need the entire preceding path in robots.txt for it to match?
e.g:
I know if i add Disallow: /fish to robots.txt it will block
/fish
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anythingBut would it block?:
en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything(taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier!
As basically I'm wanting to block many URL that have BTS- in such as:
http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybobBut have other pages that I do not want blocked, in subfolders that also have BTS- in, such as:
http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingyThanks for listening
-
Yes this is what I thought, but wanted some second opinions.
Although I wouldn't actually need a wild card after BTS, as just leaving it open is the same as using a wildcard:
/fish*.......... Equivalent to "/fish" -- the trailing wildcard is ignored. https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt Thanks for the link, I'll take a look
-
You're right in with the **Disallow: /fish **in the robots file blocking all those initial links, but if you wanted to block everything inside the /en/ folder, you would need to do disallow: /en/fish
You could use a wildcard in the robots.txt file to do something along the lines of Disallow: /BTS-*
This _'should' _work, but it's always worth checking using a tool to make sure it's all implemented correctly. Distilled did a post a while back about a JS tool which allows you to test if robots.txt files work correctly which can be found here - http://www.distilled.net/blog/seo/js-bookmarklet-for-checking-if-a-page-is-blocked-by-robots-txt/
In addition to this, you could also use the 'blocked URLs' tool in GWT to see if the pages are successfully blocked once you've implemented the code.
Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
Block session id URLs with robots.txt
Hi, I would like to block all URLs with the parameter '?filter=' from being crawled by including them in the robots.txt. Which directive should I use: User-agent: *
Intermediate & Advanced SEO | | Mat_C
Disallow: ?filter= or User-agent: *
Disallow: /?filter= In other words, is the forward slash in the beginning of the disallow directive necessary? Thanks!1 -
How To Rank Our Featured Snippet - What Changes Are Needed On Our Page?
I've read a number of articles that have been helpful, but most of them are specifically still just trying to prove the value of the snippets and more recently show you how to find what search terms to rank for. What I'm really struggling with is exactly 'How do we rank for them, when we already have the #1 position and the featured snippet is going to another site'? Let me break this down a bit more: 1. We are measuring the 'SERP Features' within Moz Pro Tools and I've identified ~300 pages where there is a 'Featured Snippet' but I don't have the feature. 2. In a good portion of these, I'm outranking the site that has the 'Featured Snippet'. So I can compare my site, side by side to the 'Featured Snippet'. Now that I have the question, my ranking and the competition all in front of me. What changes are recommended I implement on our page? Is there a recommended process to follow?
Intermediate & Advanced SEO | | fabfrug0 -
Will disallowing in robots.txt noindex a page?
Google has indexed a page I wish to remove. I would like to meta noindex but the CMS isn't allowing me too right now. A suggestion o disallow in robots.txt would simply stop them crawling I expect or is it also an instruction to noindex? Thanks
Intermediate & Advanced SEO | | Brocberry0 -
Whats Next, noobie needs some help :)
Hey Everyone, I have been a member of SEOMOZ for about 4 months and love it, it really has helped me out. my first website I am trying to get well ranked in the new zealand market is www.shopezy.co.nz I have chosen: sell online free ecommerce website ecommerce website builder Are paid links the way forward for me? should I aim for more keywords? should I pay for help? Just looking for a lttle direction, if someone could help. thanks.
Intermediate & Advanced SEO | | bonmaklad0 -
How long need paid links to get punished?
Hi, I just discover a competitor which is dominating the first page of google with 4 websites. I analyse them and all was backed up on a private domain network, with subdomains , maybe a 100 subdomains of the main domain. He have on all 4 domains 3500 links about, with same anchor text and 40 -50 with different. all are exact match domain. So I just reported him and this one who is selling him the links. How much time need google to take action and how long need to get punished this 4 websites? Plus what will happen to them? I reported them yesterday., how much time Google need to punish him , if does... Thanks
Intermediate & Advanced SEO | | nyanainc0 -
Building a Large Local Services Directory - Subdomain Needed?
In 2012 we will be rolling out a directory of local services for our industry. This will ultimately be thousands of additional pages, with city/zip searching, individual provider pages etc. The main reason is for UX -- providing local resources for our industry to compliment the online experience (we're an online B2C retailer) My question is if there are pros/cons to putting this on a subdomain, or if having it on the root is ideal. I don't see a huge influx of backlinks (making a sub fine) but I suppose that could change down the road. I do see some indexing benefits for new terms like 'service x in los angeles' etc, but that would also be fine on a sub. It feels like it would be cleaner to keep separate and on a sub, but maybe we're missing something. We certainty don't want to hurt anything on our primary site which drives the business. Thoughts?
Intermediate & Advanced SEO | | SEOPA0 -
Need Help Finding Directories to Submit To
I am looking for a lot of free "do follow" technology directories to submit to. Does anyone know of a good directory or a list of some sort of technology directories or something similar? Actually, I guess any directory that has a technology category would be helpful.
Intermediate & Advanced SEO | | MyNet0