Robots.txt, does it need preceding directory structure?
-
Do you need the entire preceding path in robots.txt for it to match?
e.g:
I know if i add Disallow: /fish to robots.txt it will block
/fish
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anythingBut would it block?:
en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything(taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier!
As basically I'm wanting to block many URL that have BTS- in such as:
http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybobBut have other pages that I do not want blocked, in subfolders that also have BTS- in, such as:
http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingyThanks for listening
-
Yes this is what I thought, but wanted some second opinions.
Although I wouldn't actually need a wild card after BTS, as just leaving it open is the same as using a wildcard:
/fish*.......... Equivalent to "/fish" -- the trailing wildcard is ignored. https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt Thanks for the link, I'll take a look
-
You're right in with the **Disallow: /fish **in the robots file blocking all those initial links, but if you wanted to block everything inside the /en/ folder, you would need to do disallow: /en/fish
You could use a wildcard in the robots.txt file to do something along the lines of Disallow: /BTS-*
This _'should' _work, but it's always worth checking using a tool to make sure it's all implemented correctly. Distilled did a post a while back about a JS tool which allows you to test if robots.txt files work correctly which can be found here - http://www.distilled.net/blog/seo/js-bookmarklet-for-checking-if-a-page-is-blocked-by-robots-txt/
In addition to this, you could also use the 'blocked URLs' tool in GWT to see if the pages are successfully blocked once you've implemented the code.
Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Need advice on redirects
Hi, I have new web addresses for my subpages. None if them have external links. Should I do redirects to the new pages or just leave the old pages in 404 and let google crawl and rank the new page. I am asking because my current pages don’t have a good ranking and I am thinking starting with a clean url is better. Thank you,
Intermediate & Advanced SEO | | seoanalytics1 -
Robots.txt, Disallow & Indexed-Pages..
Hi guys, hope you're well. I have a problem with my new website. I have 3 pages with the same content: http://example.examples.com/brand/brand1 (good page) http://example.examples.com/brand/brand1?show=false http://example.examples.com/brand/brand1?show=true The good page has rel=canonical & it is the only page should be appear in Search results but Google has indexed 3 pages... I don't know how should do now, but, i am thinking 2 posibilites: Remove filters (true, false) and leave only the good page and show 404 page for others pages. Update robots.txt with disallow for these parameters & remove those URL's manually Thank you so much!
Intermediate & Advanced SEO | | thekiller990 -
Robots.txt vs noindex
I recently started working on a site that has thousands of member pages that are currently robots.txt'd out. Most pages of the site have 1 to 6 links to these member pages, accumulating into what I regard as something of link juice cul-d-sac. The pages themselves have little to no unique content or other relevant search play and for other reasons still want them kept out of search. Wouldn't it be better to "noindex, follow" these pages and remove the robots.txt block from this url type? At least that way Google could crawl these pages and pass the link juice on to still other pages vs flushing it into a black hole. BTW, the site is currently dealing with a hit from Panda 4.0 last month. Thanks! Best... Darcy
Intermediate & Advanced SEO | | 945010 -
Need help with Google Webmaster Tools Errors
I have a lots of error on my Google webmaster tools under Search Appearance -> Structure Data there are two sets of items 1- "hentry" and source is "Markup: microformats.org" and error says: "Missing: author | Missing: updated" 2-"hcard" and source is "Markup: microformats.org" and error says: "Missing: fn" I am using WordPress. Can anybody tell me how to fix these errors please. Thank you Sina
Intermediate & Advanced SEO | | SinaKashani1 -
Directory backlink
Hello everyone, I know that this question has been asked millions of time, but I am really not getting a straight answer for it. Well the question will be divided in few other questions : Google changed, I get that, but I am reading everywhere, come up with a great content and the rest will follow, stop creating your own backlink and let user link to you ... But I don't know if this is apply for every site on the web, let take the example of a flash gaming site that we manage, we are creating games every day, coming up with great (unique) text for each of them, we are active on social media and stopped backlink from directories. But now we can see our sites losing ranking and seeing some websites that are not having much content on their sites or even active on social medias that are ranking better than us. We always used white hat techniques, this is why we were so well ranked for so long, but now we see our ranking change on a daily basis but can't explained why. So my question is, should we totally stop directories backlink (even the good directories)? Or we should keep on going and try PR at the same time? For a site that just started how on earth will he be able to get backlinks if it's not using directories in the first place? So I feel that I am going in circle here and I don't know what else we could do to improve our site. We even recast the site to bring better experience to the user to see if this will help on us on getting our ranking back. And this help, as the page views and time on the site improved with it, but the ranking is still unchanged (that has been done 3 months ago). Just to let you know we are aware about the panda and penguin updates 🙂 Thanks for your help on this, and I hope the answers will help others 🙂 Thanks, Mounir
Intermediate & Advanced SEO | | drimlike0 -
Do my redirects on my homepage need to be 301?
Our domain name is something like www.I-am-cool.com but most people just type in iamcool.com After doing some research I found that those are 302 redirects and I think they should be 301. If I am correct do I need to redirect www.iamcool.com and iamcool.com or just one or the other?
Intermediate & Advanced SEO | | EcommerceSite0 -
Effect duration of robots.txt file.
in my web site there is demo site in that also, index in Google but no need it now.so i have created robots file and upload to server yesterday.in the demo folder there are some html files,and i wanna remove all these in demo file from Google.but still in web master tools it showing User-agent: *
Intermediate & Advanced SEO | | innofidelity
Disallow: /demo/ How long this will take to remove from Google ? And are there any alternative way doing that ?0 -
.htaccess - error404 redirect within a directory?
Hi, One of my clients has a CMS website offering Health and Safety training. When the courses have been run they automatically drop off of the system which is great for the front-end of the site but this leaves pile 404 errors for the URLs. I am trying to put a .htaccess redirect in place that will redirect back to the main category for that course i/e : http://www.domain.co.uk/courses/highways/6-NRSWA/27-nrswa-operative-sept-11.html will redirect to http://www.domain.co.uk/courses/highways/6-NRSWA I have spent a looooong time hitting google for a solution but can't seem to come up with anything. If at all possible I would also like to be able to post a php variable via the redirect url so that I can display a message on the category page saying that the course is no longer available be please select a different course. i/e: http://www.domain.co.uk/courses/highways/6-NRSWA?course=not-available Any help on this would be most gratefully received.
Intermediate & Advanced SEO | | AdeLewis0