Block all but one URL in a directory using robots.txt?
-
Is it possible to block all but one URL with robots.txt?
for example domain.com/subfolder/example.html, if we block the /subfolder/ directory we want all URLs except for the exact match url domain.com/subfolder to be blocked.
-
Robots.txt files are sequential, which means they follow directives in the order they appear. So if two directives conflict, they will follow the last one.
So the simple way to do this is to disallow all files first, then allow the directory you want next. It would look something like this:
User-agent: *
Disallow: /User-agent: *
Allow: /testCaveat: This is NOT the way robots.txt is supposed to work. By design, robots.txt is designed for disallowing, and technically you shouldn't ever have to use it for allowing. That said, this should work pretty well.
You can check your work in Google Webmaster, which has a robots.txt checker. Site Configuration > Crawler Access. Just type in your proposed robots.txt, then a test URL and you should be good to go.
Hope this helps!
-
According to my knowledge this possibility does not exist. One fast method to get over this is to get a crawler program to crawl your urls, so that you can quickly copy out all url in the folder paste in in the robots.txt and leave aout the one that you want in the index.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL structure for SEO
Hi Mozzers, I have a site which is a combination of product pages, and news and advice pages that relate to the products. How would you approach the URL structure for this, following SEO best practice? Approach 1 Product pages:
Intermediate & Advanced SEO | | A_Q
www.website.com/product-category/product-page News and advice pages:
www.website.com/product-category/product-page/news-and-advice-story-1
www.website.com/product-category/product-page/news-and-advice-story-2
etc or Approach 2 Product pages:
www.website.com/product-category/product-page News and advice pages:
www.website.com/news/product-category/news-and advice-story-1 (with internal linking to relevant product page)
www.website.com/news/product-category/news-and advice-story-2 (with internal linking to relevant product page)
etc Or would a different approach be better?0 -
Can multiple geotargeting hreflang tags be set in one URL? International SEO question
Hi All, Thank you for this great post! I have a question please. If i target www.onedirect.co.nl/en/ in English for Holland, Belgium and Luxembourg, are the tags below correct? English for Holland, Belgium and Luxembourg: http://www.example.co.nl/en/" hreflang="en-nl" /> http://www.example.co.nl/en/" hreflang="en-be" /> http://www.example.co.nl/en/" hreflang="en-lu" /> AND Targeting Holland and Belgium in Dutch: Pour la page www.onedirect.co.nl on peut inclure ce tag: http://www.example.co.nl" hreflang="nl-nl" /> http://www.example.co.nl" hreflang="nl-be" /> thanks a lot for your help!
Intermediate & Advanced SEO | | Onedirect_uk0 -
SSL and robots.txt question - confused by Google guidelines
I noticed "Don’t block your HTTPS site from crawling using robots.txt" here: http://googlewebmastercentral.blogspot.co.uk/2014/08/https-as-ranking-signal.html Does this mean you can't use robots.txt anywhere on the site - even parts of a site you want to noindex, for example?
Intermediate & Advanced SEO | | McTaggart0 -
Bingpreview/1.0b Useragent Using Adding Trailing Slash to all URLs
The Bingpreview crawler, which I think exists in order to take snapshots of mobile friendly pages, crawled my pages last night for the first time. However, it is adding a trailing slash to the end of each of my dynamic pages. The result is my program is giving the wrong page--my program is not expecting a trailing slash at the end of the urls. It was 160 pages, but I have thousands of pages it could do this to. I could try doing a mod rewrite but that seems like it should be unnecessary. ALL the other crawlers are crawling the proper urls. None of my hyperlinks have the slash on the end. I have written to Bing to tell them of the problem. Is anyone else having this issue? Any other suggestions for what to do? The user agent is: Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 BingPreview/1.0b
Intermediate & Advanced SEO | | friendoffood0 -
Robot.txt help
Hi, We have a blog that is killing our SEO. We need to Disallow Disallow: /Blog/?tag*
Intermediate & Advanced SEO | | Studio33
Disallow: /Blog/?page*
Disallow: /Blog/category/*
Disallow: /Blog/author/*
Disallow: /Blog/archive/*
Disallow: /Blog/Account/.
Disallow: /Blog/search*
Disallow: /Blog/search.aspx
Disallow: /Blog/error404.aspx
Disallow: /Blog/archive*
Disallow: /Blog/archive.aspx
Disallow: /Blog/sitemap.axd
Disallow: /Blog/post.aspx But Allow everything below /Blog/Post The disallow list seems to keep growing as we find issues. So rather than adding in to our Robot.txt all the areas to disallow. Is there a way to easily just say Allow /Blog/Post and ignore the rest. How do we do that in Robot.txt Thanks0 -
How to Disallow Tag Pages With Robot.txt
Hi i have a site which i'm dealing with that has tag pages for instant - http://www.domain.com/news/?tag=choice How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed. Any suggestions? Cheers, Mark
Intermediate & Advanced SEO | | monster990 -
Using the right Schema.org - & is there a penalty in using the wrong one?
Hi We have a set of reviewed products (in this case restaurants) that total an average rating of 4.0/5.0 from 800 odd reviews. We know to use schema/restaurant for individual restaurants we promote but what about for a list of cities, say restaurants in boston for example. For the product page containing all of Boston restaurants - should we use schema.org/restaurant (but its not 1 physical restaurant) or schema.org - product + agg review score? What do you do for your product listing pages? If we get it wrong, is there a penalty? Or this just simply up to us?
Intermediate & Advanced SEO | | xoffie1 -
Multiple URLs for the same page
I am working with a client and recently discovered that they have several URLs that go to the same page. http://www.maps.com/FunFacts.aspx
Intermediate & Advanced SEO | | WebMarketingandDesign
http://www.maps.com/funfacts.aspx
http://www.maps.com/FunFacts.aspx?nav=FF
http://www.maps.com/FunFacts.aspx?nav=FS
http://www.maps.com/funfacts.aspx?nav=FF
http://www.maps.com/funfacts.aspx?nav=ffhttp://www.maps.com/FunFacts.aspx?nav=MShttp://www.maps.com/funfacts.aspx?nav=
http://www.maps.com/FunFacts.aspx?nav=FF#
http://www.maps.com/FunFacts
http://www.maps.com/funfacts.aspx?.nav=FF I am afraid this is happening all over the site. So, my question is: Is this hurting the SEO and how? If so what is the best way to go about fixing this problem? Thanks for your help!0