Robots.txt: how to exclude sub-directories correctly?
-
Hello here,
I am trying to figure out the correct way to tell SEs to crawls this:
http://www.mysite.com/directory/
But not this:
http://www.mysite.com/directory/sub-directory/
or this:
http://www.mysite.com/directory/sub-directory2/sub-directory/...
But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way:
disallow: /directory/sub-directory/
disallow: /directory/sub-directory2/
disallow: /directory/sub-directory/sub-directory/
disallow: /directory/sub-directory2/subdirectory/
etc...
I would end up having thousands of definitions to disallow all the possible sub-directory combinations.
So, is the following way a correct, better and shorter way to define what I want above:
allow: /directory/$
disallow: /directory/*
Would the above work?
Any thoughts are very welcome! Thank you in advance.
Best,
Fab.
-
I mentioned both. You add a meta robots to noindex and remove from the sitemap.
-
But google is still free to index a link/page even if it is not included in xml sitemap.
-
Install Yoast Wordpress SEO plugin and use that to restrict what is indexed and what is allowed in a sitemap.
-
I am using wordpress, Enfold theme (themeforest).
I want some files to be accessed by google, but those should not be indexed.
Here is an example: http://prntscr.com/h8918o
I have currently blocked some JS directories/files using robots.txt (check screenshot)
But due to this I am not able to pass Mobile Friendly Test on Google: http://prntscr.com/h8925z (check screenshot)
Is its possible to allow access, but use a tag like noindex in the robots.txt file. Or is there any other way out.
-
Yes, everything looks good, Webmaster Tools gave me the expected results with the following directives:
allow: /directory/$
disallow: /directory/*
Which allows this URL:
http://www.mysite.com/directory/
But doesn't allow the following one:
http://www.mysite.com/directory/sub-directory2/...
This page also gives an update similar to mine:
https://support.google.com/webmasters/answer/156449?hl=en
I think I am good! Thanks
-
Thank you Michael, it is my understanding then that my idea of doing this:
allow: /directory/$
disallow: /directory/*
Should work just fine. I will test it within Google Webmaster Tools, and let you know if any problems arise.
In the meantime if anyone else has more ideas about all this and can confirm me that would be great!
Thank you again.
-
I've always stuck to Disallow and followed -
"This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:"
http://www.robotstxt.org/robotstxt.html
From https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt this seems contradictory
|
/*
| equivalent to / | equivalent to / | Equivalent to "/" -- the trailing wildcard is ignored. |I think this post will be very useful for you - http://moz.com/community/q/allow-or-disallow-first-in-robots-txt
-
Thank you Michael,
Google and other SEs actually recognize the "allow:" command:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
The fact is: if I don't specify that, how can I be sure that the following single command:
disallow: /directory/*
Doesn't prevent SEs to spider the /directory/ index as I'd like to?
-
As long as you dont have directories somewhere in /* that you want indexed then I think that will work. There is no allow so you don't need the first line just
disallow: /directory/*
You can test out here- https://support.google.com/webmasters/answer/156449?rd=1
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Meta robots
Hi, I am checking a website for SEO and I've noticed that a lot of pages from the blog have the following meta robots: meta name="robots" content="follow" Normally these pages should be indexed, since search engines will index and follow by default. In this case however, a lot of pages from this blog are not indexed. Is this because the meta robots is specified, but only contains follow? So will search engines only index and follow by default if there is no meta robots specified at all? And secondly, if I would change the meta robots, should I just add index or remove the meta robots completely from the code? Thanks for checking!
Intermediate & Advanced SEO | | Mat_C0 -
Robots.txt Allowed
Hello all, We want to block something that has the following at the end: http://www.domain.com/category/product/some+demo+-text-+example--writing+here So I was wondering if doing: /*example--writing+here would work?
Intermediate & Advanced SEO | | ThomasHarvey0 -
What are the best practices for geo-targeting by sub-folders?
My domain is currently targeting the US, but I'm building out sub-folders that will need to geo-target France, England, and Spain. Each country will have it's own sub-folder, and professionally translated (domain.com/france). Other than the hreflang tags, what are other best practices I can implement? Can Google Webmaster tools geo-target by subfolder? Any suggestions would be appreciated. Thanks Justin
Intermediate & Advanced SEO | | Rhythm_Agency0 -
Part of my site does not show the correct Meta title
Hi our website meta title on the directory section is showing the same title, it does not show the page title. We have tried turning off all plugins, reinstalling the theme, creating a new htacces file. installing Yoast, and testing with All in one seo but still the same thing happens. Tried different themes with the same results But when we test with Twenty Thirteen it is ok Completely lost and would love some help Thanks in advance
Intermediate & Advanced SEO | | Taiger0 -
Localising our business to the correct country
Hi I work for children's furniture business called Tidy Books. We are based in the UK. We have UK site www.tidy-books.co.uk. We also have a US site www.tidy-books.com which is registered in the US. We have fully dedicated and translated French, German and Italian site (www.tidy-books.fr, www.tidy-books.de, www.tidy-books.it) . These all fall under our UK registered address. What I would like, is to have a French, German and Italian business address for these website. We just need an address only. This would mainly be used to for Google business listing and other business listings sites to help rank are sites correctly in their country domains. T Do you know of or recommend any companies that can do this? Is there any implications I need to be aware of, such as tax? Thanks
Intermediate & Advanced SEO | | tidybooks0 -
Meta Robot Tag:Index, Follow, Noodp, Noydir
When should "Noodp" and "Noydir" meta robot tag be used? I have hundreds or URLs for real estate listings on my site that simply use "Index", Follow" without using Noodp and Noydir. Should the listing pages use these Noodp and Noydr also? All major landing pages use Index, Follow, Noodp, Noydir. Is this the best setting in terms of ranking and SEO. Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
How to structure your site correctly for optimal juice flow?
Hello fellow mozzers. I have a question regarding structuring a site for optimal link juice flow. If you have an existing website that has for instance a contact page, we know its pointless for that page to have any juice at all. In a hypothetical scenario would it be ok to no index, no follow that page? What happens to existing pagerank on such a page? for instance if you have a contact page with pr 4 and you no index, no follow it, I understand the pagerank will disappear from that page but will it be distributed to other pages on your site? What would be the correct way of handling this scenario?
Intermediate & Advanced SEO | | rightmove0 -
A global brand with localised microsites - distinct TLDs or directories by territory?
Hello, Looking to create an export site for a gobal brand and considering the benefits of distinct domains/TLDs vs. directories by territory. I.e. brand.fr vs. brand.com/fr for our French content
Intermediate & Advanced SEO | | Urbanfox
brand.ca/fr vs brand.com/ca/fr for our French Canadian content Apple segregate their content by directory but we're not quite Apple to be fair... Directory route would be technically cleaner but I don't wish to discount the SEO benefit of unique TLDs. Any thoughts / considerations / similar experiences? Thanks, Jan0