Robots.txt: how to exclude sub-directories correctly?
-
Hello here,
I am trying to figure out the correct way to tell SEs to crawls this:
http://www.mysite.com/directory/
But not this:
http://www.mysite.com/directory/sub-directory/
or this:
http://www.mysite.com/directory/sub-directory2/sub-directory/...
But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way:
disallow: /directory/sub-directory/
disallow: /directory/sub-directory2/
disallow: /directory/sub-directory/sub-directory/
disallow: /directory/sub-directory2/subdirectory/
etc...
I would end up having thousands of definitions to disallow all the possible sub-directory combinations.
So, is the following way a correct, better and shorter way to define what I want above:
allow: /directory/$
disallow: /directory/*
Would the above work?
Any thoughts are very welcome! Thank you in advance.
Best,
Fab.
-
I mentioned both. You add a meta robots to noindex and remove from the sitemap.
-
But google is still free to index a link/page even if it is not included in xml sitemap.
-
Install Yoast Wordpress SEO plugin and use that to restrict what is indexed and what is allowed in a sitemap.
-
I am using wordpress, Enfold theme (themeforest).
I want some files to be accessed by google, but those should not be indexed.
Here is an example: http://prntscr.com/h8918o
I have currently blocked some JS directories/files using robots.txt (check screenshot)
But due to this I am not able to pass Mobile Friendly Test on Google:Â http://prntscr.com/h8925z (check screenshot)
Is its possible to allow access, but use a tag like noindex in the robots.txt file. Or is there any other way out.
-
Yes, everything looks good, Webmaster Tools gave me the expected results with the following directives:
allow: /directory/$
disallow: /directory/*
Which allows this URL:
http://www.mysite.com/directory/
But doesn't allow the following one:
http://www.mysite.com/directory/sub-directory2/...
This page also gives an update similar to mine:
https://support.google.com/webmasters/answer/156449?hl=en
I think I am good! Thanks
-
Thank you Michael, it is my understanding then that my idea of doing this:
allow: /directory/$
disallow: /directory/*
Should work just fine. I will test it within Google Webmaster Tools, and let you know if any problems arise.
In the meantime if anyone else has more ideas about all this and can confirm me that would be great!
Thank you again.
-
I've always stuck to Disallow and followed -
"This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:"
http://www.robotstxt.org/robotstxt.html
From https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt this seems contradictory
|
/*
| equivalent to / | equivalent to / | Equivalent to "/" -- the trailing wildcard is ignored. |I think this post will be very useful  for you - http://moz.com/community/q/allow-or-disallow-first-in-robots-txt
-
Thank you Michael,
Google and other SEs actually recognize the "allow:" command:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
The fact is: if I don't specify that, how can I be sure that the following single command:
disallow: /directory/*
Doesn't prevent SEs to spider the /directory/ index as I'd like to?
-
As long as you dont have directories somewhere in /* that you want indexed then I think that will work. Â There is no allow so you don't need the first line just
disallow: /directory/*
You can test out here-Â https://support.google.com/webmasters/answer/156449?rd=1
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Did We Implement Structured Data Correctly?
Our designer/developer recently implemented structured data on our pages. I'm trying to become more educated on how it works since I'm the SEO marketing specialist on the team and the one that writes and publishes the majority of our content. I'm aware it's extremely important and needs to be done, I just don't know how to do it yet. The developer was on our team for over a year, we recently let him go. Now, I'm going through all the pages to make sure it's done correctly. I'm using the structured data testing tool to look at the pages and have been playing with the structured data markup helper. I would REALLY appreciate it if one of my fellow MOZ fans & family can help me determine if it's done correctly. We do not currently have any schema plugs installed that I know of. So I'm not sure how he implemented the Schema code. I would like to know what I need to do moving forward to the additional content we publish as well as what to do to correctly implement Schema if not already. When I manually look at one of our FAQ pages I see multiple schema data formats detected... I'm not sure if we're supposed to have multiple or just one---->Â https://www.screencast.com/t/TjHphL7jsI I also noticed in the Question schema data for that same page... the accepted answer is empty. I would image that should have the short version of the answer to the question in it?--->https://www.screencast.com/t/e6ppXkhXd7QS Here's a screenshot of our structured data info from Google search console--->Â https://www.screencast.com/t/KHj4BGgdrZ4m HELP please! Our website consists of 25-30 "product" pages https://www.medicarefaq.com/medigap/ https://www.medicarefaq.com/medicare-supplement/ https://www.medicarefaq.com/medigap/plan-f/ https://www.medicarefaq.com/medicare-supplement/plan-f/ We currently have about 75 FAQ pages and adding 4-6 per month. This is what brings in most our traffic. https://www.medicarefaq.com/faqs/2018-top-medicare-supplement-insurance-plans/ https://www.medicarefaq.com/faqs/2018-medicare-high-deductible-plan-f-changes https://www.medicarefaq.com/faqs/medicare-guaranteed-issue-rights We have 100 state specific pages (two for each state) https://www.medicarefaq.com/medicare-supplement/florida/ https://www.medicarefaq.com/medigap/florida/ https://www.medicarefaq.com/medicare-supplement/California/ https://www.medicarefaq.com/medigap/California/ We have 20ish carrier specific pages https://www.medicarefaq.com/medicare-supplement/humana/ https://www.medicarefaq.com/medicare-supplement/mutual-of-omaha/ Then we have about 30 blog pages so far and are publishing new blog posts weekly https://www.medicarefaq.com/blog/average-age-retirement-rising/ https://www.medicarefaq.com/blog/social-security-benefit-increase-announced-2018 https://www.medicarefaq.com/blog/new-california-bill-force-drugmakers-explain-price-hikes
Intermediate & Advanced SEO | | LindsayE0 -
What to try when Google excludes your URL only from high-traffic search terms and results?
We have a high authority blog post (high PA) that used to rank for several high-traffic terms. Right now the post continues to rank high for variations of the high-traffic terms (e.g keyword + " free", keyword + " discussion") but the URL has been completed excluded from the money terms with alternative URLs of the domain ranking on positions 50+. There is no manual penalty in place or a DCMA exclusion. What are some of the things ppl would try here? Some of the things I can think of: - Remove keyword terms in article - Change the URL and do a 301 redirect - Duplicate the POST under new URL, 302 redirect from old blog post, and repoint links as much as you have control - Refresh content including timestamps - Remove potentially bad neighborhood links etc Has anyone seen the behavior above for their articles? Are there any recommendations? /PP
Intermediate & Advanced SEO | | ppseo800 -
SEO implications of moving fra a sub-folder to a root domain
I am considering a restructure of my site, and was hoping for some input on SEO implications which I am having some issues getting clarity in. (I will be using sample domains/urls because of language reasons, not an english site), Thinking about moving a site (all content) from example.com/parenting -> parenting.com. This is to have a site fully devoted to this theme, and more easily monitor and improve SEO performance on this content alone. Today all stats on external links, DA etc is related to the root domain, and not just this sub-department. Plus it would be a better brand-experience of the content and site. Other info/issues: -The domain parenting.com (used as example) is currently redirected to example.com/parenting. So I would have to reverse that redirect, and would also redirect all articles to the new site. The current domain example.com has a high DA (67), but the new domain parenting.com has a much lower DA  (24). Question: Would the parenting.com domain improve it's DA when not redirected and the sub-folder on the high-DA domain is redirected here instead? Would it severly hurt SEO traffic to make this change, and if so is there a strategy to make the move with as little loss in traffic as possible? How much value is in having a stand-alone domain, which also is one of the most important keywords for this theme? My doubt comes mostly from moving from a domain with high DA to a domain with much lower DA, and I am not sure about how removing the redirect would change that, or if placing a new redirect from the subfolder on the current site would help improve it. Would some DA flow over with a 301 redirect? Thanks for any advice or hints to other documentation that might be of interest for this scenario 🙂
Intermediate & Advanced SEO | | Magne_Vidnes0 -
Robots.txt issue for international websites
In Google.co.uk, our US based (abcd.com) is showing: A description for this result is not available because of this site's robots.txt – learn more But UK website (uk.abcd.com) is working properly. We would like to disappear .com result totally, if possible. How to fix it? Thanks in advance.
Intermediate & Advanced SEO | | JinnatUlHasan0 -
E-commerce worldwide sub domains or folders
Hi Guys, We currently only sell to the UK so its pretty easy to manage our seo etc. However we are building a new site on Trespass.com and will be using magento enterprise. We will be serving the UK, US and the rest of the world. Does anyone here have experience with this? Is it best to have sub domains ie. UK.trespass.com, US.trespass.com? Or folders Trespass.com/uk Trespass.com/de Trespass.com/US Thanks guys
Intermediate & Advanced SEO | | Trespass0 -
How to Disallow Tag Pages With Robot.txt
Hi i have a site which i'm dealing with that has tag pages for instant - http://www.domain.com/news/?tag=choice How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed. Any suggestions? Cheers, Mark
Intermediate & Advanced SEO | | monster990 -
Multiple domain level redirects to unique sub-folder on one domain...
Hi, I have a restaurant menu directory listing website (for example www.menus.com). Restaurant can have there menu listed on this site along with other details such as opening hours, photos ect. An example of a restaurant url might be www.menus.com/london/bobs-pizza. A feature i would like to offer is the ability for Bob's pizza to use the menus.com website listing as his own website (let assume he has no website currently). I would like to purchase www.bobspizza.com and 301 redirect to www.menus.com/london/bobs-pizza Why?
Intermediate & Advanced SEO | | blackrails
So bob can then list bobspizza.com on his advertising material (business cards etc, rather than www.menus.com/london/bobs-pizza). I was considering using a 301 redirect for this though have been told that too many domain level redirects to one single domain can be flagged as spam by Google. Is there any other way to achieve this outcome without being penalised? Rel canonical url, url masking? Other things to note: It is fine if www.bobspizza.com is NOT listed in search results. I would ideally like any link juice pointing to www.bobspizza.com to pass onto www.menus.com though this is a nice to have. If it comes at the cost of being penalised i can live without the link juice from this. Thanks0 -
Robots.txt is blocking Wordpress Pages from Googlebot?
I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot?
Intermediate & Advanced SEO | | ENSO0