Robots.txt: how to exclude sub-directories correctly?
-
Hello here,
I am trying to figure out the correct way to tell SEs to crawls this:
http://www.mysite.com/directory/
But not this:
http://www.mysite.com/directory/sub-directory/
or this:
http://www.mysite.com/directory/sub-directory2/sub-directory/...
But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way:
disallow: /directory/sub-directory/
disallow: /directory/sub-directory2/
disallow: /directory/sub-directory/sub-directory/
disallow: /directory/sub-directory2/subdirectory/
etc...
I would end up having thousands of definitions to disallow all the possible sub-directory combinations.
So, is the following way a correct, better and shorter way to define what I want above:
allow: /directory/$
disallow: /directory/*
Would the above work?
Any thoughts are very welcome! Thank you in advance.
Best,
Fab.
-
I mentioned both. You add a meta robots to noindex and remove from the sitemap.
-
But google is still free to index a link/page even if it is not included in xml sitemap.
-
Install Yoast Wordpress SEO plugin and use that to restrict what is indexed and what is allowed in a sitemap.
-
I am using wordpress, Enfold theme (themeforest).
I want some files to be accessed by google, but those should not be indexed.
Here is an example: http://prntscr.com/h8918o
I have currently blocked some JS directories/files using robots.txt (check screenshot)
But due to this I am not able to pass Mobile Friendly Test on Google: http://prntscr.com/h8925z (check screenshot)
Is its possible to allow access, but use a tag like noindex in the robots.txt file. Or is there any other way out.
-
Yes, everything looks good, Webmaster Tools gave me the expected results with the following directives:
allow: /directory/$
disallow: /directory/*
Which allows this URL:
http://www.mysite.com/directory/
But doesn't allow the following one:
http://www.mysite.com/directory/sub-directory2/...
This page also gives an update similar to mine:
https://support.google.com/webmasters/answer/156449?hl=en
I think I am good! Thanks
-
Thank you Michael, it is my understanding then that my idea of doing this:
allow: /directory/$
disallow: /directory/*
Should work just fine. I will test it within Google Webmaster Tools, and let you know if any problems arise.
In the meantime if anyone else has more ideas about all this and can confirm me that would be great!
Thank you again.
-
I've always stuck to Disallow and followed -
"This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:"
http://www.robotstxt.org/robotstxt.html
From https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt this seems contradictory
|
/*
| equivalent to / | equivalent to / | Equivalent to "/" -- the trailing wildcard is ignored. |I think this post will be very useful for you - http://moz.com/community/q/allow-or-disallow-first-in-robots-txt
-
Thank you Michael,
Google and other SEs actually recognize the "allow:" command:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
The fact is: if I don't specify that, how can I be sure that the following single command:
disallow: /directory/*
Doesn't prevent SEs to spider the /directory/ index as I'd like to?
-
As long as you dont have directories somewhere in /* that you want indexed then I think that will work. There is no allow so you don't need the first line just
disallow: /directory/*
You can test out here- https://support.google.com/webmasters/answer/156449?rd=1
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using a Sub Domain as a Main Domain?
Hi, I'm working on a site at the moment and the sub domain is acting as the main domain. This occurred when the site was redesigned and built on a sub domain for testing but it was never moved to the main domain when it went live (a couple of years ago). So little or no pages are live on domain.com but all on sub.domain.com. It's a large company but they have very poor rankings. Would you recommend that they move the sub domain back into the root folder? Does this involve renaming/re-pointing URLs? Thanks Louise
Intermediate & Advanced SEO | | MVIreland1 -
Directories that Redirect - Do They Pass Link Juice?
I did some searching before asking but could not quite find what I was looking for. There are valid directories out there that provide business as well as links that provide SEO value. My question is whether or not having a redirect in place negates passing any link juice. When I use Open Site Explorer for Old Monterey Inn, this directory (CABBI) does not show up on their list. However, their website dropped from Google Analytics altogether for some time because of an issue in how they built their site. Their "fix" is this redirect which was integrated a short time ago. I do see traffic in Google Analytics now but wonder about the link juice. Example: <a href="[/redirect?type=website&inn=34211&url=http%3A%2F%2Fwww.oldmontereyinn.com](view-source:https://www.cabbi.com/redirect?type=website&inn=34211&url=http%3A%2F%2Fwww.oldmontereyinn.com)" target="<a class="attribute-value">_blank</a>">www.oldmontereyinn.coma>p> What say you? Thanks to anyone that responds.
Intermediate & Advanced SEO | | ColoradoMarketingTeam0 -
How can I get Bing to index my subdomain correctly?
Hi guys, My website exists on a subdomain (i.e. https://website.subdomain.com) and is being indexed correctly on all search engines except Bing and Duck Duck Go, which list 'https://www.website.subdomain.com'. Unfortunately my subdomain isn't configured for www (the domain is out of my control), so searchers are seeing a server error when clicking on my homepage in the SERPs. I have verified the site successfully in Bing Webmaster Tools, but it still shows up incorrectly. Does anyone have any advice on how I could fix this issue? Thank you!
Intermediate & Advanced SEO | | cos20300 -
It's a good idea to have a directory on your website?
Currently I have a directory on a sub domain but Google apparently sees it as part of my main domain so all outgoing links may be affecting my rankings?
Intermediate & Advanced SEO | | Valarlf0 -
Effect duration of robots.txt file.
in my web site there is demo site in that also, index in Google but no need it now.so i have created robots file and upload to server yesterday.in the demo folder there are some html files,and i wanna remove all these in demo file from Google.but still in web master tools it showing User-agent: *
Intermediate & Advanced SEO | | innofidelity
Disallow: /demo/ How long this will take to remove from Google ? And are there any alternative way doing that ?0 -
WordPress site and Forum in Sub domain
I have a web site www.mydirectoyzzz.com (Eg site) instaled wordpress and directory theme on that. i wanna add forum to this web site phpbb or similar .so how can i use forum in main domain .wanna know how it works best for SEO www.mydirectoyzzz.com/forum or www.forum.mydirectoyzzz.com ? Adding forum to main site .is that harmful to main web site ? Looking for SEO Expert help.
Intermediate & Advanced SEO | | innofidelity0 -
Disallow my store in robots.txt?
Should I disallow my store directory in robots.txt? Here is the URL: https://www.stdtime.com/store/ Here are my reasons for suggesting this: SEOMOZ finds crawl "errors" in there that I don't care about I don't think I care if the search engines index those pages I only have one product, and it is not an impulse buy My product has a 60 day sales cycle, so price is less important than features
Intermediate & Advanced SEO | | raywhite0 -
Links directory: is it worth it?
Would there be any benefit or penalty for implementing a links directory with over 300 external links to websites that somtimes return the link? Or would it be more beneficial to simply ask for one way inbound links when gaining links? For example this section of this website: http://directory.flyawaysimulation.com/ This is their directory and most but not all of the sites in that directory link back to them. Your ideas, thoughts or suggestions are greatly appreciated.
Intermediate & Advanced SEO | | Peter2640