Robots.txt: how to exclude sub-directories correctly?
-
Hello here,
I am trying to figure out the correct way to tell SEs to crawls this:
http://www.mysite.com/directory/
But not this:
http://www.mysite.com/directory/sub-directory/
or this:
http://www.mysite.com/directory/sub-directory2/sub-directory/...
But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way:
disallow: /directory/sub-directory/
disallow: /directory/sub-directory2/
disallow: /directory/sub-directory/sub-directory/
disallow: /directory/sub-directory2/subdirectory/
etc...
I would end up having thousands of definitions to disallow all the possible sub-directory combinations.
So, is the following way a correct, better and shorter way to define what I want above:
allow: /directory/$
disallow: /directory/*
Would the above work?
Any thoughts are very welcome! Thank you in advance.
Best,
Fab.
-
I mentioned both. You add a meta robots to noindex and remove from the sitemap.
-
But google is still free to index a link/page even if it is not included in xml sitemap.
-
Install Yoast Wordpress SEO plugin and use that to restrict what is indexed and what is allowed in a sitemap.
-
I am using wordpress, Enfold theme (themeforest).
I want some files to be accessed by google, but those should not be indexed.
Here is an example: http://prntscr.com/h8918o
I have currently blocked some JS directories/files using robots.txt (check screenshot)
But due to this I am not able to pass Mobile Friendly Test on Google: http://prntscr.com/h8925z (check screenshot)
Is its possible to allow access, but use a tag like noindex in the robots.txt file. Or is there any other way out.
-
Yes, everything looks good, Webmaster Tools gave me the expected results with the following directives:
allow: /directory/$
disallow: /directory/*
Which allows this URL:
http://www.mysite.com/directory/
But doesn't allow the following one:
http://www.mysite.com/directory/sub-directory2/...
This page also gives an update similar to mine:
https://support.google.com/webmasters/answer/156449?hl=en
I think I am good! Thanks
-
Thank you Michael, it is my understanding then that my idea of doing this:
allow: /directory/$
disallow: /directory/*
Should work just fine. I will test it within Google Webmaster Tools, and let you know if any problems arise.
In the meantime if anyone else has more ideas about all this and can confirm me that would be great!
Thank you again.
-
I've always stuck to Disallow and followed -
"This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:"
http://www.robotstxt.org/robotstxt.html
From https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt this seems contradictory
|
/*
| equivalent to / | equivalent to / | Equivalent to "/" -- the trailing wildcard is ignored. |I think this post will be very useful for you - http://moz.com/community/q/allow-or-disallow-first-in-robots-txt
-
Thank you Michael,
Google and other SEs actually recognize the "allow:" command:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
The fact is: if I don't specify that, how can I be sure that the following single command:
disallow: /directory/*
Doesn't prevent SEs to spider the /directory/ index as I'd like to?
-
As long as you dont have directories somewhere in /* that you want indexed then I think that will work. There is no allow so you don't need the first line just
disallow: /directory/*
You can test out here- https://support.google.com/webmasters/answer/156449?rd=1
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How do I know if I am correctly solving an uppercase url issue that may be affecting Googlebot?
We have a large e-commerce site (10k+ SKUs). https://www.flagandbanner.com. As I have begun analyzing how to improve it I have discovered that we have thousands of urls that have uppercase characters. For instance: https://www.flagandbanner.com/Products/patriotic-paper-lanterns-string-lights.asp. This is inconsistently applied throughout the site. I directed our website vendor to fix the issue and they placed 301 redirects via a rule to the web.config file. Any url that contains an uppercase character now displays as a lowercase. However, as I use screaming frog to monitor our site, I see all these 301 redirects--thousands of them. The XML sitemap still shows the the uppercase versions. We have had indexing issues as well. So I'm wondering what is the most effective way to make sure that I'm not placing an extra burden on Googlebot when they index our site? Should I have just not cared about the uppercase issue and let it alone?
Intermediate & Advanced SEO | | webrocket0 -
Set Robots.txt file to crawl my website at specific times
Our website provider has stated that they can only 'lift' their block on our website in order for it to be crawled as specific times. Is there any way to amend a robots.txt to ensure that it crawls our website at a specific time of day/night in order to coincide with the block being lifted? Many Thanks, Charlene
Intermediate & Advanced SEO | | CharleneKennedy120 -
If I 301 redirect a sub-page that is #1, will I risk losing SERP?
I have a site that for some reason Google decided to rank one of our articles #1 for a fairly competitive term. The article is kind of a BS blog post and I want to 301 it to our page about the topic as that's designed for conversion. If I do this, will we risk losing the ranking? If so, what are other options? Can I change the content of the ranked page to something closer to our landing page? Any advice is welcome!
Intermediate & Advanced SEO | | dk80 -
Do I need to worry about sub-domains?
Hi Moz commnity, Our website ranking was good and dropped for couple of recent months. We have around 10 sub-domains. I doubt them if they are hurting us. Being said all over in SEO industry like the sub-domains are completely different websites; will they hurt if they are not well optimised? And we have many links from our sub-domains to website top pages, is this wrong for Google? How to well maintain the sub-domains? Do I need to worry about them? Thanks
Intermediate & Advanced SEO | | vtmoz0 -
Robots.txt - blocking JavaScript and CSS, best practice for Magento
Hi Mozzers, I'm looking for some feedback regarding best practices for setting up Robots.txt file in Magento. I'm concerned we are blocking bots from crawling essential information for page rank. My main concern comes with blocking JavaScript and CSS, are you supposed to block JavaScript and CSS or not? You can view our robots.txt file here Thanks, Blake
Intermediate & Advanced SEO | | LeapOfBelief0 -
SEO connectivity between domains and sub-domains
Hi, My web site georgerossphotography.com and my ecommerce site store.georgerossphotography.com each reside on different servers. georgerossphotography.com has a domain authority of 30 store.georgerossphotography.com has a domain authority of 30 Clearly, they are considered two individual sites but is there any way that I can boost the performance of the primary domain by passing along some for that good SEO juice from the sub-domain? Any input would be gratefully received. Regards,
Intermediate & Advanced SEO | | sirgeorge0 -
Does Yahoo Directory Listing Pass Authority with PA:0 and 0 links from 0 Root Domains?
So we already have our brand listed in Yahoo Directory for a few years but today I noticed it is not listed in OSE and the pages we're listed on in Yahoo Dir are PA:0 / DA: 100 with 0 links from 0 Root Domains! (or with a PA:1) Does this mean no juice is being passed at all for this listing? Does it mean it is not even spidered by Google then as how can it be found if no inlinks? Does any authority still get passed from Yahoos domain with DA100 despite pages being PA0? I ask because I'm considering adding another company to Yahoo Dir to get some authority rather than traffic.
Intermediate & Advanced SEO | | emerald0 -
Correcting an unnatural link profile
A site I work with ranked page 1 for a competitive keyphrase until recently. (Not Panda-related as far as we can tell.) We've done extensive on-site tweaking and the page is still parked at 27-32 in the SERPs. We believe the only viable explanation at this point is an unnatural link profile. Over the course of several years the site has racked up a large collection of footer links with anchor text due to business relationships with the sites in question. So the profile is now skewed, with the result as follows: 100,000 domain links (top 10 competitors range 1800-50k) 87% anchor text optimized (competitors 0-41%) 99% follow links (competitors 85-100%) The vast majority of links are footer links We're working on creating more natural, high-value links but this of course takes time. In the short term, two questions: Should we aim to remove or change some of the footer links? If so, do we remove them, or just change anchor text? How many? How many new links should we pursue each month to make a meaningful impact on the profile without being too aggressive? Any other thoughts on how to fix this are also appreciated. Thanks!
Intermediate & Advanced SEO | | kdcomms0