Google showing high volume of URLs blocked by robots.txt in in index-should we be concerned?
-
if we search site:domain.com vs www.domain.com, We see: 130,000 vs 15,000 results. When reviewing the site:domain.com results, we're finding that the majority of the URLs showing are blocked by robots.txt. They are subdomains that we use as production environments (and contain similar content as the rest of our site).
And, we also find the message "In order to show you the most relevant results, we have omitted some entries very similar to the 541 already displayed." SEER Interactive mentions that this is one way to gauge a Panda penalty: http://www.seerinteractive.com/blog/100-panda-recovery-what-we-learned-to-identify-issues-get-your-traffic-back
We were hit by Panda some time back--is this an issue we should address? Should we unblock the subdomains and add noindex, follow?
-
I think it's worth it. I'm not sure what CMS you're using, but it shouldn't take much time to add noindex,follow to the header of all your pages, and then remove the robots.txt directive that's preventing them from being crawled.
-
thanks--I am concerned about if we should go through the process of unblocking them--they are all showing in the SERPs with the "This URL is blocked by robots.txt"--is it worrisome that such a large % of our URLs in the SERPs are showing as blocked by robots.txt with the "omitted from search results" message?
-
If Google has already crawled/indexed the subdomains before, then adding noindex, follow is probably the best approach. This is because if you just block the sites with robots.txt, Google will still know that they pages exist, but won't be able to crawl them, resulting in it taking a long time for the pages to be de-indexed, if ever. Additionally, if those subdomains have any links, then that link value is lost because Google can't crawl the pages.
Adding noindex,follow will tell Google definitely to remove those subdomains from their index, as well as help preserve any link equity they've accumulated.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Displaying Vanity URL in Google Search Result
Hi Moz! Not sure if this has been asked before, but is there any way to tell Google to display a vanity URL (that has been 301d) instead of the actual URL in the SERP? Example: www.domainA.com is a vanity URL (bought specifically for Brand Identity reasons) that redirects to www.domainB.com. Is it possible to have the domainA Url show up in Google for a Branded search query? Thanks in advance! Arjun
Intermediate & Advanced SEO | | Lauriedechaseaux0 -
In the google index but search redirects to homepage
Hi everyone, thanks for reading i have a website "www.gardeners.scot" and have the following pages listed in google site: command http://www.gardeners.scot/garden-landscaping-Edinburgh.htm & http://www.gardeners.scot/garden-maintenance-Edinburgh.htm however when a user searches for "garden landscaping Edinburgh" or "garden maintenance Edinburgh" we are in the rankings but google search links these phrases to the home page not to their targeted pages. the site is about a year old have checked the robots.txt, sitemap.xml & .htaccess files but can see anything wrong there. any ideas out there?
Intermediate & Advanced SEO | | livingphilosophy0 -
Google Index Status Falling Fast - What should I be considering?
Hi Folks, Working on an ecommerce site. I have found a month on month fall in the Index Status continuing since late 2015. This has resulted in around 80% of pages indexed according to Webmaster. I do not seem to have any bad links or server issues. I am in the early stages of working through, updating content and tags but am yet to see a slowing of the fall. If anybody has tips on where to look for to issues or insight to resolve this I would really appreciate it. Thanks everybody! Tim
Intermediate & Advanced SEO | | Toby-Symec0 -
PDF Cached by Google, but not showing as link
The following pdf is cached by google: http://www.sba.gov/sites/default/files/files/REFERRAL%20LIST%20OF%20BOND%20AGENCIES_Florida.pdf However, OpenSiteExplorer is not listing any of the links as found in it. With such an authoritative site, I would think Google would value this, right? None of the sites listed rank well though and OpenSiteExplorer's inability to see the links makes me wonder if Google provides these sites any value at all. Is there any link juice or brand mention value here for Google?
Intermediate & Advanced SEO | | TheDude0 -
Google don't index .ee version of a website
Hello, We have a problem with our clients website .ee. This website was developed by another company and now we don't know what is wrong with it. If i do a Google search "site:.ee" it only finds konelux.ee homepage and nothing else. Also homepage title tag and meta dec is in Finnish language not in Estonian language. If i look at .ee/robots.txt it looks like robots.txt don't block Google access. Any ideas what can be wrong here? BR, T
Intermediate & Advanced SEO | | sfinance0 -
More Indexed Pages than URLs on site.
According to webmaster tools, the number of pages indexed by Google on my site doubled yesterday (gone from 150K to 450K). Usually I would be jumping for joy but now I have more indexed pages than actual pages on my site. I have checked for duplicate URLs pointing to the same product page but can't see any, pagination in category pages doesn't seem to be indexed nor does parameterisation in URLs from advanced filtration. Using the site: operator we get a different result on google.com (450K) to google.co.uk (150K). Anyone got any ideas?
Intermediate & Advanced SEO | | DavidLenehan0 -
Is our robots.txt file correct?
Could you please review our robots.txt file and let me know if this is correct. www.faithology.com/robots.txt Thank you!
Intermediate & Advanced SEO | | BMPIRE0 -
What content should I block in wodpress with robots.txt?
I need to know if anyone has tips on creating a good robots.txt. I have read a lot of info, but I am just not clear on what I should allow and not allow on wordpress. For example there are pages and posts, then attachments, wp-admin, wp-content and so on. Does anyone have a good robots.txt guideline?
Intermediate & Advanced SEO | | ENSO0