Google showing high volume of URLs blocked by robots.txt in in index-should we be concerned?
-
if we search site:domain.com vs www.domain.com, We see: 130,000 vs 15,000 results. When reviewing the site:domain.com results, we're finding that the majority of the URLs showing are blocked by robots.txt. They are subdomains that we use as production environments (and contain similar content as the rest of our site).
And, we also find the message "In order to show you the most relevant results, we have omitted some entries very similar to the 541 already displayed." SEER Interactive mentions that this is one way to gauge a Panda penalty: http://www.seerinteractive.com/blog/100-panda-recovery-what-we-learned-to-identify-issues-get-your-traffic-back
We were hit by Panda some time back--is this an issue we should address? Should we unblock the subdomains and add noindex, follow?
-
I think it's worth it. I'm not sure what CMS you're using, but it shouldn't take much time to add noindex,follow to the header of all your pages, and then remove the robots.txt directive that's preventing them from being crawled.
-
thanks--I am concerned about if we should go through the process of unblocking them--they are all showing in the SERPs with the "This URL is blocked by robots.txt"--is it worrisome that such a large % of our URLs in the SERPs are showing as blocked by robots.txt with the "omitted from search results" message?
-
If Google has already crawled/indexed the subdomains before, then adding noindex, follow is probably the best approach. This is because if you just block the sites with robots.txt, Google will still know that they pages exist, but won't be able to crawl them, resulting in it taking a long time for the pages to be de-indexed, if ever. Additionally, if those subdomains have any links, then that link value is lost because Google can't crawl the pages.
Adding noindex,follow will tell Google definitely to remove those subdomains from their index, as well as help preserve any link equity they've accumulated.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to stop URLs that include query strings from being indexed by Google
Hello Mozzers Would you use rel=canonical, robots.txt, or Google Webmaster Tools to stop the search engines indexing URLs that include query strings/parameters. Or perhaps a combination? I guess it would be a good idea to stop the search engines crawling these URLs because the content they display will tend to be duplicate content and of low value to users. I would be tempted to use a combination of canonicalization and robots.txt for every page I do not want crawled or indexed, yet perhaps Google Webmaster Tools is the best way to go / just as effective??? And I suppose some use meta robots tags too. Does Google take a position on being blocked from web pages. Thanks in advance, Luke
Intermediate & Advanced SEO | | McTaggart0 -
Google indexed wrong pages of my website.
When I google site:www.ayurjeewan.com, after 8 pages, google shows Slider and shop pages. Which I don't want to be indexed. How can I get rid of these pages?
Intermediate & Advanced SEO | | bondhoward0 -
How to take out international URL from google US index/hreflang help
Hi Moz Community, Weird/confusing question so I'll try my best. The company I work for also has an Australian retail website. When you do a site:ourbrand.com search the second result that pops up is au.brand.com, which redirects to the actual brand.com.au website. The Australian site owner removed this redirect per my bosses request and now it leads to a an unavailable webpage. I'm confused as to best approach, is there a way to noindex the au.brand.com URL from US based searches? My only problem is that the au.brand.com URL is ranking higher than all of the actual US based sub-cat pages when using a site search. Is this an appropriate place for an hreflang tag? Let me know how I can help clarify the issue. Thanks,
Intermediate & Advanced SEO | | IceIcebaby
-Reed0 -
Schema.org on Product Page showing strange result if you post url in google
Hi All, We have implemented Schema.org for our products and currently if you put the url in google, the results showing up are not the meta description but some of the schema.org content along with some other rubbish at the bottom . Do you know if we are doing this wrong as in GWT it all looks okay and says it fine? You can get the url from here -http://goo.gl/aSFPqP Any assistance, greatly appreciated. thanks peter
Intermediate & Advanced SEO | | PeteC120 -
Google Webmaster Remove URL Tool
Hi All, To keep this example simple.
Intermediate & Advanced SEO | | Mark_Ch
You have a home page. The home page links to 4 pages (P1, P2, P3, P4). ** Home page**
P1 P2 P3 P4 You now use Google Webmaster removal tool to remove P4 webpage and cache instance. 24 hours later you check and see P4 has completely disappeared. You now remove the link from the home page pointing to P4. My Question
Does Google now see only pages P1, P2 & P3 and therefore allocate link juice at a rate of 33.33% each. Regards Mark0 -
Robot.txt File Not Appearing, but seems to be working?
Hi Mozzers, I am conducting a site audit for a client, and I am confused with what they are doing with their robot.txt file. It shows in GWT that there is a file and it is blocking about 12K URLs (image attached). It also shows in GWT that the file was downloaded 10 hours ago successfully. However, when I go to the robot.txt file link, the page is blank. Would they be doing something advanced to be blocking URLs to hide it it from users? It appears to correctly be blocking log-ins, but I would like to know for sure that it is working correctly. Any advice on this would be most appreciated. Thanks! Jared ihgNxN7
Intermediate & Advanced SEO | | J-Banz0 -
Reciprocal Links and nofollow/noindex/robots.txt
Hypothetical Situations: You get a guest post on another blog and it offers a great link back to your website. You want to tell your readers about it, but linking the post will turn that link into a reciprocal link instead of a one way link, which presumably has more value. Should you nofollow your link to the guest post? My intuition here, and the answer that I expect, is that if it's good for users, the link belongs there, and as such there is no trouble with linking to the post. Is this the right way to think about it? Would grey hats agree? You're working for a small local business and you want to explore some reciprocal link opportunities with other companies in your niche using a "links" page you created on your domain. You decide to get sneaky and either noindex your links page, block the links page with robots.txt, or nofollow the links on the page. What is the best practice? My intuition here, and the answer that I expect, is that this would be a sneaky practice, and could lead to bad blood with the people you're exchanging links with. Would these tactics even be effective in turning a reciprocal link into a one-way link if you could overlook the potential immorality of the practice? Would grey hats agree?
Intermediate & Advanced SEO | | AnthonyMangia0 -
Title tag showing in Google that we are not setting
Hello, We've noticed that when we do a specific search (print screen attached), that the business name and/or a completely different title is getting indexed into the search engine that we are not setting. Below is an example from the source code of how we're setting the title, this matches the 2nd listing circled in the attached image. The indexed title tag reflects "Animal Business Card Holders - Kyle Design" Any ideas or feedback on how this is happening? <title>Animal Business Card Cases in Pet, Insect and Wildlife Designstitle> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta name="description" content="Eye-catching business card holder cases personalized with custom animal designs for humane professionals and pet owners. Custom select a sleek metal finish, bold aluminum or iridescent accent color, size and unique design for the ultimate self-expressing animal gift!" /> <meta name="keywords" content="business card holder unique personalized custom holders silver gold wood metal cards cases sleek aluminum engraved contemporary case animal animals design designs black color accents iridescent pet insect wildlife cat dog dragonfly butterfly lions sea turtles sea otters elephants animal lover animal activist zoologist veterinarian breeder animal whisperer thin deep large credit Asian size engraving personalize gift gifts special monogram customized corporate logo name professional title meaningful sentiment" /> <meta name="copyright" content="Copyright Kyle Design" /> <meta name="author" content="Kyle Design" />
Intermediate & Advanced SEO | | marketing_zoovy.com
<meta name="generator" content="xyz Commerce System http://www.domain.com/" />
<link rel="canonical" href="xyz link"
<script type="text/javaScript"> Thanks,
Jamie0