Panda Updates - robots.txt or noindex?
-
Hi,
I have a site that I believe has been impacted by the recent Panda updates. Assuming that Google has crawled and indexed several thousand pages that are essentially the same and the site has now passed the threshold to be picked out by the Panda update, what is the best way to proceed?
Is it enough to block the pages from being crawled in the future using robots.txt, or would I need to remove the pages from the index using the meta noindex tag? Of course if I block the URLs with robots.txt then Googlebot won't be able to access the page in order to see the noindex tag.
Anyone have and previous experiences of doing something similar?
Thanks very much.
-
This is a good read. http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world I think you should be careful with robot.txt because blocking access to the bot will not cause them to remove the content from their index. They will simply include a message saying not quite sure what's on this page.. I would use noindex to clear out the index first before attempting robot.txt exclusion.
-
Yes, both because if a page is linked to on another site google with spider that other site and follow your link without hitting the robots.txt and the page could get indexed if there is not a noindex on it.
-
Indeed try both.
Irving +1
-
both. block the lowest quality lowest traffic pages with nodindex and block the folder in robots.txt
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What should I do after a failed request for validation (error with noindex, nofollow) in new Google Search Console?
Hi guys, We have the following situation: After an error message in new google search console for a large amount of pages with noindex, nofollow tag, a validation is requested before the problem is fixed. (it's incredibly stupid decision taken before asking the SEO team for advice) Google starts the validation, crawls 9 URLs and changes the status to "Failed". All other URLs are still in "pending" status. The problem has been fixed for more than 10 days, but apparently Google doesn't crawl the pages and none of the URLs is back in the index. We tried pinging several pages and html sitemaps, but there is no result. Do you think we should request for re-validation or wait more time? It there something more we could do to speed up the process?
Intermediate & Advanced SEO | | ParisChildress0 -
NOINDEX,NOFOLLOW Mistake
One of our top organic landing page was set as "NOINDEX,NOFOLLOW" by "mistake". I took me about a week to realize this after I saw a drop of traffic on that page. I looked on Google to see if it was indexed and my fear were confirmed! After finding our that it was switched to "NOINDEX,NOFOLLOW" I switched it back to "INDEX,FOLLOW" and did an index request in our Google Search Console. Anyone else has run into a similar issue? Did you ever got the page inxed again?
Intermediate & Advanced SEO | | FrankViolette2 -
Blog content and panda?
If we want to create a blog to keep in front of our customers (via email and posting on social) and the posts will be around 300 - 1000 words like this site http://www.solopress.com/blog/ are we going to be asking for a panda slap as the issue would be the very little shares and traction after the first day or two. Also would panda only affect the blogs that are crap if we mix in a couple of really good posts or would it affect theses as well and possibly even the site? Any help would be appreciated.
Intermediate & Advanced SEO | | BobAnderson0 -
Wildcarding Robots.txt for Particular Word in URL
Hey All, So I know that this isn't a standard robots.txt, I'm aware of how to block or wildcard certain folders but I'm wondering whether it's possible to block all URL's with a certain word in it? We have a client that was hacked a year ago and now they want us to help remove some of the pages that were being autogenerated with the word "viagra" in it. I saw this article and tried implementing it https://builtvisible.com/wildcards-in-robots-txt/ and it seems that I've been able to remove some of the URL's (although I can't confirm yet until I do a full pull of the SERPs on the domain). However, when I test certain URL's inside of WMT it still says that they are allowed which makes me think that it's not working fully or working at all. In this case these are the lines I've added to the robots.txt Disallow: /*&viagra Disallow: /*&Viagra I know I have the solution of individually requesting URL's to be removed from the index but I want to see if anybody has every had success with wildcarding URL's with a certain word in their robots.txt? The individual URL route could be very tedious. Thanks! Jon
Intermediate & Advanced SEO | | EvansHunt0 -
Do you know a case where product variations caused panda?
Would like to add 300 products to ecommerce site of which 150 products are just variations in different colors. In this particular case there are some reaons for: Not writing partially unique product descriptions for each product variation Not setting them up as variations on one product page So I would have several product pages with nearly identical product descriptions (proprietary description written by us), just name of color in title and description and EAN in description being different, as well as over time different user generated content showing up. Also different product images used. I would not mind if google would not index some product variations. Do you think I should be concerned about Panda? Do you know any website which had a Panda problem caused by product variations? Thanks
Intermediate & Advanced SEO | | lcourse0 -
Internal structure update
How often does google update the internal linking structure of a website ? Thank you,
Intermediate & Advanced SEO | | seoanalytics0 -
Robots.txt: Can you put a /* wildcard in the middle of a URL?
We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt. For example: Disallow: /images/ is blocked just fine However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed. The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds?
Intermediate & Advanced SEO | | IHSwebsite0 -
Internal linking using exact keywords Bad – Post Panda
Internal linking using exact keywords Bad – Post Panda Any tips for internal linking ,how can we target landing pages to rank well using internal linking. Use of Keywords in within site links, is it bad? Are footer links bad? Use of Siloing artichture bad or good? What a best linking model for Ecommerce Site.
Intermediate & Advanced SEO | | conversiontactics0