Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
"noindex, follow" or "robots.txt" for thin content pages
-
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great.
I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
-
trung.ngo - check out this article I posted http://www.blindfiveyearold.com/crawl-optimization
that's where I got my "inspiration" from to consider using robots.txt instead...
-
I am thinking if I exclude more thin pages from being crawled (robots.txt) that may be better than my current "noindex, follow" - the thin pages are already "noindex, follow".
You are saying "unless there's evidence that the pages are taking up too much of the crawl bandwidth, it doesn't seem like too much of an issue to me." - but how would I know this? Fair to assume for a website with 5,000 pages this is probably not an issue?
I am concerned with the "noindex, follow" Google may think "ahh, we have seen all this stuff before. Thanks for keeping out of our index, but we are still going to devalue your original content indexed pages because we crawl and see all this thin stuff." I am thinking with the robots.txt it would potentially be a stronger signal that could help my indexed pages. Or you think it is a minor and probably not relevant?
-
Hello there,
Have you had any duplicate content or crawling issues in the past or is this more of a preventative measure? If the pages, as you put it, "would not generate relevant search traffic", then I would argue that it'd make sense to "noindex, follow" based on the assumption that the pages are not currently driving search traffic, and have no real potential to contribute significantly to brand discovery via a search engine in the future.
I wouldn't necessarily say that Google crawling your page more frequently would automatically give you a boost in rankings; it's more associated with whether or not they're crawling pages frequently enough to index updates to the pages. So unless there's evidence that the pages are taking up too much of the crawl bandwidth, it doesn't seem like too much of an issue to me.
All of this to say, take a look at the data to see if a real problem exists--whether crawl resources or duplicate content--before doing anything drastic. And, of course, also understand what you'll be losing by making the updates. If you do choose to prevent crawling via robots.txt and are at all concerned with the duplicate/thin content aspect, remember to implement a noindex and confirm that the pages are removed from search results before disallowing in robots.txt--otherwise, they'll remain indexed.
-
Hi Keri, There are some good comments but none really answer this question and that is why I am trying to approach from different angles. Maybe you can shed some light on this:
AJ Kohn wrote this great article: http://www.blindfiveyearold.com/crawl-optimization - he talks about using robots.txt to exclude thin content in order to increase frequency with qhich indexed content gets crawled, supposedly helping rankings. In this great whiteboard Friday, Rand suggests using "noindex, follow" - http://moz.com/blog/handling-duplicate-content-across-large-numbers-of-urls.I am trying to get more light on this (people who have experience with this), but struggle to get answers.
-
I noticed you had similar questions at http://moz.com/community/q/unique-content-below-fold-better-move-above-fold and http://moz.com/community/q/risk-using-nofollow-tag with several answers each, including some that were marked as Good Answer. Did any of those answers help to answer your question?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
Does redirecting from a "bad" domain "infect" the new domain?
Hi all, So a complicated question that requires a little background. I bought unseenjapan.com to serve as a legitimate news site about a year ago. Social media and content growth has been good. Unfortunately, one thing I didn't realize when I bought this domain was that it used to be a porn site. I've managed to muck out some of the damage already - primarily, I got major vendors like Macafee and OpenDNS to remove the "porn" categorization, which has unblocked the site at most schools & locations w/ public wifi. The sticky bit, however, is Google. Google has the domain filtered under SafeSearch, which means we're losing - and will continue to lose - a ton of organic traffic. I'm trying to figure out how to deal with this, and appeal the decision. Unfortunately, Google's Reconsideration Request form currently doesn't work unless your site has an existing manual action against it (mine does not). I've also heard such requests, even if I did figure out how to make them, often just get ignored for months on end. Now, I have a back up plan. I've registered unseen-japan.com, and I could just move my domain over to the new domain if I can't get this issue resolved. It would allow me to be on a domain with a clean history while not having to change my brand. But if I do that, and I set up 301 redirects from the former domain, will it simply cause the new domain to be perceived as an "adult" domain by Google? I.e., will the former URL's bad reputation carry over to the new one? I haven't made a decision one way or the other yet, so any insights are appreciated.
Intermediate & Advanced SEO | | gaiaslastlaugh0 -
Does having alot of pages with noindex and nofollow tags affect rankings?
We are an e-commerce marketplace at for alternative fashion and home decor. We have over 1000+ stores on the marketplace. Early this year, we switched the website from HTTP to HTTPS in March 2018 and also added noindex and nofollow tags to the store about page and store policies (mostly boilerplate content) Our traffic dropped by 45% and we have since not recovered. We have done I am wondering could these tags be affecting our rankings?
Intermediate & Advanced SEO | | JimJ1 -
Landing pages for paid traffic and the use of noindex vs canonical
A client of mine has a lot of differentiated landing pages with only a few changes on each, but with the same intent and goal as the generic version. The generic version of the landing page is included in navigation, sitemap and is indexed on Google. The purpose of the differentiated landing pages is to include the city and some minor changes in the text/imagery to best fit the Adwords text. Other than that, the intent and purpose of the pages are the same as the main / generic page. They are not to be indexed, nor am I trying to have hidden pages linking to the generic and indexed one (I'm not going the blackhat way). So – I want to avoid that the duplicate landing pages are being indexed (obviously), but I'm not sure if I should use noindex (nofollow as well?) or rel=canonical, since these landing pages are localized campaign versions of the generic page with more or less only paid traffic to them. I don't want to be accidentally penalized, but I still need the generic / main page to rank as high as possible... What would be your recommendation on this issue?
Intermediate & Advanced SEO | | ostesmorbrod0 -
Are HTML Sitemaps Still Effective With "Noindex, Follow"?
A site we're working on has hundreds of thousands of inventory pages that are generally "orphaned" pages. To reach them, you need to do a lot of faceting on the search results page. They appear in our XML sitemaps as well, but I'd still consider these orphan pages. To assist with crawling and indexation, we'd like to create HTML sitemaps to link to these pages. Due to the nature (and categorization) of these products, this would mean we'll be creating thousands of individual HTML sitemap pages, which we're hesitant to put into the index. Would the sitemaps still be effective if we add a noindex, follow meta tag? Does this indicate lower quality content in some way, or will it make no difference in how search engines will handle the links therein?
Intermediate & Advanced SEO | | mothner0 -
Disallow URLs ENDING with certain values in robots.txt?
Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | | jmorehouse0 -
Recovering from robots.txt error
Hello, A client of mine is going through a bit of a crisis. A developer (at their end) added Disallow: / to the robots.txt file. Luckily the SEOMoz crawl ran a couple of days after this happened and alerted me to the error. The robots.txt file was quickly updated but the client has found the vast majority of their rankings have gone. It took a further 5 days for GWMT to file that the robots.txt file had been updated and since then we have "Fetched as Google" and "Submitted URL and linked pages" in GWMT. In GWMT it is still showing that that vast majority of pages are blocked in the "Blocked URLs" section, although the robots.txt file below it is now ok. I guess what I want to ask is: What else is there that we can do to recover these rankings quickly? What time scales can we expect for recovery? More importantly has anyone had any experience with this sort of situation and is full recovery normal? Thanks in advance!
Intermediate & Advanced SEO | | RikkiD220 -
Removing Dynamic "noindex" URL's from Index
6 months ago my clients site was overhauled and the user generated searches had an index tag on them. I switched that to noindex but didn't get it fast enough to avoid being 100's of pages indexed in Google. It's been months since switching to the noindex tag and the pages are still indexed. What would you recommend? Google crawls my site daily - but never the pages that I want removed from the index. I am trying to avoid submitting hundreds of these dynamic URL's to the removal tool in webmaster tools. Suggestions?
Intermediate & Advanced SEO | | BeTheBoss0