Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
"noindex, follow" or "robots.txt" for thin content pages
-
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great.
I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
-
trung.ngo - check out this article I posted http://www.blindfiveyearold.com/crawl-optimization
that's where I got my "inspiration" from to consider using robots.txt instead...
-
I am thinking if I exclude more thin pages from being crawled (robots.txt) that may be better than my current "noindex, follow" - the thin pages are already "noindex, follow".
You are saying "unless there's evidence that the pages are taking up too much of the crawl bandwidth, it doesn't seem like too much of an issue to me." - but how would I know this? Fair to assume for a website with 5,000 pages this is probably not an issue?
I am concerned with the "noindex, follow" Google may think "ahh, we have seen all this stuff before. Thanks for keeping out of our index, but we are still going to devalue your original content indexed pages because we crawl and see all this thin stuff." I am thinking with the robots.txt it would potentially be a stronger signal that could help my indexed pages. Or you think it is a minor and probably not relevant?
-
Hello there,
Have you had any duplicate content or crawling issues in the past or is this more of a preventative measure? If the pages, as you put it, "would not generate relevant search traffic", then I would argue that it'd make sense to "noindex, follow" based on the assumption that the pages are not currently driving search traffic, and have no real potential to contribute significantly to brand discovery via a search engine in the future.
I wouldn't necessarily say that Google crawling your page more frequently would automatically give you a boost in rankings; it's more associated with whether or not they're crawling pages frequently enough to index updates to the pages. So unless there's evidence that the pages are taking up too much of the crawl bandwidth, it doesn't seem like too much of an issue to me.
All of this to say, take a look at the data to see if a real problem exists--whether crawl resources or duplicate content--before doing anything drastic. And, of course, also understand what you'll be losing by making the updates. If you do choose to prevent crawling via robots.txt and are at all concerned with the duplicate/thin content aspect, remember to implement a noindex and confirm that the pages are removed from search results before disallowing in robots.txt--otherwise, they'll remain indexed.
-
Hi Keri, There are some good comments but none really answer this question and that is why I am trying to approach from different angles. Maybe you can shed some light on this:
AJ Kohn wrote this great article: http://www.blindfiveyearold.com/crawl-optimization - he talks about using robots.txt to exclude thin content in order to increase frequency with qhich indexed content gets crawled, supposedly helping rankings. In this great whiteboard Friday, Rand suggests using "noindex, follow" - http://moz.com/blog/handling-duplicate-content-across-large-numbers-of-urls.I am trying to get more light on this (people who have experience with this), but struggle to get answers.
-
I noticed you had similar questions at http://moz.com/community/q/unique-content-below-fold-better-move-above-fold and http://moz.com/community/q/risk-using-nofollow-tag with several answers each, including some that were marked as Good Answer. Did any of those answers help to answer your question?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
Disallow URLs ENDING with certain values in robots.txt?
Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | | jmorehouse0 -
Putting "noindex" on a page that's in an iframe... what will that mean for the parent page?
If I've got a page that is being called in an iframe, on my homepage, and I don't want that called page to be indexed.... so I put a noindex tag on the called page (but not on the homepage) what might that mean for the homepage? Nothing? Will Google, Bing, Yahoo, or anyone else, potentially see that as a noindex tag on my homepage?
Intermediate & Advanced SEO | | Philip-DiPatrizio0 -
Why is "Noindex" better than a "Canonical" for Pagination?
"Noindex" is a suggested pagination technique here: http://searchengineland.com/the-latest-greatest-on-seo-pagination-114284, and everyone seems to agree that you shouldn't canonicalize all pages in a series to the first page, but I'd love if someone can explain why "noindex" is better than a canonical?
Intermediate & Advanced SEO | | nicole.healthline0 -
Should comments and feeds be disallowed in robots.txt?
Hi My robots file is currently set up as listed below. From an SEO point of view is it good to disallow feeds, rss and comments? I feel allowing comments would be a good thing because it's new content that may rank in the search engines as the comments left on my blog often refer to questions or companies folks are searching for more information on. And the comments are added regularly. What's your take? I'm also concerned about the /page being blocked. Not sure how that benefits my blog from an SEO point of view as well. Look forward to your feedback. Thanks. Eddy User-agent: Googlebot Crawl-delay: 10 Allow: /* User-agent: * Crawl-delay: 10 Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /rss/ Disallow: /comments/feed/ Disallow: /page/ Disallow: /date/ Disallow: /comments/ # Allow Everything Allow: /*
Intermediate & Advanced SEO | | workathomecareers0 -
Artist Bios on Multiple Pages: Duplicate Content or not?
I am currently working on an eComm site for a company that sells art prints. On each print's page, there is a bio about the artist followed by a couple of paragraphs about the print. My concern is that some artists have hundreds of prints on this site, and the bio is reprinted on every page,which makes sense from a usability standpoint, but I am concerned that it will trigger a duplicate content penalty from Google. Some people are trying to convince me that Google won't penalize for this content, since the intent is not to game the SERPs. However, I'm not confident that this isn't being penalized already, or that it won't be in the near future. Because it is just a section of text that is duplicated, but the rest of the text on each page is original, I can't use the rel=canonical tag. I've thought about putting each artist bio into a graphic, but that is a huge undertaking, and not the most elegant solution. Could I put the bio on a separate page with only the artist's info and then place that data on each print page using an <iframe>and then put a noindex,nofollow in the robots.txt file?</p> <p>Is there a better solution? Is this effort even necessary?</p> <p>Thoughts?</p></iframe>
Intermediate & Advanced SEO | | sbaylor0 -
NOINDEX or NOINDEX,FOLLOW
Currently we employ this tag on pages we want to keep out of the index but want link juice to flow through them: <META NAME="ROBOTS" CONTENT="NOINDEX"> Is the tag above the same as: <META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW"> Or should we be specifying the "FOLLOW" in our tag?
Intermediate & Advanced SEO | | Peter2640 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1