Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
"noindex, follow" or "robots.txt" for thin content pages
-
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great.
I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
-
trung.ngo - check out this article I posted http://www.blindfiveyearold.com/crawl-optimization
that's where I got my "inspiration" from to consider using robots.txt instead...
-
I am thinking if I exclude more thin pages from being crawled (robots.txt) that may be better than my current "noindex, follow" - the thin pages are already "noindex, follow".
You are saying "unless there's evidence that the pages are taking up too much of the crawl bandwidth, it doesn't seem like too much of an issue to me." - but how would I know this? Fair to assume for a website with 5,000 pages this is probably not an issue?
I am concerned with the "noindex, follow" Google may think "ahh, we have seen all this stuff before. Thanks for keeping out of our index, but we are still going to devalue your original content indexed pages because we crawl and see all this thin stuff." I am thinking with the robots.txt it would potentially be a stronger signal that could help my indexed pages. Or you think it is a minor and probably not relevant?
-
Hello there,
Have you had any duplicate content or crawling issues in the past or is this more of a preventative measure? If the pages, as you put it, "would not generate relevant search traffic", then I would argue that it'd make sense to "noindex, follow" based on the assumption that the pages are not currently driving search traffic, and have no real potential to contribute significantly to brand discovery via a search engine in the future.
I wouldn't necessarily say that Google crawling your page more frequently would automatically give you a boost in rankings; it's more associated with whether or not they're crawling pages frequently enough to index updates to the pages. So unless there's evidence that the pages are taking up too much of the crawl bandwidth, it doesn't seem like too much of an issue to me.
All of this to say, take a look at the data to see if a real problem exists--whether crawl resources or duplicate content--before doing anything drastic. And, of course, also understand what you'll be losing by making the updates. If you do choose to prevent crawling via robots.txt and are at all concerned with the duplicate/thin content aspect, remember to implement a noindex and confirm that the pages are removed from search results before disallowing in robots.txt--otherwise, they'll remain indexed.
-
Hi Keri, There are some good comments but none really answer this question and that is why I am trying to approach from different angles. Maybe you can shed some light on this:
AJ Kohn wrote this great article: http://www.blindfiveyearold.com/crawl-optimization - he talks about using robots.txt to exclude thin content in order to increase frequency with qhich indexed content gets crawled, supposedly helping rankings. In this great whiteboard Friday, Rand suggests using "noindex, follow" - http://moz.com/blog/handling-duplicate-content-across-large-numbers-of-urls.I am trying to get more light on this (people who have experience with this), but struggle to get answers.
-
I noticed you had similar questions at http://moz.com/community/q/unique-content-below-fold-better-move-above-fold and http://moz.com/community/q/risk-using-nofollow-tag with several answers each, including some that were marked as Good Answer. Did any of those answers help to answer your question?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disallow URLs ENDING with certain values in robots.txt?
Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | | jmorehouse0 -
Should I be using meta robots tags on thank you pages with little content?
I'm working on a website with hundreds of thank you pages, does it make sense to no follow, no index these pages since there's little content on them? I'm thinking this should save me some crawl budget overall but is there any risk in cutting out the internal links found on the thank you pages? (These are only standard site-wide footer and navigation links.) Thanks!
Intermediate & Advanced SEO | | GSO0 -
301 redirection pointing to noindexed pages
I have rather an unusual situation where a recently launched affiliate site does not have any unique content as its all syndicated content. For that reason we are currently using the noindex,nofollow meta tags to keep the pages out of the search engines index until we create unique content for the pages. The problem is that due to a very tight timeframe with rebranding, we are looking at 301 redirecting (on a page to page basis) another high authority legacy domain to this new site before we have had a chance to add unique content to it and remove the noindex,nofollow tags. I would assume that any link authority normally passed through the 301 would be lost in this scenario but Im uncertain of what the broader impact might be. Has anyone dealt with a similar scenario? I know this scenario is not ideal and I would rather wait until the unique content is up and noindex tags are removed before launching the 301 redirect of the legacy domain but there are a number of competing priorities at play outside of SEO.
Intermediate & Advanced SEO | | LosNomads0 -
Pipe ("|") in my website's title is being replaced with ":" in Google results
Hi , One of the websites I'm promoting and working on is www.pau-brasil.co.il.
Intermediate & Advanced SEO | | Kadel
It's wordpress-based website and as you can see the html's Title is "PauBrasil | some hebrew slogan".
(Screenshot: http://i.imgur.com/2f80EEY.gif)
When I'm searching for "PauBrasil" (Which is the brand's name) , one of the results google shows is "PauBrasil: Some Hebrew Slogan" (Screenshot: http://i.imgur.com/eJxNHrO.gif ) Why does the pipe is being replaced with ":" ?
And not just that , as you can see there's a "blank space" missing between the the ":" to the slogan.
(note: the websites has been indexed by google crawler at least 4 times so I find it hard to believe it can be the reason) I've keep on looking and found out that there's another page in that website with the exact same title
but when I'm looking for it in google , it shows the title as it really is , with pipe. ("|").
(Screenshot: http://i.imgur.com/dtsbZV2.gif) Have you ever encountered something like that?
Can it be that the duplicated title cause that weird "replacement"? Thanks in advance,
Kadel0 -
Is it a bad idea to have a "press" page and link to press mentions of our company?
We've recently been getting quite a bit of press. Would it be wise to create a "press" page and link to mentions of us or would this devalue the links on the press pages as Google may think they reciprocal?
Intermediate & Advanced SEO | | JenniferDacosta0 -
Removing Dynamic "noindex" URL's from Index
6 months ago my clients site was overhauled and the user generated searches had an index tag on them. I switched that to noindex but didn't get it fast enough to avoid being 100's of pages indexed in Google. It's been months since switching to the noindex tag and the pages are still indexed. What would you recommend? Google crawls my site daily - but never the pages that I want removed from the index. I am trying to avoid submitting hundreds of these dynamic URL's to the removal tool in webmaster tools. Suggestions?
Intermediate & Advanced SEO | | BeTheBoss0 -
Will having image lightbox with content on a web page SEO friendly?
This website is done in CMS. Will having lightbox pop up with content be SEO friendly? If you go to the web page and click on the images at the bottom of the page. There are lightbox that will display information. Will these lightbox content information be crawl by Google? Will it be consider as content for the url http://jennlee.com/portfolio/bran.. Thanks, John
Intermediate & Advanced SEO | | VizionSEO990 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1