"noindex, follow" or "robots.txt" for thin content pages
-
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great.
I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
-
trung.ngo - check out this article I posted http://www.blindfiveyearold.com/crawl-optimization
that's where I got my "inspiration" from to consider using robots.txt instead...
-
I am thinking if I exclude more thin pages from being crawled (robots.txt) that may be better than my current "noindex, follow" - the thin pages are already "noindex, follow".
You are saying "unless there's evidence that the pages are taking up too much of the crawl bandwidth, it doesn't seem like too much of an issue to me." - but how would I know this? Fair to assume for a website with 5,000 pages this is probably not an issue?
I am concerned with the "noindex, follow" Google may think "ahh, we have seen all this stuff before. Thanks for keeping out of our index, but we are still going to devalue your original content indexed pages because we crawl and see all this thin stuff." I am thinking with the robots.txt it would potentially be a stronger signal that could help my indexed pages. Or you think it is a minor and probably not relevant?
-
Hello there,
Have you had any duplicate content or crawling issues in the past or is this more of a preventative measure? If the pages, as you put it, "would not generate relevant search traffic", then I would argue that it'd make sense to "noindex, follow" based on the assumption that the pages are not currently driving search traffic, and have no real potential to contribute significantly to brand discovery via a search engine in the future.
I wouldn't necessarily say that Google crawling your page more frequently would automatically give you a boost in rankings; it's more associated with whether or not they're crawling pages frequently enough to index updates to the pages. So unless there's evidence that the pages are taking up too much of the crawl bandwidth, it doesn't seem like too much of an issue to me.
All of this to say, take a look at the data to see if a real problem exists--whether crawl resources or duplicate content--before doing anything drastic. And, of course, also understand what you'll be losing by making the updates. If you do choose to prevent crawling via robots.txt and are at all concerned with the duplicate/thin content aspect, remember to implement a noindex and confirm that the pages are removed from search results before disallowing in robots.txt--otherwise, they'll remain indexed.
-
Hi Keri, There are some good comments but none really answer this question and that is why I am trying to approach from different angles. Maybe you can shed some light on this:
AJ Kohn wrote this great article: http://www.blindfiveyearold.com/crawl-optimization - he talks about using robots.txt to exclude thin content in order to increase frequency with qhich indexed content gets crawled, supposedly helping rankings. In this great whiteboard Friday, Rand suggests using "noindex, follow" - http://moz.com/blog/handling-duplicate-content-across-large-numbers-of-urls.I am trying to get more light on this (people who have experience with this), but struggle to get answers.
-
I noticed you had similar questions at http://moz.com/community/q/unique-content-below-fold-better-move-above-fold and http://moz.com/community/q/risk-using-nofollow-tag with several answers each, including some that were marked as Good Answer. Did any of those answers help to answer your question?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
"Null" appearing as top keyword in "Content Keywords" under Google index in Google Search Console
Hi, "Null" is appearing as top keyword in Google search console > Google Index > Content Keywords for our site http://goo.gl/cKaQ4K . We do not use "null" as keyword on site. We are not able to find why Google is treating "null" as a keyword for our site. Is anyone facing such issue. Thanks & Regards
Intermediate & Advanced SEO | | vivekrathore0 -
Robots.txt, does it need preceding directory structure?
Do you need the entire preceding path in robots.txt for it to match? e.g: I know if i add Disallow: /fish to robots.txt it will block /fish
Intermediate & Advanced SEO | | Milian
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anything But would it block?: en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything (taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier! As basically I'm wanting to block many URL that have BTS- in such as: http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybob But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as: http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingy Thanks for listening0 -
NoIndexing Massive Pages all at once: Good or bad?
If you have a site with a few thousand high quality and authoritative pages, and tens of thousands with search results and tags pages with thin content, and noindex,follow the thin content pages all at once, will google see this is a good or bad thing? I am only trying to do what Google guidelines suggest, but since I have so many pages index on my site, will throwing the noindex tag on ~80% of thin content pages negatively impact my site?
Intermediate & Advanced SEO | | WebServiceConsulting.com0 -
Could large number of "not selected" pages cause a penalty?
My site was penalized for specific pages in the UK On July 28 (corresponding with a Panda update). I cleaned up my website and wrote to Google and they responded that "no manual spam actions had been taken". The only other thing I can think of is that we suffered an automatic penalty. I am having problems with my sitemap and it is indexing many error pages, empty pages, etc... According to our index status we have 2,679,794 not selected pages and 36,168 total indexed. Could this have been what caused the error? (If you have any articles to back up your answers that would be greatly appreciate) Thanks!
Intermediate & Advanced SEO | | theLotter0 -
Can I use a "no index, follow" command in a robot.txt file for a certain parameter on a domain?
I have a site that produces thousands of pages via file uploads. These pages are then linked to by users for others to download what they have uploaded. Naturally, the client has blocked the parameter which precedes these pages in an attempt to keep them from being indexed. What they did not consider, was they these pages are attracting hundreds of thousands of links that are not passing any authority to the main domain because they're being blocked in robots.txt Can I allow google to follow, but NOT index these pages via a robots.txt file --- or would this have to be done on a page by page basis?
Intermediate & Advanced SEO | | PapaRelevance0 -
Hidden Content with "clip"
Hi We're relaunching a site with a Drupal 7 CMS. Our web agency has hidden content on it and they say it's for Accessibility (I don't see the use myself, though). Since they ask for more cash in order to remove it, the management is unsure. So I wanted to check if anyone knows whether this could hurt us in search engines. There is a field in the HTML where you can skip to the main content: Skip to main content The corresponding CSS comes here: .element-invisible{position:absolute !important;clip:rect(1px 1px 1px 1px);clip:rect(1px,1px,1px,1px);} #skip-link a,#skip-link a:visited{position:absolute;display:block;left:0;top:-500px;width:1px;height:1px;overflow:hidden;text-align:center;background-color:#666;color:#fff;} The crucial point is that they're hiding the text "skip to main content", using clip:rect(1px 1px 1px 1px), which shrinks the text to one pixel. So IMO this is hiding content. How bad is it? PS: Hope the source code is sufficient. Ask me if you need more. Thx!
Intermediate & Advanced SEO | | zeepartner0 -
Why duplicate content for same page?
Hi, My SEOMOZ crawl diagnostic warn me about duplicate content. However, to me the content is not duplicated. For instance it would give me something like: (URLs/Internal Links/External Links/Page Authority/Linking Root Domains) http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110516 /1/1/31/2 http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110711 0/0/1/0 http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110811 0/0/1/0 http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110911 0/0/1/0 Why is this seen as duplicate content when it is only URL with campaign tracking codes to the same content? Do I need to clean this?Thanks for answer
Intermediate & Advanced SEO | | nuxeo0 -
Block all but one URL in a directory using robots.txt?
Is it possible to block all but one URL with robots.txt? for example domain.com/subfolder/example.html, if we block the /subfolder/ directory we want all URLs except for the exact match url domain.com/subfolder to be blocked.
Intermediate & Advanced SEO | | nicole.healthline0