Duplicate Page content | What to do?
-
Hello Guys,
I have some duplicate pages detected by MOZ. Most of the URL´s are from a registracion process for users, so the URL´s are all like this:
www.exemple.com/user/login?destination=node/125%23comment-form
What should I do? Add this to robot txt? If so how? Whats the command to add in Google Webmaster?
Thanks in advance!
Pedro Pereira
-
Hi Carly,
It needs to be done to each of the pages. In most cases, this is just a minor change to a single page template. Someone might tell you that you can add an entry to robots.txt to solve the problem, but that won't remove them from the index.
Looking at the links you provided, I'm not convinced you should deindex them all - as these are member profile pages which might have some value in terms of driving organic traffic and having unique content on them. That said I'm not party to how your site works, so this is just an observation.
Hope that helps,
George
-
Hi George,
I am having a similar issue with my site, and was looking for a quick clarification.
We have several "member" pages that have been created as a part of registration (thousands) and they are appearing as duplicate content. When you say add noindex and and a canonical, is this something that needs to be done to every individual page or is there something that can be done that would apply to the thousands of pages at once?
Here are a couple of examples of what the pages look like:
http://loyalty360.org/me/members/8003
http://loyalty360.org/me/members/4641
Thank you!
-
1. If you add just noindex, Google will crawl the page, drop it from the index but it will also crawl the links on that page and potentially index them too. It basically passes equity to links on the page.
2. If you add nofollow, noindex, Google will crawl the page, drop it from the index but it will not crawl the links on that page. So no equity will be passed to them. As already established, Google may still put these links in the index, but it will display the standard "blocked" message for the page description.
If the links are internal, there's no harm in them being followed unless you're opening up the crawl to expose tons of duplicate content that isn't canonicalised.
noindex is often used with nofollow, but sometimes this is simply due to a misunderstanding of what impact they each have.
George
-
Hello,
Thanks for your response. I have learn more which is great
My question is should I add a noindex only to that page or a noidex, nofolow?
Thanks!
-
Yes it's the worst possible scenario that they basically get trapped in SERPs. Google won't then crawl them until you allow the crawling, then set noindex (to remove from SERPS) and then add nofollow,noindex back on to keep them out of SERPs and to stop Google following any links on them.
Configuring URL parameters again is just a directive regarding the crawl and doesn't affect indexing status to the best of my knowledge.
In my experience, noindex is bulletproof but nofollow / robots.txt is very often misunderstood and can lead to a lot of problems as a result. Some SEOs think they can be clever in crafting the flow of PageRank through a site. The unsurprising reality is that Google just does what it wants.
George
-
Hi George,
Thanks for this, It's very interesting... the urls do appear in search results but their descriptions are blocked(!)
Did you try configuring URL parameters in WMT as a solution?
-
Hi Rafal,
The key part of that statement is "we might still find and index information about disallowed URLs...". If you read the next sentence it says: "As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results".
If you look at moz.com/robots.txt you'll see an entry for:
Disallow: /pages/search_results*
But if you search this on Google:
site:moz.com/pages/search_results
You'll find there are 20 results in the index.
I used to agree with you, until I found out the hard way that if Google finds a link, regardless of whether it's in robots.txt or not it can put it in the index and it will remain there until you remove the nofollow restriction and noindex it, or remove it from the index using webmaster tools.
George
-
George,
I went to check with Google to make sure I am correct and I am!
"While Google won't crawl or index the content blocked by
robots.txt
, we might still find and index information about disallowed URLs from other places on the web." Source: https://support.google.com/webmasters/answer/6062608?hl=enYes, he can fix these problems on page but disallowing it in robots will work fine too!
-
Just adding this to robots.txt will not stop the pages being indexed:
Disallow: /*login?
It just means Google won't crawl the links on that page.
I would do one of the following:
1. Add noindex to the page. PR will still be passed to the page but they will no longer appear in SERPs.
2. Add a canonical on the page to: "www.exemple.com/user/login"
You're never going to try and get these pages to rank, so although it's worth fixing I wouldn't lose too much sleep on the impact of having duplicate content on registration pages (unless there are hundreds of them!).
Regards,
George
-
In GWT: Crawl=> URL Parameters => Configure URL Parameters => Add Parameter
Make sure you know what you are doing as it's easy to mess up and have BIG issues.
-
Add this line to your robots.txt to prevent google from indexing these pages:
Disallow: /*login?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ratings pages are Duplicate Content
This brought up another question. should the review page (which now has a canonical to the item page) be Index,follow? My item review pages are showing up with Duplicate Content errors in MOZ. Here are two examples http://www.americanmusical.com/ItemReview--i-HAM-SK1-LIST http://www.americanmusical.com/ItemReview--i-MAC-203680902-LIST is the problem that the pages contain the same code and questions with very little customer created info?
On-Page Optimization | | dianeb1520 -
Does pages with same products but with different orders count as duplication?
Let's say I got an e-commerce website. In that website I have 3 different pages: New products: display paged products order by created time descendingly Hot products: display paged products order by viewed Random products: display paged products randomly
On-Page Optimization | | vuquangchien
So are those 3 pages should be considered duplicated content? Should I canonicalize them to New product page (with paging), or should I create a new page without paging and point all of 3 pages above to that page (this page doesn't make sense from user experience but look good for crawler)?0 -
Creative ways to dramatically increase content on ecommerce category pages?
I need to signficantly boost the content on the category pages on my ecommerce website. Currently, they're pretty thin, with some only having approx 50 words of unique content. In the past, I've intentionally kept the content on these pages quite light, to keep the aesthetic a certain way. It's a fashion-based site, so it's very much about the visual. However, with the introduction of Panda, I need to change this mindset. But, there must be slightly more creative ways to boost the content to stop the pages looking too text heavy. I'm not talking hidden text or anything, but ways to break it up in different blocks on the page to make it look natural/relevant, while keeping it looking great. Anyone have any good ideas? Or, any links to ecommerce sites that have employed brilliant methods?
On-Page Optimization | | Coraltoes770 -
Add content as blog post or to product pages?
Hi, We have around 40 products which we can produce plenty of in-depth and detailed "how to"-type pieces of content for. Our current plan is to produce a "How to make" style post for each as a long blog post, then link that to the product page. There's probably half a dozen or more of these kind of blog posts that we could do for each product. The reason why we planned on doing it like this is that it would give us plenty of extra pages (blog posts) on their own URL which can be indexed and rank for long tail keywords, but also that we can mention these posts in our newsletter. It'd give people a new page full of specific content that they can read instead of us having to say "Hey! We've updated our product page for X!", which seems a little pointless. Most of the products we sell don't get very many searches themselves; Most get a couple dozen and the odd few get 100-300 each, while one gets more than 2,000 per month. The products don't get many searches as it's a relatively unknown niche when it comes to details, but searches for the "categories" these products are in are very well known (Some broad terms that cover the niche get more than 30,000+ searches a month in the UK and 100,000+ world wide) [Exact].
On-Page Optimization | | azu25
Regarding the one product with more than 2,000 searches; This keyword is both the name of the product and also a name for the category page. Many of our competitors have just one of these products, whereas we're one of the first to have more than 6 variations of this product, thus the category page is acting like our other product pages and the information you would usually find on our product pages, is on the category page for just this product. I'm still leaning towards creating each piece of content as it's own blog post which links to the product pages, while the product pages link to the relevant blog posts, but i'm starting to think that it may be be better to put all the content on the product pages themselves). The only problem with this is that it cuts out on more than 200 very indepth and long blog posts (which due to the amount of content, videos and potentially dozens of high resolution images may slow down the loading of the product pages). From what I can see, here are the pros and cons: Pro (For blog posts):
1. More than 200 blog posts (potentially 1000+ words each with dozens of photos and potentially a video)..
2. More pages to crawl, index and rank..
3. More pages to post on social media..
4. Able to comment about the posts in the newsletter - Sounds more unique than "We've just updated this product page"..
5. Commenting is available on blog posts, whereas it is not on product pages..
6. So much information could slow down the loading of product pages significantly..
7. Some products are very similar (ie, the same product but "better quality" - Difficult to explain without giving the niche away, which i'd prefer not to do ATM) and this would mean the same content isn't on multiple pages.
8. By my understanding, this would be better for Google Authorship/Publishership.. Con (Against blog posts. For extended product pages):
1. Customers have all information in one place and don't have to click on a "Related Blog posts" tab..
2. More content means better ability to rank for product related keywords (All but a few receive very few searches per month, but the niche is exploding at an amazing rate at the moment)..
3. Very little chance of a blog post out-ranking the related product page for keywords.. I've run out of ideas for the 'Con' side of things, but that's why I'd like opinions from someone here if possible. I'd really appreciate any and all input, Thanks! [EDIT]:
I should add that there will be a small "How to make" style section on product pages anyway, which covers the most common step by step instructions. In the content we planned for blog posts, we'd explore the regular method in greater detail and several other methods in good detail. Our products can be "made" in several different ways which each result in a unique end result (some people may prefer it one way than another, so we want to cover every possible method), effectively meaning that there's an almost unlimited amount of content we could write.
In fact, you could probably think of the blog posts as more of "an ultimate guide to X" instead of simply "How to X"...0 -
Duplicate Content?
Hi All, I have a new client site, a static site with navigation across the top, and down the left side. Two of the menus from the top navigation are replicated in the navigation structure on the left hand side. They have the exact same url structure, they are in fact the same exact page, listed on the site in two areas. My question is - is this a case of duplicate content, or, as they urls are the exact same, will they be seen as a single page? A canonical tag on one would be replicated on the other by the CMS - so do I leave it, or try to get them to re-structure removing one of the links? (I doubt they will do this as its a brand new site they just has developed). Many thanks!
On-Page Optimization | | Webrevolve0 -
High Volume Duplicate Title and Content Errors: Scale of 1-10 How bad is this?
I have 15k pages and 1.5K have duplicate title and content errors. the reason is I have a ring parent page with child pages for each of the size variations. On a scale of 1-10 how big an issue is this and does it need fixing?
On-Page Optimization | | Tippman0 -
How to fix duplicate page content and page titles?
Apologies in advance if this has already been answered (it probably has) - I'm just not seeing it. Is there a guide on here for how to fix the issues brought up by the crawler - specifically, things like duplicate page content, or duplicate page titles? A lot of these seem to have been created by wordpress.org combos that I didn't anticipate - i.e., category pages, author pages, etc. The crawler brings up the problems, but I don' t know where to start to go about fixing them. Also, any guide on best SEO practices or fixing optimization problems, specifically for wordpress.org blogs, would be greatly appreciated. Thanks!
On-Page Optimization | | prospects1 -
Percentage of duplicate content allowable
Can you have ANY duplicate content on a page or will the page get penalized by Google? For example if you used a paragraph of Wikipedia content for a definition/description of a medical term, but wrapped it in unique content is that OK or will that land you in the Google / Panda doghouse? If some level of duplicate content is allowable, is there a general rule of thumb ratio unique-to-duplicate content? thanks!
On-Page Optimization | | sportstvjobs0