Duplicate Page content | What to do?
-
Hello Guys,
I have some duplicate pages detected by MOZ. Most of the URL´s are from a registracion process for users, so the URL´s are all like this:
www.exemple.com/user/login?destination=node/125%23comment-form
What should I do? Add this to robot txt? If so how? Whats the command to add in Google Webmaster?
Thanks in advance!
Pedro Pereira
-
Hi Carly,
It needs to be done to each of the pages. In most cases, this is just a minor change to a single page template. Someone might tell you that you can add an entry to robots.txt to solve the problem, but that won't remove them from the index.
Looking at the links you provided, I'm not convinced you should deindex them all - as these are member profile pages which might have some value in terms of driving organic traffic and having unique content on them. That said I'm not party to how your site works, so this is just an observation.
Hope that helps,
George
-
Hi George,
I am having a similar issue with my site, and was looking for a quick clarification.
We have several "member" pages that have been created as a part of registration (thousands) and they are appearing as duplicate content. When you say add noindex and and a canonical, is this something that needs to be done to every individual page or is there something that can be done that would apply to the thousands of pages at once?
Here are a couple of examples of what the pages look like:
http://loyalty360.org/me/members/8003
http://loyalty360.org/me/members/4641
Thank you!
-
1. If you add just noindex, Google will crawl the page, drop it from the index but it will also crawl the links on that page and potentially index them too. It basically passes equity to links on the page.
2. If you add nofollow, noindex, Google will crawl the page, drop it from the index but it will not crawl the links on that page. So no equity will be passed to them. As already established, Google may still put these links in the index, but it will display the standard "blocked" message for the page description.
If the links are internal, there's no harm in them being followed unless you're opening up the crawl to expose tons of duplicate content that isn't canonicalised.
noindex is often used with nofollow, but sometimes this is simply due to a misunderstanding of what impact they each have.
George
-
Hello,
Thanks for your response. I have learn more which is great
My question is should I add a noindex only to that page or a noidex, nofolow?
Thanks!
-
Yes it's the worst possible scenario that they basically get trapped in SERPs. Google won't then crawl them until you allow the crawling, then set noindex (to remove from SERPS) and then add nofollow,noindex back on to keep them out of SERPs and to stop Google following any links on them.
Configuring URL parameters again is just a directive regarding the crawl and doesn't affect indexing status to the best of my knowledge.
In my experience, noindex is bulletproof but nofollow / robots.txt is very often misunderstood and can lead to a lot of problems as a result. Some SEOs think they can be clever in crafting the flow of PageRank through a site. The unsurprising reality is that Google just does what it wants.
George
-
Hi George,
Thanks for this, It's very interesting... the urls do appear in search results but their descriptions are blocked(!)
Did you try configuring URL parameters in WMT as a solution?
-
Hi Rafal,
The key part of that statement is "we might still find and index information about disallowed URLs...". If you read the next sentence it says: "As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results".
If you look at moz.com/robots.txt you'll see an entry for:
Disallow: /pages/search_results*
But if you search this on Google:
site:moz.com/pages/search_results
You'll find there are 20 results in the index.
I used to agree with you, until I found out the hard way that if Google finds a link, regardless of whether it's in robots.txt or not it can put it in the index and it will remain there until you remove the nofollow restriction and noindex it, or remove it from the index using webmaster tools.
George
-
George,
I went to check with Google to make sure I am correct and I am!
"While Google won't crawl or index the content blocked by
robots.txt
, we might still find and index information about disallowed URLs from other places on the web." Source: https://support.google.com/webmasters/answer/6062608?hl=enYes, he can fix these problems on page but disallowing it in robots will work fine too!
-
Just adding this to robots.txt will not stop the pages being indexed:
Disallow: /*login?
It just means Google won't crawl the links on that page.
I would do one of the following:
1. Add noindex to the page. PR will still be passed to the page but they will no longer appear in SERPs.
2. Add a canonical on the page to: "www.exemple.com/user/login"
You're never going to try and get these pages to rank, so although it's worth fixing I wouldn't lose too much sleep on the impact of having duplicate content on registration pages (unless there are hundreds of them!).
Regards,
George
-
In GWT: Crawl=> URL Parameters => Configure URL Parameters => Add Parameter
Make sure you know what you are doing as it's easy to mess up and have BIG issues.
-
Add this line to your robots.txt to prevent google from indexing these pages:
Disallow: /*login?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Will I have duplicate content on my own website?
Hello Moz community, We are an agency providing services to various industries, and among them the hair salon industry. On our website, we have our different service pages in the main menu, as usual. These service pages are general information and apply to any industry.We also have a page on the website that is only intended for the hair salon industry. On this page, we would like to link new service pages: they will be the same services as our “general” services, but specialized for hair salons. My questions relate to duplicate content: Do we have to make the new individual service pages for hair salons with completely different text, even though it’s the same service, in order to avoid having duplicate content? Can we just change a few words from the “general service” page to specifically target hair salons, and somehow avoid Google seeing it as duplicate content? Reminder that these pages will be internal links inside of the hair salon industry page. Thank you in advance for your answers, Gaël
On-Page Optimization | | Gael_Regnault0 -
Duplicate Content for Event Pages
Hi Folks, I have event pages for specific training courses running on certain dates, the problem I have is that MOZ indicates that I have 1040 duplicate content issues because I'm serving pages like this https://purplegriffon.com/event/2521/mop-practitioner I'm not sure how best to go about resolving this as, of course, although each event is unique in terms of it's start date, the courses and locations could be identical. Will Google penalise us for these types of pages, or will they even index them? Should I add a canonical link to the head of the document pointing to the related course page such as https://purplegriffon.com/courses/project-management/mop-management-of-portfolios/mop-practitioner. Will this solve the issue? I'm a little stuck on what to do for the best. Any advice would be much appreciated. Thanks. Kind Regards Gareth Daine
On-Page Optimization | | PurpleGriffon0 -
Duplication in landing page
This is driving me mad, I have a site that for some reason google and moz pick up the landing page as a duplicate. They see "mysite/" and "mysite/index.html" as two different pages and giving me warnings for duplication. I have no 301 included at this time and I am using foundation as the base. This is occurring both on a localhost test bed and live....... anyone got an idea how to correct.
On-Page Optimization | | AndyBirtles0 -
Should I consolidate pages to prevent "thin content"
I have a site based on downloadable images which tend to slow a site. In order to increase speed I divided certain pages up so that there will be less images on each page such as here: http://www.kindergartenteacherresources.com/2011/09/23/spongebob-alphabet-worksheets-uppercase-letters-a-h/ http://www.kindergartenteacherresources.com/2011/09/23/spongebob-alphabet-worksheets-uppercase-letters-i-q/ http://www.kindergartenteacherresources.com/2011/09/23/spongebob-alphabet-worksheets-uppercase-letters-r-z/ The problem is that I now have potential duplicate content and thin content. Should I consolidate them and put all of the content from the 3 pages on one page? or maybe keep them as they are but add a rel previous / next tag? or any other suggestion to prevent a duplicate/thin content penalty while not slowing down the site too much?
On-Page Optimization | | JillB20130 -
New Client Wants to Keep Duplicate Content Targeting Different Cities
We've got a new client who has about 300 pages on their website that are the same except the cities that are being targeted. Thus far the website has not been affected by penguin or panda updates, and the client wants to keep the pages because they are bringing in a lot of traffic for those cities. We are concerned about duplicate content penalties; do you think we should get rid of these pages or keep them?
On-Page Optimization | | waqid0 -
Removing syndicated duplicate content from website - what steps do I need to take to make sure Google knows?
Hey all, So I've made the decision to cancel the service that provides my blog with regular content / posts, since it seems that having duplicate content on my site isn't doing me any favors. So I'm on a Wordpress system - I'll be exporting the posts so I have them for reference, and then deleting the posts. There are like 150 or so - What steps should I take to ensure that Google learns of the changes I've made? Or do I not need to do anything at all in that department? Also - I guess I've assumed that the best decision would be to 'remove' the content from my blog. IS that the best way to go? Or should I leave it in place and start adding unique content? (my guess is that I need to remove it...) Thanks for your help, Kurt
On-Page Optimization | | KurtBullock0 -
Duplicated Page Content
I have encountered this weird problem about duplicate page content. My site got 3 duplicate content similar on the link structure below. If I'm going to use rel canonical does it help to resolve the duplication problem? Thanks http://www.sample.com http://www.sample.com/ http://www.sample.com/index.php
On-Page Optimization | | mattvectorbpo0 -
Duplicate content on homepage?
Hi I have just created a new campaign and it states that I have duplicate page content which would affect search rankings. Basically it is counting my site www.mydomain.com and www.mydomain.com/index.php as two seperate pages. How can I make it so that only www.mydomain.com is visible reducing the duplicate content issue? Many Thanks
On-Page Optimization | | idv0