Strange duplicate content issue

CreativeChoices

Hi there,

SEOmoz crawler has identified a set of duplicate content that we are struggling to resolve.

For example, the crawler picked up that this page www. creative - choices.co.uk/industry-insight/article/Advice-for-a-freelance-career is a duplicate of this page www. creative - choices.co.uk/develop-your-career/article/Advice-for-a-freelance-career.

The latter page's content is the original and can be found in the CMS admin area whilst the former page is the duplicate and has no entry in the CMS. So we don't know where to begin if the "duplicate" page doesn't exist in the CMS.

The crawler states that this page www. creative-choices.co.uk/industry-insight/inside/creative-writing is the referrer page. Looking at it, only the original page's link is showing on the referrer page, so how did the crawler get to the duplicate page?

ReneReinholdt

it could be any one out of the following 3 scenarios.

1: The page in question was moved at some point and since the CMS still accepts the old URL, when google re-visits the old URL it still finds it. So in this scenario it will find both the old URL and the new URL and index both.

2: google hasn't revisited the page for a long while but it is still in it's index, even though it would get a 301 by the CMS when it visits the page. Can be easily fixed by going to webmaster tools and ask it to remove it from the index.

3: there are still links to the old URL either on site or off site and since the CMS doesn't 301 the oid page it will index it again with a new URL.

4:the page still exists in the CMS because of some strange setting or equivalent in the CMS.

as mentioned before the easy fix is to use a robots.txt and deny access to the page and ask google to remove it from it's index. the better fix is to find the problem in the CMS and solve it. a midway fix could be to 301 it in the .htaccess or equvilent on an ISS server.

hope it helped

CreativeChoices

Thanks René,

I updated my earlier reply with a question that i think you missed.

The list isn't growing, which is a good thing but how is it possible for the crawler to pick up the duplicate page urls when the the referrer page has the correct urls?

ReneReinholdt

I have come across this sort of issue a gazillion times + infinite.. almost all of our clients seem to have dub cont problems of one kind or another

but often it is different things that is the problem. But I'm afraid that I can't point you in the right direction, since I have no experience with your CMS. To be able to do that I would need to have access to the site itself. (since I don't know the CMS.) My advice would be to get a developer on the issue or to grab hold of the support for the CMS (if any.)

CreativeChoices

Hi René,

Thanks for your reply and suggestions. It could well be CMS remembering old urls as this list isn't growing. But is the crawler able to pickup the old urls when the referrer page has the correct urls?

We are on Expression Engine. Have you come across this sort of issue before?

ReneReinholdt

Well it kinda have to be in the CMS, since it has 2 different paths.. But you could fix it by going to the .htaccess (if you have access and redirect it to the right URL and make a robots.txt and disallow access to the page.

if the page has been moved to a new location theres a good chance that the CMS is setup to remember the old URL and show the page. This is indeed a problem, but a potential problem with the CMS.

Go to webmaster tools and ask them to delete the dublicate from thier index.

You specific problem could originate from a ton of different problems and it is kinda har to fix without direct access to everything. What CMS is it your using?

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Strange duplicate content issue

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Recurring events and duplicate content

Is this going to be seen by google as duplicate content

Duplicate Page Content

How to protect against duplicate content?

Multiple Sites Duplicate Content Best Practice

Standard Responses Causing Duplication Issues

Canonical Link for Duplicate Content

Duplicate content connundrum