Removing Duplicate Page Content

SteveMaguire

Since joining SEOMOZ four weeks ago I've been busy tweaking our site, a magento eCommerce store, and have successfully removed a significant portion of the errors.

Now I need to remove/hide duplicate pages from the search engines and I'm wondering what is the best way to attack this?

Can I solve this in one central location, or do I need to do something in the Google & Bing webmaster tools?

Here is a list of duplicate content

http://www.unitedbmwonline.com/?dir=asc&mode=grid&order=name http://www.unitedbmwonline.com/?dir=asc&mode=list&order=name
http://www.unitedbmwonline.com/?dir=asc&order=name http://www.unitedbmwonline.com/?dir=desc&mode=grid&order=name http://www.unitedbmwonline.com/?dir=desc&mode=list&order=name http://www.unitedbmwonline.com/?dir=desc&order=name http://www.unitedbmwonline.com/?mode=grid http://www.unitedbmwonline.com/?mode=list

Thanks in advance,

Steve

Flipmedia112

Thank you Cyrus I will certainly read the blog post and consider the noindex, nofollow on content with a canonical tag that differs from the current served page' uri.

I am still at little confused as to why the SEOMOZ crawl is highlighting duplicate pages when the canonical tag is present and pointing to the primary content.

Take the following example page for example:-

http://www.planksclothing.com/planks-classic-t-shirt-black-multi.html

Firstly the page has a canonical tag. There is no search on the site and product is viewed a root level without directory structure, which in a Magento instance is the common problem with duplicate content...

Currently at the time of writing SEOMOZ is updating my duplicate repor, so I can't find out what is the duplicate content. Maybe it is updating to say it is not

Thanks

Amendment: After reading the supplied blog post (http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world) I have learn't that the above page is just not different and probably is in the area of "Thin Content".

Cyrus-Shepard

There are many, many different types of duplicate content, and how you handle it depends on the specific type of duplicate content and your needs.

If you haven't already, I highly suggest you read Dr. Pete's excellent post on dupe content here: http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world

In your specific case it looks like you have multiple parameters serving the same basic content as your homepage. Is this correct?

In this case, you should set a canonical on every page pointing to the homepage. This also has the benefit of solving the errors in the SEOmoz PRO app.

It also sounds like you've addressed the issue in Google's Webmaster Tools. Unfortunately, Google doesn't let SEOmoz sync with Webmaster Tools, so anything you set there won't show up in the Web App.

Finally, don't forget about Bing Webmaster. They have similar parameter settings you can submit.

By the way, some SEOs would suggest putting meta robots "NOINDEX, FOLLOW" tags on those duplicate pages. While this may potentially send conflicting signals when coupled with the canonical tag, it is a potentially valid approach.

Hope this helps! Best of luck with your SEO.

Flipmedia112

This is exactly my current situation...

As a result of the SEOMOZ Duplicate content report I set about resolving these issues...

In the first instance I configured URL parameters via Google Webmaster Tools. It instantly occurred to me that whilst this fixes these potential duplicate content in Google this configuration does not affect other search engines and the work is unlikely to be reflected in future SEOMOZ crawls of the site.

I'm interested in creating a over arching method of removing the potential duplication caused via URL parameters required to paginate, sort and filter content. The majority of these URL parameters are standardized across web applications. But is it actually required?

In my case each Magento store uses the canonical tag correctly and has an updated robots.txt to restrict the crawling of areas of the site that should be excluded... In a sense this is the over arching method of removing potential duplicate content. So why is SEOMOZ reporting duplicate content?

I suppose the big question is... Is SEOMOZ crawling the site correctly, do these results reflect robots.txt and canonical tags?

SteveMaguire

Thank you for your thoughts.

As mentioned in my above response, canonical tags have already been configured for the site, it's just this home page that remains the issue.

SteveMaguire

Thanks for your response.

I looked in URL Parameters and see dir & mode are already defined.

Then I searched the http://www.unitedbmwonline.com page source for canonical links and none are defined, though I do have canonical tags setup for the rest of the site

Any other thoughts of how to remove these duplicates?

sprynewmedia

You can also tell Google to ignore certain query string variables through Webmaster Tools.

For instance, indicate that "dir" and "mode" have no impact on content.

Other SE's have simular controls.

john4math

This is why the canonical tag was invented, to solve duplicate content issues when URL parameters are involved. Set a canonical tag on all these pages to point towards the version of the page you want to appear in search results. As long as the pages are identical, or close to it, the search engines (most likely) will respect the canonical tag, and pass along the duplicate versions link juice to the page you're pointing to.

Here's some info: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html. If you Google "canonical tag", you'll find lots more!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Removing Duplicate Page Content

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

To remove or not remove a redirected page from index

Are feeds bad for duplicate content?

Http vs. https - duplicate content

Duplicate content issue

How does google treat dynamically generated content on a page?

Duplicate content in external domains

Duplicate content resulting from js redirect?

Removing a Page From Google index