How to remove Duplicate content due to url parameters from SEOMoz Crawl Diagnostics

dfeg

Hello all

I'm currently getting back over 8000 crawl errors for duplicate content pages . Its a joomla site with virtuemart and 95% of the errors are for parameters in the url that the customer can use to filter products.

Google is handling them fine under webmaster tools parameters but its pretty hard to find the other duplicate content issues in SEOMoz with all of these in the way.

All of the problem parameters start with

?product_type_

Should i try and use the robot.txt to stop them from being crawled and if so what would be the best way to include them in the robot.txt

Any help greatly appreciated.

dfeg

Hi Tom

It took a while but I got there in the end. I was using joomla 1.5 and I downloaded a component called "tag meta" which allows you to insert tags including the canonical tag on specific urls or more importantly urls which begin in a certain way. Now how you use it depends on how your sef urls are set up or what sef component you are using but you can put a canonical tag on every url in a section that has view-all-products in it.

So in one of my examples I put a canonical tag pointing to /maternity-tops.html (my main category page for that section) on every url that began with /maternity-tops/view-all-products

I hope this if of help to you. It takes a bit of playing around with but it worked for me. The component also has fairly good documentation.

Regards

Damien

tdorseyjr

Damien,

Are you able to explain how you were able to do this within virtuemart?

Thanks

Tom

dfeg

So leave the 5 pages of dresses as they are because they are all original but have the canonical tag on all of the filter parameters pointing to Page 1 of dresses.

Thank you for your help Alan

AlanMosley

It should be on all versions of the page, all pointing to the one version.

Search engines will then see all as one page

dfeg

Hi Alan

Thanks for getting back to me so fast. I'm slightly confused on this so an example might help One of the pages is http://www.funkybumpmaternity.com/Maternity-Dresses.html.

There are 5 pages of dresses with options on the left allowing you to narrow that down by color, brand, occasion and style. Every time you select an option on combination of options on the left for example red it will generate a page with only red dresses and a url of http://www.funkybumpmaternity.com/Maternity-Dresses/View-all-products.html?product_type_1_Colour[0]=Red&product_type_1_Colour_comp=find_in_set_any&product_type_id=1

The options available are huge which I believe is why i'm getting so many duplicate content content issues on SEOMoz pro. Google is handling the parameters fine.

How should I implement the canonical tag? Should I have a tag on all filter pages referencing page 1 of the dresses? Should pages 2-5 have the tag on them? If so would this mean that the dresses on these pages would not be indexed?

AlanMosley

This sounds more like a case for a canonical tag,

dont exculed with robots.txt this is akin to cutting off your arm, because you have a spliter in your finger.

When you exclude use robots, link juce passing though links to these pages is lost.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How to remove Duplicate content due to url parameters from SEOMoz Crawl Diagnostics

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Duplicate Content

Crawl Diagnostics saids a page is linking but I can't find the link on the page.

Does SEOmoz have a Keyword Research tool similar to, say, the Google AdWords tool or the WebCEO Keyword Research Tool? And where might that be? (Sorry, I'm very new to SEOmoz Pro.)

Crawl Stats Have Dissapeared

Where does the crawler find the urls?

I have a Rel Canonical "notice" in my Crawl Diagnostics report. I'm presuming that means that the spider has detected a rel canonical tag and it is working as opposed to warning about an issue, is this correct?

How long does the seomoz crawl take?

We were unable to grade that page. We received a response code of 301\. URL content not parseable