How to remove Duplicate content due to url parameters from SEOMoz Crawl Diagnostics
-
Hello all
I'm currently getting back over 8000 crawl errors for duplicate content pages . Its a joomla site with virtuemart and 95% of the errors are for parameters in the url that the customer can use to filter products.
Google is handling them fine under webmaster tools parameters but its pretty hard to find the other duplicate content issues in SEOMoz with all of these in the way.
All of the problem parameters start with
?product_type_
Should i try and use the robot.txt to stop them from being crawled and if so what would be the best way to include them in the robot.txt
Any help greatly appreciated.
-
Hi Tom
It took a while but I got there in the end. I was using joomla 1.5 and I downloaded a component called "tag meta" which allows you to insert tags including the canonical tag on specific urls or more importantly urls which begin in a certain way. Now how you use it depends on how your sef urls are set up or what sef component you are using but you can put a canonical tag on every url in a section that has view-all-products in it.
So in one of my examples I put a canonical tag pointing to /maternity-tops.html (my main category page for that section) on every url that began with /maternity-tops/view-all-products
I hope this if of help to you. It takes a bit of playing around with but it worked for me. The component also has fairly good documentation.
Regards
Damien
-
Damien,
Are you able to explain how you were able to do this within virtuemart?
Thanks
Tom
-
So leave the 5 pages of dresses as they are because they are all original but have the canonical tag on all of the filter parameters pointing to Page 1 of dresses.
Thank you for your help Alan
-
It should be on all versions of the page, all pointing to the one version.
Search engines will then see all as one page
-
Hi Alan
Thanks for getting back to me so fast. I'm slightly confused on this so an example might help One of the pages is http://www.funkybumpmaternity.com/Maternity-Dresses.html.
There are 5 pages of dresses with options on the left allowing you to narrow that down by color, brand, occasion and style. Every time you select an option on combination of options on the left for example red it will generate a page with only red dresses and a url of http://www.funkybumpmaternity.com/Maternity-Dresses/View-all-products.html?product_type_1_Colour[0]=Red&product_type_1_Colour_comp=find_in_set_any&product_type_id=1
The options available are huge which I believe is why i'm getting so many duplicate content content issues on SEOMoz pro. Google is handling the parameters fine.
How should I implement the canonical tag? Should I have a tag on all filter pages referencing page 1 of the dresses? Should pages 2-5 have the tag on them? If so would this mean that the dresses on these pages would not be indexed?
-
This sounds more like a case for a canonical tag,
dont exculed with robots.txt this is akin to cutting off your arm, because you have a spliter in your finger.
When you exclude use robots, link juce passing though links to these pages is lost.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Seomoz legacy pages?
Hello, I am finding that I miss several of the old seomoz sections. The legacy tools in particular like the visual website comparison. Where is that now? Also, where is the ongoing list of the top 100 sites? So much was lost in the shift to MOZ, I hope some of the good old stuff is still available. Thank you, Nolan
Moz Pro | | QuietProgress0 -
Duplicate page report
We ran a CSV spreadsheet of our crawl diagnostics related to duplicate URLS' after waiting 5 days with no response to how Rogerbot can be made to filter. My IT lead tells me he thinks the label on the spreadsheet is showing “duplicate URLs”, and that is – literally – what the spreadsheet is showing. It thinks that a database ID number is the only valid part of a URL. To replicate: Just filter the spreadsheet for any number that you see on the page. For example, filtering for 1793 gives us the following result: | URL http://truthbook.com/faq/dsp_viewFAQ.cfm?faqID=1793 http://truthbook.com/index.cfm?linkID=1793 http://truthbook.com/index.cfm?linkID=1793&pf=true http://www.truthbook.com/blogs/dsp_viewBlogEntry.cfm?blogentryID=1793 http://www.truthbook.com/index.cfm?linkID=1793 | There are a couple of problems with the above: 1. It gives the www result, as well as the non-www result. 2. It is seeing the print version as a duplicate (&pf=true) but these are blocked from Google via the noindex header tag. 3. It thinks that different sections of the website with the same ID number the same thing (faq / blogs / pages) In short: this particular report tell us nothing at all. I am trying to get a perspective from someone at SEOMoz to determine if he is reading the result correctly or there is something he is missing? Please help. Jim
Moz Pro | | jimmyzig0 -
How can I correct this massive duplicate content problem?
I just updated a clients website which resulted in about 6000 duplicate page content errors. The way I set up my clients new website is I created a sub folder calles blog and installed wordpress on that folder. So when you go to suncoastlaw.com your taken to an html website, but if you click on the blog link in the nav, your taken to the to blog subfolder. The problem I'm having is that the url's seem to be repeating them selves. So for example, if you type in in http://suncoastlaw.com/blog/aboutus.htm/aboutus.htm/aboutus.htm/aboutus.htm/ that somehow is a legitimate url and is being considered duplicate content of of http://suncoastlaw.com/aboutus.htm/. This repeating url only seems to be a problem when the blog/ is in the url. Any ideas as to how I can fix this?
Moz Pro | | ScottMcPherson0 -
How can I see the URL's affected in Seomoz Crawl when Notices increase
Hi, When Seomoz crawled my site, my notices increased by 255. How can I only these affected urls ? thanks Sarah
Moz Pro | | SarahCollins0 -
Crawl Diagnostics - Canonical Question
On one of my sites I have 61 notices for Rel Canonical. Is it bad to have these or is this just something that's informative?
Moz Pro | | kadesmith0 -
Campaign Crawl Report
Hello, Just a quicky, is there anyway I can do a crawl report for something in a campaign so I can compare the changes? I know you can do a separate crawl test, but it wont show the differences,and the next crawl date isnt untill the 28th.
Moz Pro | | Prestige-SEO0 -
"Duplicate Page Title" and "Duplicate Page Content" issue
Hi I am having an issue with my site showing duplicate page title and content issues for www.domain.com and www.domain.com/ Is the trailing slash really an issue? Can someone help me with a mod_rewrite rule to sort this please? Thanks,
Moz Pro | | JoeBrewer
Joe0 -
My crawl diagnostic is showing 2 duplicate content and titles.
First of all Hi - My name is Jason and I've just joined - How you all doing? My 1st question then: When I view where these errors are occurring it says www mydomain co uk and www mydomain co uk/index.html Isn't this the same page? I have looked into my root folder and only index.html exists.
Moz Pro | | JasonHegarty0