Duplicate Content Report: Duplicate URLs being crawled with "++" at the end

webmethod

Hi,

In our Moz report over the past few weeks I've noticed some duplicate URLs appearing like the following:

Original (valid) URL:

http://www.paperstone.co.uk/cat_553-616_Office-Pins-Clips-and-Bands.aspx?filter_colour=Green

Duplicate URL:

http://www.paperstone.co.uk/cat_553-616_Office-Pins-Clips-and-Bands.aspx?filter_colour=Green**++**

These aren't appearing in Webmaster Tools, or in a Screaming Frog crawl of our site so I'm wondering if this is a bug with the Moz crawler? I realise that it could be resolved using a canonical reference, or performing a 301 from the duplicate to the canonical URL but I'd like to find out what's causing it and whether anyone else was experiencing the same problem.

Thanks,

George

ChiarynMiranda

So glad to help, George!

webmethod

Hi Chiaryn,

Thanks - you've been really helpful! I had assumed that as the referrer wasn't in the Web UI (per WMT), it wasn't available anywhere. I'd also assumed it was a copywriting issue and not a product data issue.

Need to readdress my assumptions

George

ChiarynMiranda

Hey George,

Thanks for writing in.

I looked into the pages with the ++ in the URL and it seems that they do actually exist on the site, so it isn't an issue with our crawler that is causing these in your crawl errors. For example, a link to the URL http://www.paperstone.co.uk/cat_553_Desktop-Essentials.aspx?filter_colour=Green++ can be found in the source code of the page http://www.paperstone.co.uk/cat_553_Desktop-Essentials.aspx here: http://screencast.com/t/HpHTlSs5gH8H

You can find the referral pages for the ++ pages on the site by downloading the Full Crawl Diagnostics CSV. In the first column, perform a search for the ++. When you find the correct row, look in the column labeled referrer, AM. This tells you the referral URL of the page where our crawlers first found the URLs that include ++. You can then visit this URL to find the links to those pages.

Since these URLs with the ++ do resolve with a 200 http status and they have the same code and content as the pages without the ++, our crawler will count them as duplicate content. I'm not certain why Screaming Frog and GWT are not find or reporting these pages; it may be that they parse the + signs in the URL differently than our crawler does.

As Keri and bishop23 mentioned, this is most likely not a major issue if GWT isn't reporting the errors, but we prefer to report the issues because we would rather be safe than sorry.

I hope this helps. Please let me know if you have any other questions.

Chiaryn

KeriMorgret

I'm not seeing an answer that jumps out at me for this one. For the immediate future, don't sweat it if you're not seeing it in GWT. This is assigned to our help desk, and we'll have someone from there investigate more and get back to you, though it might be a few days because of the Thanksgiving holiday (if you don't get an answer today, it may be Monday before we have a chance to respond).

bishop23

If they're not appearing on WMT than you should ignore unless it's an exact duplicated content, then delete

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Duplicate Content Report: Duplicate URLs being crawled with "++" at the end

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

"Our crawler was not able to access the robots.txt file on your site."

No crawl data anymore

What is the difference between the "Crawl Issues" report and the "Crawl Test" report?

Having trouble exporting reports. Keep getting error messages

How do I pull a report to show duplicate page content including links to the pages that are duplicates?

Tracking keywords and SERP Analysis Reports

Why is my dashboard reports not updating???

Why Isn't My Analytics Report Downloading?