Duplicate Content Report: Duplicate URLs being crawled with "++" at the end
-
Hi,
In our Moz report over the past few weeks I've noticed some duplicate URLs appearing like the following:
Original (valid) URL:
http://www.paperstone.co.uk/cat_553-616_Office-Pins-Clips-and-Bands.aspx?filter_colour=Green
Duplicate URL:
http://www.paperstone.co.uk/cat_553-616_Office-Pins-Clips-and-Bands.aspx?filter_colour=Green**++**
These aren't appearing in Webmaster Tools, or in a Screaming Frog crawl of our site so I'm wondering if this is a bug with the Moz crawler? I realise that it could be resolved using a canonical reference, or performing a 301 from the duplicate to the canonical URL but I'd like to find out what's causing it and whether anyone else was experiencing the same problem.
Thanks,
George
-
So glad to help, George!
-
Hi Chiaryn,
Thanks - you've been really helpful! I had assumed that as the referrer wasn't in the Web UI (per WMT), it wasn't available anywhere. I'd also assumed it was a copywriting issue and not a product data issue.
Need to readdress my assumptions
George
-
Hey George,
Thanks for writing in.
I looked into the pages with the ++ in the URL and it seems that they do actually exist on the site, so it isn't an issue with our crawler that is causing these in your crawl errors. For example, a link to the URL http://www.paperstone.co.uk/cat_553_Desktop-Essentials.aspx?filter_colour=Green++ can be found in the source code of the page http://www.paperstone.co.uk/cat_553_Desktop-Essentials.aspx here: http://screencast.com/t/HpHTlSs5gH8H
You can find the referral pages for the ++ pages on the site by downloading the Full Crawl Diagnostics CSV. In the first column, perform a search for the ++. When you find the correct row, look in the column labeled referrer, AM. This tells you the referral URL of the page where our crawlers first found the URLs that include ++. You can then visit this URL to find the links to those pages.
Since these URLs with the ++ do resolve with a 200 http status and they have the same code and content as the pages without the ++, our crawler will count them as duplicate content. I'm not certain why Screaming Frog and GWT are not find or reporting these pages; it may be that they parse the + signs in the URL differently than our crawler does.
As Keri and bishop23 mentioned, this is most likely not a major issue if GWT isn't reporting the errors, but we prefer to report the issues because we would rather be safe than sorry.
I hope this helps. Please let me know if you have any other questions.
Chiaryn
-
I'm not seeing an answer that jumps out at me for this one. For the immediate future, don't sweat it if you're not seeing it in GWT. This is assigned to our help desk, and we'll have someone from there investigate more and get back to you, though it might be a few days because of the Thanksgiving holiday (if you don't get an answer today, it may be Monday before we have a chance to respond).
-
If they're not appearing on WMT than you should ignore unless it's an exact duplicated content, then delete
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Solved Why is MOZ crawl taking so long?
I began my site crawl on November 3rd and now it is November 7th and it is still "in progress". Why is this happening?
Product Support | | CarisaS_Wenda0 -
Site down: "high CPU usage is due to the large traffic generated from Moz
My client's site is down and the web host gives says that Moz is the reason why. "The fact that your site was limited is because the traffic generated by Moz. This is why I have suggested to block their IP addresses." Now we have unblocked the IP addresses and as you can see your site was limited again. And again the : Code: 54.224.139.99 - - [26/Oct/2017:16:00:43 -0500] "GET /amp/@Smile_Design_/@Smile_Design_/@Smile_Design_/page/2/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/page/2/@Smile_Design_/page/2/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/ HTTP/1.0" 200 58551 "-" "rogerbot/1.1 (http://moz.com/help/guides/search-overview/crawl-diagnostics#more-help, rogerbot-crawler@moz.com)"
Product Support | | jessential
54.224.139.99 - - [26/Oct/2017:16:01:02 -0500] "GET /amp/@Smile_Design_/@Smile_Design_/@Smile_Design_/page/2/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/page/2/@Smile_Design_/page/2/@Smile_Design_/@Smile_Design_/@Smile_Design_/page/2/ HTTP/1.0" 200 58521 "-" "rogerbot/1.1 (http://moz.com/help/guides/search-overview/crawl-diagnostics#more-help, rogerbot-crawler@moz.com)"
54.224.139.99 - - [26/Oct/2017:16:01:16 -0500] "GET /amp/@Smile_Design_/@Smile_Design_/@Smile_Design_/page/2/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/page/2/@Smile_Design_/page/2/@Smile_Design_/@Smile_Design_/page/2/@Smile_Design_ HTTP/1.0" 301 - "-" "rogerbot/1.1 (http://moz.com/help/guides/search-overview/crawl-diagnostics#more-help, rogerbot-crawler@moz.com)"
54.224.139.99 - - [26/Oct/2017:16:01:30 -0500] "GET /amp/@Smile_Design_/@Smile_Design_/@Smile_Design_/page/2/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/@Smile_Design_/page/2/@Smile_Design_/page/2/@Smile_Design_/@Smile_Design_/page/2/@Smile_Design_/ HTTP/1.0" 200 58528 "-" "rogerbot/1.1 (http://moz.com/help/guides/search-overview/crawl-diagnostics#more-help, rogerbot-crawler@moz.com)" "Please check with Moz if they can reduce the rate your site is crawled. Only after you confirm that the rate is decreased we will remove the limit imposed on your account." NOTE: Can you resolve this? NOTE: I have achieved the campaign at this time in an effort to keep the site live.0 -
I have suddenly got a lot of duplicate page title errors in Moz, please can you advise what to do ?
my site used to be http and i have now converted to https. my site is www.paulsummerfieldphotography.co.uk, please advise how to solve the duplicate page title errors ?
Product Support | | Paul_Ward0 -
Rogerbot not crawling our site
Has anyone else had issues with Roger crawling your site in the last few weeks? It shows only 2 pages crawled. I was able to crawl the site using Screaming Frog with no problem and we are not specifically blocking Roger via robots.txt or any other method. Has anyone encountered this issue? Any suggestions?
Product Support | | cckapow0 -
MOZ not accepting our recent changes it still showing us old Crawl Diagnostics report
Hi, 507 Temporary Redirect We made changes for 302 redirects which are listed in crawl diagnostics report. Now "Compare" and "Wishlist" links are already removed from our source code. All required changes are made but still your report listed Compare and wishlist links. We made changes on Friday (14/8/2015) and waiting for new updated report. Link: http://www.stopwobble.com/ Please let us know what is the exact issue. So that we can fix it.
Product Support | | torbett0 -
SEO Moz PRO app Isn't Crawling Anymore
Hi, We find the SEO Moz PRO app a great tool for us. What is the reason that it is not re-crawling the websites included in our campaigns anymore?
Product Support | | solution.advisor0 -
Brand reports
Hi! I have upgraded to Medium account but I don't find where is the brand report option. Please could you help me? Thanks in advance! Lucy
Product Support | | 3guisantes0 -
Moz Ranking report help
Any way to show the ranking changes comparing a old dates ranking report- like 4 months ago vs the latest update?
Product Support | | DavidKonigsberg0