What is Considered Duplicate Content by Crawlers?
-
I am asking this because I have a couple of site audit tools that I use to crawl a site I work on every week and they are showing duplicate content issues (which I know there is a lot on this site) but some of what is flagged as duplicate content makes no sense.
For example, the following URL's were grouped together as duplicate content:
|
https://www.firefold.com/contact-us
|
| https://www.firefold.com/sale |
|
|
How are these pages duplicate content? I am confused on what site audit tools are considering duplicate content.
Just FYI, this is data from Moz crawl diagnostics but SEMrush site auditor is giving me the same type of data.
Any help would be greatly appreciated.
Ryan
-
Yea I just started working on this site. I haven't used Moz Analytics much so just wanting to see how their crawler crawls pages.
And yes I agree, there are a lot of BIG BIG BIG issues with this site.
I got a large workload over the next few months haha.
-
I would add that there's is no text on any of those three pages - any "text" one would see there is actually just embedded in an image - which is a huge issue for a number of reasons:
- Search engines see that there's no text - a big no-no.
- You're getting practically no SEO value from the content that would be there, even if there isn't much.
- It's heavier this way - which makes load times slower.
I want to clarify that there are many, bigger issues with these pages - but as your question concerns only duplicate content, I'll leave all of that out for the time being. To summarize, Google, Yahoo, and Bing are just seeing some duplicate banners, sidebars, etc. and then some images in the body of your pages. Hence, duplicate content.
-
Thanks for that information.
It makes sense looking at the data and pages from that perspective.
-
Hi Ryan!
Our crawler will flag pages that have at least 90% similarity in the entire source code of the site so not just the body.
The way you want to interpret the report is the contact-us page has 35 duplicates, so "gabe" and "sale" are not dupes of each other in this section but are only each a duplicate of "contact-us". Those URLs might appear with their own duplicates of the same pages further down in the report.
While on the front end the pages do not appear to be similar. The issue is likely with the amount of javascript code on those pages.
Our crawler cannot read javascript so we are likely only able to see the template of the page. Other search tools are probably seeing the same thing as it returns 79% similarity using this tool: http://www.freebulkseotools.com/similar-page-checker-tool.php
I can't provide much insight from a dev perspective but hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Content suggestion
Hello, What is the secret sauce of your content suggestion. How do you consider that a topic is covered. Please explain. Thank you, Flvien
Moz Bar | | seoanalytics0 -
Are there any tricks for checking duplicate content?
Hello all, My MOZ weekly scan keeps coming back with indication for duplicate content on pages that don't have that much alike. I feel like I must be missing something. Is there any place I can plug in two urls so that it would tell me what the similarities are and figure out how to make it less of a duplicate content? https://www.stage32.com/happy-writers/pitch-sessions/Pitch-Amar-Hansen-Saturday-October-29th-2016
Moz Bar | | stage32
and
https://www.stage32.com/happy-writers/pitch-sessions/Pitch-Will-Raynor-Wednesday-November-16th-2016 OR https://www.stage32.com/webinars/Film-Contracts-101-Everything-You-Need-To-Know
and
https://www.stage32.com/webinars/Breaking-Down-IP-Intellectual-Property-For-Development Any ideas? Thanks in advance.0 -
Error in Duplicate Content Being Reported - Pages Aren't Actually Duplicates
The recent crawl of one of our sites revealed a high number of duplicate content issues. However, when I viewed the report for pages with duplicate content I noticed almost all of them are not duplicates. For example, these two pages are marked as dupes:
Moz Bar | | M_D_Golden_Peak
https://www.writersstore.com/publishers/hollywood-creative-directory
https://www.writersstore.com/authors/g-miki-hayden These are thin as far as content goes but definitely not duplicates. Any recommendations or ways to adjust the settings so that these false positives aren't clogging up our site crawl report?0 -
Site Crawl report show strange duplicate pages
Beginning in early in Feb, we got a big bump in duplicate pages. The URLs of the pages are very odd: Example URL:
Moz Bar | | Neo4j
http://firstname.lastname@website.com/dir/page.php
is duplicate with http://website.com/dir/page.php I checked though the site, nginx conf files, and referral pages, and could not find what is prefixing the pages with 'http://firstname.lastname@'. Any ideas? The person whose name is 'Firstname Lastname' is stumped as well. Thanks.0 -
Crawl Diagnostics: How many pages (deep) will it crawl for dup content
Does anyone know how deep the crawl diagnostics will crawl when searching for dup content? Will it crawl the entire site, or will it only crawl "x" amount of pages? Thanks!
Moz Bar | | tdawson090 -
Perplexed by last MOZ crawling duplicate content errors
In the last crawler issues report from MOZ I can see many many pages listed as duplicate content with 0 duplicate urls. Like this: http://imgur.com/fbikRVq I am puzzled, what does it mean?
Moz Bar | | max.favilli0 -
Dupe content report showing in 'Errors' section when surely should be in 'Warnings' section ?
Why is the dupe content info showing in errors and not warnings ? Since if dupe content can get your site penalised (as per Panda) or worse banned, surely it should be in that section of reports ? Cheers
Moz Bar | | Dan-Lawrence
Dan0 -
Path of Moz Crawler Report?
Hi, I'm finding path of moz crawler report. Please tell me where I can find this tool?. Thanks
Moz Bar | | KLLC0