What is Considered Duplicate Content by Crawlers?
-
I am asking this because I have a couple of site audit tools that I use to crawl a site I work on every week and they are showing duplicate content issues (which I know there is a lot on this site) but some of what is flagged as duplicate content makes no sense.
For example, the following URL's were grouped together as duplicate content:
|
https://www.firefold.com/contact-us
|
| https://www.firefold.com/sale |
|
|
How are these pages duplicate content? I am confused on what site audit tools are considering duplicate content.
Just FYI, this is data from Moz crawl diagnostics but SEMrush site auditor is giving me the same type of data.
Any help would be greatly appreciated.
Ryan
-
Yea I just started working on this site. I haven't used Moz Analytics much so just wanting to see how their crawler crawls pages.
And yes I agree, there are a lot of BIG BIG BIG issues with this site.
I got a large workload over the next few months haha.
-
I would add that there's is no text on any of those three pages - any "text" one would see there is actually just embedded in an image - which is a huge issue for a number of reasons:
- Search engines see that there's no text - a big no-no.
- You're getting practically no SEO value from the content that would be there, even if there isn't much.
- It's heavier this way - which makes load times slower.
I want to clarify that there are many, bigger issues with these pages - but as your question concerns only duplicate content, I'll leave all of that out for the time being. To summarize, Google, Yahoo, and Bing are just seeing some duplicate banners, sidebars, etc. and then some images in the body of your pages. Hence, duplicate content.
-
Thanks for that information.
It makes sense looking at the data and pages from that perspective.
-
Hi Ryan!
Our crawler will flag pages that have at least 90% similarity in the entire source code of the site so not just the body.
The way you want to interpret the report is the contact-us page has 35 duplicates, so "gabe" and "sale" are not dupes of each other in this section but are only each a duplicate of "contact-us". Those URLs might appear with their own duplicates of the same pages further down in the report.
While on the front end the pages do not appear to be similar. The issue is likely with the amount of javascript code on those pages.
Our crawler cannot read javascript so we are likely only able to see the template of the page. Other search tools are probably seeing the same thing as it returns 79% similarity using this tool: http://www.freebulkseotools.com/similar-page-checker-tool.php
I can't provide much insight from a dev perspective but hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Content suggestion
Hello, What is the secret sauce of your content suggestion. How do you consider that a topic is covered. Please explain. Thank you, Flvien
Moz Bar | | seoanalytics0 -
I update content and then craw but the MOZ spider still shows old content. Do I need to update something else?
"This shows but was replaced a day before I ran Moz crawer: | We provide a full service for low cost automated phone calls, robocalls, Bulk SMS service, Political robo calls without needing computer skills | "
Moz Bar | | ThomasDaBomb
I look in the link on website and see:
<title>Our customers talk about: Currently the tremendous growth of organi</title> Why does the craw not reflect the current content? Thanks.
Thomas0 -
Duplicate Home Page in Moz Report
I recently ran a crawl report in MOZ for www.laddersfree.co.uk and it has picked duplicate pages for my home page. The two pages are www.laddersfree.co.uk and www.laddersfree.co.uk The CMS System my developer is using is Concrete5. If you look at the source code there is a Canonical <link rel="<a class="attribute-value">canonical</a>" href="http://www.laddersfree.co.uk/" /> I am very confused as to why there are two pages picked up in the crawl. Can someone offer any advice please? Thank you. Jason
Moz Bar | | gymmad0 -
WP 4.0 Update Causing Major Duplicate Content Errors?
According to my moz analytics, my site has went through the roof with duplicate content. There's a nice Mozzer named Abe looking into this with me, but I'm wondering if it could be due to the WP 4.0 update. Has anyone else experienced an uptick like this before? I've never had any problems with the other updates. Thanks, Ruben
Moz Bar | | KempRugeLawGroup0 -
Moz Crawl Test Tool - SEO Web Crawler showing up with no details
So basically I have ran the Moz Crawl Test tool twice for this url "bubblingwithenergy.info" and both times the report has listed 1 URL when there is obviously a lot more if you check the site. My question is, why is the Moz Crawl only reporting 1 URL when there are heaps? Is there a possibility it is being blocked and if so what would be blocking it? This website is using a CMS called Infusion and it is based off CMSMS (CMS Made Simple). Any answers would be greatly appreciated. Cheers
Moz Bar | | KBB_Digital0 -
OnPage Reports - Duplicate titles and meta descriptions
Hi Moz, I know you guys changed your interface awhile back but I have a question about the new reports. On the old interface, I used to use a report that would automatically run when I created a new account letting me know where the dup titles and meta descriptions were on an entire site. Where can I find this report on the new interface? Thanks Carla
Moz Bar | | Carla_Dawson1 -
Moz crawler
I have a site which is in a non production status. Crawlers are blocked vis robot txt. User-agent: *
Moz Bar | | Emanuele_Ricci
Disallow: / I WANT TO MAKE A CRAWLING TEST WITH MOZ CRAWLER (RogerBot) ,
how can I allow your crawler to get in and prevent other crawlers from indexing the site? Thanks memok0 -
Moz Dupe content crawl anomaly
Hi Moz has completed a crawl for a site i'm working on which also has a development area (hence with lots of dupe content) on a sub domain (and this dev area hasn't been hidden from crawlers via password, robots, gwt etc etc). Moz dupe content report is not showing any of these urls though even though my campaign setting is on 'root' domain so i would have thought report should be listing the subdomain urls as dupe content (because they are dupe content). Any ideas ? Cheers Dan
Moz Bar | | Dan-Lawrence0