What is Considered Duplicate Content by Crawlers?
-
I am asking this because I have a couple of site audit tools that I use to crawl a site I work on every week and they are showing duplicate content issues (which I know there is a lot on this site) but some of what is flagged as duplicate content makes no sense.
For example, the following URL's were grouped together as duplicate content:
|
https://www.firefold.com/contact-us
|
| https://www.firefold.com/sale |
|
|
How are these pages duplicate content? I am confused on what site audit tools are considering duplicate content.
Just FYI, this is data from Moz crawl diagnostics but SEMrush site auditor is giving me the same type of data.
Any help would be greatly appreciated.
Ryan
-
Yea I just started working on this site. I haven't used Moz Analytics much so just wanting to see how their crawler crawls pages.
And yes I agree, there are a lot of BIG BIG BIG issues with this site.
I got a large workload over the next few months haha.
-
I would add that there's is no text on any of those three pages - any "text" one would see there is actually just embedded in an image - which is a huge issue for a number of reasons:
- Search engines see that there's no text - a big no-no.
- You're getting practically no SEO value from the content that would be there, even if there isn't much.
- It's heavier this way - which makes load times slower.
I want to clarify that there are many, bigger issues with these pages - but as your question concerns only duplicate content, I'll leave all of that out for the time being. To summarize, Google, Yahoo, and Bing are just seeing some duplicate banners, sidebars, etc. and then some images in the body of your pages. Hence, duplicate content.
-
Thanks for that information.
It makes sense looking at the data and pages from that perspective.
-
Hi Ryan!
Our crawler will flag pages that have at least 90% similarity in the entire source code of the site so not just the body.
The way you want to interpret the report is the contact-us page has 35 duplicates, so "gabe" and "sale" are not dupes of each other in this section but are only each a duplicate of "contact-us". Those URLs might appear with their own duplicates of the same pages further down in the report.
While on the front end the pages do not appear to be similar. The issue is likely with the amount of javascript code on those pages.
Our crawler cannot read javascript so we are likely only able to see the template of the page. Other search tools are probably seeing the same thing as it returns 79% similarity using this tool: http://www.freebulkseotools.com/similar-page-checker-tool.php
I can't provide much insight from a dev perspective but hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
The factors considered in the new domain authority algorithm? On-site factors we can use to compare with competitors? Is having an NA as a spam score a bad thing?
Does anyone know the factors considered in the new domain authority algorithm other than spam score and complex distributions of links based on quality and traffic? Does anyone know of on-site factors we can use to compare with competitors to try and improve DA? Is having an NA as a spam score a bad thing?
Moz Bar | | CQMarketing0 -
Why isn't the Moz crawler getting all of my item pages?
I am stumped and Moz is being terrible to work with. This site has about 40k pages 39,800 of them are item pages roughly. Moz is only finding about 2400 of my pages. So they are missing most but not all of my item pages. I do not know which item pages they are missing. The fact that they are finding about 2k but not the rest leads me to believe the crawler is struggling with pagination. The site is built on Magento 2 and uses the Amasty Layered Navigation extension. Does anyone have any ideas?
Moz Bar | | Tylerj0 -
How can I find duplicate pages from a Moz Crawl?
We have many duplicate pages that show up on the Moz Crawl, and we're trying to fix these but it's very difficult because I can't see a way to isolate the code where the duplicate is found. For instance, http://experiencemission.org/immersion/ is one of our main pages, and the crawl shows one duplicate of http://experiencemission.org/immersion. It appears that one of our staff manually edited the source code in one of our pages but forgot the trailing slash. This would be an easy fix but the problem is that this page is linked to internally on our website 2423 times, so it's next to impossible to find the code that is incorrect. We have many other pages with this same basic problem. We know we have duplicates, but it's next to impossible to isolate them. So my question is this: When viewing the Moz Crawl data is there any way to see where a specific duplicate page link is located on our website? Thanks for any and all help!
Moz Bar | | expmission0 -
Duplicating a Report to monthly from weekly
Is there a way to duplicate a weekly report and also run it monthly without redesigning it from scratch? I have a great report a client loves but they also want to see it monthly. Thanks! Melinda
Moz Bar | | timesharecmo0 -
Moz crawler
I have a site which is in a non production status. Crawlers are blocked vis robot txt. User-agent: *
Moz Bar | | Emanuele_Ricci
Disallow: / I WANT TO MAKE A CRAWLING TEST WITH MOZ CRAWLER (RogerBot) ,
how can I allow your crawler to get in and prevent other crawlers from indexing the site? Thanks memok0 -
Duplicate content reported for totally different pages
Hi, The Moz report is showing just over 21,500 duplicate page issues on our site. This is more or less every page we have. However when I look at the pages it says are duplicates they are totally different (it could for example report that a news page for 2009 is the same as a product page just added which has no relation when you read the content or view the page). What sort of thing could it be picking up as duplicate content? I assume it must be something in the HTML for the site rather than the actual page content as there is no cross over at all on the pages highlighted. The only issue I can currently identify is that the menu for the mobile version of the site has a huge number of internal links which I will cut down. If the tools purely look at HTML content this could be seen as duplicate but shouldn't it be clever enough to realise what is content and what is site structure? Thanks,
Moz Bar | | TW-Steve0 -
Crawl Diagnostics - nofollow - reducing duplicate pages
Hi I'm looking at a crawl diagnostic report, I can see I have many duplicate pages, the reason for this is that when a brand filter is applied to a page. IE
Moz Bar | | chameleondm
www.mysite.com/mycategory - lets say this is the product listing page
www.mysite.com/category/mybrand - and this is the same page but with a brand filter applied
www.mysite.com/category/myotherbrand - and this is the same page but with a different brand filter applied I had intially appendeded the meta title, description and keywords with some extra content if a brand filter was applied, because the page on the whole does have different content. IE I would have a custom meta information, H1 tag and products on that page just for that specific brand.
However I am wondering if these two pages are really just competing with each other as lots of the content will be the same. Should I scrap that approach and use either nofollow on the brand filter link, or simply use a canonical. Thanks, James1 -
Duplicate page content
Hi guys the feedback form my campaign suggests I have to much duplicate page content. I’ve had a look at the CSV file but it doesn’t seem to be abundantly clear as to which pages on my site have the duplicate content. Can anyone tell which columns I need to refer to on the sheet, to ascertain this information. Also if the content is only slightly different, will Google still consider it to be duplicate? I look forward to hearing from you
Moz Bar | | Hardley1110