What is Considered Duplicate Content by Crawlers?
-
I am asking this because I have a couple of site audit tools that I use to crawl a site I work on every week and they are showing duplicate content issues (which I know there is a lot on this site) but some of what is flagged as duplicate content makes no sense.
For example, the following URL's were grouped together as duplicate content:
|
https://www.firefold.com/contact-us
|
| https://www.firefold.com/sale |
|
|
How are these pages duplicate content? I am confused on what site audit tools are considering duplicate content.
Just FYI, this is data from Moz crawl diagnostics but SEMrush site auditor is giving me the same type of data.
Any help would be greatly appreciated.
Ryan
-
Yea I just started working on this site. I haven't used Moz Analytics much so just wanting to see how their crawler crawls pages.
And yes I agree, there are a lot of BIG BIG BIG issues with this site.
I got a large workload over the next few months haha.
-
I would add that there's is no text on any of those three pages - any "text" one would see there is actually just embedded in an image - which is a huge issue for a number of reasons:
- Search engines see that there's no text - a big no-no.
- You're getting practically no SEO value from the content that would be there, even if there isn't much.
- It's heavier this way - which makes load times slower.
I want to clarify that there are many, bigger issues with these pages - but as your question concerns only duplicate content, I'll leave all of that out for the time being. To summarize, Google, Yahoo, and Bing are just seeing some duplicate banners, sidebars, etc. and then some images in the body of your pages. Hence, duplicate content.
-
Thanks for that information.
It makes sense looking at the data and pages from that perspective.
-
Hi Ryan!
Our crawler will flag pages that have at least 90% similarity in the entire source code of the site so not just the body.
The way you want to interpret the report is the contact-us page has 35 duplicates, so "gabe" and "sale" are not dupes of each other in this section but are only each a duplicate of "contact-us". Those URLs might appear with their own duplicates of the same pages further down in the report.
While on the front end the pages do not appear to be similar. The issue is likely with the amount of javascript code on those pages.
Our crawler cannot read javascript so we are likely only able to see the template of the page. Other search tools are probably seeing the same thing as it returns 79% similarity using this tool: http://www.freebulkseotools.com/similar-page-checker-tool.php
I can't provide much insight from a dev perspective but hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content & Title Tag Group Fields on MoZ Report
Hello, On my SEO MOZ exported Site Crawl CSV report, I have columns for Duplicate Content Group & for Duplicate Title Tag Group. The values in the columns are numerical - 20, 5 , 15, etc. Can anyone explain to me what these values represent and how I can fix the issues I presume they represent? Thank you,
Moz Bar | | AED-1
Scott0 -
Is MOZ any good to analyze an e-commerce site? How come that a cms page can be seen as duplicate content with a category page?
Hi Guys, I've been using Moz for quite a long time now for 2 of my shops. Now I am in the process of launching the second shop and I just don't understand how is it possible that a cms static page (About US) to be seen as a duplicate content with other 96 pages - including product pages and other totally different pages such as delivery information, category pages, returns and so on. Really MOZ?? Is it me or you?? Your help would be much appreciated! Thank you!
Moz Bar | | Sorin_T0 -
Moz Crawler Causing Server Timeouts... Crawling thousands of non-existant pages with query parameters
Moz crawler is crawling all pages like this: http://www.xxxx.com/?product_count=100&product_order=desc&product_orderby=date http://www.xxxx.com/?product_count=100&product_order=desc&paged=1 http://www.xxx.com/?product_count=100&product_order=desc&product_view=grid Last month it crawled 80,000 pages on a site with less than 100 pages. Is there a way to select only certain pages to be crawled? Right now it is still crawling this site, since Monday morning and it's Tuesday mid-day. Every Monday it is causing time-outs from high band width on our server. Just getting ready to delete this client from the account unless there is a solution someone can give us. Thanks.
Moz Bar | | adirondack0 -
Moz Content --- for SEO or simply user engagement?
What is the primary function of Moz Content? It looks like it is most useful for managing content as a user engagement tool. Our content strategy is centered on boosting organic placement - with user engagement as a nice but unessential side product. Besides providing general descriptive details of a site's content / authorship - how can Moz Content help with SEO?
Moz Bar | | cvonhassell0 -
Duplicate content reported for totally different pages
Hi, The Moz report is showing just over 21,500 duplicate page issues on our site. This is more or less every page we have. However when I look at the pages it says are duplicates they are totally different (it could for example report that a news page for 2009 is the same as a product page just added which has no relation when you read the content or view the page). What sort of thing could it be picking up as duplicate content? I assume it must be something in the HTML for the site rather than the actual page content as there is no cross over at all on the pages highlighted. The only issue I can currently identify is that the menu for the mobile version of the site has a huge number of internal links which I will cut down. If the tools purely look at HTML content this could be seen as duplicate but shouldn't it be clever enough to realise what is content and what is site structure? Thanks,
Moz Bar | | TW-Steve0 -
Duplicate page content/page titles on tages
Hello everyone. New to the community and loving it already. Question, I am receiving an error of 6 pages with duplicate content and page titles. A majority of these are tag pages. Should I be worried about these? IN the column listed duplicate urls it is listing 0 ( screen shot: http://screencast.com/t/azvuVk0ucWt) Are these tags a problem? Will SEO be hurt because of this? What are TAG pages? Actually pages, categories, should I eliminate these?
Moz Bar | | Jasonalanmagic0 -
Duplicate page titles
Hi -- A crawl tells me I have 200 duplicate page titles. Unfortunately, it doesn't tell me what those pages are duplicating. What do I do with this information? How do I begin to respond? Thanks
Moz Bar | | skipperdoodle0 -
Crawl Diagnostics - nofollow - reducing duplicate pages
Hi I'm looking at a crawl diagnostic report, I can see I have many duplicate pages, the reason for this is that when a brand filter is applied to a page. IE
Moz Bar | | chameleondm
www.mysite.com/mycategory - lets say this is the product listing page
www.mysite.com/category/mybrand - and this is the same page but with a brand filter applied
www.mysite.com/category/myotherbrand - and this is the same page but with a different brand filter applied I had intially appendeded the meta title, description and keywords with some extra content if a brand filter was applied, because the page on the whole does have different content. IE I would have a custom meta information, H1 tag and products on that page just for that specific brand.
However I am wondering if these two pages are really just competing with each other as lots of the content will be the same. Should I scrap that approach and use either nofollow on the brand filter link, or simply use a canonical. Thanks, James1