Crawl reports urls with duplicate content but its not the case
-
Hi guys!
Some hours ago I received my crawl report.I noticed several records with urls with duplicate content so I went to open those urls one by one.
Not one of those urls were really with duplicate content but I have a concern because website is about product showcase and many articles are just images with href behind them. Many of those articles are using the same images so maybe thats why the seomoz crawler duplicate content flag is raised. I wonder if Google has problem with that too.See for yourself how it looks like:
http://by.vg/NJ97y
http://by.vg/BQypEThose two url's are flagged as duplicates...please mind the language(Greek) and try to focus on the urls and content.
ps: my example is simplified just for the purpose of my question.
<colgroup><col width="3436"></colgroup>
| URLs with Duplicate Page Content (up to 5) | -
Disclaimer: I just answered a question just like this on another thread, so I literally copied and pasted my response from there, and edited where necessary.
The SEOmoz web app uses a similarity threshold of 95% of the html code. This takes everything on the page, both hidden and visible into account.
In this case, it's counting all of the navigation and sidebar as well, which is significant. What's left of the unique content - the part that matters, makes up less than 5% of the code.Here's a tool you can use to check the similarity: http://www.duplicatecontent.net/
I ran the pages through a couple of tools which showed 98% similarity. (but only 75% text similarity, which is good, but not great)
SEOKeith is absolutely right that there's very little on those pages to help them rank. Without text, you're fighting an uphill battle.
Hope this helps! Best of luck with your SEO.
-
Yeah, thats what I m going to do in my next meeting. Either way I also feel such websites need to have more pics than anything else, maybe a blog page or separate pages with articles could link to those products one by one with related description having a side content website for the actual product pages.
-
Maybe explain to the client it's not going to rank as well without text and has less chance of getting found by searches (generally speaking...).
I get duplicate content flagging as well sometimes, I check the pages manually when it happens.
-
Thanks Keith. I ve been using seomoz for some days so I wasnt sure about this.
Client wants website with as less text as possible so I guess my only hopes are title and alt attributes.
-
Those pages are very similar so it's probably throwing the duplicate content switch in SEOmoz, you might want to ignore it in this case.
I would add some more text to those pages personally to aid with ranking, you can position the text over the images with CSS.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why doesn't Moz crawl whole pages of our website to report All On-Page issues?
Hi friends & mozzers, How can't Moz crawl whole pages of our website: https://www.4atvtires.com/ to report All Serious On-Page issues. We have more than 15000 product pages. And how could it be possible that Moz isn't able to crawl whole, just got crawl report upto 258 pages of our website, and also I can experience the same in Google webmaster ?? Please help to fix this issue as early as possible. Regards,
Moz Pro | | BigSlate
Rann0 -
Moz shows duplicate content, but URL's are tagged with campaign tags
Crawl diagnostics shows a lot of pages with duplicate content, but when I check the details, I see that it lists the same page but the url contains a campaign tag, so it's not really another page that is serving identical content... Is there a way to remove these pages out of the Crawl Diagnostics?
Moz Pro | | jorisbrabants0 -
After I make corrections of my crawl diagnostics report, how can I tell is those corrections "took". Is there a way to immediatly refresh that report. Will it eventually refresh?'
I have made corrections to the crawl diagnostics report. Can I refresh this report? I would like to see if my corrections were correct. Thanks for your anticipated answer!
Moz Pro | | Bob550 -
Changing the way SEOmoz Detects Duplicate Content
Hey everyone, I wanted to highlight today's blog post in case you missed it. In short, we're using a different algorithm to detect duplicate pages. http://moz.com/blog/visualizing-duplicate-web-pages If you see a change in your crawl results and you haven't done anything, this is probably why. Here's more information taken directly from the post: 1. Fewer duplicate page errors: a general decrease in the number of reported duplicate page errors. However, it bears pointing out that: **We may still miss some near-duplicates. **Like the current heuristic, only a subset of the near-duplicate pages is reported. **Completely identical pages will still be reported. **Two pages that are completely identical will have the same simhash value, and thus a difference of zero as measured by the simhash heuristic. So, all completely identical pages will still be reported. 2. Speed, speed, speed: The simhash heuristic detects duplicates and near-duplicates approximately 30 times faster than the legacy fingerprints code. This means that soon, no crawl will spend more than a day working its way through post-crawl processing, which will facilitate significantly faster delivery of results for large crawls.
Moz Pro | | KeriMorgret2 -
Dot Net Nuke generating long URL showing up as crawl errors!
Since early July a DotNetNuke site is generating long urls that are showing in campaigns as crawl errors: long url, duplicate content, duplicate page title. URL: http://www.wakefieldpetvet.com/Home/tabid/223/ctl/SendPassword/Default.aspx?returnurl=%2F Is this a problem with DNN or a nuance to be ignored? Can it be controlled? Google webmaster tools shows no crawl errors like this.
Moz Pro | | EricSchmidt0 -
Wrong duplicated page content
I found out that some errors on my website are considered as "duplicated page content" while they are not, the content is different on each page. I wonder why ? Is it an issue from Seomoz ?
Moz Pro | | Amadeus_eBC0 -
Crawl Diagnostics Report
I'm a bit concerned about the results I'm getting from the Crawl Diagnostics Report. I've updated the site with canonical urls to remove duplicate content and when I check the site - it all displays the right values, but the report, which has just finished crawling is still showing a lot of pages as duplicate content. Simple example: http://www.domain.com http://www.domain.com/ Both of them are in the duplicate content section although both have canonical url set as: Does each crawl check the entire site from the beginning or just the pages it didn't have a chance to crawl the last time? This is just one of 333 duplicate content pages, which have canonical url pointing to the right page. Can someone please explain?
Moz Pro | | coremediadesign0 -
Duplicate Content and Titles in SEOMoz reports
I've had to rename some of the pages on my site and also move them to different locations. I placed a rel="canonical" on the old page pointing to the new one. The reports on my PRO Dashboard are telling me that I have Duplicate Content and Page Title errors. Do the SEOMoz automated reports take the rel="canonical" link into consideration or do I need to remove these pages and do a 301 redirect from the old to the new page?
Moz Pro | | TRICORSystems0