Internal Duplicate Content Question...
-
We are looking for an internal duplicate content checker that is capable of crawling a site that has over 300,000 pages. We have looked over Moz's duplicate content tool and it seems like it is somewhat limited in how deep it crawls. Are there any suggestions on the best "internal" duplicate content checker that crawls deep in a site?
-
If you want to a free test to crawl use this
https://www.deepcrawl.com/forms/free-crawl-report/
Please remember that URIs & URLs are different so your site with 300,000 URLs might have 600,000 URIs if you want to see how it works for free you can sign up for a free crawl for your first 10,000 pages.
I am not affiliated with the company aside from being a very happy customer.
-
Far no way the Best is going to be deep Crawl it automatically connects to Google Webmaster tools and analytics.
it can crawl constantly for ever. The real advantage is setting it to five URLs per second and depending on the speed of your server it will do it consistently I would not go over five pages per second. Make sure that you pick a dynamic IP structuring if you do not have a strong web application firewall if you do pick a single static IP then you can crawl the entire tire site without issue by white listing it. Now this is my personal opinion and I know what you're asking to be accomplished in the literally no time compared to other systems using deep crawl deepcrawl.com
It will show you what duplicate content is contained inside your website duplicate URLs what duplicate title tags you name it.
https://www.deepcrawl.com/knowledge/best-practice/seven-duplicate-content-issues/
https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-highlights-08102015/
You have a decent sized website and I would recommend adding a free edition of Robotto.org Robotto, can detect whether a preferredwww or non-www option has been configured correctly.
A lot of issues with web application firewall and CDNs you name it can be detected using the school and the combination of them is a real one-two punch. I honestly think that you will be happy with this tool. I have had issues with anything local like screaming frog when crawling surcharge websites you do not want to depend on your desktop ram. I hope you will let me know if this is a good solution for you I know that it works very very well and it will not stop crawling until it finds everything. Your site will be finished before 24 hours are done.
-
Correct, Thomas. We are not looking to restructure the site at this time but we are looking for a program that will crawl 300,000 plus pages and let us know which internal pages are duplicated.
-
If the tool has to crawl more than a crawl depth of 100 it is very common to find something that's able to do it. Like a said deep crawl, screaming frog & Moz is but you're talking about finding content that shouldn't be restructured.
-
If you looking for the most powerful tool for crawling websites deepcrawl.com is the king. Screaming frog it Is good but is dependent on RAM on your desktop. And does not have as many features as deep crawl
https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-highlights-08102015/
-
Check out Siteliner. I've never tried it with a site that big, personally. But it's free, so worth a shot to see what you can get out of it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Consolidating two different domains to point at same site, duplicate content penalty?
I have two websites that are extremely similar and want to consolidate them into one website by pointing both domain names at one website. is this going to cause any duplicate content penalties by having two different domain names pointing at the same site? Both domains get traffic so i don't want to just discontinue one of the domains.
Intermediate & Advanced SEO | | Ron100 -
No-index pages with duplicate content?
Hello, I have an e-commerce website selling about 20 000 different products. For the most used of those products, I created unique high quality content. The content has been written by a professional player that describes how and why those are useful which is of huge interest to buyers. It would cost too much to write that high quality content for 20 000 different products, but we still have to sell them. Therefore, our idea was to no-index the products that only have the same copy-paste descriptions all other websites have. Do you think it's better to do that or to just let everything indexed normally since we might get search traffic from those pages? Thanks a lot for your help!
Intermediate & Advanced SEO | | EndeR-0 -
Partial duplicate content and canonical tags
Hi - I am rebuilding a consumer website, and each product page will contain a unique product image, and a sentence or two about the product (and we tend to use a lot of the same words in different ways across products). I'd like to have a tabbed area below the product info that talks about the overall product line, and this content would be duplicate across all the product pages (a "Why use our products" type of thing). I'd have this duplicate content also living on its own URL's so they can be found alone in the SERP's. Question is, do I need to add the canonical tag to this page, since there's partial duplicate content on the product pages? And if I did that, would my product pages go un-indexed?? I understand how to handle completely duplicated content, it's the partial duplicate that I'm having difficulty figuring out.
Intermediate & Advanced SEO | | Jenny10 -
Penalized for Duplicate Page Content?
I have some high priority notices regarding duplicate page content on my website www.3000doorhangers.com Most of the pages listed here are on our sample pages: http://www.3000doorhangers.com/home/door-hanger-pricing/door-hanger-design-samples/ On the left side of our page you can go through the different categories. Most of the category pages have similar text. We mainly just changed the industry on each page. Is this something that google would penalize us for? Should I go through all the pages and use completely unique text for each page? Any suggestions would be helpful Thanks! Andrea
Intermediate & Advanced SEO | | JimDirectMailCoach0 -
Copying contents from a blog site (External) to a company blogsite (internal)
Hi, I have a client that has several external blogs www.blogsite1.info www.blogsite2.info and he also has the www.companywebsite.com the main domain of course is the comapnywebsite.com. They are doing some thing wrong, because instead of generating contents inside the main domain, the create contents in the blogsites and send links to the blogsites to see those contents. So they are inviting their users to EXIT the website... So, I told him, If you want to generate contents, please keep a blog INSIDE your domain www.companywebsite.com/blog, but keep the other ones, cause they are generating links (they are .info domains, that is not good, but they are nice keyword match domains) Now, he told me he was thinking on copy and paste the contents from the external blogsites to the internal website. I warned him about generating duplicate content. But.... is it really a problem? They are not in the same domain... Could google give a penalty because of that to the main domain? Thanks!
Intermediate & Advanced SEO | | teconsite0 -
Duplicate content resulting from js redirect?
I recently created a cname (e.g. m.client-site .com) and added some js (supplied by mobile site vendor to the head which is designed to detect if the user agent is a mobi device or not. This is part of the js: var CurrentUrl = location.href var noredirect = document.location.search; if (noredirect.indexOf("no_redirect=true") < 0){ if ((navigator.userAgent.match(/(iPhone|iPod|BlackBerry|Android.*Mobile|webOS|Window Now... Webmaster Tools is indicating 2 url versions for each page on the site - for example: 1.) /content-page.html 2.) /content-page.html?no_redirect=true and resulting in duplicate page titles and meta descriptions. I am not quite adept enough at either js or htaccess to really grasp what's going on here... so an explanation of why this is occurring and how to deal with it would be appreciated!
Intermediate & Advanced SEO | | SCW0 -
SEOMoz Internal Dupe. Content & Possible Coding Issues
SEOmoz Community! I have a relatively complicated SEO issue that has me pretty stumped... First and foremost, I'd appreciate any suggestions that you all may have. I'll be the first to admit that I am not an SEO expert (though I am trying to be). Most of my expertise is with PPC. But that's beside the point. Now, the issues I am having: I have two sites: http://www.federalautoloan.com/Default.aspx and http://www.federalmortgageservices.com/Default.aspx A lot of our SEO efforts thus-far have done good for Federal Auto Loan... and we are seeing positive impacts from them. However, we recently did a server transfer (may or may not be related)... and since that time a significant number of INTERNAL duplicate content pages have appeared through the SEOmoz crawler. The number is around 20+ for both Federal Auto Loan and Federal Mortgage Services (see attachments). I've tried to include as much as I can via the attachments. What you will see is all of the content pages (articles) with dupe. content issues along with a screen capture of the articles being listed as duplicate for the pages: Car Financing How It Works A Home Loan is Possible with Bad Credit (Please let me know if you could use more examples) At first I assumed it was simply an issue with SEOmoz... however, I am now worried it is impacting my sites (I wasn't originally because Federal Auto Loan has great quality scores and is climbing in organic presence daily). That being said, we recently launched Federal Mortgage Services for PPC... and my quality scores are relatively poor. In fact, we are not even ranking (scratch that, not even showing that we have content) for "mortgage refinance" even though we have content (unique, good, and original content) specifically around "mortgage refinance" keywords. All things considered, Federal Mortgage Services should be tighter in the SEO department than Federal Auto Loan... but it is clearly not! I could really use some significant help here... Both of our sites have a number of access points: http://www.federalautoloan.com/Default.aspx and http://www.federalmortgageservices.com/Default.aspx are both the designated home pages. And I have rel=canonical tags stating such. However, my sites can also be reached via the following: http://www.federalautoloan.com http://www.federalautoloan.com/default.aspx http://www.federalmortgageservices.com http://www.federalmortgageservics.com/default.aspx Should I incorporate code that "redirects" traffic as well? Or is it fine with just the relevancy tags? I apologize for such a long post, but I wanted to include as much as possible up-front. If you have any further questions... I'll be happy to include more details. Thank you all in advance for the help! I greatly appreciate it! F7dWJ.png dN9Xk.png dN9Xk.png G62JC.png ABL7x.png 7yG92.png
Intermediate & Advanced SEO | | WPColt0 -
Duplicate content
I have just read http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world and I would like to know which option is the best fit for my case. I have the website http://www.hotelelgreco.gr and every image in image library http://www.hotelelgreco.gr/image-library.aspx has a different url but is considered duplicate with others of the library. Please suggest me what should i do.
Intermediate & Advanced SEO | | socrateskirtsios0