Internal Duplicate Content Question...
-
We are looking for an internal duplicate content checker that is capable of crawling a site that has over 300,000 pages. We have looked over Moz's duplicate content tool and it seems like it is somewhat limited in how deep it crawls. Are there any suggestions on the best "internal" duplicate content checker that crawls deep in a site?
-
If you want to a free test to crawl use this
https://www.deepcrawl.com/forms/free-crawl-report/
Please remember that URIs & URLs are different so your site with 300,000 URLs might have 600,000 URIs if you want to see how it works for free you can sign up for a free crawl for your first 10,000 pages.
I am not affiliated with the company aside from being a very happy customer.
-
Far no way the Best is going to be deep Crawl it automatically connects to Google Webmaster tools and analytics.
it can crawl constantly for ever. The real advantage is setting it to five URLs per second and depending on the speed of your server it will do it consistently I would not go over five pages per second. Make sure that you pick a dynamic IP structuring if you do not have a strong web application firewall if you do pick a single static IP then you can crawl the entire tire site without issue by white listing it. Now this is my personal opinion and I know what you're asking to be accomplished in the literally no time compared to other systems using deep crawl deepcrawl.com
It will show you what duplicate content is contained inside your website duplicate URLs what duplicate title tags you name it.
https://www.deepcrawl.com/knowledge/best-practice/seven-duplicate-content-issues/
https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-highlights-08102015/
You have a decent sized website and I would recommend adding a free edition of Robotto.org Robotto, can detect whether a preferredwww or non-www option has been configured correctly.
A lot of issues with web application firewall and CDNs you name it can be detected using the school and the combination of them is a real one-two punch. I honestly think that you will be happy with this tool. I have had issues with anything local like screaming frog when crawling surcharge websites you do not want to depend on your desktop ram. I hope you will let me know if this is a good solution for you I know that it works very very well and it will not stop crawling until it finds everything. Your site will be finished before 24 hours are done.
-
Correct, Thomas. We are not looking to restructure the site at this time but we are looking for a program that will crawl 300,000 plus pages and let us know which internal pages are duplicated.
-
If the tool has to crawl more than a crawl depth of 100 it is very common to find something that's able to do it. Like a said deep crawl, screaming frog & Moz is but you're talking about finding content that shouldn't be restructured.
-
If you looking for the most powerful tool for crawling websites deepcrawl.com is the king. Screaming frog it Is good but is dependent on RAM on your desktop. And does not have as many features as deep crawl
https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-highlights-08102015/
-
Check out Siteliner. I've never tried it with a site that big, personally. But it's free, so worth a shot to see what you can get out of it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to answer questions when there no questions for my keyword
Hello, Let's say I want to rank on "Alsace bike tour" whatever tool I use Moz keyword explorer, google suggest , keyword.io, answer the public ... there are not questions... so... what do I need to answer ? I imagine that for google there are some questions more relevant than others ? Should I answer do I need to bring my own bike or where will I go... ? and will google give me "points " for answering those questions even though people don't have questions... For the keyword title tag, it is easy, people ask the character limit, title tag generator and so on but for may keywords like that ones I am targeting people have NO Questions ! Thank you,
Intermediate & Advanced SEO | | seoanalytics0 -
Will we be penalised for duplicate content on a sub-domain?
Hi there, I run a WordPress blog and I use [community platform] Discourse for commenting. When we publish a post to Wordpress, a duplicate of that post is pushed to a topic on Discourse, which is on a sub-domain. Eg: The original post and the duplicated post. Will we be penalised for duplicating our own content on a subdomain? If so, other than using an excerpt, what are our options? Thanks!
Intermediate & Advanced SEO | | ILOVETHEHAWK0 -
Can multiple geotargeting hreflang tags be set in one URL? International SEO question
Hi All, I have a question please. If i target www.onedirect.co.nl/en/ in English for Holland, Belgium and Luxembourg, are the tags below correct? English for Holland, Belgium and Luxembourg: http://www.example.co.nl/en/" hreflang="en-nl" /> http://www.example.co.nl/en/" hreflang="en-be" /> http://www.example.co.nl/en/" hreflang="en-lu" /> AND Targeting Holland and Belgium in Dutch: Pour la page www.onedirect.co.nl on peut inclure ce tag: http://www.example.co.nl" hreflang="nl-nl" /> http://www.example.co.nl" hreflang="nl-be" /> thanks a lot for your help!
Intermediate & Advanced SEO | | Onedirect_uk0 -
Duplicate Content: Is a product feed/page rolled out across subdomains deemed duplicate content?
A company has a TLD (top-level-domain) which every single product: company.com/product/name.html The company also has subdomains (tailored to a range of products) which lists a choosen selection of the products from the TLD - sort of like a feed: subdomain.company.com/product/name.html The content on the TLD & subdomain product page are exactly the same and cannot be changed - CSS and HTML is slightly differant but the content (text and images) is exactly the same! My concern (and rightly so) is that Google will deem this to be duplicate content, therfore I'm going to have to add a rel cannonical tag into the header of all subdomain pages, pointing to the original product page on the TLD. Does this sound like the correct thing to do? Or is there a better solution? Moving on, not only are products fed onto subdomain, there are a handfull of other domains which list the products - again, the content (text and images) is exactly the same: other.com/product/name.html Would I be best placed to add a rel cannonical tag into the header of the product pages on other domains, pointing to the original product page on the actual TLD? Does rel cannonical work across domains? Would the product pages with a rel cannonical tag in the header still rank? Let me know if there is a better solution all-round!
Intermediate & Advanced SEO | | iam-sold0 -
Duplicated Content with Index.php
Good Afternoon, My website uses Joomla CMS and has the htaccess rewrite code enabled to ensure the use of search engine friendly URLs (SEF's). While browsing the crawl diagnostics I have found that Moz considers the /index.php URL a duplicate to our root. I will always under the impression that the htaccess rewrite took care of that issue and obviously I would like to address it. I attempted to create a 301 redirect from the index.php URL to the root but ran into an issue when attempting to login to the admin portion of the website as the redirect sent me back to the homepage. I was curious if anyone had advice for handling the index.php duplication issue, specifically with Joomla. Additionally, I have confirmed that in Google Webmasters, under URL parameters, the index.php parameter is set as 'Representative URL'.
Intermediate & Advanced SEO | | BrandonEML0 -
Duplicate Content... Really?
Hi all, My site is www.actronics.eu Moz reports virtually every product page as duplicate content, flagged as HIGH PRIORITY!. I know why. Moz classes a page as duplicate if >95% content/code similar. There's very little I can do about this as although our products are different, the content is very similar, albeit a few part numbers and vehicle make/model. Here's an example:
Intermediate & Advanced SEO | | seowoody
http://www.actronics.eu/en/shop/audi-a4-8d-b5-1994-2000-abs-ecu-en/bosch-5-3
http://www.actronics.eu/en/shop/bmw-3-series-e36-1990-1998-abs-ecu-en/ate-34-51 Now, multiply this by ~2,000 products X 7 different languages and you'll see we have a big dupe content issue (according to Moz's Crawl Diagnostics report). I say "according to Moz..." as I do not know if this is actually an issue for Google? 90% of our products pages rank, albeit some much better than others? So what is the solution? We're not trying to deceive Google in any way so it would seem unfair to be hit with a dupe content penalty, this is a legit dilemma where our product differ by as little as a part number. One ugly solution would be to remove header / sidebar / footer on our product pages as I've demonstrated here - http://woodberry.me.uk/test-page2-minimal-v2.html since this removes A LOT of page bloat (code) and would bring the page difference down to 80% duplicate.
(This is the tool I'm using for checking http://www.webconfs.com/similar-page-checker.php) Other "prettier" solutions would greatly appreciated. I look forward to hearing your thoughts. Thanks,
Woody 🙂1 -
Finding Duplicate Content Spanning more than one Site?
Hi forum, SEOMoz's crawler identifies duplicate content within your own site, which is great. How can I compare my site to another site to see if they share "duplicate content?" Thanks!
Intermediate & Advanced SEO | | Travis-W0 -
Duplicate content ramifications for country TLDs
We have a .com site here in the US that is ranking well for targeted phrases. The client is expanding its sales force into India and South Africa. They want to duplicate the site entirely, twice. Once for each country. I'm not well-versed in international SEO. Will this cause a duplicate content filter? Would google.co.in and google.co.za look at google.com's index for duplication? Thanks. Long time lurker, first time question poster.
Intermediate & Advanced SEO | | Alter_Imaging0