Best tools for identifying internal duplicate content
-
Hello again Mozzers! Other than the Moz tool, are there any other tools out there for identifying internal duplicate content? Thanks, Luke
-
-
Great article link! Thank you!
-
Thanks Jorge - Not sure how I'd survive without Screaming Frog - haven't gotten around to Xenu Linksleuth yet, but must give it a go sometime soon! Although I use copyscape to check for external duplication, hadn't realised I could use it to check for duplicate text within a website, so I'm v grateful for that pointer Luke
-
Thanks James - good advice!
-
Huge thanks for the advice and that brilliant article Anthony :-)!
-
Luke
Apart from the tools mentioned above, I use copyscape premium to identify duplicate text (in the body of the page), I also find these tools very useful:
Xenu Linksleuth: very good for finding duplicate tags in your page's headers (title, description), and for many other tasks that require crawling your site. And the tool is free!
Screaming Frog: Another web crawler and very good tool for finding duplicate tags. It is a paid tool (about 77 GBP per year) but has a couple of features that Xenu does not have.
Cheers
Jorge
-
I use the Moz crawler to crawl my entire site and export it to an excel spreadsheet to navigate, its one of the first columns on your report
http://pro.moz.com/tools/crawl-test
Although i agree with Anthony and think its a very good idea to track any duplicate mentions from Google's perspective in webmaster tools
-
Duplicate content is going to be on your website. The key is to keep it out of Google's index. That is why using Google tools is very important. I find that using Google Webmaster Tools (duplicate page titles) and most importantly Google search as the best way to identify these problems.
This article is absolutely fantastic and there is a section titled "Tools for Finding & Diagnosing Duplicate Content" that explains exactly how to use Google and Google Webmaster Tools to find your duplicate content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Upper and lower case URLS coming up as duplicate content
Hey guys and gals, I'm having a frustrating time with an issue. Our site has around 10 pages that are coming up as duplicate content/ duplicate title. I'm not sure what I can do to fix this. I was going to attempt to 301 direct the upper case to lower but I'm worried how this will affect our SEO. can anyone offer some insight on what I should be doing? Update: What I'm trying to figure out is what I should do for our URL's. For example, when I run an audit I'm getting two different pages: aaa.com/BusinessAgreement.com and also aaa.com/businessagreement.com. We don't have two pages but for some reason, Google thinks we do.
Intermediate & Advanced SEO | | davidmac1 -
Partial duplicate content and canonical tags
Hi - I am rebuilding a consumer website, and each product page will contain a unique product image, and a sentence or two about the product (and we tend to use a lot of the same words in different ways across products). I'd like to have a tabbed area below the product info that talks about the overall product line, and this content would be duplicate across all the product pages (a "Why use our products" type of thing). I'd have this duplicate content also living on its own URL's so they can be found alone in the SERP's. Question is, do I need to add the canonical tag to this page, since there's partial duplicate content on the product pages? And if I did that, would my product pages go un-indexed?? I understand how to handle completely duplicated content, it's the partial duplicate that I'm having difficulty figuring out.
Intermediate & Advanced SEO | | Jenny10 -
Artist Bios on Multiple Pages: Duplicate Content or not?
I am currently working on an eComm site for a company that sells art prints. On each print's page, there is a bio about the artist followed by a couple of paragraphs about the print. My concern is that some artists have hundreds of prints on this site, and the bio is reprinted on every page,which makes sense from a usability standpoint, but I am concerned that it will trigger a duplicate content penalty from Google. Some people are trying to convince me that Google won't penalize for this content, since the intent is not to game the SERPs. However, I'm not confident that this isn't being penalized already, or that it won't be in the near future. Because it is just a section of text that is duplicated, but the rest of the text on each page is original, I can't use the rel=canonical tag. I've thought about putting each artist bio into a graphic, but that is a huge undertaking, and not the most elegant solution. Could I put the bio on a separate page with only the artist's info and then place that data on each print page using an <iframe>and then put a noindex,nofollow in the robots.txt file?</p> <p>Is there a better solution? Is this effort even necessary?</p> <p>Thoughts?</p></iframe>
Intermediate & Advanced SEO | | sbaylor0 -
Best tools for exploring links?
and not just every single link, but ones you know that Google is actually indexing. I find seomoz to be super easy, but there is no way to distinguish links that are actually counting "juice", or am i missing something. What about majesticseo - any other similar tools you use when trying to find linking sites that pass juice?
Intermediate & Advanced SEO | | imageworks-2612900 -
Multiple cities/regions websites - duplicate content?
We're about to launch a second site for a different, neighbouring city in which we are going to setup a marketing campaign to target sales in that city (which will also have a separate office there as well). We are going to have it under the same company name, but different domain name and we're going to do our best to re-write the text content as much as possible. We want to avoid Google seeing this as a duplicate site in any way, but what about: the business name the toll free number (which we would like to have same on both sites) the graphics/image files (which we would like to have the same on both sites) site structure, coding styles, other "forensic" items anything I might not be thinking of... How are we best to proceed with this? What about cross-linking the sites?
Intermediate & Advanced SEO | | webdesignbarrie0 -
What is the best tool to crawl a site with millions of pages?
I want to crawl a site that has so many pages that Xenu and Screaming Frog keep crashing at some point after 200,000 pages. What tools will allow me to crawl a site with millions of pages without crashing?
Intermediate & Advanced SEO | | iCrossing_UK0 -
How to deal with category browsing and duplicate content
On an ecommerce site there are typically a lot of pages that may appear to be duplications due to category browse results where the only difference may be the sorting by price or number of products per page. How best to deal with this? Add nofollow to the sorting links? Set canonical values that ignore these variables? Set cononical values that match the category home page? Is this even a possible problem with Panda or spiders in general?
Intermediate & Advanced SEO | | IanTheScot0 -
Duplicate Content, Campaign Explorer & Rel Canonical
Google Advises to use Rel Canonical URL's to advise them which page with similiar information is more relevant. You are supposed to put a rel canonical on the non-preferred pages to point back to the desired page. How do you handle this with a product catalog using ajax, where the additional pages do not exist? An example would be: <colgroup><col width="470"></colgroup>
Intermediate & Advanced SEO | | eric_since1910.com
| .com/productcategory.aspx?page=1 /productcategory.aspx?page=2 /productcategory.aspx?page=3 /productcategory.aspx?page=4 The page=1,2,3 and 4 do not physically exist, they are simply referencing additional products I have rel canonical urls' on the main page www.examplesite.com/productcategory.aspx, but I am not 100% sure this is correct or how else it could be handled. Any Ideas Pro mozzers? |0