Duplicate Content - Bulk analysis tool?
-
Hi
I wondered if there's a tool to analyse duplicate content - within your own site or on external sites, but that you can upload the URL's you want to check in bulk?
I used Copyscape a while ago, but don't remember this having a bulk feature?
Thank you!
-
Great thank you!
I'll give both a go!
-
Great thanks
Yes I use screaming frog for this, but it was to look at actual page content. So yes to see if sites copy our content, but also to see whether we need to update our product content as some products are very similar.
I'll check the batch process on copyscape thanks!
-
I have not used this tool in this way, but have used it for other crawler projects related to content clean up and it is rock solid. They have been very responsive to me on questions related to use of the software. http://urlprofiler.com/
Duplicate content search is the project next on my list, here is how they do it.
http://urlprofiler.com/blog/duplicate-content-checker/
You let URL profiler crawl the section of your site that is most likely to be copied (say your blog) and you tell URL profiler what section of your HTML to compare against (i.e. the content section vs the header or footer). URL profiler then uses proxies (you have to buy the proxies) to perform Google searches on sentences from your content. It crawls those results to see if there is a site in the Google SERPs that has sentences from your content word for word (or pretty close).
I have played with Copyscape, but my markets are too niche for it to work for me. The logic here from URL profilers is that you are searching the database that most matters, Google.
Good luck!
-
I believe you might be able to use List Mode in ScreamingFrog to accomplish this, however it depends on ultimately what your goal is to check for duplicate content. Do you simply want to find duplicate titles or duplicate descriptions? Or do you want to find pages with sufficiently similar text as to warrant concern?
== Ooops! ==
It didn't occur to me that you were more interested in duplicate content caused by other sites copying your content rather than duplicate content among your list of URLs.
Copyscape does have a "Batch Process" tool but it is only available to paid subscribers. It does work quite nicely though.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate ecommerce domains and canonical
Hi everybody! I'd like to discuss the SEO strategy I've thought regarding a client of mine and ask for help because it's a serious case of duplicate content. There is a main website (the business model one) where he compares the cost of medicines in several pharmacies, to show the cheapest shopping cart to the customer. But the shopping has to been made in another domain, within the selected pharmacie, because my country's law in Europe says that is compulsory to sell the medicines only on the pharmacy website. So my client has started to create domains, one for each pharmacy, where the differences between them are only some products, the business information of the pharmacy and the template's colour. But all of them shares the same product data base. My aim is to rank the comparing website (it contains all the products), not each pharmacy, so I've started to create different content for this one. Should I place rel=canonical in the pharmacies domains t the original one? For instance: www.pharmacie1.com >> www.originaltorank.com www.pharmacie2.com >> www.originaltorank.com www.pharmacie1.com/product-10 >> www.originaltorank.com/product-10 I've already discuss the possibilities to focus all the content in only one website, but it's compulsory to have different domains in order to sell medicines By the way, I can't redirect 301 because I need these websites exist for the same reason (the law) He is creating 1-3 new domains every week so obviously he has had a drop in his SEO traffic that I have to solve this fast. Do you think the canonical will be the best solution? I dont want to noindex these domains beacuse we're creating Google Local pages for each one in order to be found in their villages. Please, I'll appreciate any piece of advice. Thanks!
On-Page Optimization | | Estherpuntu0 -
Where to add new content
I run a vBulletin website and vBulletin isnt very SEO friendly. I do fairly well in Google for most of my keywords, but forums dont necessarily build strong page authority etc. My site deals with fishing reports across the state of VA and drives 15-18k sessions a month and close to 100,000 page views a month based on Google Analytics. I want to start targeting new keywords and I am concerned about vBulletin inability to be SEO friendly. Many of my new keywords arent dynamic like fishing reports that are added by members daily. These are more like campgrounds, marinas etc. My thought is to install a Wordpress blog and build out this content so I can efficiently deal with on page SEO. the vBulletin software is installed in the root so I would install wordpress in something like mydomain/lake123/ Is the right thing to do, and will google see multiple sitemaps (one for vbulletin and another for wordpress) and index appropriately? Am I missing something major here? Thanks ~ Brian
On-Page Optimization | | FCBCO0 -
Duplicate content penalty
when moz crawls my site they say I have 2x the pages that I really have & they say I am being penalized for duplicate content. I know years ago I had my old domain resolve over to my new domain. Its the only thing that makes sense as to the duplicate content but would search engines really penalize me for that? It is technically only on 1 site. My business took a significant sales hit starting early July 2013, I know google did and algorithm update that did have SEO aspects. I need to resolve the problem so I can stay in business
On-Page Optimization | | cheaptubes0 -
SEO Content Revolution Question
I was wondering if articles written about questions people are asking will help my website rank better. For example let's say I wrote an article answering the query, "What Hair Dye Does Angela Merkel Use?" or, "Is Hillary Clinton Thinking of Running for President," and they rank well on google, and in turn they get viewed a lot by searchers because it answers their queries. Would this help my website as whole start ranking better? Thanks!
On-Page Optimization | | OOMDODigital0 -
Is there a tool that will "grade" content?
Does anybody know of a tool that can "grade" content for Panda compliance. For example, it might look at: • the total number of words on the page • the average number of words in sentences • grammar • spelling • repetitious words and/or phrases • Readability—using algorithms such as: Flesch Kincaid Reading Ease Flesch Kincaid Grade Level Gunning Fog Score Coleman Liau Index Automated Readability Index (ARI) For the last 5 months I've been writing and rewriting literally 100s of catalog descriptions—adhering to the "no duplicate content" and "adding value" rubrics—but in an extremely informal style. I would like to know if I'm at least meeting Google Panda's minimum standards.
On-Page Optimization | | RScime250 -
Creating Duplicate Content on Shopping Sites
I have a client with an eCommerce site that is interested in adding their products to shopping sites. If we use the same information that is on the site currently, will we run into duplicate content issues when those same products & descriptions are published on shopping sites? Is it best practice to rewrite the product title and descriptions for shopping sites to avoid duplicate content issues?
On-Page Optimization | | mj7750 -
Duplicate content? Not sure.
Good news! I have my first real SEO gig and now I have to be able to actually deliver. I'm up for it but I want to be sure I'm seeing what I think I am before suggesting any changes. I'm working my way throught Danny Dover's excellent book SEO Secrets and learning tons! To see if there is duplicate content on the site, I've taken a sentence from one of the pages on the site and searched for it: i.e., site:storybooksforhealing.com "Some of the most quiet moments are often the most difficult after a loss. Mornings, late nights, time alone." The SERPs show 7 pages that have this text on it. It seems like this is duplicate content, right? This is a Wordpress website so what's happening is the actual page is here: www.storybooksforhealing.com/publish-cup-of-joy/ but there are several archive pages that show excerpts of this text, too. If this is duplicate content (first question) then how would I go about remedying it? Should I set the canonical reference to /publish-cup-of-joy page? Thank you for being patient with my NOOB questions.
On-Page Optimization | | ChristiMc0 -
How much constitutes duplicate content in your opinion?
Mornin' In your experience, how much constitutes duplicate content? A sentence, a paragraph, half a page, etc? What about quotes - are they considered duplications, too, if there aren't quotation marks? Over the years, the client has been a bit bad in taking a paragraph from here, a sentence from there, and coupling it all together as daily news on their site. I'm now in the middle of a purge. Oh boy! All hail originality.
On-Page Optimization | | Martin_S0