Finding Duplicate Content Spanning more than one Site?
-
Hi forum, SEOMoz's crawler identifies duplicate content within your own site, which is great. How can I compare my site to another site to see if they share "duplicate content?" Thanks!
-
The Alert thing is great! I use it when we write new content (along with CopyScape after a week or so) just so I can make sure I'm outranking it. lol
-
Yes. I totally agree with Darin. There isn't a duplicate content penalty, per se, and the tools he listed are quite good suggestions as well.
-
IMHO, even if the HTML is different you could have duplicate content if the H1 or paragraph text is substantially similar. However, is this automatically penalized? No. Syndication of content can be quite prevalent on the Web. For example the AP breaks a news story and posts it online and it is subsequently picked up by the New York Times and Wall Street Journal. Wherever the content appeared first, particularly if it has a canonical tag in place, that source will be credited with having the original content. The other sites aren't going to be penalized, but they aren't going to benefit from it either.
Similar things happen on large e-commerce sites all the time. For example, 100's of e-commerce stores sell lightbulbs. Those descriptions are most certainly "substantially similar." It'd be kind of strange if they weren't. They aren't penalized for that.
I hope this is helpful! It is always good to set up a Google Alert for any great pieces of content you do write, just so you can be aware of who might be copying your stuff! (Tynt.com can also be very useful for this).
Good luck!
Dana
-
Just for the record there isn't any "Duplicate Content Penalty" so don't worry to much about this. Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.
However, to answer your question I use copyscape to do this but you have to insert a URL and not just lines at a time.
Here are some other ones I've heard good things about:
I agree with Dana on the Google thing too. Like she said, "Just be sure to put quotes around your snippet."
-
This helps, thanks Dana. Is the actual paragraph content the main source of a duplicate content penalty? For example, what if the pages share different metadata and the HTML is entirely different except for the H1 text and paragraph content?
-
Hi Zora,
This best way to do this is to grab a random section of text from the page and go to Google, then paste that section of text in the search bar inside "quotes." For example, from your question above, I could search:
"SEOMoz's crawler identifies duplicate content within your own site, which is great. How can I compare my site"
you will see that the result in Google is a result to this page (once it's been indexed, which hasn't happened quite yet) - Just be sure to put quotes around your snippet.
Hope that helps!
Dana
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Benefit of getting more then one link from a site?
Hi Guys, Is there any benefit (purely for SEO - link building) to getting more than one link from a domain if you already have 1? From my understanding, even if you had 100 links from a domain, that really counts as 1 for link authority. So from this, i don't see much point in getting more then 1 link. Is this a right assumption? Cheers.
Intermediate & Advanced SEO | | wozniak650 -
Penalties for duplicate content
Hello!We have a website with various city tours and activities listed on a single page (http://vaiduokliai.lt/). The list changes accordingly depending on filtering (birthday in Vilnius, bachelor party in Kaunas, etc.). The URL doesn't change. Content changes dynamically. We need to make URL visible for each category, then optimize it for different keywords (for example city tours in Vilnius for a list of tours and activities in Vilnius with appropriate URL /tours-in-Vilnius).The problem is that activities overlap very often in different categories, so there will be a lot of duplicate content on different pages. In such case, how severe penalty could be for duplicate content?
Intermediate & Advanced SEO | | jpuzakov0 -
Best practice with duplicate content. Cd
Our website has recently been updated, now it seems that all of our products pages look like this cdnorigin.companyname.com/catagory/product Google is showing these pages within the search. rather then companyname.com/catagory/product Each product page does have a canaonacal tag on that points to the cdnorigin page. Is this best practice? i dont think that cdnorigin.companyname etc looks very goon in the search. is there any reason why my designer would set the canonical tags up this way?
Intermediate & Advanced SEO | | Alexogilvie0 -
Duplicate content
I run about 10 sites and most of them seemed to fall foul of the penguin update and even though I have never sought inorganic links I have been frantically searching for a link based answer since April. However since asking a question here I have been pointed in another direction by one of your contributors. It seems At least 6 of my sites have duplicate content issues. If you search Google for "We have selected nearly 200 pictures of short haircuts and hair styles in 16 galleries" which is the first bit of text from the site short-hairstyles.com about 30000 results appear. I don't know where they're from nor why anyone would want to do this. I presume its automated since there is so much of it. I have decided to redo the content. So I guess (hope) at some point in the future the duplicate nature will be flushed from Google's index? But how do I prevent it happening again? It's impractical to redo the content every month or so. For example if you search for "This facility is written in Flash® to use it you need to have Flash® installed." from another of my sites that I coincidently uploaded a new page to a couple of days ago, only the duplicate content shows up not my original site. So whoever is doing this is finding new stuff on my site and getting it indexed on google before even google sees it on my site! Thanks, Ian
Intermediate & Advanced SEO | | jwdl0 -
How to Remove Joomla Canonical and Duplicate Page Content
I've attempted to follow advice from the Q&A section. Currently on the site www.cherrycreekspine.com, I've edited the .htaccess file to help with 301s - all pages redirect to www.cherrycreekspine.com. Secondly, I'd added the canonical statement in the header of the web pages. I have cut the Duplicate Page Content in half ... now I have a remaining 40 pages to fix up. This is my practice site to try and understand what SEOmoz can do for me. I've looked at some of your videos on Youtube ... I feel like I'm scrambling around to the Q&A and the internet to understand this product. I'm reading the beginners guide.... any other resources would be helpful.
Intermediate & Advanced SEO | | deskstudio0 -
Blog Duplicate Content
Hi, I have a blog, and like most blogs I have various search options (subject matter, author, archive, etc) which produce the same content via different URLs. Should I implement the rel-canonical tag AND the meta robots tag (noindex, follow) on every page of duplicate blog content, or simply choose one or the other? What's best practice? Thanks Mozzers! Luke
Intermediate & Advanced SEO | | McTaggart0 -
Multiple sitemaps for one site?
Excuse my sitemap ignorance here. I've got a site and it's got a blog in a sub-folder. The blog gets updated frequently, the main site does not. Is it best to; a) Have 2 sitemaps.. one in the root and one in the /blog folder. b) Have 1 sitemap that is regularly updated The reason being, I know there's various plugins that create blog sitemaps on the fly, so that would be much easier than updating the main sitemap every time a change was made. If the answer is 2 sitemaps; Would you stop the root sitemap from detailing the contents of the blog folder or just update it every so often with the contents of the blog folder?
Intermediate & Advanced SEO | | PeterAlexLeigh0 -
Removing Duplicate Content Issues in an Ecommerce Store
Hi All OK i have an ecommerce store and there is a load of duplicate content which is pretty much the norm with ecommerce store setups e.g. this is my problem http://www.mystoreexample.com/product1.html
Intermediate & Advanced SEO | | ChriSEOcouk
http://www.mystoreexample.com/brandname/product1.html
http://www.mystoreexample.com/appliancetype/product1.html
http://www.mystoreexample.com/brandname/appliancetype/product1.html
http://www.mystoreexample.com/appliancetype/brandname/product1.html so all the above lead to the same product
I also want to keep the breadcrumb path to the product Here's my plan Add a canonical URL to the product page
e.g. http://www.mystoreexample.com/product1.html
This way i have a short product URL Noindex all duplicate pages but do follow the internal links so the pages are spidered What are the other options available and recommended? Does that make sense?
Is this what most people are doing to remove duplicate content pages? thanks 🙂0