Finding Duplicate Content Spanning more than one Site?
-
Hi forum, SEOMoz's crawler identifies duplicate content within your own site, which is great. How can I compare my site to another site to see if they share "duplicate content?" Thanks!
-
The Alert thing is great! I use it when we write new content (along with CopyScape after a week or so) just so I can make sure I'm outranking it. lol
-
Yes. I totally agree with Darin. There isn't a duplicate content penalty, per se, and the tools he listed are quite good suggestions as well.
-
IMHO, even if the HTML is different you could have duplicate content if the H1 or paragraph text is substantially similar. However, is this automatically penalized? No. Syndication of content can be quite prevalent on the Web. For example the AP breaks a news story and posts it online and it is subsequently picked up by the New York Times and Wall Street Journal. Wherever the content appeared first, particularly if it has a canonical tag in place, that source will be credited with having the original content. The other sites aren't going to be penalized, but they aren't going to benefit from it either.
Similar things happen on large e-commerce sites all the time. For example, 100's of e-commerce stores sell lightbulbs. Those descriptions are most certainly "substantially similar." It'd be kind of strange if they weren't. They aren't penalized for that.
I hope this is helpful! It is always good to set up a Google Alert for any great pieces of content you do write, just so you can be aware of who might be copying your stuff! (Tynt.com can also be very useful for this).
Good luck!
Dana
-
Just for the record there isn't any "Duplicate Content Penalty" so don't worry to much about this. Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.
However, to answer your question I use copyscape to do this but you have to insert a URL and not just lines at a time.
Here are some other ones I've heard good things about:
I agree with Dana on the Google thing too. Like she said, "Just be sure to put quotes around your snippet."
-
This helps, thanks Dana. Is the actual paragraph content the main source of a duplicate content penalty? For example, what if the pages share different metadata and the HTML is entirely different except for the H1 text and paragraph content?
-
Hi Zora,
This best way to do this is to grab a random section of text from the page and go to Google, then paste that section of text in the search bar inside "quotes." For example, from your question above, I could search:
"SEOMoz's crawler identifies duplicate content within your own site, which is great. How can I compare my site"
you will see that the result in Google is a result to this page (once it's been indexed, which hasn't happened quite yet) - Just be sure to put quotes around your snippet.
Hope that helps!
Dana
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexed Site A's Content On Site B, Site C etc
Hi All, I have an issue where the content (pages and images) of Site A (www.ericreynolds.photography) are showing up in Google under different domains Site B (www.fastphonerepair.com), Site C (www.quarryhillvet.com), Site D (www.spacasey.com). I believe this happened because I installed an SSL cert on Site A but didn't have the default SSL domain set on the server. You were able to access Site B and any page from Site A and it would pull up properly. I have since fixed that SSL issue and am now doing a 301 redirect from Sites B, C and D to Site A for anything https since Sites B, C, D are not using an SSL cert. My question is, how can I trigger google to re-index all of the sites to remove the wrong listings in the index. I have a screen shot attached so you can see the issue clearer. I have resubmitted my site map but I'm not seeing much of a change in the index for my site. Any help on what I could do would be great. Thanks
Intermediate & Advanced SEO | | cwscontent
Eric TeVM49b.png qPtXvME.png1 -
Duplicate URLs on eCommerce site caused by parameters
Hi there, We have a client with a large eCommerce site with about 1500 duplicate URLs caused by the parameters in the URLs (such as the sort parameter where the list of products are then sorted by price, age etc.) Example: www.example.com/cars/toyota First duplicate URL: www.example.com/cars/toyota?sort=price-ascending Second duplicate URL: www.example.com/cars/toyota?sort=price-descending Third duplicate URL: www.example.com/cars/toyota?sort=age-descending Originally we had advised to add a robots.txt file to block search engines from crawling the URLs with parameters but this hasn't been done. My question: If we add the robots.txt now and exclude all URLs with filters - how long will it take for Google to disregard the duplicate URLs? We could ask the developers to add canonical tags to all the duplicates but these are about 1500... Thanks in advance for any advice!
Intermediate & Advanced SEO | | Gabriele_Layoutweb0 -
Cross Domain duplicate content...
Does anyone have any experience with this situation? We have 2 ecommerce websites that carry 90% of the same products, with mostly duplicate product descriptions across domains. We will be running some tests shortly. Question 1: If we deindex a group of product pages on Site A, should we see an increase in ranking for the same products on Site B? I know nothing is certain, just curious to hear your input. The same 2 domains have different niche authorities. One is healthcare products, the other is general merchandise. We've seen this because different products rank higher on 1 domain or the other. Both sites have the same Moz Domain Authority (42, go figure). We are strongly considering cross domain canonicals. Question 2 Does niche authority transfer with a cross domain canonical? In other words, for a particular product, will it rank the same on both domains regardless of which direction we canonical? Ex: Site A: Healthcare Products, Site B: General Merchandise. I have a health product that ranks #15 on site A, and #30 on site B. If I use rel=canonical for this product on site B pointing at the same product on Site A, will the ranking be the same if I use Rel=canonical from Site A to Site B? Again, best guess is fine. Question 3: These domains have similar category page structures, URLs, etc, but feature different products for a particular category. Since the pages are different, will cross domain canonicals be honored by Google?
Intermediate & Advanced SEO | | AMHC1 -
Duplicate blog content and NOINDEX
Suppose the "Home" page of your blog at www.example.com/domain/ displays your 10 most recent posts. Each post has its own permalink page (where you have comments/discussion, etc.). This obviously means that the last 10 posts show up as duplicates on your site. Is it good practice to use NOINDEX, FOLLOW on the blog root page (blog/) so that only one copy gets indexed? Thanks, Akira
Intermediate & Advanced SEO | | ahirai0 -
Coupon Website Has Tons of Duplicate Content, How do I fix it?
Ok, so I just got done running my campaign on SEOMOZ for a client of mine who owns a Coupon Magazine company. They upload thousands of ads into their website which gives similar looking duplicate content ... like http://coupon.com/mom-pop-shop/100 and
Intermediate & Advanced SEO | | Keith-Eneix
http://coupon.com/mom-pop-shop/101. There's about 3200 duplicates right now on the website like this. The client wants the coupon pages to be indexed and followed by search engines so how would I fix the duplicate content but still maintain search-ability of these coupon landing pages?0 -
ECommerce syndication & duplicate content
We have an eCommerce website with original software products. We want to syndicate our content to partner and affiliate websites, but are worried about the effect of duplicate content all over the web. Note that this is a relatively high profile project, where thousands of sites will be listing hundreds of our products, with the exact same name, description, tags, etc. We read the wonderful and relevant post by Kate Morris on this topic (here: http://mz.cm/nXho02) and we realize the duplicate content is never the best option. Some concrete questions we're trying to figure out: 1. Are we risking penalties of any sort? 2. We can potentially get tens of thousands of links from this concept, all with duplicate content around them, but from PR3-6 sites, some with lots of authority. What will affect our site more - the quantity of mediocre links (good) or the duplicate content around them (bad)? 3. Should we sacrifice SEO for a good business idea?
Intermediate & Advanced SEO | | erangalp0 -
Cross-Domain Canonical and duplicate content
Hi Mozfans! I'm working on seo for one of my new clients and it's a job site (i call the site: Site A).
Intermediate & Advanced SEO | | MaartenvandenBos
The thing is that the client has about 3 sites with the same Jobs on it. I'm pointing a duplicate content problem, only the thing is the jobs on the other sites must stay there. So the client doesn't want to remove them. There is a other (non ranking) reason why. Can i solve the duplicate content problem with a cross-domain canonical?
The client wants to rank well with the site i'm working on (Site A). Thanks! Rand did a whiteboard friday about Cross-Domain Canonical
http://www.seomoz.org/blog/cross-domain-canonical-the-new-301-whiteboard-friday0 -
Accepting RSS feeds. Does it = duplicate content?
Hi everyone, for a few years now I've allowed school clients to pipe their news RSS feed to their public accounts on my site. The result is a daily display of the most recent news happening on their campuses that my site visitors can browse. We don't republish the entire news item; just the headline, and the first 150 characters of their article along with a Read more link for folks to click if they want the full story over on the school's site. Each item has it's own permanent URL on my site. I'm wondering if this is a wise practice. Does this fall into the territory of duplicate content even though we're essentially providing a teaser for the school? What do you think?
Intermediate & Advanced SEO | | peterdbaron0