Is Google able to determine duplicate content every day/ month?
-
A while ago I talked to somebody who used to work for MSN a couple of years ago within their engineering department. We talked about a recent dip we had with one of our sites.We argued this could be caused by the large amount of duplicate content we have on this particular website (+80% of our site).
Then he said, quoted: "Google seems only to be able to determine every couple of months instead of every day if the content is actually duplicate content". I clearly don't doubt that duplicate content is a ranking factor. But I would like to know you guys opinions about Google being only able to determine this every couple of X months instead of everyday.
Have you seen or heard something similar?
-
Sorting out Google's timelines is tricky these days, because they aren't the same for every process and every site. In the early days, the "Google dance" happened about once a month, and that was the whole mess (index, algo updates, etc.). Over time, index updates have gotten a lot faster, and ranking and indexation are more real-time (especially since the "Caffeine" update), but that varies wildly across sites and pages.
I think you also have to separate a couple of impacts of duplicate content. When it comes to filtering - Google excluding a piece of duplicate content from rankings (but not necessarily penalizing the site), I don't see any evidence that this takes a couple of months. It can Google days or weeks to re-cache any given page, and to detect a duplicate they would have to re-cache both copies, so that may take a month in some cases, realistically. I strongly suspect, though, that the filter itself happens in real-time. There's no good way to store a filter for every scenario, and some filters are query-specific. Computationally, some filters almost have to happen on the fly.
On the other hand, you have updates like Panda, where duplicate content can cause something close to a penalty. Panda data was originally updated outside of the main algorithm, to the best of our knowledge, and probably about once/month. Over the more than a year since Panda 1.0 rolled out, though, it seems that this timeline accelerated. I don't think it's real-time, but it may be closer to 2 weeks (that's speculation, I admit).
So, the short answer is "It's complicated" I don't have any evidence to suggest that filtering duplicates takes Google months (and, actually, have anecdotal evidence that it can happen much faster). It is possible that it could take weeks or months to see the impact of duplicates on some sites and in some situations, though.
-
Hi Donnie,
Thanks for your reply, but I was already aware of the fact that Google had/ has a sandbox. I had to mention this within my question. I'm looking more for an answer around the fact if Google is able to determine on what basis if pages are duplicate.
Because I saw dozens of cases where our content was indexed and we linked/ linked not back to the 'original' source.
Also want to make clear that in all of these cases the duplicate content was in agreement with the original sources just to be sure.
-
In the past google had a sandbox period before any page (content) would rank. However, now everything is instant. (just learned this today @seomoz)
If you release something, Google will index it as fast as possible. If that info gets duplicated Google will only count the first one indexed. Everyone else loses brownie points unless they trackback/link back to the main article (first indexed).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it possible to have too much content?
Is it possible to have too much content? If so, how do we prove (or at least get evidence) one way or the other to whether we are being adversely affected in the SERPs? The only way we could think of is to publish a lot less for a week or two, but nobody is willing to risk it. Numbers wise we are publishing an average of 125 articles a day, with an archive of around 300k fwiw thanks 🙂
Reporting & Analytics | | Fammy0 -
Weird Google Analytics tracking question
I have a client that has a market place site, where people list goods and sell them, think something like Etsy. Instead of developing a system to show the users page views and things like that, does it sound reasonable to let them enter a Google Analytics property on the pages they list on, then let them monitor through GA? Does anyone see any fatal flaws in this thinking?
Reporting & Analytics | | LesleyPaone0 -
Why did my home page fall off of google rankings?
My home page at www.smt-associates.com has been ranked well for various key word phrases for years. I've tried to optimize it for the search "Crystal Lake CPA Firm" and it always had ranked number 1-2. Now it doesn't even rank in the top 5 pages (actually I don't know which page it falls on). I did an on-page report card and it has an A rating. So, what is preventing Google from ranking my home page on page 1? There's not that much competition so this should be an easy ranking for me. I don't know how ling this has not been listed, but I did modify my site about 12-18 months ago with a new WP theme. Could the theme be the problem?
Reporting & Analytics | | smtcpa0 -
Are Panda/Penguin Penalties not Global but only fired for specific Google CCTLDs?
An international portfolio of web sites has been suffering from Panda and Penguin over the last months. Essentially many of the international sites completely disappeared from the top 50 when before we had many Top 5 positions for competitive key words. Today we noticed that it appears that for all this time rankings for local keywords have not been affected for domains outside of the most relevant Google CCTLD for the page. To give an example: Domain.it / Keyword 1 (IT) = Italian Competitive Keyword Google.it: Keyword 1 (IT) April 2012 Position 4
Reporting & Analytics | | sp80
Google.it: Keyword 1 (IT) December 2012 Position --- Google.com: Keyword 1 (IT) April 2012 Position 1
Google.com: Keyword 1 (IT) December 2012 Position 1 Google.de: Keyword 1 (IT) April 2012 Position 3
Google.de: Keyword 1 (IT) December 2012 Position 4 Have other people observed such behavior? Does this give any pointers towards how the recovery strategies should be drafted? We have experienced that the more search volume a keywords has received the harder the Panda/Penguin impact. So one hypothesis could be that because the international Google domains do not receive overly significant traffic for the localized keywords Panda/Penguin protection algorithms are not being applied with the same force. Any thoughts are welcome. /Thomas1 -
Google Webmaster Tools not displaying all backlinks
I was asked to take a look at a site that a friend redesigned and immediately plummeted in rankings. It was a flash site where the domain redirected to a flash landing page. When the site was redesigned, they removed the flash page and put in the new site on the domain URL without changing the redirect or putting a 301 in place on the old flash landing page URL. This site didn't have many backlinks to begin with, but it did ok for some important search phrases. Now when I look in Webmaster Tools, there are only 2 backlinks showing. Meanwhile I know there are more backlinks in existence, primarily from yellowpages and citysearch. I've had them put a 301 on the old flash landing page pointing at the domain URL and they've added a canonical URL. What have I missed here?
Reporting & Analytics | | BostonWright0 -
How can you tell if your new content has been indexed?
Other than simply doing a search in each case, is there any way I can tell (in Webmaster Tools, for example) if the 500-1000 new pages of content I have added have been indexed and are now appearing in search results? My traffic hasn't risen much, but I know at least a few of them are in there... How can I tell when they're all in?
Reporting & Analytics | | corp08030 -
Google Analytics Goal Tracking Head Match w/ Query Strings
Hello, I have what should be a simple question here but there is a small nuisance I am trying to make sure I have configured correctly. We have a product based website w/ no e-commerce because they sell through a dealer network. All these product pages have "Where to Buy" links and the URL after you click where to buy always uses the query string ?r=XXX. Example: www.mysite.com/product/category/subcategory/product-name?r=12345 I want to setup a goal in GA with a URL and configure head match on the "?r" but which of the following is exactly how it should be configured with the "Goal URL" ?r= ?r r= Does it matter, because I had it setup as "?r" and it was never registering any goals. Do I need to leave off the "?" and just have it be r= Thanks in advance for the respones.
Reporting & Analytics | | Bevelwise0 -
Google Analytics session update question
Hello, With reference to seomoz blog post - http://www.seomoz.org/blog/panda-24-and-analytics-session-update-rolled-out-simultaneously#jtc151292 , i would like clarification about the following - User searches Google for "Product Name" and clicks on your AdWords advertisement. User leaves site and searches a few more times, click on competition and comparing prices and features. User ultimately decides to with your product, Googles "Your Brand + Product Name", clicks your organic listing, and buys the product. This whole process takes less than 30 minutes. "Your Brand + Product Name" will appear in your organic keyword report with 1 visit. My question is whether "Product Name" will also appear in organic keyword report with 1 visit if the visitor is not signed in. ( as search won't be encrypted ) Thanks
Reporting & Analytics | | seoug_20100