Is Google able to determine duplicate content every day/ month?
-
A while ago I talked to somebody who used to work for MSN a couple of years ago within their engineering department. We talked about a recent dip we had with one of our sites.We argued this could be caused by the large amount of duplicate content we have on this particular website (+80% of our site).
Then he said, quoted: "Google seems only to be able to determine every couple of months instead of every day if the content is actually duplicate content". I clearly don't doubt that duplicate content is a ranking factor. But I would like to know you guys opinions about Google being only able to determine this every couple of X months instead of everyday.
Have you seen or heard something similar?
-
Sorting out Google's timelines is tricky these days, because they aren't the same for every process and every site. In the early days, the "Google dance" happened about once a month, and that was the whole mess (index, algo updates, etc.). Over time, index updates have gotten a lot faster, and ranking and indexation are more real-time (especially since the "Caffeine" update), but that varies wildly across sites and pages.
I think you also have to separate a couple of impacts of duplicate content. When it comes to filtering - Google excluding a piece of duplicate content from rankings (but not necessarily penalizing the site), I don't see any evidence that this takes a couple of months. It can Google days or weeks to re-cache any given page, and to detect a duplicate they would have to re-cache both copies, so that may take a month in some cases, realistically. I strongly suspect, though, that the filter itself happens in real-time. There's no good way to store a filter for every scenario, and some filters are query-specific. Computationally, some filters almost have to happen on the fly.
On the other hand, you have updates like Panda, where duplicate content can cause something close to a penalty. Panda data was originally updated outside of the main algorithm, to the best of our knowledge, and probably about once/month. Over the more than a year since Panda 1.0 rolled out, though, it seems that this timeline accelerated. I don't think it's real-time, but it may be closer to 2 weeks (that's speculation, I admit).
So, the short answer is "It's complicated" I don't have any evidence to suggest that filtering duplicates takes Google months (and, actually, have anecdotal evidence that it can happen much faster). It is possible that it could take weeks or months to see the impact of duplicates on some sites and in some situations, though.
-
Hi Donnie,
Thanks for your reply, but I was already aware of the fact that Google had/ has a sandbox. I had to mention this within my question. I'm looking more for an answer around the fact if Google is able to determine on what basis if pages are duplicate.
Because I saw dozens of cases where our content was indexed and we linked/ linked not back to the 'original' source.
Also want to make clear that in all of these cases the duplicate content was in agreement with the original sources just to be sure.
-
In the past google had a sandbox period before any page (content) would rank. However, now everything is instant. (just learned this today @seomoz)
If you release something, Google will index it as fast as possible. If that info gets duplicated Google will only count the first one indexed. Everyone else loses brownie points unless they trackback/link back to the main article (first indexed).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it possible to import data from an old Google Analytics profile to a new Google Analytics profile?
We have encountered a situation where a client's old SEO firm is refusing to grant us Admin access to our client's existing GA account. For security purposes (so the other SEO firm doesn't delete the existing GA profile) we have started a new Google Analytics profile. Again we do have access to the data in the old account. Is it possible to migrate this old data over (if we just have user access)? Thanks for the help
Reporting & Analytics | | RosemaryB0 -
How long does Google Analytics credit a source?
I was under the impression that GA credits the source with a conversion if it is the source for the current session but it seems that if the user returns later that the source may still be credited. Is this the case? How long does GA credit a source?
Reporting & Analytics | | bearpaw1 -
Help w/ Google Event tracking w/ new Universal analytics.js
I want to implement Google event tracking after my visitors complete an estimate form. The form provides them an approximate range of costs based on their widget and widget amount. I have the php coding done, but I don't know JS and I need to send an event upon completion of the estimate form. However, all of Google's examples are button click events. How do I send the event upon completing a form if there is no separate landing page? If I have my Google analytics code on the page as well, would simply adding the following the completed estimate code work?
Reporting & Analytics | | TheDude0 -
Tracking in Google Analytics
My site has just recently (or maybe not so recently...) had a great deal of https URL's indexed (I was really only able to find this out thanks to the recent update to the GWT Index Status). It appears that Googlebot picked up an ssl somewhere (I already know where) on my site and then proceeded to crawl and index pages with https rather than http. Since I understand the issue, it should be an easy fix. My question is, does Google Analytics support (track) both http AND https for one site, or would I need to set up two different tracking codes for http and https? I figured that I might as well grab some data from the https pages that are indexed before I try and remove them. I've done a little research on using Groupings/Groups but I figured I would reach out to the MOZ community to see if anyone else has worked with a similar issue. Thanks!
Reporting & Analytics | | GalcoIndustrial0 -
Google Tag Manager Classes
Does anyone know of classes or people who teach Google Tag Manager and how to use it effectively?
Reporting & Analytics | | JQC0 -
Traffic Discrepancy between google and moz
I checked my traffic data this morning for this past week and there seems to be what I hope is a huge mistake. https://www.diigo.com/item/image/3vpdp/mckp My traffic seems to have dropped from 176 organic visits to 0 visits. There is no indication of this at all in Google analytics. What's going on?
Reporting & Analytics | | EcomLkwd0 -
Google Webmaster. Backlinks
GWMT only shows that there are 3 domains pointing to a site of mine. I'm looking under "Links to site". But this can't be true because the site is pretty old and I know there are hundreds of domains that point to this one. What would explain this discrepancy? And is there some other free tool that will show all the backlinks? I've used Opensite explorer but that tool isn't close to comprehensive as GWMT usually is (based on other sites I've analyzed)
Reporting & Analytics | | priceseo0 -
Google and bing search filed commands
Dose someone have / know a full list / resource with commands for google and bing ? Including filters for those commands ? (site:domain.com -filter etc) (like: site:domain.com, link:domain.com etc) I use the basic ones b ut I know there are much more and that there are several filters that can be used with success to filter down results. Thanks.
Reporting & Analytics | | eyepaq1