Is Google able to determine duplicate content every day/ month?
-
A while ago I talked to somebody who used to work for MSN a couple of years ago within their engineering department. We talked about a recent dip we had with one of our sites.We argued this could be caused by the large amount of duplicate content we have on this particular website (+80% of our site).
Then he said, quoted: "Google seems only to be able to determine every couple of months instead of every day if the content is actually duplicate content". I clearly don't doubt that duplicate content is a ranking factor. But I would like to know you guys opinions about Google being only able to determine this every couple of X months instead of everyday.
Have you seen or heard something similar?
-
Sorting out Google's timelines is tricky these days, because they aren't the same for every process and every site. In the early days, the "Google dance" happened about once a month, and that was the whole mess (index, algo updates, etc.). Over time, index updates have gotten a lot faster, and ranking and indexation are more real-time (especially since the "Caffeine" update), but that varies wildly across sites and pages.
I think you also have to separate a couple of impacts of duplicate content. When it comes to filtering - Google excluding a piece of duplicate content from rankings (but not necessarily penalizing the site), I don't see any evidence that this takes a couple of months. It can Google days or weeks to re-cache any given page, and to detect a duplicate they would have to re-cache both copies, so that may take a month in some cases, realistically. I strongly suspect, though, that the filter itself happens in real-time. There's no good way to store a filter for every scenario, and some filters are query-specific. Computationally, some filters almost have to happen on the fly.
On the other hand, you have updates like Panda, where duplicate content can cause something close to a penalty. Panda data was originally updated outside of the main algorithm, to the best of our knowledge, and probably about once/month. Over the more than a year since Panda 1.0 rolled out, though, it seems that this timeline accelerated. I don't think it's real-time, but it may be closer to 2 weeks (that's speculation, I admit).
So, the short answer is "It's complicated" I don't have any evidence to suggest that filtering duplicates takes Google months (and, actually, have anecdotal evidence that it can happen much faster). It is possible that it could take weeks or months to see the impact of duplicates on some sites and in some situations, though.
-
Hi Donnie,
Thanks for your reply, but I was already aware of the fact that Google had/ has a sandbox. I had to mention this within my question. I'm looking more for an answer around the fact if Google is able to determine on what basis if pages are duplicate.
Because I saw dozens of cases where our content was indexed and we linked/ linked not back to the 'original' source.
Also want to make clear that in all of these cases the duplicate content was in agreement with the original sources just to be sure.
-
In the past google had a sandbox period before any page (content) would rank. However, now everything is instant. (just learned this today @seomoz)
If you release something, Google will index it as fast as possible. If that info gets duplicated Google will only count the first one indexed. Everyone else loses brownie points unless they trackback/link back to the main article (first indexed).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How long does Google Analytics credit a source?
I was under the impression that GA credits the source with a conversion if it is the source for the current session but it seems that if the user returns later that the source may still be credited. Is this the case? How long does GA credit a source?
Reporting & Analytics | | bearpaw1 -
Google Analytics: Different stats for date range vs single month?
I've been scratching my head, chin, and you name it over this one. I have an advanced segment to remove bot traffic from my data. When I look at the Audience Overview data for a single month (let's say Aug). I am shown a session count. No problems here, however If I set the date range to (January - August). The august monthly stats is incorrect, much lower. What this means is that, if I export a CSV report from Jan-Aug, the data is wrong compared to individually recording a month. Anyone faced this? I've asked the question over at the Google Analytics technical section as well, but no answer P.S I even used the 'control the number of sessions used to calculate this report' tool but no luck.
Reporting & Analytics | | Bio-RadAbs0 -
Google Webmaster. Backlinks
GWMT only shows that there are 3 domains pointing to a site of mine. I'm looking under "Links to site". But this can't be true because the site is pretty old and I know there are hundreds of domains that point to this one. What would explain this discrepancy? And is there some other free tool that will show all the backlinks? I've used Opensite explorer but that tool isn't close to comprehensive as GWMT usually is (based on other sites I've analyzed)
Reporting & Analytics | | priceseo0 -
Google Analytics Goals are not recording
Hello! I've set up Goals within Google Analytics and they aren't registering. I'm wondering if someone can review what I've set up to see if I've entered something wrong. I'm trying to track enrollments. Here's the site info: Enroll URL: http://www.careplusdentalplans.com/individuals/enroll-careplus/ Upon completion, the user goes to a Thank You page and our CMS generates a trailing URL parameter. For example: http://www.careplusdentalplans.com/index.php/individuals/enroll-careplus-thank-you/#123456789 My Goal info in GA: Goal URL: /individuals/enroll-careplus-thank-you/
Reporting & Analytics | | SmileMoreSEO
Match Type: Head Match (to disregard trailing URL parameter)
Goal Funnel: Step 1: /individuals/enroll-careplus/ I'm curious if the problem lies in either: The URL: Notice how the URL adds /index.php/ on the thank you page. In that case, should I enter the full URL in the Goal instead of the truncated URL? The Funnel: Is it necessary to show the initial Enroll page or if I'll see that in the Flow Visualization regardless? Thanks in advance for helping me out. Erik0 -
2 days in the past week Google has crawled 10x the average pages crawled per day. What does this mean?
For the past 3 months my site www.dlawlesshardware.com has had an average of about 400 pages crawled per day by google. We have just over 6,000 indexed pages. However, twice in the last week, Google crawled an enormous percentage of my site. After averaging 400 pages crawled for the last 3 months, the last 4 days of crawl stats say the following. 2/1 - 4,373 pages crawled 2/2 - 367 pages crawled 2/3 - 4,777 pages crawled 2/4 - 437 pages crawled What is the deal with these enormous spike in pages crawled per day? Of course, there are also corresponding spikes in kilobytes downloaded per day. Essentially, Google averages crawling about 6% of my site a day. But twice in the last week, Google decided to crawl just under 80% of my site. Has this happened to anyone else? Any ideas? I have literally no idea what this means and I haven't found anyone else with the same problem. Only people complaining about massive DROPS in pages crawled per day. Here is a screenshot from Webmaster Tools: http://imgur.com/kpnQ8EP The drop in time spent downloading a page corresponded exactly to an improvement in our CSS. So that probably doesn't need to be considered, although I'm up for any theories from anyone about anything.
Reporting & Analytics | | dellcos0 -
How serious are the Duplicate page content and Tags error?
I have a travel booking website which reserves flights, cars, hotels, vacation packages and Cruises. I encounter a huge number of Duplicate Page Title and Content error. This is expected because of the nature of my website. Say if you look for flights between Washington DC and London Heathrow you will at least get 60 different options with same content and title tags. How can I go about reducing the harm if any of duplicate content and meta tags on my website? Knowing that invariably I will have multiple pages with same content and tags? Would appreciate your advice? S.H
Reporting & Analytics | | sherohass0 -
Setting up Google Analytics default URL
If someone has set: the default url in Google Analytics to a non-www address (http://mysite.com) then placed the UA tracking script from that GA account within the CMS framework of the website... ... and then set the permanent 301 redirect in the htaccess file to redirect to the www address (http://www.mysite.com). How less accurrate will my GA analytics measurements be considering the default url within GA is non-www and the permanent 301 redirect in htacess is to the www-address? Anyone know how reliable GA reports are until the default url in GA analytics is changed to match what is the redirected url in htaccess file? _Cindy
Reporting & Analytics | | CeCeBar0 -
Weird drop in ranks... google.fr
Hello folks, Before our domain: <cite>www.convertisseurvideo.net</cite> Keyword: convertisseur video ( convert your video ) is at the first(page) google.fr for this keyword search. Right now, its at second page, really dropped. And our CTR = 1% . How can we improve the CTR? Any toughts ? Thanks. ss.JPG
Reporting & Analytics | | augustos0