Is it possible that Google may have erroneous indexing dates?
-
I am consulting someone for a problem related to copied content. Both sites in question are WordPress (self hosted) sites. The "good" site publishes a post. The "bad" site copies the post (without even removing all internal links to the "good" site) a few days after.
On both websites it is obvious the publishing date of the posts, and it is clear that the "bad" site publishes the posts days later. The content thief doesn't even bother to fake the publishing date.
The owner of the "good" site wants to have all the proofs needed before acting against the content thief. So I suggested him to also check in Google the dates the various pages were indexed using Search Tools -> Custom Range in order to have the indexing date displayed next to the search results.
For all of the copied pages the indexing dates also prove the "bad" site published the content days after the "good" site, but there are 2 exceptions for the very 2 first posts copied.
First post:
On the "good" website it was published on 30 January 2013
On the "bad" website it was published on 26 February 2013
In Google search both show up indexed on 30 January 2013!Second post:
On the "good" website it was published on 20 March 2013
On the "bad" website it was published on 10 May 2013
In Google search both show up indexed on 20 March 2013!Is it possible to be an error in the date shown in Google search results?
I also asked for help on Google Webmaster forums but there the discussion shifted to "who copied the content" and "file a DMCA complain". So I want to be sure my question is better understood here.
It is not about who published the content first or how to take down the copied content, I am just asking if anybody else noticed this strange thing with Google indexing dates.How is it possible for Google search results to display an indexing date previous to the date the article copy was published and exactly the same date that the original article was published and indexed?
-
Thanks Doug. Really an eye-opener.
-
Thanks Doug for your response. It really cleared up the questions I had about that date Google shows next to the search results.
I was not able to find official details about it, all I was able to find was different referencing as the indexing date of a page.
But I knoew here in the MOZ community there are people who really know things, that's why I asked.
So that date is just Google's estimation of the publishing date, not the date Google indexed the content!
Thanks again for taking the time to answer me!
-
Hiya Sorina,
When you use the custom date range, Google isn't listing results based on the date they were indexed. Google is using an estimated publication date.
Google tries to estimate the the publication date based on meta-data and other features of the page such as dates in the content, title and URL. The date Google first indexed the page is just one of the things that Google can use to estimate the publication date.
I also suspect that dates in any sitemap.xml files will also be taken into consideration.
But, given that even Google can't guarantee that it'll crawl and index articles on the day they've been published the crawl data may not be an accurate estimate.
Also, if the scraped content is being re-published with intact internal links (are these the full URL - do you they resolve to your original website?) then it's pretty obvious where the content came from.
Hope this help answer your question.
-
Hi Sorina,
I can tell you that the index dates shown by Google are accurate but is not the case with the Cache date sometimes as the date shown in the Cache and the copy shown in the cache don't match many times but the index dates are accurate. Send me a private message with the actual URLs under discussion and I will be able to comment with more clarity.
Best,
Devanur Rafi
-
Thank you for your response Devanur Rafi, but the "good" site doesn't have problems getting indexed.
Actually all posts on the "good" site are indexed the very same day they are published.My question was more about the indexing date shown in Google search results
How come, for a post from the "bad" site, Google is displaying an indexing date previous to the actual date the post was published on that site?!
And how come this date is exactly the same as the date Google says it indexed the post from the "good" site?
-
Hi Sorina,
This is a common thing and it all depends on a site's crawlability (how easy is it to crawl for the bot) and crawl frequency for that site. Google would have picked up that post first on the bad site and then from the good site. However, just because one or two posts were picked up late does not mean that the good site is not crawler friendly. It also depends on how far the resource is from the root. Let us take an example:
A page on a good site: abc.com/folder1/folder2/folder3/page.html
Now a bad site copies that page: xyz.com/page.html
In this case, Google might first pickup the copied page from the bad site as it is just a click away from the root which is not the case with the good site where the page is nested deep inside multiple folders.
You can also give the way back machine (archive.org) a try to find which website published the post first. Sometimes this might work out pretty well. You can also try to look at the cache dates of the posts on both the sites in Google to get some info in this regard.
Hope those help. I wish you good luck.
Best,
Devanur Rafi.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google asking questions in SERPs
I just did s search for Hayley Kiyoko, and Google asked me which song is my favourite from her new album. Is this a new thing? I've never asked Google a question before and had it ask me something back, other than "did you mean... (the correct spelling for what I was looking for)?" u6qYnwq.png
Algorithm Updates | | 4RS_John1 -
Google Mobile Algorithm update
Hi there, On April the 21st Google seems to going to update their Mobile algorithm. I have a few questions about this one. Our current mobile website is very mobile friendly. We block all mobile pages with a noindex, so the desktop pages have been indexed on mobile devices. We use a redirect from desktop page to mobile page when someone hits a result on a mobile device. My gut tells me this is not April 21st-proof so I'm thinking about an update to make this whole thing adaptive. By making the thing adaptive, our mobile pages will be indexed instead of the desktop pages. Two questions: Will Google treat the mobile page as a 100% different page than the desktop page? Or will it match those two because everything will tell Google those belong together. In other words: will the mobile page start with a zero authority and will pages lose good organic positions because of authority or not? Which ranking factor will be stronger after April 21st for mobile pages: page authority or mobile friendliness? In other words: is it worth ignoring the 21 April update because the authority of the desktop pages is more important than making every page super mobile friendly? Hope to get some good advice! Marcel
Algorithm Updates | | MarcelMoz0 -
Drop in Page Indexing, Small rise in Search Queries
Hello, I have a news based website so i am creating multiple new posts daily. I changed a lot of the site and got rid of old potentially duplicate content back in feb and had a sharp drop in pages indexed. I know this was because I removed a lot of pages though. However I still have a good 20,000 + pages on my site and my indexing has dropped a further three times since then. From 9,000 to 2,000 a coupe of months ago and then slowly down since April to just 133. It doesn't seem to have affected my search queries yet but surely will if it continues. I am really confused as to how this might happen & how to turn it around. We dont use any dodgy SEO tricks either.
Algorithm Updates | | luwhosjack0 -
Does a KML file have to be indexed by Google?
I'm currently using the Yoast Local SEO plugin for WordPress to generate my KML file which is linked to from the GeoSitemap. Check it out http://www.holycitycatering.com/sitemap_index.xml. A competitor of mine just told me that this isn't correct and that the link to the KML should be a downloadable file that's indexed in Google. This is the opposite of what Yoast is saying... "He's wrong. 🙂 And the KML isn't a file, it's being rendered. You wouldn't want it to be indexed anyway, you just want Google to find the information in there. What is the best way to create a KML? Should it be indexed?
Algorithm Updates | | projectassistant1 -
How to speed up indexing of my site...
Only 4 out of the 12 pages of my blog/site have been indexed. How can I ensure all the pages get indexed? I'm using a wordpress site, and I also wondered how could I speed the indexing process up (I have submitted a site map) Thanks!
Algorithm Updates | | copywritingbuzz0 -
Has Google problems in indexing pages that use <base href=""> the last days?
Since a couple of days I have the problem, that Google Webmaster tools are showing a lot more 404 Errors than normal. If I go thru the list I find very strange URLs that look like two paths put together. For example: http://www.domain.de/languages/languageschools/havanna/languages/languageschools/london/london.htm If I check on which page Google found that path it is showing me the following URL: http://www.domain.de/languages/languageschools/havanna/spanishcourse.htm If I check the source code of the Page for the Link leading to the London Page it looks like the following: [...](languages/languageschools/london/london.htm) So to me it looks like Google is ignoring the <base href="..."> and putting the path together as following: Part 1) http://www.domain.de/laguages/languageschools/havanna/ instead of base href Part 2) languages/languageschools/london/london.htm Result is the wrong path! http://www.domain.de/languages/languageschools/havanna/languages/languageschools/london/london.htm I know finding a solution is not difficult, I can use absolute paths instead of relative ones. But: - Does anyone make the same experience? - Do you know other reasons which could cause such a problem? P.s.: I am quite sure that the CMS (Typo3) is not generating these paths randomly. I would like to be sure before we change the CMS's Settings to absolute paths!
Algorithm Updates | | SimCaffe0 -
Google above the fold update
Hi everyone, Ever since the Jan 19th Google 'above the fold update' I have noticed some strange ranking changes in some of my sites. 1. rankings increased dramatically (not in top 50 to page 2) on Jan 19th for about 5 days then dropped out completely from the top 50. 2. our rankings then did the same thing again around Feb 2nd for about 5 -6 days then has bottomed out ever since. We do not have any ads on the site but our pages are dominated by images for most of the 'above the fold' section then followed by the content down the page. Any insight into this would be much appreciated. Cheers, Andrew
Algorithm Updates | | jay.raman0 -
Google removing pages from Index for Panda effected sites?
We have several clients that we took over from other SEO firms in the last 6 months. We are seeing an odd trend. Links are disappearing from the reports. Not just the SEOmoz reports, but all the back link reports we use. Also... sites that pre Panda would show up as a citation or link, have not been showing up. Many are these are not Indexed, and are on large common Y.P or other type sites. Any one think Google is removing pages from the Index on sites based on Panda. Yours in all curiosity. PS ( we are not large enough to produce quantity data on this.)
Algorithm Updates | | MBayes0