Is it possible that Google may have erroneous indexing dates?
-
I am consulting someone for a problem related to copied content. Both sites in question are WordPress (self hosted) sites. The "good" site publishes a post. The "bad" site copies the post (without even removing all internal links to the "good" site) a few days after.
On both websites it is obvious the publishing date of the posts, and it is clear that the "bad" site publishes the posts days later. The content thief doesn't even bother to fake the publishing date.
The owner of the "good" site wants to have all the proofs needed before acting against the content thief. So I suggested him to also check in Google the dates the various pages were indexed using Search Tools -> Custom Range in order to have the indexing date displayed next to the search results.
For all of the copied pages the indexing dates also prove the "bad" site published the content days after the "good" site, but there are 2 exceptions for the very 2 first posts copied.
First post:
On the "good" website it was published on 30 January 2013
On the "bad" website it was published on 26 February 2013
In Google search both show up indexed on 30 January 2013!Second post:
On the "good" website it was published on 20 March 2013
On the "bad" website it was published on 10 May 2013
In Google search both show up indexed on 20 March 2013!Is it possible to be an error in the date shown in Google search results?
I also asked for help on Google Webmaster forums but there the discussion shifted to "who copied the content" and "file a DMCA complain". So I want to be sure my question is better understood here.
It is not about who published the content first or how to take down the copied content, I am just asking if anybody else noticed this strange thing with Google indexing dates.How is it possible for Google search results to display an indexing date previous to the date the article copy was published and exactly the same date that the original article was published and indexed?
-
Thanks Doug. Really an eye-opener.
-
Thanks Doug for your response. It really cleared up the questions I had about that date Google shows next to the search results.
I was not able to find official details about it, all I was able to find was different referencing as the indexing date of a page.
But I knoew here in the MOZ community there are people who really know things, that's why I asked.
So that date is just Google's estimation of the publishing date, not the date Google indexed the content!
Thanks again for taking the time to answer me!
-
Hiya Sorina,
When you use the custom date range, Google isn't listing results based on the date they were indexed. Google is using an estimated publication date.
Google tries to estimate the the publication date based on meta-data and other features of the page such as dates in the content, title and URL. The date Google first indexed the page is just one of the things that Google can use to estimate the publication date.
I also suspect that dates in any sitemap.xml files will also be taken into consideration.
But, given that even Google can't guarantee that it'll crawl and index articles on the day they've been published the crawl data may not be an accurate estimate.
Also, if the scraped content is being re-published with intact internal links (are these the full URL - do you they resolve to your original website?) then it's pretty obvious where the content came from.
Hope this help answer your question.
-
Hi Sorina,
I can tell you that the index dates shown by Google are accurate but is not the case with the Cache date sometimes as the date shown in the Cache and the copy shown in the cache don't match many times but the index dates are accurate. Send me a private message with the actual URLs under discussion and I will be able to comment with more clarity.
Best,
Devanur Rafi
-
Thank you for your response Devanur Rafi, but the "good" site doesn't have problems getting indexed.
Actually all posts on the "good" site are indexed the very same day they are published.My question was more about the indexing date shown in Google search results
How come, for a post from the "bad" site, Google is displaying an indexing date previous to the actual date the post was published on that site?!
And how come this date is exactly the same as the date Google says it indexed the post from the "good" site?
-
Hi Sorina,
This is a common thing and it all depends on a site's crawlability (how easy is it to crawl for the bot) and crawl frequency for that site. Google would have picked up that post first on the bad site and then from the good site. However, just because one or two posts were picked up late does not mean that the good site is not crawler friendly. It also depends on how far the resource is from the root. Let us take an example:
A page on a good site: abc.com/folder1/folder2/folder3/page.html
Now a bad site copies that page: xyz.com/page.html
In this case, Google might first pickup the copied page from the bad site as it is just a click away from the root which is not the case with the good site where the page is nested deep inside multiple folders.
You can also give the way back machine (archive.org) a try to find which website published the post first. Sometimes this might work out pretty well. You can also try to look at the cache dates of the posts on both the sites in Google to get some info in this regard.
Hope those help. I wish you good luck.
Best,
Devanur Rafi.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google traffic drops (October 24th 2014) - Penguin 3.0?
Hi All My client's site http://www.carismaautodesign.com took a bit of a knock around 24th October 2014. Google organic traffic down by ~32%. I am trying to understand if site has been hit by a Penguin refresh (3.0) - and if so why?
Algorithm Updates | | seowoody
If it was, it would appear to be a false negative as the site and backlink profile is clean. The content is perhaps the only area in question... as it's more of a brochure site, therefore content is relatively thin and promotional rather than in-depth/editorial. For example, the gallery pages are very similar in structure, with the images and specification text being the only variation, click on any of the vehicle interiors to see what I mean - http://www.carismaautodesign.com/gallery/mercedes/viano/ You will see the specification text is unique per vehicle interior, but not hugely; Then how could it be? The interiors all contain the same elements just a variation in seating, leather colour, stiching, wood finish etc. Question: Do you think... a.) This IS NOT a Penguin issue but something else (please ellaborate)
b.) This IS a Penguin issue but a false-negative, so do nothing, this will bounce back with next Penguin refresh
c.) This IS a Penguin issue related to content. Merge all gallery pages into one page per vehicle (i.e. one Mercedes Viano Interior page, with all 19 interior galleries as part of the page - thus building one stronger page with more intro text and simple bulleted specification per gallery)
d.) This IS a Penguin issue related to something else (please ellaborate) Thanks,0 -
Is there a we to get Google to index our site quicker?
I have updated some pages on a website, is there a way to get Google to index the page quicker?
Algorithm Updates | | webguru20140 -
Does Google use data from Gmail to penalize domains and vice versa?
Has anyone noticed issues with Gmail deliverability and spam inboxing happening around the same time as other large Google updates? For example, if Google blasted your site in Panda or Penguin, have anyone seen them use the same judgement across into Gmail deliverability to blacklist your domain?
Algorithm Updates | | Eric_edvisors0 -
Google is showing crazy results
Google is showing crazy results in these days sometimes my sites are on top of all keywords sometimes far behind in search engine in same day what is going on ????
Algorithm Updates | | GM0070 -
Google visits falling at the expense of Bing
Has anyone else noticed their percentage of search visits from Google slipping in the last few weeks at the expense of Bing? We've seen a 4% swing in the last month. Obviously Google is still the dominant presence (acconuting for 88.4% of all organic visits to our site kenwoodtravel.co.uk) but still it would be interesting to know if this is just a blip or more of a trend?
Algorithm Updates | | BrettCollins0 -
Google Authorship and Hobby Blog
I hope that someone can help me come up with the best option. Please forgive my ignorance on this issue. I have a hobby blog and up until now I have not wanted to associate it with my real name. It is a menswear blog about classic American style. I was afraid that it may be a hindrance if I was ever looking for a more conservative career than SEO. I am now reconsidering this and thinking that claiming it may be of more help than harm. Which brings me to Google Authorship. My dilemma and misunderstanding stems from the fact that I have mutliple Gmail accounts. I am guessing that some of the newer accounts have a G+ associated with them. So my question is do I use the email that is associated with my blog or my main gmail that I use personally? If I do use the gmail associated with the blog will it then become my default Google plus profile? Any insight would be helpful. Thanks in advance. If any of you are interested the hobby blog is Oxford Cloth Button Down.
Algorithm Updates | | JerrodDavid0 -
How to speed up indexing of my site...
Only 4 out of the 12 pages of my blog/site have been indexed. How can I ensure all the pages get indexed? I'm using a wordpress site, and I also wondered how could I speed the indexing process up (I have submitted a site map) Thanks!
Algorithm Updates | | copywritingbuzz0 -
Is this a possible Google penalty scenario?
In January we were banned from Google due to duplicate websites because of a server configuration error by our previous webmaster. Around 100 of our previously inactive domain names were defaulted to the directory of our company website during a server migration, thus showing the exact same site 100 times... obviously Google was not game and banned us. At the end of February we were allowed back into the SERPS after fixing the issue and have since steadily regained long-tail keyword phrase rankings, but in Google are still missing our main keyword phrase. This keyword phrase brings in the bulk of our best traffic, so obviously it's an issue. We've been unable to get above position 21 for this keyword, but in Yahoo, Bing, and Yandex (Russian SE) we're positions 3, 3, and 7 respectively. It seems to me there has to be a penalty in effect, as this keyword gets between 10 and 100 times as much traffic in Google than any of the ones we're ranked for, what do you think? EDIT: I should mention in the 4-5 years prior to the banning we had been ranked between 15 and 4th in Google, 80% of the time on the first page.
Algorithm Updates | | ACann0