Scraped content ranking above the original source content in Google.
-
I need insights on how “scraped” content (exact copy-pasted version) rank above the original content in Google.
4 original, in-depth articles published by my client (an online publisher) are republished by another company (which happens to be briefly mentioned in all four of those articles). We reckon the articles were re-published at least a day or two after the original articles were published (exact gap is not known). We find that all four of the “copied” articles rank at the top of Google search results whereas the original content i.e. my client website does not show up in the even in the top 50 or 60 results.
We have looked at numerous factors such as Domain authority, Page authority, in-bound links to both the original source as well as the URLs of the copied pages, social metrics etc. All of the metrics, as shown by tools like Moz, are better for the source website than for the re-publisher. We have also compared results in different geographies to see if any geographical bias was affecting results, reason being our client’s website is hosted in the UK and the ‘re-publisher’ is from another country--- but we found the same results. We are also not aware of any manual actions taken against our client website (at least based on messages on Search Console).
Any other factors that can explain this serious anomaly--- which seems to be a disincentive for somebody creating highly relevant original content.
We recognize that our client has the option to submit a ‘Scraper Content’ form to Google--- but we are less keen to go down that route and more keen to understand why this problem could arise in the first place.
Please suggest.
-
**Everett Sizemore - Director, R&D and Special Projects at Inflow: **Use the Google Scraper Report form.
Thanks. I didn't know about this.
If that doesn't work, submit a DMCA complaint to Google.
This does work. We submit dozens of DMCAs to Google every month. We also send notices to sites who have used our content but might know understand copyright infringement.
Everett Sizemore - Director, R&D and Special Projects at Inflow Endorsed 2 minutes ago Until Manoj gives us the URLs so we can look into it ourselves, I'd have to say this is the best answer: Google sucks sometimes. Use the Google Scraper Report form. If that doesn't work, submit a DMCA complaint to Google.
-
Oh, that is a very good point. This is very bad for people who have clients.
-
Thanks, EGOL.
The other big challenge is to get clients to also buy into the idea that it is Google's problem!
-
**In this specific instance, the original source outscores the site where content is duplicated on almost all the common metrics that are deemed to be indicative of a site's relative authority/standing. **
Yes, this happens. It states the problem and Google's inabilities more strongly than I have stated it above.
**Any ideas/ potential solutions that you could help with ---- will be much appreciated. **
I have this identical problem myself. Actually, its Google's problem. They have crap on their shoes but say that they can't smell it.
-
Hi,
Thanks for the response. I'd understand if the original source was indeed new or not so 'powerful' or an established site in the niche that it serves.
In this specific instance, the original source outscores the site where content is duplicated on almost all the common metrics that are deemed to be indicative of a site's relative authority/standing.
Any ideas/ potential solutions that you could help with ---- will be much appreciated.
Thanks
-
Scraped content frequently outranks the original source, especially when the original source is a new site or a site that is not powerful.
Google says that they are good at attributing content to the original publisher. They are delusional. Lots of SEOs believe Google. I'll not comment on that.
If scraped content was not making money for people this practice would have died a long time ago. I submit that as evidence. Scrapers know what Google does not (or refused to admit) and what many SEOs refuse to believe.
-
No, John - we don't use the 'Fetch as Googlebot' for every post. I am intrigued by the possibility you suggest.
Yes, there are lots of unknowns and certain results seem inexplicable --- as we feel this particular instance is. We have looked at and evaluated most of the obvious things to be considered, including the likelihood of the re-publisher having gotten more social traction. However, the actual results are opposite to what we'd expect.
I'm hoping that you/ some of the others in this forum could shed some light on any other factors that could be influencing the results.
Thanks.
-
Thanks for the link, Umar.
Yes, we did fetch the cached versions of both pages--- but that doesn't indicate when the respective pages were first indexed, it just shows when the pages were last cached.
-
No Martijn, the articles have excerpts from representatives of the republisher; there are no links to the re-publisher website.
-
When you're saying you're mentioning the re-publisher briefly in the posts itself does that mean you're also linking to them?
-
Hey Manoj,
That's indeed very weird. There can be multiple reasons for this, for instance, did you try to fetch the cached version of both sites to check when they got indexed? Usually online publication sites have fast indexing rate and it might be possible that your client shared the articles on social before they got indexed and the other site lifted them up.
Do check out this brilliant Moz post, I'm sure you will get the idea what caused this,
https://moz.com/blog/postpanda-your-original-content-is-being-outranked-by-scrapers-amp-partners
Hope this helps!
-
Do you use fetch for google WMT with every post?
If your competitors monitor the site, harvest the content and then publish and use fetch for google - that could explain why google ranks them first. ie google would likely have indexed their content first.
That said there are so many unknown factors at play, ie how does social stack up. Are they using google + etc.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Google ignore content styled with 'display:none'?
Do you know if an H1 within a div that has a 'display: none' style applied will still be crawled and evaluated by Google? We have that situation on this page on line 136: view-source:https://www.junk-king.com/services/items-we-take/foreclosure-cleanouts Of course we also have an H1 up at the top of the page and are concerned that the second one will cause interference with our SEO efforts. I've seen conflicting and inconclusive information on line - not sure. Thanks for any help.
Intermediate & Advanced SEO | | rastellop0 -
I am really surprised to see this page is ranking like crazy even the content is very thin
https://www.hackerearth.com/blog/artificial-intelligence/artificial-intelligence-101-how-to-get-started/ We are ranking for 121KW for this page. And 22KW are ranking in the 1-3 position. I am not able to understand why will it rank like anything. Considering that it has just 4 inbound links. Will some help me to understand this mystery. When we try to write a good in-depth content then we are not ranking but for such content, we are doing fairly good.
Intermediate & Advanced SEO | | Rajnish_HE1 -
Content or Backlinks
HI I have resource issues and need to prioritise my time, I know both content & backlinks are important for SEO, but where will it be most beneficial to spend my time? We are a generalist site, so this also makes things tougher. I have some core areas to work on, but want to be the most effective in the time I spend on them. Thanks!
Intermediate & Advanced SEO | | BeckyKey1 -
Is tabbed content okay or bad for SEO? Google takes both sides.
Hello Moz Community! It seems like there are two opinions coming from directly from Google on tabbed content: 1) John Mueller says here that content is indexed but discounted 2) Matt Cutts says here that if you're not using tabs deceptively, you're in good shape I see this has been discussed in the Moz Q & A before, but I have an interesting situation: The pages I am building have ~50% static content, and ~50% tabbed content (only two tabs). Showing all tabbed content at once is not an option. Since the tabbed content will make up 50% of the total content, it's important that it is 100% weighted by Google. I can think of two ways to show it: 1) Standard tabs using jQuery Advantage: Both tab 1 and tab 2's content indexed Disadvantage: Tabbed content may be discounted? 2) Make the content of the tabs conditional on the server side website.com/page/ only shows tab 1's content in html website.com/page/?tab=2 only shows tab 2's content in the html. Include rel="canonical" pointing to website.com/page/. Advantage: Content of tab 1 indexed & 100% counted by Google Disadvantage: Content of tab 2 not indexed Which option is best? Is there a better solution?
Intermediate & Advanced SEO | | jamiestu130 -
Help! The website ranks fine but one of my web pages simply won't rank on Google!!!
One of our web pages will not rank on Google. The website as a whole ranks fine except just one section...We have tested and it looks fine...Google can crawl the page no problem. There are no spurious redirects in place. The content is fine. There is no duplicate page content issue. The page has a dozen product images (photos) but the load time of the page is absolutely fine. We have the submitted the page via webmaster and its fine. It gets listed but then a few hours later disappears!!! The site has not been penalised as we get good rankings with other pages. Can anyone help? Know about this problem?
Intermediate & Advanced SEO | | CayenneRed890 -
Ranking Google News - 3 digits in URL?
Do we need to have unique 3 digits in URL, like stated here in technical guidelines from Google -- https://support.google.com/news/publisher/answer/40787?hl=en&ref_topic=4359866. Or, is having and submitting Google News XML Sitemap a way around that -- https://support.google.com/news/publisher/answer/68323?hl=en
Intermediate & Advanced SEO | | bonnierSEO0 -
Original Source and Canonical tags
We've been using canonical links to protect site SEO for contributor content and requiring canonical of our partners (as well as tagging internal duplicate content with canonical). Most other media sites have been doing the same but this is a moving target. I'm now hearing that the original source tag is now a better option. Special focus for us is placement on google news. Any guidance?
Intermediate & Advanced SEO | | jbertfield0