Removing duplicated content using only the NOINDEX in large scale (80% of the website).
-
Hi everyone,
I am taking care of the large "news" website (500k pages), which got massive hit from Panda because of the duplicated content (70% was syndicated content). I recommended that all syndicated content should be removed and the website should focus on original, high quallity content.
However, this was implemented only partially. All syndicated content is set to NOINDEX (they thing that it is good for user to see standard news + original HQ content). Of course it didn't help at all. No change after months. If I would be Google, I would definitely penalize website that has 80% of the content set to NOINDEX a it is duplicated. I would consider this site "cheating" and not worthy for the user.
What do you think about this "theory"? What would you do?
Thank you for your help!
-
-
it has been almost a year now from the massive hit. after that, there were also some smaller hits
-
we are putting effort into improvements. that is quite frustrating for me, because I believe that our effort is demolished by old duplicated content (that creates 80% of the website :-))
Yeah, we will need to take care about the link-mess...
Thank you! -
-
Yeah, this strategy will be definitely part of the guidelines for the editors.
One last question: do you know some good resources I can use as an inspiration?
Thank you so much..
-
We deleted thousands of pages every few months.
Before deleting anything we identified valuable pages that continued to receive traffic from other websites or from search. These were often updated and kept on the site. Everything else was 301 redirected to the "news homepage" of the site. This was not a news site, it was a very active news section on an industry portal site.
You have set 410 for those pages and remove all internal links to them and google was ok with that?
Our goal was to avoid internal links to pages that were going to be deleted. Our internal "story recommendation" widgets would stop showing links to pages after a certain length of time. Our periodic purges were done after that length of time.
We never used hard coded links in stories to pages that were subject to being abandoned. Instead we simply linked to category pages where something relevant would always be found.
Develop a strategy for internal linking that will reduce site maintenance and focus all internal links to pages that are permanently maintained.
-
Yaikes! Will you guys still pay for it if it's removed? If so, then combining below comments with my thoughts - I'd delete it, since it's old and not time relevant.
-
Yeah, paying ... we actually pay for this content (earlier management decisions :-))
-
EGOL your insights are very appreciated :-)!
I agree with you. Makes total sense.
So you didn't experience any problems removing outdated content (or "content with no traffic value") from your website? You have set 410 for those pages and remove all internal links to them and google was ok with that?
Redirecting useless content - you mean set 301 to the most relevant page that is bringing traffic?
Thank you sir
-
But I still miss the point of paying for the content that is not accessible from SE
- "paying"?
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
- correct
-
HI Dimitrii,
thank you very much for your opinion. The idea of canonical links is very interesting. We may try that in the "first" phase. But I still miss the point of paying for the content that is not accessible from SE.
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
-
Just seeing the other responses. Agree with what EGOL mentions. A content audit would be even better to see if there was any value at all on those pages (GA traffic, links, etc). Odds are though that there was not any and you already killed all of it with the noindex tag in place.
-
Couple of things here.
-
If a second Panda update has not occurred since the changes that were made then you may not get credit for the noindexed content. I don't think this is "cheating" as with the noindex, it just told Google to take 350K of its pages out of the index. The noindex is one of the best ways to get your content out of Google's index.
-
If you have not spent time improving the non-syndicated content then you are missing the more important part and that is to improve the quality of the content that you have.
A side point to consider here, is your crawl budget. I am assuming that the site still internally links to these 350K pages and so users and bots will go to them and have to process etc. This is mostly a waste of time. As all of these pages are out of Google's index thanks to the noindex tag, why not take out all internal links to those pages (i.e. from sitemaps, paginated index pages, menus, internal content) so that you can have the user and Google focus on the quality content that is left over. I would then also 404/410 all those low quality pages as they are now out of Google's index and not linked internally. Why maintain the content?
-
-
Good point! News gotta be new
-
If there are 500,000 pages of "news" then a lot of that content is "history" instead of "news". Visitors are probably not consuming it. People are probably not searching for it. And actively visited pages on the site are probably not linking to it.
So, I would use analytics to determine if these "history" pages are being viewed, are pulling in much traffic, have very many links, and I would delete and redirect them if they are not important to the site any longer. This decision is best made at the page level.
For "unique content" pages that appear only on my site, I would assess them at regular intervals to determine which ones are pulling in traffic and which ones are not. Some sites place news in folders according to their publication dates and that facilitates inspecting old content for its continued value. These pages can then be abandoned and redirected once their content is stale and not being consumed. Again, this can best be done at the page level.
I used to manage a news section and every few months we would assess, delete and redirect, to keep the weight of the site as low as possible for maximum competitiveness.
-
Hi there.
NOINDEX !== no crawling. and surely it doesn't equal NOFOLLOW. what you probably should be looking at is canonical links.
My understanding is (and i can be completely wrong) that when you get hit by Panda for duplicate content and then try to recover, Google checks your website for the same duplicate content - it's still crawlable, all the links are still "followable", it's still scraped content, you aren't telling crawlers that you took it from somewhere else (by canonicalizing), it's just not displayed in SERPs. And yes, 80% of content being noindex probably doesn't help either.
So, I think that what you need to do is either remove that duplicate content whatsoever, or use canonical links to originals or (bad idea, but would work) block all those links in robots.txt (at least this way those pages will become uncrawlable whatsoever). All this still is unreputable techniques though, kinda like polishing the dirt.
Hope this makes sense.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What if i dont use an H1, but rather, h2 with multiple keywords.
the reason i dont want to use h1 is because i can have only one h1, however if i use several h2s. is it gonna help me rank? bacause google likes h1 more than h2, is google gonna give more priority or same priority to h2., and if that priority is gonna be less, what will be the percentage of that lessness? for ex: h1 gets 90 score if my h1 is missing how much score my h2 will get out of hundred(i know there is no such metric but i am just wondering anyways)
White Hat / Black Hat SEO | | Sam09schulz0 -
Internal Links & Possible Duplicate Content
Hello, I have a website which from February 6 is keep losing positions. I have not received any manual actions in the Search Console. However I have read the following article a few weeks ago and it look a lot with my case: https://www.seroundtable.com/google-cut-down-on-similar-content-pages-25223.html I noticed that google has remove from indexing 44 out of the 182 pages of my website. The pages that have been removed can be considered as similar like the website that is mentioned in the article above. The problem is that there are about 100 pages that are similar to these. It is about pages that describe the cabins of various cruise ships, that contain one picture and one sentence of max 10 words. So, in terms of humans this is not duplicate content but what about the engine, having in mind that sometimes that little sentence can be the same? And let’s say that I remove all these pages and present the cabin details in one page, instead of 15 for example, dynamically and that reduces that size of the website from 180 pages to 50 or so, how will this affect the SEO concerning the internal links issue? Thank you for your help.
White Hat / Black Hat SEO | | Tz_Seo0 -
Client Wants To Use A .io Domain Name - How Bad For Organic?
Hi, I have a U.S. client who is stuck on a name that he wants to get as a .io (British Indian Ocean) domain name for a new site. Aside from the user confusion/weirdness, how much harder do you think this makes this sites organic in the U.S. in the future with a .io domain name? FYI, the other part of the domain name he wants to use is short, meaningless and implies nothing in and of itself. Thanks!
White Hat / Black Hat SEO | | 945012 -
Content placement in HTML and display
Does Google penalize for content being placed at the top of the page and display for users at bottom of the page? This technique is done by CSS. Thank you in advance for your feedback!
White Hat / Black Hat SEO | | Aerocasillas0 -
Sudden Drop in Website Traffic Last month
Can any one help me. One of my website http://www.imperialcard.com.au/ suddenly started to drop in traffic and ranking. I havent done anything black hat. How do I figure out what caused this. Thanks
White Hat / Black Hat SEO | | Verve-Innovation0 -
Common passwords used for spam accounts?
This is a bit of a longshot. I know that many of the spam forum accounts, blog posts etc that have in the past been used for SEO are generated automatically. Does anyone know of any common passwords that are often used when setting up these accounts? I only ask as, trying to clean up the backlink profile for my website, I found myself in desperation keying in random passwords trying to access the spam accounts created on various forums by our former SEO agency. Eventually I got lucky and worked out the password for a series of forum accounts was, not very imaginatively, 'seo'. Having worked out this, I was able to delete the spam signatures on about 10 forums. But there are many other accounts where I have no idea of the password used. I guess I'm just wondering if there are standard stock passwords used in the past by many SEOs? Not likely to get an answer to this one, I know, but worth a shot.
White Hat / Black Hat SEO | | mgane0 -
Ask Bloggers/Users To Link To Website
I have a web service that help bloggers to do certain tasks and find different partners. We have a couple of thousand bloggers using the service and ofcourse this is a great resource for us to build links from. The bloggers are all from different platforms and domains. Currently when a blogger login to the service we tell the blogger that if they write a blog post about us with their own words, and tell their readers what they think of our service. We will then give them a certain benifit within the service. This is clearly encouraging a dofollow-link from the bloggers, and therefore it's not natural link building. The strategy is however working quite good with about 150 new blog posts about our service per month, which both gives us a lot of new visitors and users, but also give us link power to increase our rankings within the SERP. Now to my questions: This is not a natural way of building links, but what is your opinion of this? Is this total black hat and should we be scared of a severe punishment from Google? We are not leaving any footprints more than we are asking the users for a link, and all blogposts are created with their own unique words and honest opinions. Since this viral marketing method is working great, we have no plans of changing our strategy. But what should we avoid and what steps should we take to ensure that we won't get in any trouble in the future for encouraging our users to linking back to us in this manner?
White Hat / Black Hat SEO | | marcuslind0 -
Using an auto directory submission
Has anyone used easysubmits.com and what's your experience with it? Any other directory submission or link building tools that help automate and manage the process like easysubmits.com says they can do? I'm just looking at it currenlty and wanted to hear others thoughts before I get taken in by some black hat method that hurts my websites instead.
White Hat / Black Hat SEO | | Twinbytes0