Removing duplicated content using only the NOINDEX in large scale (80% of the website).
-
Hi everyone,
I am taking care of the large "news" website (500k pages), which got massive hit from Panda because of the duplicated content (70% was syndicated content). I recommended that all syndicated content should be removed and the website should focus on original, high quallity content.
However, this was implemented only partially. All syndicated content is set to NOINDEX (they thing that it is good for user to see standard news + original HQ content). Of course it didn't help at all. No change after months. If I would be Google, I would definitely penalize website that has 80% of the content set to NOINDEX a it is duplicated. I would consider this site "cheating" and not worthy for the user.
What do you think about this "theory"? What would you do?
Thank you for your help!
-
-
it has been almost a year now from the massive hit. after that, there were also some smaller hits
-
we are putting effort into improvements. that is quite frustrating for me, because I believe that our effort is demolished by old duplicated content (that creates 80% of the website :-))
Yeah, we will need to take care about the link-mess...
Thank you! -
-
Yeah, this strategy will be definitely part of the guidelines for the editors.
One last question: do you know some good resources I can use as an inspiration?
Thank you so much..
-
We deleted thousands of pages every few months.
Before deleting anything we identified valuable pages that continued to receive traffic from other websites or from search. These were often updated and kept on the site. Everything else was 301 redirected to the "news homepage" of the site. This was not a news site, it was a very active news section on an industry portal site.
You have set 410 for those pages and remove all internal links to them and google was ok with that?
Our goal was to avoid internal links to pages that were going to be deleted. Our internal "story recommendation" widgets would stop showing links to pages after a certain length of time. Our periodic purges were done after that length of time.
We never used hard coded links in stories to pages that were subject to being abandoned. Instead we simply linked to category pages where something relevant would always be found.
Develop a strategy for internal linking that will reduce site maintenance and focus all internal links to pages that are permanently maintained.
-
Yaikes! Will you guys still pay for it if it's removed? If so, then combining below comments with my thoughts - I'd delete it, since it's old and not time relevant.
-
Yeah, paying ... we actually pay for this content (earlier management decisions :-))
-
EGOL your insights are very appreciated :-)!
I agree with you. Makes total sense.
So you didn't experience any problems removing outdated content (or "content with no traffic value") from your website? You have set 410 for those pages and remove all internal links to them and google was ok with that?
Redirecting useless content - you mean set 301 to the most relevant page that is bringing traffic?
Thank you sir
-
But I still miss the point of paying for the content that is not accessible from SE
- "paying"?
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
- correct
-
HI Dimitrii,
thank you very much for your opinion. The idea of canonical links is very interesting. We may try that in the "first" phase. But I still miss the point of paying for the content that is not accessible from SE.
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
-
Just seeing the other responses. Agree with what EGOL mentions. A content audit would be even better to see if there was any value at all on those pages (GA traffic, links, etc). Odds are though that there was not any and you already killed all of it with the noindex tag in place.
-
Couple of things here.
-
If a second Panda update has not occurred since the changes that were made then you may not get credit for the noindexed content. I don't think this is "cheating" as with the noindex, it just told Google to take 350K of its pages out of the index. The noindex is one of the best ways to get your content out of Google's index.
-
If you have not spent time improving the non-syndicated content then you are missing the more important part and that is to improve the quality of the content that you have.
A side point to consider here, is your crawl budget. I am assuming that the site still internally links to these 350K pages and so users and bots will go to them and have to process etc. This is mostly a waste of time. As all of these pages are out of Google's index thanks to the noindex tag, why not take out all internal links to those pages (i.e. from sitemaps, paginated index pages, menus, internal content) so that you can have the user and Google focus on the quality content that is left over. I would then also 404/410 all those low quality pages as they are now out of Google's index and not linked internally. Why maintain the content?
-
-
Good point! News gotta be new
-
If there are 500,000 pages of "news" then a lot of that content is "history" instead of "news". Visitors are probably not consuming it. People are probably not searching for it. And actively visited pages on the site are probably not linking to it.
So, I would use analytics to determine if these "history" pages are being viewed, are pulling in much traffic, have very many links, and I would delete and redirect them if they are not important to the site any longer. This decision is best made at the page level.
For "unique content" pages that appear only on my site, I would assess them at regular intervals to determine which ones are pulling in traffic and which ones are not. Some sites place news in folders according to their publication dates and that facilitates inspecting old content for its continued value. These pages can then be abandoned and redirected once their content is stale and not being consumed. Again, this can best be done at the page level.
I used to manage a news section and every few months we would assess, delete and redirect, to keep the weight of the site as low as possible for maximum competitiveness.
-
Hi there.
NOINDEX !== no crawling. and surely it doesn't equal NOFOLLOW. what you probably should be looking at is canonical links.
My understanding is (and i can be completely wrong) that when you get hit by Panda for duplicate content and then try to recover, Google checks your website for the same duplicate content - it's still crawlable, all the links are still "followable", it's still scraped content, you aren't telling crawlers that you took it from somewhere else (by canonicalizing), it's just not displayed in SERPs. And yes, 80% of content being noindex probably doesn't help either.
So, I think that what you need to do is either remove that duplicate content whatsoever, or use canonical links to originals or (bad idea, but would work) block all those links in robots.txt (at least this way those pages will become uncrawlable whatsoever). All this still is unreputable techniques though, kinda like polishing the dirt.
Hope this makes sense.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My Brand new website shows 79% spam Score, what is the reason and how should I deal with this?
Hi, I have just launched my website 1 month before and I have used all paid images, Uniquely written contents, Everything is genuine for better SEO experience in the future. The actual problem is its showing spam by 79% in MOZ bar, I don't have a single link on my website also my content is unique, Images are unique. Why its showing so much spam on this brand new website? Can you please help me? I am very stressed due to this problem.
White Hat / Black Hat SEO | | rahat640 -
I redesigned a clients website and there is a pretty massive drop in traffic - despite my efforts to significantly improve SEO.
Hi there, I redesigned a clients website that was very old fashioned and was not responsive. I implemented 301 redirects, kept the content pretty similar, website linking structure very similar - the only things i changed was making the website responsive, improved title tags, added a bit more information, improved the footer and h1 tags etc.. however although clicks are fairly similar search impressions have dropped about 60% on average over the past week. The old site had some keywords linking to pages with no new content so i removed those as seemed like black hat seo tricks and also there was a huge list of "locations we deliver to" on the homepage followed by around 500 citys/towns I removed this. Could this be the cause for the drop? as i assumed those would do more harm than good? Fairly new with SEO as you can probably tell. Looking for advice on what may be the cause and what steps I should take now. Thanks for reading! duGeW
White Hat / Black Hat SEO | | binkez321 -
Do the links from top websites' forums boost in-terms of backlinks?
If we get any backlinks from discussions/forums of top websites like wordpress and joomla forums; do they count as valid and authority improving backlinks? I mean about the dofollow links.
White Hat / Black Hat SEO | | vtmoz1 -
80% of traffic lost over night, Google Penalty?
Hi all.
White Hat / Black Hat SEO | | Hemjakt
I have a website called Hemjakt (http://www.hemjakt.se/) which is a search engine for real estate currently only available on the Swedish market. The application crawl real estate websites and collect all estates on a single searchable application. The site has been released for a few months and have seen a steady growth since release, increasing by 20% weekly up to ~900 visitors per day. 3 days ago, over night, I lost 80% of my traffic. Instead of 900 visitors per day I'm at ~100 visitors per day and when I search for long, specific queries such as "Åsgatan 15, Villa 12 rum i Alsike, Knivsta" ( <adress><house type=""><rooms><area> <city>), I'm now only found on the fifth page. I suspect that I have become a subject of a Google Penalty. How to get out of this mess?</city></rooms></house></adress> Just like all search engines or applications, I do crawl other websites and scrape their content. My content is ~90% unique from the source material and I do add user value by giving them the possibility to compare houses, get ton of more data to compare pricing and history, giving them extra functionalities that source site do not offer and so on. My analytics data show good user engagement. Here is one example of a Source page and a page at my site:
Source: http://www.hemnet.se/bostad/villa-12rum-alsike-knivsta-kommun-asgatan-15-6200964
My Site: http://www.hemjakt.se/bostad/55860-asgatan-15/ So: How do I actually confirm that this is the reason I lost my traffic? When I search for my branded query, I still get result. Also I'm still indexed by Google. If I am penalized. I'm not attempting to do anything Black Hat and I really believe that the app gives a lot of value to the users. What tweaks or suggestions do you have to changes of the application, to be able to continue running the service in a way that Google is fine with?0 -
How to save website from Negative SEO?
Hi, I have read couple of good blog post on Negative SEO and come to know about few solution which may help me to save my website during Negative SEO. Here, I want to share my experience and live data regarding Negative SEO. Someone is creating bad inbound links to my website. I come to know about it via Google webmaster tools. Honestly, I have implemented certain solutions like Google disavow tool, contact to certain websites and many more. But, I can see negative impact on organic visits. Organic visits are going down since last two months. And, I am thinking, These bad inbound links are biggest reasons behind it. You can visit following URLs to know more about it. Can anyone share your experience to save website from negative SEO? How can I save any website from Negative SEO (~Bad Inbound Links) https://docs.google.com/file/d/0BxyEDFdgDN-iR0xMd2FHeVlzYVU/edit https://drive.google.com/file/d/0BxyEDFdgDN-iMEtneXU1YmhWX2s/edit?usp=sharing https://drive.google.com/file/d/0BxyEDFdgDN-iSzNXdEJRdVJJVGM/edit?usp=sharing
White Hat / Black Hat SEO | | CommercePundit0 -
Would it be a good idea to duplicate a website?
Hello, here is the situation: let's say we have a website www.company1.com which is 1 of 3 main online stores catering to a specific market. In an attempt to capture a larger market share, we are considering opening a second website, say www.company2.com. Both these websites have a different URL, but offer the same products for sale to the same clientele. With this second website, the theory is instead of operating 1 of 3 stores, we now operate 2 of 4. We see 2 ways of doing this: we launch www.company2.com as a copy of www.company1.com. we launch www.company2.com as a completely different website. The problem I see with either of these approaches is duplicate content. I think the duplicate content issue would be even more or a problem with the first approach where the entire site is mostly a duplicate. With the second approach, I think the duplicate content issue can be worked around by having completely different product pages and overall website structure. Do you think either of these approaches could result in penalties by the search engines? Furthermore, we all know that higher ranking/increased traffic can be achieved though high quality unique content, social media presence, on-going link-building and so on. Now assuming we have a fixed amount of manpower to provide for these tasks; do you think we have better odds of increasing our overall traffic by sharing the manpower on 2 websites, or putting it all behind a single one? Thanks for your help!
White Hat / Black Hat SEO | | yacpro130 -
Anyone used clicksubmit.co.uk?
As title, anyone used them? their reviews all sound really positive (if they're real). The system sounds like an auto submitting back link generator - which can't be good?
White Hat / Black Hat SEO | | FDFPres0 -
Is it outside of Google's search quality guidelines to use rel=author on the homepage?
I have recently seen a few competitors using rel=author to markup their homepage. I don't want to follow suit if it is outside of Google's search quality guidelines. But I've seen very little on this topic, so any advice would be helpful. Thanks!
White Hat / Black Hat SEO | | smilingbunny0