Removing duplicated content using only the NOINDEX in large scale (80% of the website).
-
Hi everyone,
I am taking care of the large "news" website (500k pages), which got massive hit from Panda because of the duplicated content (70% was syndicated content). I recommended that all syndicated content should be removed and the website should focus on original, high quallity content.
However, this was implemented only partially. All syndicated content is set to NOINDEX (they thing that it is good for user to see standard news + original HQ content). Of course it didn't help at all. No change after months. If I would be Google, I would definitely penalize website that has 80% of the content set to NOINDEX a it is duplicated. I would consider this site "cheating" and not worthy for the user.
What do you think about this "theory"? What would you do?
Thank you for your help!
-
-
it has been almost a year now from the massive hit. after that, there were also some smaller hits
-
we are putting effort into improvements. that is quite frustrating for me, because I believe that our effort is demolished by old duplicated content (that creates 80% of the website :-))
Yeah, we will need to take care about the link-mess...
Thank you! -
-
Yeah, this strategy will be definitely part of the guidelines for the editors.
One last question: do you know some good resources I can use as an inspiration?
Thank you so much..
-
We deleted thousands of pages every few months.
Before deleting anything we identified valuable pages that continued to receive traffic from other websites or from search. These were often updated and kept on the site. Everything else was 301 redirected to the "news homepage" of the site. This was not a news site, it was a very active news section on an industry portal site.
You have set 410 for those pages and remove all internal links to them and google was ok with that?
Our goal was to avoid internal links to pages that were going to be deleted. Our internal "story recommendation" widgets would stop showing links to pages after a certain length of time. Our periodic purges were done after that length of time.
We never used hard coded links in stories to pages that were subject to being abandoned. Instead we simply linked to category pages where something relevant would always be found.
Develop a strategy for internal linking that will reduce site maintenance and focus all internal links to pages that are permanently maintained.
-
Yaikes! Will you guys still pay for it if it's removed? If so, then combining below comments with my thoughts - I'd delete it, since it's old and not time relevant.
-
Yeah, paying ... we actually pay for this content (earlier management decisions :-))
-
EGOL your insights are very appreciated :-)!
I agree with you. Makes total sense.
So you didn't experience any problems removing outdated content (or "content with no traffic value") from your website? You have set 410 for those pages and remove all internal links to them and google was ok with that?
Redirecting useless content - you mean set 301 to the most relevant page that is bringing traffic?
Thank you sir
-
But I still miss the point of paying for the content that is not accessible from SE
- "paying"?
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
- correct
-
HI Dimitrii,
thank you very much for your opinion. The idea of canonical links is very interesting. We may try that in the "first" phase. But I still miss the point of paying for the content that is not accessible from SE.
Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?
-
Just seeing the other responses. Agree with what EGOL mentions. A content audit would be even better to see if there was any value at all on those pages (GA traffic, links, etc). Odds are though that there was not any and you already killed all of it with the noindex tag in place.
-
Couple of things here.
-
If a second Panda update has not occurred since the changes that were made then you may not get credit for the noindexed content. I don't think this is "cheating" as with the noindex, it just told Google to take 350K of its pages out of the index. The noindex is one of the best ways to get your content out of Google's index.
-
If you have not spent time improving the non-syndicated content then you are missing the more important part and that is to improve the quality of the content that you have.
A side point to consider here, is your crawl budget. I am assuming that the site still internally links to these 350K pages and so users and bots will go to them and have to process etc. This is mostly a waste of time. As all of these pages are out of Google's index thanks to the noindex tag, why not take out all internal links to those pages (i.e. from sitemaps, paginated index pages, menus, internal content) so that you can have the user and Google focus on the quality content that is left over. I would then also 404/410 all those low quality pages as they are now out of Google's index and not linked internally. Why maintain the content?
-
-
Good point! News gotta be new
-
If there are 500,000 pages of "news" then a lot of that content is "history" instead of "news". Visitors are probably not consuming it. People are probably not searching for it. And actively visited pages on the site are probably not linking to it.
So, I would use analytics to determine if these "history" pages are being viewed, are pulling in much traffic, have very many links, and I would delete and redirect them if they are not important to the site any longer. This decision is best made at the page level.
For "unique content" pages that appear only on my site, I would assess them at regular intervals to determine which ones are pulling in traffic and which ones are not. Some sites place news in folders according to their publication dates and that facilitates inspecting old content for its continued value. These pages can then be abandoned and redirected once their content is stale and not being consumed. Again, this can best be done at the page level.
I used to manage a news section and every few months we would assess, delete and redirect, to keep the weight of the site as low as possible for maximum competitiveness.
-
Hi there.
NOINDEX !== no crawling. and surely it doesn't equal NOFOLLOW. what you probably should be looking at is canonical links.
My understanding is (and i can be completely wrong) that when you get hit by Panda for duplicate content and then try to recover, Google checks your website for the same duplicate content - it's still crawlable, all the links are still "followable", it's still scraped content, you aren't telling crawlers that you took it from somewhere else (by canonicalizing), it's just not displayed in SERPs. And yes, 80% of content being noindex probably doesn't help either.
So, I think that what you need to do is either remove that duplicate content whatsoever, or use canonical links to originals or (bad idea, but would work) block all those links in robots.txt (at least this way those pages will become uncrawlable whatsoever). All this still is unreputable techniques though, kinda like polishing the dirt.
Hope this makes sense.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Footer Links Used for Keyword Spam
I was on the phone with a proposed web relaunch firm for one of my clients listening to them talk about their deep SEO knowledge. I cannot believe that this wouldn’t be considered black-hat or at least very Spammy in which case a client could be in trouble. On this vendor’s site I notice that they stack the footer site map with about 50 links that are basically keywords they are trying to rank for. But here’s the kicker shown by way of example from one of the themes in the footer: 9 footer links:
White Hat / Black Hat SEO | | RosemaryB
Top PR Firms
Best PR Firms
Leading PR Firms
CyberSecurity PR Firms
Cyber Security PR Firms
Technology PR Firms
PR Firm
Government PR Firms
Public Sector PR Firms Each link goes to a unique URL that is basically a knock-off of the homepage with a few words or at the most one sentences swapped out to include this footer link keyword phrase, sometimes there is a different title attribute but generally they are a close match to each other. The canonical for each page links back to itself. I simply can’t believe Google doesn’t consider this Spammy. Interested in your view.
Rosemary0 -
Besides technical error improvement, best way to increase organic traffic to movie review website
I have a friend's website, ShowBizJunkies, that they work very had at improving and providing great content. I put the website in a more modern theme, increased speed (wpengine, but maxed out with cdn, caching, image optimization, etc) But now I'm struggling how to suggest further improving the seo structure or building backlinks. I know trying to come up for those terms like "movie reviews" and many similar are ridiculously difficult, and requires tons of high quality backlinks. What is my lowest hanging fruit here, any suggestions? My current plan is: 1. Fix technical errors 2. Create more evergreen content 3. Work on timing of article release for better Google News coverage 4. More social sharing, sharing on Tumblr, Reddit, Facebook Groups, G+ Communities, etc 5. Build backlinks via outreach to tv show specific sites, movie fan sites, actor fan sites (interviews)
White Hat / Black Hat SEO | | JustinMurray1 -
301 redirects for 3 top level domains using WP SEO Yoast
Hey Guys I have a custom built website - and a wp blog attached to this - problem is there are 3 top level domains: zenory.co.nz, zenory.com and zenory.com.au **The issue is when I enter the domain to 301 redirect I only have to enter one domain usually i enter redirect from zenory.com/blog/oldpage to zenory.com.newpage ** For eg: I have just move Phone Psychic Readings from the blog - over to the main site. However there seems to be an issue that I'm still having and trying to clean up. I'm finding backlinks there are linking to each other of my 3 domains that end up backlinking across domains, which I was told this can look as spammy to google. For eg: co.nz links many pages to com.au. I'm currently trying to clean this up at the moment - however while im in the process of this - I find myself question when I'm creating the 301 redirects from the blog - but lets say I'm on the blog for zenoy.co.nz/blog/oldblogpost and when I click on a blog post - it redirects me to zenory.com/newarticlepost - because I have redirected it to .com - how can I redirect and make sure is going back to the right domain name to save myself from having to show this cross backlinks? Would gratefully appreciate any assistance on this tricky situation. Cheers Just
White Hat / Black Hat SEO | | edward-may0 -
Why website isn't showing on results?
Hello Moz! Just got a quick question - we have a clientcalled and for some reason they just aren't showing up in the search results. It's not a new domain and hasn't been penalised (or has reason for penalty). All the content is fresh and has no bad back links to the site. It is a new website and has been indexed by Google but for even for branded search terms, it just doesn't show up anywhere on page 1 (i think page 4). Any help or advise is great appreciated is it's doing my head in. We are using www.google.com.au. Kindest Regards
White Hat / Black Hat SEO | | kymodo0 -
Using a geolocation service to serve different banners in homepage. Dangers? Best Practices?
Hello, our website is used by customer in more than 100 countries. Becasuse the countries we serve are so many, we are using one single domain and homepage, without country specific content. Now, we are considering to use an geolocation service to identify the customer location and then to change the contents of one banner in the home page accordingly. Might this be dangerous from a SEO perspective? If yes, any suggesiton on how can we implement this to avoid troubles and penalties form the Search Engines? Thanks in advance for any help,Dario
White Hat / Black Hat SEO | | Darioz0 -
What happens when content on your website (and blog) is an exact match to multiple sites?
In general, I understand that having duplicate content on your website is a bad thing. But I see a lot of small businesses (specifically dentists in this example) who hire the same company to provide content to their site. They end up with the EXACT same content as other dentists. Here is a good example: http://www.hodnettortho.com/blog/2013/02/valentine’s-day-and-your-teeth-2/ http://www.braces2000.com/blog/2013/02/valentine’s-day-and-your-teeth-2/ http://www.gentledentalak.com/blog/2013/02/valentine’s-day-and-your-teeth/ If you google the title of that blog article you find tons of the same article all over the place. So, overall, doesn't this make the content on these blogs irrelevant? Does this hurt the SEO on these sites at all? What is the value of having completely unique content on your site/blog vs having duplicate content like this?
White Hat / Black Hat SEO | | MorganPorter0 -
How to Remove Unwanted Links
I dropped like a rock in Google rankings on the 24<sup>th</sup>
White Hat / Black Hat SEO | | rdominey
of April. After having to become familiar with Google webmaster tools and doing
allot of investigating I discovered that there is a website www.siteloki.com that has 6,742 links to my website. I have
tried to contact siteloki with no response. I tracked them on Whois to an
office suite in LA called the building to find that the suite listed is the
building management suite. I have had
the following sent to them via email, their contact page and posted on their website
forum and still no reply: Please take action to remove all links to this website
immediately! I have been notified by my client that your website has a
malicious attack using links from www.siteloki.com
against www.getyourphotosoncanvas.com. My client did not solicit these links, pay for these links or authorize any
third party to build links for them. They just appeared. The links are even
pointing to my client’s old website (same url). This is a big problem and I
don’t understand why these links exist. There are currently 6,471 links from
your domain. Please remove these links immediately or we will consider legal
action against your company. We have contacted Google on the behalf of our
client and informed them of this malicious act. I expect to see these links
removed immediately! Regards, I have submitted the site in the malware reporting section
of webmasters tools. I have searched but cannot find any documentation on how
to block this type of attack. It seems that Google failed to provide any means
for an honest website owner following the rules to block this type of attack and
as a result we have been unjustly penalized by Google with a drop to the bottom
in our page ranking. I would appreciate ANY HELP in removing these links and getting the Siteloki website blocked from linking to my website? Any Ideas?0