Duplicate Content due to Panda update!
-
I can see that a lot of you are worrying about this new Panda update just as I am!
I have such a headache trying to figure this one out, can any of you help me?
I have thousands of pages that are "duplicate content" which I just can't for the life of me see how... take these two for example:
http://www.eteach.com/Employer.aspx?EmpNo=18753
http://www.eteach.com/Employer.aspx?EmpNo=31241
My campaign crawler is telling me these are duplicate content pages because of the same title (which that I can see) and because of the content (which I can't see).
Can anyone see how Google is interpreting these two pages as duplicate content??
Stupid Panda!
-
Hi Virginia
This is frustrating indeed as it certainly doesn't look like you've used duplicate content in a malicious way.
To understand why Google might be seeing these pages as duplicate content, let's take a look at the pages through the Google bot's eyes:
Google Crawl for page 1
Google Crawl for page 2What you'll see here is that Google is reading the entirety of both pages, with the only difference being a logo that it can't see and a name + postal address. The rest of the page is duplicate. This should point out that Google reads things like site navigation menus and footers and interprets them, for the purpose of Panda, as "content".
This doesn't mean that you should have a different navigation on every page (that wouldn't be feasible). But it does mean that you need to have enough unique content on each page to show Google that the pages are not duplicate and contain content. I can't give you a % on this, but let's say roughly content that is 300-400 words long would do the trick.
Now, this might be feasible for some of your pages, but for the two pages you've linked to above, there simply isn't enough you could write about. Similarly, because the URL generates a random query for each employer, you could potentially have hundreds or thousands of pages you'd need to add content to, which is a hell of a lot of work.
So here's what I'd do. I'd get a list of each URL on your site that could be seen as "duplicate" content, like the ones above. Be as harsh in judging this as Google would be. I'd then decide whether you can add further content to these pages or not. For description pages or "about us" pages, you can perhaps add a bit more. For URLs like the ones above, you should do the following:
In the header of each of these URLs you've identified, add this code:
This tells the Googlebot not to crawl or index the URLs. In doing that, it won't rank it in the index and it won't see it as duplicate content. This would be perfect for the URLs you've given above as I very much doubt you'd ever want to rank these pages, so you can safely noindex and nofollow them. Furthermore, as these URLs are created from queries, I am assuming that you may have one "master" page that the URLs are generated from. This may mean that you would only need to add the meta code to this one page for it to apply to all of them. I'm not certain on this and you should clarify with your developers and/or whoever runs your CMS. The important thing, however, is to have the meta tags applied to all those duplicate content URLs that you don't want to rank for. For those that you do want to rank for, you will need to add more unique content to those pages in order to stop it being flagged as duplicate.
As always, there's a great Moz post on how to deal with duplication issues right here.
Hope this helps Virginia and if you have any more questions, feel free to ask me!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Are online tools considered thin content?
My website has a number of simple converters. For example, this one converts spaces to commas
White Hat / Black Hat SEO | | ConvertTown
https://convert.town/replace-spaces-with-commas Now, obviously there are loads of different variations I could create of this:
Replace spaces with semicolons
Replace semicolons with tabs
Replace fullstops with commas Similarly with files:
JSON to XML
XML to PDF
JPG to PNG
JPG to TIF
JPG to PDF
(and thousands more) If somoene types one of those into Google, they will be happy because they can immediately use the tool they were hunting for. It is obvious what these pages do so I do not want to clutter the page up with unnecessary content. However, would these be considered doorway pages or thin content or would it be acceptable (from an SEO perspective) to generate 1000s of pages based on all the permutations?1 -
Plugin to duplicate CMS pages, changing the location
Hi all, We have recently noticed a rise in local business websites using a plugin to duplicate hundreds of pages changing only the location in the h1 tag and the page description, we're pretty sure this is a black hat technique allowing them to rank for all locations (although the duplicate page content must not be doing them any favours). An example of this is http://www.essexcarrecovery.co.uk We would like to know what plugin they are using as we think there may be better ways to use this, we may be able to create original location pages faster than we do now? Also why does not seem to be too detrimental to the businesses SEO as surely this method should be damaging?
White Hat / Black Hat SEO | | birdmarketing0 -
How to re-rank an established website with new content
I can't help but feel this is a somewhat untapped resource with a distinct lack of information.
White Hat / Black Hat SEO | | ChimplyWebGroup
There is a massive amount of information around on how to rank a new website, or techniques in order to increase SEO effectiveness, but to rank a whole new set of pages or indeed to 're-build' a site that may have suffered an algorithmic penalty is a harder nut to crack in terms of information and resources. To start I'll provide my situation; SuperTED is an entertainment directory SEO project.
It seems likely we may have suffered an algorithmic penalty at some point around Penguin 2.0 (May 22nd) as traffic dropped steadily since then, but wasn't too aggressive really. Then to coincide with the newest Panda 27 (According to Moz) in late September this year we decided it was time to re-assess tactics to keep in line with Google's guidelines over the two years. We've slowly built a natural link-profile over this time but it's likely thin content was also an issue. So beginning of September up to end of October we took these steps; Contacted webmasters (and unfortunately there was some 'paid' link-building before I arrived) to remove links 'Disavowed' the rest of the unnatural links that we couldn't have removed manually. Worked on pagespeed as per Google guidelines until we received high-scores in the majority of 'speed testing' tools (e.g WebPageTest) Redesigned the entire site with speed, simplicity and accessibility in mind. Htaccessed 'fancy' URLs to remove file extensions and simplify the link structure. Completely removed two or three pages that were quite clearly just trying to 'trick' Google. Think a large page of links that simply said 'Entertainers in London', 'Entertainers in Scotland', etc. 404'ed, asked for URL removal via WMT, thinking of 410'ing? Added new content and pages that seem to follow Google's guidelines as far as I can tell, e.g;
Main Category Page Sub-category Pages Started to build new links to our now 'content-driven' pages naturally by asking our members to link to us via their personal profiles. We offered a reward system internally for this so we've seen a fairly good turnout. Many other 'possible' ranking factors; such as adding Schema data, optimising for mobile devices as best we can, added a blog and began to blog original content, utilise and expand our social media reach, custom 404 pages, removed duplicate content, utilised Moz and much more. It's been a fairly exhaustive process but we were happy to do so to be within Google guidelines. Unfortunately, some of those link-wheel pages mentioned previously were the only pages driving organic traffic, so once we were rid of these traffic has dropped to not even 10% of what it was previously. Equally with the changes (htaccess) to the link structure and the creation of brand new pages, we've lost many of the pages that previously held Page Authority.
We've 301'ed those pages that have been 'replaced' with much better content and a different URL structure - http://www.superted.com/profiles.php/bands-musicians/wedding-bands to simply http://www.superted.com/profiles.php/wedding-bands, for example. Therefore, with the loss of the 'spammy' pages and the creation of brand new 'content-driven' pages, we've probably lost up to 75% of the old website, including those that were driving any traffic at all (even with potential thin-content algorithmic penalties). Because of the loss of entire pages, the changes of URLs and the rest discussed above, it's likely the site looks very new and probably very updated in a short period of time. What I need to work out is a campaign to drive traffic to the 'new' site.
We're naturally building links through our own customerbase, so they will likely be seen as quality, natural link-building.
Perhaps the sudden occurrence of a large amount of 404's and 'lost' pages are affecting us?
Perhaps we're yet to really be indexed properly, but it has been almost a month since most of the changes are made and we'd often be re-indexed 3 or 4 times a week previous to the changes.
Our events page is the only one without the new design left to update, could this be affecting us? It potentially may look like two sites in one.
Perhaps we need to wait until the next Google 'link' update to feel the benefits of our link audit.
Perhaps simply getting rid of many of the 'spammy' links has done us no favours - I should point out we've never been issued with a manual penalty. Was I perhaps too hasty in following the rules? Would appreciate some professional opinion or from anyone who may have experience with a similar process before. It does seem fairly odd that following guidelines and general white-hat SEO advice could cripple a domain, especially one with age (10 years+ the domain has been established) and relatively good domain authority within the industry. Many, many thanks in advance. Ryan.0 -
Can I use content from an existing site that is not up anymore?
I want to take down a current website and create a new site or two (with new url, ip, server). Can I use the content from the deleted site on the new sites since I own it? How will Google see that?
White Hat / Black Hat SEO | | RoxBrock0 -
Have just submitted Disavow file to Google: Shall I wait until after they have removed bad links to start new content lead SEO campaign?
Hi guys, I am currently conducting some SEO work for a client. Their previous SEO company had built a lot of low quality/spam links to their site and as a result their rankings and traffic have dropped dramatically. I have analysed their current link profile, and have submitted the spammiest domains to Google via the Disavow tool. The question I had was.. Do I wait until Google removes the spam links that I have submitted, and then start the new content based SEO campaign. Or would it be okay to start the content based SEO campaign now, even though the current spam links havent been removed yet.. Look forward to your replies on this...
White Hat / Black Hat SEO | | sanj50500 -
Publishing the same article content on Yahoo? Worth It? Penalties? Urgent
Hey All, I am currently working for a company and they are publishing exactly the same content on their website and yahoo. In addition to this when I put the same article's title it gets outranked by Yahoo. Isn't against Google guidelines? I think Yahoo also gets more than us since they are on the first position. How do you think should the company stop this practice? Please need urgent responses for these questions. Also look at the attachment and look at the snippets. We have a snippet (description) like the first paragraph but yahoo somehow scans the content and creates meta descriptions based on the search queries. How do they do That?
White Hat / Black Hat SEO | | moneywise_test0 -
Panda Recovery: Is a reconsideration request necessary?
Hi everyone, I run a 12-year old travel site that primarily publishes hotel reviews and blog posts about ways to save when traveling in Europe. We have a domain authority of 65 and lots of high quality links from major news websites (NYT, USA Today, NPR, etc.). We always ranked well for competitive searches like "cheap hotels in Paris," etc., for many, many years (like 10 years). Things started falling two years ago (April 2011)--I thought it was just normal algorithmic changes, and that our pages were being devalued (and perhaps, it was). So, we continued to bulk up our reviews and other key pages, only to see things continue to slide. About a month ago I lined up all of our inbound search traffic from Google Analytics and compared it to SEO Moz's timeline of Google updates. Turns out every time there was a Panda roll-out (from the second one in April 2011) our traffic tumbled. Other updates (Penguin, etc.) didn't seem to make a difference. But why should our content that we invest so much in take a hit from Panda? It wasn't "thin." But thin content existed elsewhere on our site: We had a flights section with 40,000 pages of thin content, cranked out of our database with virtually no unique content. We had launched that section in 2008, and it had never been an issue (and had mostly been ignored), but now, I believed, it was working against us. My understanding is that any thin content can actually work against the entire site's rankings. In summary: We had 40,000 thin flights pages, 2,500 blog posts (rich content), and about 2,500 hotel-related pages (rich and well researched "expert" content). So, two weeks ago we dropped almost the entire flights section. We kept about 400 pages (of the 40,000) with researched, unique and well-written information, and we 410'd the rest. Following the advice of so many others on these boards, we put the "thin" flights pages in their own sitemap so we could watch their index number fall in Webmaster tools. And we watched (with some eagerness and trepidation) as the error count shot up. Google has found about half of them at this point. Last week I submitted a "reconsideration request" to Google's spam team. I wasn't sure if this was necessary (as the whole point of dropping the pages, 410'ing and so forth was to fix it on our end, which would hopefully filter down through the SERPs eventually). However, I thought it was worth sending them a note explaining the actions we had taken, just in case. Today I received a response from them. It includes: "We reviewed your site and found no manual actions by the webspam team that might affect your site's ranking in Google. There's no need to file a reconsideration request for your site, because any ranking issues you may be experiencing are not related to a manual action taken by the webspam team. Of course, there may be other issues with your site that affect your site's ranking. Google's computers determine the order of our search results using a series of formulas known as algorithms. We make hundreds of changes to our search algorithms each year, and we employ more than 200 different signals when ranking pages. As our algorithms change and as the web (including your site) changes, some fluctuation in ranking can happen as we make updates to present the best results to our users. If you've experienced a change in ranking which you suspect may be more than a simple algorithm change, there are other things you may want to investigate as possible causes, such as a major change to your site's content, content management system, or server architecture. For example, a site may not rank well if your server stops serving pages to Googlebot, or if you've changed the URLs for a large portion of your site's pages..." And thus, I'm a bit confused. If they say that there wasn't any manual action taken, is that a bad thing for my site? Or is it just saying that my site wasn't experiencing a manual penalty, however Panda perhaps still penalized us (through a drop in rankings) -- and Panda isn't considered "manual." Could the 410'ing of 40,000 thin pages actually raise some red flags? And finally, how long do these issues usually take to clear up? Pardon the very long question and thanks for any insights. I really appreciate the advice offered in these forums.
White Hat / Black Hat SEO | | TomNYC0 -
We seem to have been hit by the penguin update can someone please help?
HiOur website www.wholesaleclearance.co.uk has been hit by the penguin update, I'm not a SEO expert and when I first started my SEO got court up buying blog links, that was about 2 years ago and since them and worked really hard to get good manual links.Does anyone know of a way to dig out any bad links so I can get them removed, any software that will give me a list of any of you guys want to do take a look for me? I'm willing to pay for the work.Kind RegardsKarl.
White Hat / Black Hat SEO | | wcuk0