Duplicate Content due to Panda update!

Eteach_Marketing

I can see that a lot of you are worrying about this new Panda update just as I am!

I have such a headache trying to figure this one out, can any of you help me?

I have thousands of pages that are "duplicate content" which I just can't for the life of me see how... take these two for example:

http://www.eteach.com/Employer.aspx?EmpNo=18753

http://www.eteach.com/Employer.aspx?EmpNo=31241

My campaign crawler is telling me these are duplicate content pages because of the same title (which that I can see) and because of the content (which I can't see).

Can anyone see how Google is interpreting these two pages as duplicate content??

Stupid Panda!

TomRayner

Hi Virginia

This is frustrating indeed as it certainly doesn't look like you've used duplicate content in a malicious way.

To understand why Google might be seeing these pages as duplicate content, let's take a look at the pages through the Google bot's eyes:

Google Crawl for page 1
Google Crawl for page 2

What you'll see here is that Google is reading the entirety of both pages, with the only difference being a logo that it can't see and a name + postal address. The rest of the page is duplicate. This should point out that Google reads things like site navigation menus and footers and interprets them, for the purpose of Panda, as "content".

This doesn't mean that you should have a different navigation on every page (that wouldn't be feasible). But it does mean that you need to have enough unique content on each page to show Google that the pages are not duplicate and contain content. I can't give you a % on this, but let's say roughly content that is 300-400 words long would do the trick.

Now, this might be feasible for some of your pages, but for the two pages you've linked to above, there simply isn't enough you could write about. Similarly, because the URL generates a random query for each employer, you could potentially have hundreds or thousands of pages you'd need to add content to, which is a hell of a lot of work.

So here's what I'd do. I'd get a list of each URL on your site that could be seen as "duplicate" content, like the ones above. Be as harsh in judging this as Google would be. I'd then decide whether you can add further content to these pages or not. For description pages or "about us" pages, you can perhaps add a bit more. For URLs like the ones above, you should do the following:

In the header of each of these URLs you've identified, add this code:

This tells the Googlebot not to crawl or index the URLs. In doing that, it won't rank it in the index and it won't see it as duplicate content. This would be perfect for the URLs you've given above as I very much doubt you'd ever want to rank these pages, so you can safely noindex and nofollow them. Furthermore, as these URLs are created from queries, I am assuming that you may have one "master" page that the URLs are generated from. This may mean that you would only need to add the meta code to this one page for it to apply to all of them. I'm not certain on this and you should clarify with your developers and/or whoever runs your CMS. The important thing, however, is to have the meta tags applied to all those duplicate content URLs that you don't want to rank for. For those that you do want to rank for, you will need to add more unique content to those pages in order to stop it being flagged as duplicate.

As always, there's a great Moz post on how to deal with duplication issues right here.

Hope this helps Virginia and if you have any more questions, feel free to ask me!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Duplicate Content due to Panda update!

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Fred Update & Ecommerce

Duplicate keywords in URL?

On-site duplication working - not penalised - any ideas?

Content optimized for old keywords and G Updates

Separate Servers for Humans vs. Bots with Same Content Considered Cloaking?

Noindexing Thin Content Pages: Good or Bad?

Duplicate content showing on local pages

IP-Based Content on Homepage?