Huge number of indexed pages with no content

Dilbak

Hi,

We have accidentally had Google indexed lots os our pages with no useful content at all on them.

The site in question is a directory site, where we have tags and we have cities. Some cities have suppliers for almost all the tags, but there are lots of cities, where we have suppliers for only a handful of tags.

The problem occured, when we created a page for each cities, where we list the tags as links.

Unfortunately, our programmer listed all the tags, so not only the ones, where we have businesses, offering their services, but all of them!

We have 3,142 cities and 542 tags. I guess, that you can imagine the problem this caused!

Now I know, that Google might simply ignore these empty pages and not crawl them again, but when I check a city (city site:domain) with only 40 providers, I still have 1,050 pages indexed. (Yes, we have some issues between the 550 and the 1050 as well, but first things first:))

These pages might not be crawled again, but will be clicked, and bounces and the whole user experience in itself will be terrible.

My idea is, that I might use meta noindex for all of these empty pages and perhaps also have a 301 redirect from all the empty category pages, directly to the main page of the given city.

Can this work the way I imagine? Any better solution to cut this really bad nightmare short?

Thank you in advance.

Andras

Dilbak

Thank you again, John. I will fix this, based on our discussion.

JoshPugh

NoIndex I think is slightly superfluous as the 301 will take care of it and also point people to a proper result and give Google a redirected result.

However SEOMoz's Robots information page page suggests:

"In most cases, meta robots with parameters "noindex, follow" should be employed as a way to to restrict crawling or indexation."

So maybe consider that...

As for Robots, you can check out SEOMoz's Robots information page where it has information on wildcards, which you could use, which I THINK would work (i.e. http://domain.com/*/tags ?

Not quite sure on that last bit though...

Dilbak

Thank you for your reply, Josh.

I will then use the 301, but should I also use the noindex tag for these pages to be removed from the index?

Does it make an emphasis on my intention, or it adds no extra to the process? Perhaps, they should not be used together at all, as basically they are meant for different tasks.

(Unfortunatyly, robots.txt is not really a solution, as we have the following url structure:

www.example.com/city/tag

Since all the cities have at least a couple of valid tags, I can't specify the path to be excluded from indexing. I would also try not to add 2,000+ cities individually.

As for GWT, url removal for this number of pages might also not be an option, as I have minimum 100,000+ no-value pages to be removed (the limit is 500 per month).)

JoshPugh

I would agree, just setup a 301 redirect so that users don't bounce and actually get directed to something remotely useful, even just a listing of all the tags around the site or a home page or something (even if you do the below, to ensure users who stumble on these pages are still happy).

You could also use a robots.txt file to show which ones you don't want to be indexed, and finally you may also use Google's Webmaster Tools to manually remove particular pages!

A combo of all of those will work a treat!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Huge number of indexed pages with no content

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Dropdown content on page being crawled

Very wierd pages. 2900 403 errors in page crawl for a site that only has 140 pages.

Any idea why pages are not being indexed?

404 Error Pages being picked up as duplicate content

2 links on home page to each category page ..... is page rank being watered down?

Is using a customer quote on multiple pages duplicate content?

Advice on importing content please to keep page fresh

De-indexing thin content & Panda--any advantage to immediate de-indexing?