More Indexed Pages than URLs on site.

DavidLenehan

According to webmaster tools, the number of pages indexed by Google on my site doubled yesterday (gone from 150K to 450K). Usually I would be jumping for joy but now I have more indexed pages than actual pages on my site.

I have checked for duplicate URLs pointing to the same product page but can't see any, pagination in category pages doesn't seem to be indexed nor does parameterisation in URLs from advanced filtration.

Using the site: operator we get a different result on google.com (450K) to google.co.uk (150K).

Anyone got any ideas?

LynnPatchett

Hi David,

Its tough to say without some more digging and information, it certainly looks like you have most of the common problem areas covered from what I can see. I will throw out an idea: I see you have a few 301 redirects in place switching from .html to non html versions. If this was done on a massive scale then possibly you have a google index with both versions of the pages in the index? If so it might not really be a big issue and over the next weeks/months the old .html versions will fall out of the index and your numbers will begin to look more normal again, Just a thought.

DavidLenehan

Thanks Lynn. The 31,000 was a bit of a legacy of issue and something we have solved. The robots file was changed a couple of weeks ago. So fingers crossed Google will deindex them soon. We get the same result when using inurl: where.

Any idea where the rest have come from?

DavidLenehan

Hi Irving

We checked everything obvious and cannot explain what is going on. I cannot see any major duplicate content issues and we do not have any subdomains active. The Moz crawler also doesn't highlight any major duplicate content issues.

LynnPatchett

Hi David,

Not sure why they started showing up now (some recent changes to the site?) but I suspect your problem is indexed urls that you are trying to block with robots.txt but are finding their way into the index somehow.

If you do a search for: site:nicontrols.com inurl:/manufacturer/ and then click on the show omitted results you will see a whole bunch (31000!) of 'content blocked by robots.txt' notices but the urls are still in the index. If you do a couple more similar searches looking for other likely url paths you will likely find some more.

If you can get a no-index meta tag into these pages I think it will be more effective in keeping them out of the index. If you have in mind some recent changes you have done to the site that might have introduced internal links to these pages then it would be worth looking to see if you can get the links removed or replaced with the 'proper' link format.

Hope that helps!

irvingw

Can you see in the search the pages which are indexed and look for duplicates or technical issues causing improper indexing? Do you have other sites like subdomains Google might be counting as pages.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

More Indexed Pages than URLs on site.

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

I need help on how best to do a complicated site migration. Replacing certain pages with all new content and tools, and keeping the same URL's. The rest just need to disappear safely. Somehow.

Do uncrawled but indexed pages affect seo?

Best way to link to 1000 city landing pages from index page in a way that google follows/crawls these links (without building country pages)?

Old URLs that have 301s to 404s not being de-indexed.

Removing pages from index

What is the tool to check if a page (ex. a dynamic page) will properly be indexed by Google?

Duplicate Content From Indexing of non- File Extension Page

How to have pages re-indexed