Why are so many pages indexed?

MichaelWeisbaum

We recently launched a new website and it doesn't consist of that many pages. When you do a "site:" search on Google, it shows 1,950 results. Obviously we don't want this to be happening. I have a feeling it's effecting our rankings. Is this just a straight up robots.txt problem? We addressed that a while ago and the number of results aren't going down. It's very possible that we still have it implemented incorrectly. What are we doing wrong and how do we start getting pages "un-indexed"?

DougRoberts

What's to stop google from finding them? They're out there and available on the internet!

Block or remove pages using a robots.txt file

You can do this by putting:

User-agent: *
Disallow: /

in the robots.txt file.

You might also want to stop humans from accessing the content too - can you put this content behind a password using htaccess or block access based on network address?

KeriMorgret

Sounds like you need to put a robots.txt on those subdomains (and maybe consider some type of login too).

Quick fix: put a robots.txt on the subdomains to block them from being indexed. Go into Google Webmaster Tools and verify each subdomain as its own site, then request removal of each of those subdomains (which should be approved, since you've already blocked it in robots.txt).

I took a quick look at lab.capacity.com/robots.txt and it isn't blocking the entire subdomain, though the robots.txt at fb.capacitr.com is.

MichaelWeisbaum

I most certainly do not want those pages indexed, they're used for internal purposes only. That's exactly what I'm trying to figure out here. Why are those subdomains being indexed? They should obviously be private. Any insights would be great.

Thanks!

DougRoberts

What are are you searching for? I notice that if you do a site:.capacitr.com you get the 1,950 results you mention above.

If you do a search for site:www.capacitr.com then you only get 29 results.

Its looks like there's a whole load of pages being indexed on other subdomains - fb.capacitr.com and lab.capacity.com. (Which has 1,860 pages!)

What are these used for, do you really want these in the index!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Why are so many pages indexed?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Removing indexed internal search pages from Google when it's driving lots of traffic?

Two high ranking pages instantly dropped from index - no manual penalty notification

How will canonicalizing an https page affect the SERP-ranked http version of that page?

Do I need to remove pages that don't get any traffic from the index?

Indexed Pages Different when I perform a "site:Google.com" site search - why?

"No index" page still shows in search results and paginated pages shows page 2 in results

De-indexing product "quick view" pages

Is 404'ing a page enough to remove it from Google's index?