Why are so many pages indexed?

MichaelWeisbaum

We recently launched a new website and it doesn't consist of that many pages. When you do a "site:" search on Google, it shows 1,950 results. Obviously we don't want this to be happening. I have a feeling it's effecting our rankings. Is this just a straight up robots.txt problem? We addressed that a while ago and the number of results aren't going down. It's very possible that we still have it implemented incorrectly. What are we doing wrong and how do we start getting pages "un-indexed"?

DougRoberts

What's to stop google from finding them? They're out there and available on the internet!

Block or remove pages using a robots.txt file

You can do this by putting:

User-agent: *
Disallow: /

in the robots.txt file.

You might also want to stop humans from accessing the content too - can you put this content behind a password using htaccess or block access based on network address?

KeriMorgret

Sounds like you need to put a robots.txt on those subdomains (and maybe consider some type of login too).

Quick fix: put a robots.txt on the subdomains to block them from being indexed. Go into Google Webmaster Tools and verify each subdomain as its own site, then request removal of each of those subdomains (which should be approved, since you've already blocked it in robots.txt).

I took a quick look at lab.capacity.com/robots.txt and it isn't blocking the entire subdomain, though the robots.txt at fb.capacitr.com is.

MichaelWeisbaum

I most certainly do not want those pages indexed, they're used for internal purposes only. That's exactly what I'm trying to figure out here. Why are those subdomains being indexed? They should obviously be private. Any insights would be great.

Thanks!

DougRoberts

What are are you searching for? I notice that if you do a site:.capacitr.com you get the 1,950 results you mention above.

If you do a search for site:www.capacitr.com then you only get 29 results.

Its looks like there's a whole load of pages being indexed on other subdomains - fb.capacitr.com and lab.capacity.com. (Which has 1,860 pages!)

What are these used for, do you really want these in the index!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Why are so many pages indexed?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Will google be able to crawl all of the pages given that the pages displayed or the info on a page varies according to the city of a user?

How will canonicalizing an https page affect the SERP-ranked http version of that page?

How long to re-index a page after being blocked

PR Dilution and Number of Pages Indexed

Cleaning up /index.html on home page

Why are new pages not being indexed, and old pages (now in robots.txt) remain in the index?

Should the sitemap include just menu pages or all pages site wide?

Google replacing subpages in index with home page?