How to get a large number of urls out of Google's Index when there are no pages to noindex tag?

94501

Hi,

I'm working with a site that has created a large group of urls (150,000) that have crept into Google's index. If these urls actually existed as pages, which they don't, I'd just noindex tag them and over time the number would drift down.

The thing is, they created them through a complicated internal linking arrangement that adds affiliate code to the links and forwards them to the affiliate. GoogleBot would crawl a link that looks like it's to the client's same domain and wind up on Amazon or somewhere else with some affiiiate code. GoogleBot would then grab the original link on the clients domain and index it... even though the page served is on Amazon or somewhere else. Ergo, I don't have a page to noindex tag.

I have to get this 150K block of cruft out of Google's index, but without actual pages to noindex tag, it's a bit of a puzzler.

Any ideas? Thanks! Best... Michael

P.S.,

All 150K urls seem to share the same url pattern... exmpledomain.com/item/... so /item/ is common to all of them, if that helps.

effectdigital

If no pages which support web coding actually exist for the URLs you want to remove from Google's index, I'd probably use the HTTP header instead. Maybe use the X-Robots directives:

Even if you have no page with web-code, you can always have a HTTP Header. A HTTP header simply allows a client and / or server to fire additional information through 'requests' (post / get etc).

This is the only thing I can think of which would really help. Some people might suggest robots.txt wildcards, but robots.txt handles crawling and not indexation (so those answers wouldn't really be worth anything to you)

The other thing you could do (maybe combine this with the X-Robots stuff) is to get all of those URLs to serve status code 410 (gone) instead of 404 (temporarily gone, but coming back)

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How to get a large number of urls out of Google's Index when there are no pages to noindex tag?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Removing indexed internal search pages from Google when it's driving lots of traffic?

Google slow to index pages

How to 301 Redirect /page.php to /page, after a RewriteRule has already made /page.php accessible by /page (Getting errors)

How long does google index old urls?

Should we use URL parameters or plain URL's=

Merging your google places page with google plus page.

How to get around Google Removal tool not removing redirected and 404 pages? Or if you don't know the anchor text?

What's going on with my organic traffic from Google?