How to get a large number of urls out of Google's Index when there are no pages to noindex tag?

94501

Hi,

I'm working with a site that has created a large group of urls (150,000) that have crept into Google's index. If these urls actually existed as pages, which they don't, I'd just noindex tag them and over time the number would drift down.

The thing is, they created them through a complicated internal linking arrangement that adds affiliate code to the links and forwards them to the affiliate. GoogleBot would crawl a link that looks like it's to the client's same domain and wind up on Amazon or somewhere else with some affiiiate code. GoogleBot would then grab the original link on the clients domain and index it... even though the page served is on Amazon or somewhere else. Ergo, I don't have a page to noindex tag.

I have to get this 150K block of cruft out of Google's index, but without actual pages to noindex tag, it's a bit of a puzzler.

Any ideas? Thanks! Best... Michael

P.S.,

All 150K urls seem to share the same url pattern... exmpledomain.com/item/... so /item/ is common to all of them, if that helps.

effectdigital

If no pages which support web coding actually exist for the URLs you want to remove from Google's index, I'd probably use the HTTP header instead. Maybe use the X-Robots directives:

Even if you have no page with web-code, you can always have a HTTP Header. A HTTP header simply allows a client and / or server to fire additional information through 'requests' (post / get etc).

This is the only thing I can think of which would really help. Some people might suggest robots.txt wildcards, but robots.txt handles crawling and not indexation (so those answers wouldn't really be worth anything to you)

The other thing you could do (maybe combine this with the X-Robots stuff) is to get all of those URLs to serve status code 410 (gone) instead of 404 (temporarily gone, but coming back)

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How to get a large number of urls out of Google's Index when there are no pages to noindex tag?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Google Only Indexing Canonical Root URL Instead of Specified URL Parameters

Should I use noindex or robots to remove pages from the Google index?

How necessary is it to disavow links in 2017? Doesn't Google's algorithm take care of determining what it will count or not?

Is it a problem that Google's index shows paginated page urls, even with canonical tags in place?

How should I handle URL's created by an internal search engine?

Does hiding responsive design elements on smaller media types impact Google's mobile crawler?

How can Google index a page that it can't crawl completely?

Amount of pages indexed for classified (number of pages for the same query)