Removing a site from Google's index

issuebasedmedia

We have a site we'd like to have pulled from Google's index. Back in late June, we disallowed robot access to the site through the robots.txt file and added a robots meta tag with "no index,no follow" commands. The expectation was that Google would eventually crawl the site and remove it from the index in response to those tags. The problem is that Google hasn't come back to crawl the site since late May. Is there a way to speed up this process and communicate to Google that we want the entire site out of the index, or do we just have to wait until it's eventually crawled again?

issuebasedmedia

ok. Not abundantly clear upon first reading. Thank you for your help.

RyanKent

Thank you for pointing that out Arlene. I do see it now.

The statement before that line is of key importance for an accurate quote. "If you own the site, you can verify your ownership in Webmaster Tools and use the verified URL removal tool to remove an entire directory from Google's search results."

It could be worded better but what they are saying is AFTER your site has already been removed from Google's index via the URL removal tool THEN you can block it with robots.txt. The URL removal tool will remove the pages and keep them out of the index for 90 days. That's when changing the robots.txt file can help.

issuebasedmedia

"Note: To ensure your directory or site is permanently removed, you should use robots.txt to block crawler access to the directory (or, if you’re removing a site, to your whole site)."

The above is a quote from the page. You have to expand the section I referenced in my last comment. Just re-posting google's own words.

RyanKent

I thought you were offering a quote from the page. It seems that is your summarization. I apologize for my misunderstanding.

I can see how you can make that conclusion but it not accurate. Robots.txt does not ensure a page wont get indexed. I always recommend use of the noindex tag which should be 100% effective for the major search engines.

issuebasedmedia

Go here: http://www.google.com/support/webmasters/bin/answer.py?answer=164734

Then expand the option down below that says: "<a class="zippy zippy-track zippy-collapse" name="RemoveDirectory">I want to remove an entire site or the contents of a directory from search results"</a>

They basically instruct you to block all robots in the robots.txt file, then request removal of your site. Once it's removed, the robots file will keep it from getting back into the index. They also recommend putting a "noindex" meta tag on each page to ensure nothing will get picked up. I think we have it taken care of at this point. We'll see

RyanKent

Arlene, I checked the link you offered but I could not locate the quote you offered anywhere on the page. I am sure it is referring to a different context. Using robots.txt as a blocking tool is fine BEFORE a site or page is indexed, but not after.

issuebasedmedia

I used the removal tool and just entered a "/" which put in a request to have everything in all of my site's directories pulled from the index. And I have left "noindex" tags in place on every page. Hopefully this will get it done.

Thanks for your comments guys!

issuebasedmedia

We blocked robots from accessing the site because Google told us to. This is straight from the webmaster tools help section:

Note: To ensure your directory or site is permanently removed, you should use robots.txt to block crawler access to the directory (or, if you’re removing a site, to your whole site).

issuebasedmedia

I have webmaster tools setup, but I don't see an option to remove the whole site. There is a URL removal tool, but there are over 700 pages I want pulled out of the index. Is there an option in webmaster tools to have the whole site pulled from the index?

KeriMorgret

Actually, since you have access to the site, you can leave the robots.txt at disallowed -- if you go into Google Webmaster Tools, verify your site, and request removal of your entire site. Let me know if you'd like a link on this with more information. This will involve adding an html file or meta tag to your site to verify you have ownership.

issuebasedmedia

Thank you. Didn't realize we were shooting ourselves in the foot.

RyanKent

Hi Arlene.

The problem is that when you blocked the site with robots.txt, you are preventing Google from re-crawling your site so they cannot see the noindex tag. If you have properly placed the noindex tag on all the pages in your site, then modify your robots.txt file to allow Google to see your site. Once that happens Google will begin crawling your site and then be able to deindex your pages.

The only other suggestion is to submit a sitemap and/or remove the "nofollow" tag. With the nofollow tag on all your pages, Google may visit your site for a single page at a time since you are telling the crawler not to follow any links it finds. You are blocking it's normal discovery of your site.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Removing a site from Google's index

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

How to effectively de-index in Magento site?

Why is the Dev site indexing and not my actual Domain

Google's ability to crawl AJAX rendered content

A site is not being indexed by Google Yahoo or Bing

Will Google index a site with white text? Will it give it bad ratings?

Should we use Google's crawl delay setting?

How do I 301 url's with numbers in them?

A question about RSS feeds and nofollow's