Removing a site from Google's index
-
We have a site we'd like to have pulled from Google's index. Back in late June, we disallowed robot access to the site through the robots.txt file and added a robots meta tag with "no index,no follow" commands. The expectation was that Google would eventually crawl the site and remove it from the index in response to those tags. The problem is that Google hasn't come back to crawl the site since late May. Is there a way to speed up this process and communicate to Google that we want the entire site out of the index, or do we just have to wait until it's eventually crawled again?
-
ok. Not abundantly clear upon first reading. Thank you for your help.
-
Thank you for pointing that out Arlene. I do see it now.
The statement before that line is of key importance for an accurate quote. "If you own the site, you can verify your ownership in Webmaster Tools and use the verified URL removal tool to remove an entire directory from Google's search results."
It could be worded better but what they are saying is AFTER your site has already been removed from Google's index via the URL removal tool THEN you can block it with robots.txt. The URL removal tool will remove the pages and keep them out of the index for 90 days. That's when changing the robots.txt file can help.
-
"Note: To ensure your directory or site is permanently removed, you should use robots.txt to block crawler access to the directory (or, if you’re removing a site, to your whole site)."
The above is a quote from the page. You have to expand the section I referenced in my last comment. Just re-posting google's own words.
-
I thought you were offering a quote from the page. It seems that is your summarization. I apologize for my misunderstanding.
I can see how you can make that conclusion but it not accurate. Robots.txt does not ensure a page wont get indexed. I always recommend use of the noindex tag which should be 100% effective for the major search engines.
-
Go here: http://www.google.com/support/webmasters/bin/answer.py?answer=164734
Then expand the option down below that says: "<a class="zippy zippy-track zippy-collapse" name="RemoveDirectory">I want to remove an entire site or the contents of a directory from search results"</a>
They basically instruct you to block all robots in the robots.txt file, then request removal of your site. Once it's removed, the robots file will keep it from getting back into the index. They also recommend putting a "noindex" meta tag on each page to ensure nothing will get picked up. I think we have it taken care of at this point. We'll see
-
Arlene, I checked the link you offered but I could not locate the quote you offered anywhere on the page. I am sure it is referring to a different context. Using robots.txt as a blocking tool is fine BEFORE a site or page is indexed, but not after.
-
I used the removal tool and just entered a "/" which put in a request to have everything in all of my site's directories pulled from the index. And I have left "noindex" tags in place on every page. Hopefully this will get it done.
Thanks for your comments guys!
-
We blocked robots from accessing the site because Google told us to. This is straight from the webmaster tools help section:
Note: To ensure your directory or site is permanently removed, you should use robots.txt to block crawler access to the directory (or, if you’re removing a site, to your whole site).
-
I have webmaster tools setup, but I don't see an option to remove the whole site. There is a URL removal tool, but there are over 700 pages I want pulled out of the index. Is there an option in webmaster tools to have the whole site pulled from the index?
-
Actually, since you have access to the site, you can leave the robots.txt at disallowed -- if you go into Google Webmaster Tools, verify your site, and request removal of your entire site. Let me know if you'd like a link on this with more information. This will involve adding an html file or meta tag to your site to verify you have ownership.
-
Thank you. Didn't realize we were shooting ourselves in the foot.
-
Hi Arlene.
The problem is that when you blocked the site with robots.txt, you are preventing Google from re-crawling your site so they cannot see the noindex tag. If you have properly placed the noindex tag on all the pages in your site, then modify your robots.txt file to allow Google to see your site. Once that happens Google will begin crawling your site and then be able to deindex your pages.
The only other suggestion is to submit a sitemap and/or remove the "nofollow" tag. With the nofollow tag on all your pages, Google may visit your site for a single page at a time since you are telling the crawler not to follow any links it finds. You are blocking it's normal discovery of your site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Godaddy and Soft 404's
Hello, We've found that a website we manage has a list of not-found URLS in Google webmaster tools which are "soft 404's " according to Google. I went to the hosting company GoDaddy to explain and to see what they could do. As far as I can see GoDaddy's server are responding with a 200 HTTP error code - meaning that the page exists and was served properly. They have sort of disowned this as their problem. Their server is not serving up a true 404 response. This is a WordPress site. 1) Has anyone seen this problem before with GoDaddy?Is it a GoDaddy problem?2) Do you know a way to sort this issue? When I use the command site:mydomain.co.uk the number of URLs indexed is about right except for 2 or 3 "soft URLs" . So I wonder why webmaster tools report so many yet I can't see them all in the index?
Technical SEO | | AL123al0 -
Link's that are an internal site search?
Hi hope your're all well. I sell Red, Blue, Green Widgets within each color I have many sub types, the subtypes change all the time,and a sub type has many variations in itself. I'd like to set up links that direct customers to popular searches of sub types say: widgets.com/red/blue-spots....search string... Will Google crawl these search links and see that there is good content behind it? How does Google handle links that are also a site search? Can it be bad and should I "no follow" them? Hope someone can give me some direction on these, many thanks in advance!
Technical SEO | | Thea880 -
No confirmation page on Google's Disavow links tool?
I've been going through and doing some spring cleaning on some spammy links to my site. I used Google's Disavow links tool, but after I submit my text file, nothing happens. Should I be getting some sort of confirmation page? After I upload my file, I don't get any notifications telling me Google has received my file or anything like that. It just takes me back to this page: http://cl.ly/image/0S320q46321R/Image 2013-04-26 at 11.15.25 AM.png Am I doing something wrong or is this what everyone else is seeing too?
Technical SEO | | shawn810 -
What's the best way to handle Overly Dynamic Url's?
So my question is What the best way to handle Overly Dynamic Url's. I am working on a real estate agency website. They are selling/buying properties and the url is as followed. ttp://www.------.com/index.php?action=calculator&popup=yes&price=195000
Technical SEO | | Angelos_Savvaidis0 -
Google having trouble accessing my site
Hi google is having problem accessing my site. each day it is bringing up access denied errors and when i have checked what this means i have the following Access denied errors In general, Google discovers content by following links from one page to another. To crawl a page, Googlebot must be able to access it. If you’re seeing unexpected Access Denied errors, it may be for the following reasons: Googlebot couldn’t access a URL on your site because your site requires users to log in to view all or some of your content. (Tip: You can get around this by removing this requirement for user-agent Googlebot.) Your robots.txt file is blocking Google from accessing your whole site or individual URLs or directories. Test that your robots.txt is working as expected. The Test robots.txt tool lets you see exactly how Googlebot will interpret the contents of your robots.txt file. The Google user-agent is Googlebot. (How to verify that a user-agent really is Googlebot.) The Fetch as Google tool helps you understand exactly how your site appears to Googlebot. This can be very useful when troubleshooting problems with your site's content or discoverability in search results. Your server requires users to authenticate using a proxy, or your hosting provider may be blocking Google from accessing your site. Now i have contacted my hosting company who said there is not a problem but said to read the following page http://www.tmdhosting.com/kb/technical-questions/other/robots-txt-file-to-improve-the-way-search-bots-crawl/ i have read it and as far as i can see i have my file set up right which is listed below. they said if i still have problems then i need to contact google. can anyone please give me advice on what to do. the errors are responce code 403 User-agent: *
Technical SEO | | ClaireH-184886
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/0 -
Why Can't I Get on Google?
I've employed many of the suggestions of SEOMoz and getting a Grade "A" on a particular keyword. I'm now #4 on Yahoo and Bing. However, my site hasn't cracked the top 50 in Google. Why? I see a similar pattern with other keywords, many on yahoo and bing but only a few of my subpages get #45-48 on Google. Any ideas? http://www.gospelebooks.net
Technical SEO | | mrjgardiner0 -
Replacing H1's with images
We host a few Japanese sites and Japanese fonts tend to look a bit scruffy the larger they are. I was wondering if image replacement for H1 is risky or not? eg in short... spiders see: Some header text optimized for seo then in the css h1 {
Technical SEO | | -Al-
text-indent: -9999px;
} h1.header_1{ background:url(/images/bg_h1.jpg) no-repeat 0 0; } We are considering this technique, I thought I should get some advise before potentially jeopardising anything, especially as we are dealing with one of the most important on page elements. In my opinion any attempt to hide text could be seen as keyword stuffing, is it a case that in moderation it is acceptable? Cheers0 -
Url's don't want to show up in google. Please help?
Hi Mozfans 🙂 I'm doing a sitescan for a new client. http://www.vacatures.tuinbouw.nl/ It's a dutch jobsite. Now the problem is here: The url http://www.vacatures.tuinbouw.nl/vacatures/ is in google.
Technical SEO | | MaartenvandenBos
On the same page there are jobs (scroll down) with a followed link.
To a url like this: http://www.vacatures.tuinbouw.nl/vacatures/722/productie+medewerker+paprika+teelt/ The problem is that the second url don't show up in google. When i try to make a sitemap with Gsitecrawler the second url isn't in de sitemap.. :S What am i doing wrong? Thanks!0