Robots.txt & meta noindex--site still shows up on Google Search
-
I have set up my robots.txt like this:
User-agent: *
Disallow: /and I have this meta tag in my on a Wordpress site, set up with SEO Yoast
name="robots" content="noindex,follow"/>
I did "Fetch as Google" on my Google Search Console
My website is still showing up in the search results and it says this:
"A description for this result is not available because of this site's robots.txt"
This site has not shown up for years and now it is ranking above my site that I want to rank for this keyword. How do I get Google to ignore this site? This seems really weird and I'm confused how a site with little content, that has not been updated for years can rank higher than a site that is constantly updated and improved.
-
CleverPhd,
Really since to see a detailed yet to the point answer.
Thanks for contributing, and being in the Moz community.
Regards,
Vijay
-
Thanks for that clarification CleverPhD, forgot to mention that.
-
This one has my vote. You have to allow them access in order to see that you don't want the pages indexed. If you block them from seeing this rule...well they won't be able to see it.
-
Just to be clear on what Logan said. You have to allow Google to crawl your site by opening up your robots.txt to Google so it can see your noindex directive that is on each of the pages. Otherwise Google will never "see" the noindex directive on your pages.
Likewise, on sitemap.xml. If you are not allowing Google to crawl the sitemap (because you are blocking it with robots.txt) then Google will not read the sitemap, find all your pages that have the noindex directive on them and then remove those pages from the index.
A great article is here
https://support.google.com/webmasters/answer/93710?hl=en&ref_topic=4598466
From the mouth of Google "Important! For the noindex meta tag to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex tag, and the page can still appear in search results, for example if other pages link to it."
The other point that logan makes is that Google might list your site if there are enough sites linking to it. The steps above should take care of this, as you are deindexing the page, but here is what I am thinking he is referencing
https://www.youtube.com/watch?v=KBdEwpRQRD0
Google will include a site that is blocked in robots.txt if enough pages link to it, even if they have not crawled the url.
You can go into Search Console and find all the links that they say are pointing to your site. You can also use tools like CognitiveSEO or Ahrefs, Majestic or Moz etc and gather up all of those sites to find links to your site and include those in a disavow file that you put into Search Console and tell Google to ignore all of those links to your site.
Secret bonus method. Putting a noindex directive in your robots
https://www.deepcrawl.com/knowledge/best-practice/robots-txt-noindex-the-best-kept-secret-in-seo/
This allows you to manage your noindex directives in your robots.txt. Makes it easier as you can control all your noindex directives from a central location and block whole folders at a time. This would stop Google from crawling AND indexing pages all in one page and you can just leave the rest of the site alone and not worry about if a noindex tag should or should not be on a certain page.
Good luck!
-
As mentioned by Logan,noindex meta tag
is the most effective way to remove indexed pages. It sometimes takes time, you have to submit the right sitemap.xml which cover the pages/post you wish to get removed from google index.
-
I did read that about the robots.txt and that is why I added the noindex.
I use SEO Yoast for sitemap.xml, so shouldn't all my pages be there? I believe they are because I just looked at it a couple days ago.
So are you saying I should look through my backlink profile (WMT) and try to remove any backlinks?
Would 'Fetch as Google' not ping Google to tell them to recrawl?
Thanks for your help.
-
Hi,
First things first, it's a common misconception that the robots.txt disallow: / will prevent indexing. It's only indented to prevent crawling, which is why you don't get a meta description pulled into the result snippet. If you have links pointing to that page and a disallow: / on your robots, it's still eligible for indexation.
Second, it's pretty weird that the noindex tag isn't effective, as that's the only sure-fire way to get de-indexed intentionally. I would recommend creating an XML sitemap for all URLs on that domain that are noindex'd and resubmit that in Search Console. If Google hasn't crawled your site since adding the noindex, they don't know it's there. In my experience, forcing them to recrawl via XML submission has been effective at getting noindex noticed quicker.
I would also recommend taking a look at the link profile and removing any possible links pointing to your noindex pages, this will help future attempts at indexing.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Old site name showing in SERPs
Hi all, We've recently re-launched one of our sites with a substantial redesign, refreshed content, meta data, descriptions and functionality. We noticed in SERPs that some of the page titles are showing the old name for the site, which hasn't been used for a few years and the site's been through a few updates and a URL change since then. All the meta titles showing up as they should in crawls through Search Console and Moz and it's my understanding that if Google were pulling a cached version of a title it would have gone for a more recently cached one? Any thoughts on why Google's turned back the clock on our site's name would be greatly appreciated! -Jamie
Technical SEO | | JamieCMF0 -
Why Google ranks a page with Meta Robots: NO INDEX, NO FOLLOW?
Hi guys, I was playing with the new OSE when I found out a weird thing: if you Google "performing arts school london" you will see w w w . mountview . org. uk at the 3rd position. The point is that page has "Meta Robots: NO INDEX, NO FOLLOW", why Google indexed it? Here you can see the robots.txt allows Google to index the URL but not the content, in article they also say the meta robots tag will properly avoid Google from indexing the URL either. Apparently, in my case that page is the only one has the tag "NO INDEX, NO FOLLOW", but it's the home page. so I said to myself: OK, perhaps they have just changed that tag therefore Google needs time to re-crawl that page and de-index following the no index tag. How long do you think it will take to don't see that page indexed? Do you think it will effect the whole website, as I suppose if you have that tag on your home page (the root domain) you will lose a lot of links' juice - it's totally unnatural a backlinks profile without links to a root domain? Cheers, Pierpaolo
Technical SEO | | madcow780 -
Google haveing problems accessing part of my site
hi my site is, www.in2town.co.uk and for a few weeks now google has had trouble accessing part of my site. Today googlewebmaster tools tells me that google is having major problems it shows, 123 pages where access were denied. i have spoken to my hosting company who could not find a problem, so not sure what to do now. can anyone please give me advice on what the problem may be. any help would be great
Technical SEO | | ClaireH-1848860 -
Blocked URL's by robots.txt
In Google Webmaster Tools shows me 10,936 Blocked URL's by robots.txt and it is very strange when you go to the "Index Status" section where shows that since April 2012 robots.txt blocked many URL's. You can see more precise on the image attached (chart WMT) I can not explain why I have blocked URL's ? because I have nothing in robots.txt.
Technical SEO | | meralucian37
My robots.txt is like this: User-agent: * I thought I was penalized by Penguin in April 2012 because constantly i'am losing visitors now reaching over 40%. It may be a different penalty? Any help is welcome because i'm already so saturated. Mera robotstxt.jpg0 -
Site being indexed by Google before it has launched
We are currently coming towards the end of a site migration, and are at the final stage of testing redirects etc. However, to our horror we've just discovered Google has started indexing the new site. Any ideas on how this could have happened? I have most recently asked for robots.txt to exclude anything with a certain parameter in URL. Is there a chance this, wrongly implemented, could have caused this?
Technical SEO | | Sayers0 -
Google Search memory
Hi we have had the following statement from a member of our Japan office with regards google displaying search results, would anyone be able to give us a definitive answer on this. Google remembers previous non-mobile related searches For example, we already know that we come up on the first page if you select “kaigai keitai” (mobile phone for use abroad) and “UK” where as we don’t for searches where you replace the UK with the US or other countries. This means that if a customer, for example, does a search just on the UK e.g. using words like UK travel, London, millennium dome, etc. and then does a separate search just using the words “kaigai keitai” that google could show us as a link on the first page. However, if an individual did a search on Paris, France, Eiffel Tower, and then did a search for “kaigai keitai”, our link might not appear on the page. I don’t know if we have tested this already, but Google seems to have a very long “memory” and I could see this kind of aspect of Google resulting in us missing significant business from people going to the US, France, Italy, etc. Any thoughts?
Technical SEO | | -Al-0 -
How to show ratings on Google?
One thing I have noticed recently is "review ratings" appearing in the Google search results. I have attached a screenshot which shows an example of this. I think this is a really good feature and helps make a listing stand out in the SERPs, I would certainly be more likely to click this one. My question is how do you code for it so that Google will display it? The URL of the page in question is http://www.footy-boots.com/inter-milan-away-shirt-2011-2012-9430/ 4nXyk
Technical SEO | | ukss19840 -
What Google uses in search result descriptions
Recently, Google has started including certain information from our web pages in their search results description that is a bit puzzling. For example if you google 'Wedding Band Raleigh' the description they are using for our site's (GigMasters) page begins with the text 'Results 1 - 10 of 1005' Not sure why they are pulling that information. That is in on the page but its not high up on the page or marked with any special h1, h2, or h3 tag. We do have that information inside of a div which we have named 'Results'. Maybe that's why? Did we inadvertently use some sort of Google rich snippet or schema.org naming convention?! Any insight would be hugely appreciated.
Technical SEO | | gigmasters0