Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt & meta noindex--site still shows up on Google Search
-
I have set up my robots.txt like this:
User-agent: *
Disallow: /and I have this meta tag in my on a Wordpress site, set up with SEO Yoast
name="robots" content="noindex,follow"/>
I did "Fetch as Google" on my Google Search Console
My website is still showing up in the search results and it says this:
"A description for this result is not available because of this site's robots.txt"
This site has not shown up for years and now it is ranking above my site that I want to rank for this keyword. How do I get Google to ignore this site? This seems really weird and I'm confused how a site with little content, that has not been updated for years can rank higher than a site that is constantly updated and improved.
-
CleverPhd,
Really since to see a detailed yet to the point answer.
Thanks for contributing, and being in the Moz community.
Regards,
Vijay
-
Thanks for that clarification CleverPhD, forgot to mention that.
-
This one has my vote. You have to allow them access in order to see that you don't want the pages indexed. If you block them from seeing this rule...well they won't be able to see it.
-
Just to be clear on what Logan said. You have to allow Google to crawl your site by opening up your robots.txt to Google so it can see your noindex directive that is on each of the pages. Otherwise Google will never "see" the noindex directive on your pages.
Likewise, on sitemap.xml. If you are not allowing Google to crawl the sitemap (because you are blocking it with robots.txt) then Google will not read the sitemap, find all your pages that have the noindex directive on them and then remove those pages from the index.
A great article is here
https://support.google.com/webmasters/answer/93710?hl=en&ref_topic=4598466
From the mouth of Google "Important! For the noindex meta tag to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex tag, and the page can still appear in search results, for example if other pages link to it."
The other point that logan makes is that Google might list your site if there are enough sites linking to it. The steps above should take care of this, as you are deindexing the page, but here is what I am thinking he is referencing
https://www.youtube.com/watch?v=KBdEwpRQRD0
Google will include a site that is blocked in robots.txt if enough pages link to it, even if they have not crawled the url.
You can go into Search Console and find all the links that they say are pointing to your site. You can also use tools like CognitiveSEO or Ahrefs, Majestic or Moz etc and gather up all of those sites to find links to your site and include those in a disavow file that you put into Search Console and tell Google to ignore all of those links to your site.
Secret bonus method. Putting a noindex directive in your robots
https://www.deepcrawl.com/knowledge/best-practice/robots-txt-noindex-the-best-kept-secret-in-seo/
This allows you to manage your noindex directives in your robots.txt. Makes it easier as you can control all your noindex directives from a central location and block whole folders at a time. This would stop Google from crawling AND indexing pages all in one page and you can just leave the rest of the site alone and not worry about if a noindex tag should or should not be on a certain page.
Good luck!
-
As mentioned by Logan,noindex meta tag
is the most effective way to remove indexed pages. It sometimes takes time, you have to submit the right sitemap.xml which cover the pages/post you wish to get removed from google index.
-
I did read that about the robots.txt and that is why I added the noindex.
I use SEO Yoast for sitemap.xml, so shouldn't all my pages be there? I believe they are because I just looked at it a couple days ago.
So are you saying I should look through my backlink profile (WMT) and try to remove any backlinks?
Would 'Fetch as Google' not ping Google to tell them to recrawl?
Thanks for your help.
-
Hi,
First things first, it's a common misconception that the robots.txt disallow: / will prevent indexing. It's only indented to prevent crawling, which is why you don't get a meta description pulled into the result snippet. If you have links pointing to that page and a disallow: / on your robots, it's still eligible for indexation.
Second, it's pretty weird that the noindex tag isn't effective, as that's the only sure-fire way to get de-indexed intentionally. I would recommend creating an XML sitemap for all URLs on that domain that are noindex'd and resubmit that in Search Console. If Google hasn't crawled your site since adding the noindex, they don't know it's there. In my experience, forcing them to recrawl via XML submission has been effective at getting noindex noticed quicker.
I would also recommend taking a look at the link profile and removing any possible links pointing to your noindex pages, this will help future attempts at indexing.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can I still monitor noindex, nofollow pages with Google Analytics?
I have a private/login site where all pages are noindex, nofollow. Can I still monitor external site links with Google Analytics?
Technical SEO | | jasmine.silver0 -
Why images are not getting indexed and showing in Google webmaster
Hi, I would like to ask why our website images not indexing in Google. I have shared the following screenshot of the search console. https://www.screencast.com/t/yKoCBT6Q8Upw Last week (Friday 14 Sept 2018) it was showing 23.5K out 31K were submitted and indexed by Google. But now, it is showing only 1K 😞 Can you please let me know why might this happen, why images are not getting indexed and showing in Google webmaster.
Technical SEO | | 21centuryweb0 -
Disallow wildcard match in Robots.txt
This is in my robots.txt file, does anyone know what this is supposed to accomplish, it doesn't appear to be blocking URLs with question marks Disallow: /?crawler=1
Technical SEO | | AmandaBridge
Disallow: /?mobile=1 Thank you0 -
Google Search Console Not Sending Messages
One of our sites received a Manual Penalty for unnatural links by Google. However, we never received a message in Google Search Console or an email about the manual action. The only reason we knew about the penalty is by the obvious drop in rankings, then signing into search console to look for any manual actions, which we found. Since then, we have submitted a disavow file and a reconsideration request. However, once again we did not receive an email or message in search console that shows confirmation of the disavow or that they received the reconsideration request. The disavow file does show up after I upload it, and it says it was successfully uploaded... but no messages or emails. After many hours of investigating the various canonical versions of our website on Search Console, we found out that there were several “owners” of the various canonical versions of our site that had “could not find the email address” as a site owner. We found out that these were previous employees who no longer worked with the company and their email address was deleted. After unverifying these site owners, (all the ones that had “could not find the email address” as the site owner), the notifications, emails and messages in Search Console started to appear. However, the only place they did not appear, is the main canonical version of our site. Of course, the main canonical version of our site (https://www) is the version that we uploaded the disavow and reconsideration request. This is the canonical version of the site that we need to receive these messages to know if our reconsideration request was granted! We’ve just reuploaded the disavow file and reconsideration request to all of the other canonical versions (2 of the 3 received the message about the penalty)…. and we are currently awaiting a response. Has anybody else had problems with not receiving notifications in search console due to deleted email addresses?
Technical SEO | | Fiyyazp0 -
Desktop & Mobile XML Sitemap Submitted But Only Desktop Sitemap Indexed On Google Search Console
Hi! The Problem We have submitted to GSC a sitemap index. Within that index there are 4 XML Sitemaps. Including one for the desktop site and one for the mobile site. The desktop sitemap has 3300 URLs, of which Google has indexed (according to GSC) 3,000 (approx). The mobile sitemap has 1,000 URLs of which Google has indexed 74 of them. The pages are crawlable, the site structure is logical. And performing a Landing Page URL search (showing only Google/Organic source/medium) on Google Analytics I can see that hundreds of those mobile URLs are being landed on. A search on mobile for a longtail keyword from a (randomly selected) page shows a result in the SERPs for the mobile page that judging by GSC has not been indexed. Could this be because we have recently added rel=alternate tags on our desktop pages (and of course corresponding canonical ones on mobile). Would Google then 'not index' rel=alternate page versions? Thanks for any input on this one. PmHmG
Technical SEO | | AlisonMills0 -
Removing robots.txt on WordPress site problem
Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap. Checked source code and the robots instruction has gone so a little lost. Any ideas please?
Technical SEO | | Wallander0 -
Adding 'NoIndex Meta' to Prestashop Module & Search pages.
Hi Looking for a fix for the PrestaShop platform Look for the definitive answer on how to best stop the indexing of PrestaShop modules such as "send to a friend", "Best Sellers" and site search pages. We want to be able to add a meta noindex ()to pages ending in: /search?tag=ball&p=15 or /modules/sendtoafriend/sendtoafriend-form.php We already have in the robot text: Disallow: /search.php
Technical SEO | | reallyitsme
Disallow: /modules/ (Google seems to ignore these) But as a further tool we would like to incude the noindex to all these pages too to stop duplicated pages. I assume this needs to be in either the head.tpl or the .php file of each PrestaShop module.? Or is there a general site wide code fix to put in the metadata to apply' Noindex Meta' to certain files. Current meta code here: Please reply with where to add code and what the code should be. Thanks in advance.0 -
Robots.txt Sitemap with Relative Path
Hi Everyone, In robots.txt, can the sitemap be indicated with a relative path? I'm trying to roll out a robots file to ~200 websites, and they all have the same relative path for a sitemap but each is hosted on its own domain. Basically I'm trying to avoid needing to create 200 different robots.txt files just to change the domain. If I do need to do that, though, is there an easier way than just trudging through it?
Technical SEO | | MRCSearch0