Google is indexing blocked content in robots.txt
-
Hi,Google is indexing some URLs that i don't want to be indexed and also is indexing the same URLs with https. This URLs are blocked in the file robots.txt.I've tried to block this URLs through Google WebmasterTools but Google doesn't let me do it because this URL are httpsThe file robots.txt is correct so, what can i do to avoid this content to be indexed?
-
I think you will find that the URL´s in Google´s index are either:
- indexed prior to putting in the robots.txt disallow in place - check in the google serp and click on "in cache" to see the date.
- Heavily linked to by other external domains.
- Both of the above.
@cleverphd has a great solution. Follow that.
-
This will sound backwards but it works.
-
Add the meta noindex tag to all pages you want out of the index.
-
Take those same pages out of the robots.txt and allow them to be crawled.
The meta noindex tells Google to remove the page from the index. It is preferred over using robots.txt
http://moz.com/learn/seo/robotstxt
The robot.txt - blocks Google from crawling the page, but things can still show up if there are other pages linking to the page you are trying to remove.
http://www.youtube.com/watch?v=KBdEwpRQRD0
You have to allow Google to crawl the pages (by taking them out of the robots.txt) so it can read the noindex meta tags that then tell Google to take them out of the index.
-
-
Thank you, but that is not the problem. The file robots.txt is done since a long time ago.
-
It seems you have added/modified Robot.txt file later. Wait for some time, Say 15 days.
Also ensure syntax for robot.txtRegards,
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How preproduction website is getting indexed in Google.
Hi team, Can anybody please help me to find how my preproduction website and urls are getting indexed in Google.
Technical SEO | | nlogix0 -
Meta Titles and Meta Descriptions are not Indexing in Google
Hello Every one, I have a Wordpress website in which i installed All in SEO plugin and wrote meta titles and descriptions for each and every page and posts and submitted website to index. But after Google crawl the Meta Titles and Descriptions shown by Google are something different that are not found in Content. Even i verified the Cached version of the website and gone through Source code that crawled at that moment. the meta title which i have written is present there. Apart from this, the same URL's are displaying perfect meta titles and descriptions which i wrote in Yahoo and Bing Search Engines. Can anyone explain me how to resolve this issue. Website URL: thenewyou (dot) in Regards,
Technical SEO | | SatishSEOSiren0 -
Homepage no longer indexed in Google
Have been working on a site and the hompage has recently vanished from Google. I submit the site to Google webmaster tools a couple of days ago and checked today and the homepage has vanished. There are no no follow tags, and no robots.txt stopping the page from being crawled. It's a bit of a worry, the site is http://www.beyondthedeal.com
Technical SEO | | tonysandwich
Any insights would be massively appreciated! Thanks.0 -
Robots.txt checker
Google seems to have discontinued their robots.txt checker. Is there another tool that I can use to check my text instead? Thanks!
Technical SEO | | theLotter0 -
Confirming Robots.txt code deep Directories
Just want to make sure I understand exactly what I am doing If I place this in my Robots.txt Disallow: /root/this/that By doing this I want to make sure that I am ONLY blocking the directory /that/ and anything in front of that. I want to make sure that /root/this/ still stays in the index, its just the that directory I want gone. Am I correct in understanding this?
Technical SEO | | cbielich0 -
How do I get google to index the right pages with the right key word?
Hello I notice that even though I have a site map google is indexing the wrong pages under the wrong key words. As a result its not as relevant and is not ranking properly.
Technical SEO | | ursalesguru0 -
Indexed non www. content
Google has indexed a lot of old non www.mysite.com contnet my page at mysite.com still answers queries, should I 301 every url on it? Google has indexed about 200 pages all erogenous 404's, old directories and dynamic content at mysite.com www.mysite.com has 12 pages listed that are all current. Is this affecting my rankings?
Technical SEO | | adamzski0 -
Google has not indexed my site in over 4 weeks, what's the problem?
We recently put in permanent redirects to our new url, but Google seems to not want to index the new url. There was no problems with the old url and the new url is brand new so should have no 'black marks' against it. We have done everything we can think off in terms of submitting site maps, telling google our url has changed in webmaster tools, mentioning the new url on social sites etc...but still nothing. It has been over 4 weeks now since we set up the redirects to the url, any ideas why Google seems to be choosing not to index it? Thanks
Technical SEO | | cewe0