Google is indexing blocked content in robots.txt

elisainteractive

Hi,Google is indexing some URLs that i don't want to be indexed and also is indexing the same URLs with https. This URLs are blocked in the file robots.txt.I've tried to block this URLs through Google WebmasterTools but Google doesn't let me do it because this URL are httpsThe file robots.txt is correct so, what can i do to avoid this content to be indexed?

bjs2010

I think you will find that the URL´s in Google´s index are either:

indexed prior to putting in the robots.txt disallow in place - check in the google serp and click on "in cache" to see the date.
Heavily linked to by other external domains.
Both of the above.

@cleverphd has a great solution. Follow that.

CleverPhD

This will sound backwards but it works.

Add the meta noindex tag to all pages you want out of the index.
Take those same pages out of the robots.txt and allow them to be crawled.

The meta noindex tells Google to remove the page from the index. It is preferred over using robots.txt

http://moz.com/learn/seo/robotstxt

The robot.txt - blocks Google from crawling the page, but things can still show up if there are other pages linking to the page you are trying to remove.

http://www.youtube.com/watch?v=KBdEwpRQRD0

You have to allow Google to crawl the pages (by taking them out of the robots.txt) so it can read the noindex meta tags that then tell Google to take them out of the index.

elisainteractive

Thank you, but that is not the problem. The file robots.txt is done since a long time ago.

EastEssence22

It seems you have added/modified Robot.txt file later. Wait for some time, Say 15 days.
Also ensure syntax for robot.txt

Regards,

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Google is indexing blocked content in robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Robots.txt error

Google webmaster… Zopim Live chat blocking the resources

Will blocking the Wayback Machine (archive.org) have any impact on Google crawl and indexing/SEO?

Robots.txt & Mobile Site

"Extremely high number of URLs" warning for robots.txt blocked pages

How do you know what version of your site of Google is in their index?

Restricted by robots.txt does this cause problems?

Robots txt