Google is indexing blocked content in robots.txt
-
Hi,Google is indexing some URLs that i don't want to be indexed and also is indexing the same URLs with https. This URLs are blocked in the file robots.txt.I've tried to block this URLs through Google WebmasterTools but Google doesn't let me do it because this URL are httpsThe file robots.txt is correct so, what can i do to avoid this content to be indexed?
-
I think you will find that the URL´s in Google´s index are either:
- indexed prior to putting in the robots.txt disallow in place - check in the google serp and click on "in cache" to see the date.
- Heavily linked to by other external domains.
- Both of the above.
@cleverphd has a great solution. Follow that.
-
This will sound backwards but it works.
-
Add the meta noindex tag to all pages you want out of the index.
-
Take those same pages out of the robots.txt and allow them to be crawled.
The meta noindex tells Google to remove the page from the index. It is preferred over using robots.txt
http://moz.com/learn/seo/robotstxt
The robot.txt - blocks Google from crawling the page, but things can still show up if there are other pages linking to the page you are trying to remove.
http://www.youtube.com/watch?v=KBdEwpRQRD0
You have to allow Google to crawl the pages (by taking them out of the robots.txt) so it can read the noindex meta tags that then tell Google to take them out of the index.
-
-
Thank you, but that is not the problem. The file robots.txt is done since a long time ago.
-
It seems you have added/modified Robot.txt file later. Wait for some time, Say 15 days.
Also ensure syntax for robot.txtRegards,
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is there a way to get Google to index more of your pages for SEO ranking?
We have a 100 page website, but Google is only indexing a handful of pages for organic rankings. Is there a way to submit to have more pages considered? I have optimized meta data and get good Moz "on-page graders" or the pages & terms that I am trying to connect....but Google doesn't seem to pick them up for ranking. Any insight would be appreciated!
Technical SEO | | JulieALS0 -
HTTP Status showing up in opensiteexplorer top pages as blocked by robot.txt file
I am trying to find an answer to this question it has alot of url on this page with no data when i go into the data source and search for noindex or robot.txt but the site is visible in the search engines ?
Technical SEO | | ReSEOlve0 -
Dev Site Was Indexed By Google
Two of our dev sites(subdomains) were indexed by Google. They have since been made private once we found the problem. Should we take another step to remove the subdomain through robots.txt or just let it ride out? From what I understand, to remove the subdomain from Google we would verify the subdomain on GWT, then give the subdomain it's own robots.txt and disallow everything. Any advice is welcome, I just wanted to discuss this before making a decision.
Technical SEO | | ntsupply0 -
Advice on improve this content page for seo and google
Hi, i use joomla and i am looking for some help to find out what i should be doing to make my content pages better for seo and google. I would be grateful if people would look at the following page as an example http://www.in2town.co.uk/trip-advisor/top-american-ski-resorts-for-over-50s and let me know what i should be doing to make it better for seo and for google so people can find the page. I am using the above page as an example so i can learn from it. I would be grateful if people could look at the source code for the page to see if there is anything that should be in their that is not and if i should be looking at any joomla plugins for the content pages to improve the seo of the page. Any help to improve my seo for my content pages would be great. many thanks
Technical SEO | | ClaireH-1848860 -
Mobile Google Not Indexing Mobile Website
Google currently does not index our mobile website. It has the WWW website in it's index. When a user from a mobile phone clicks on a mobile search result for WWW we redirect them to our mobile website. This is posing problems for us as our mobile website is a fraction of the # of pages/sections as our WWW. So for example, mobile search results show that we have a "careers" section; but that's not the case for the mobile website. As a result a user gets a 404. How do we force mobile Google to index our mobile website instead of our WWW?
Technical SEO | | RBA0 -
Google , 301 redirects, and multiple domains pointing to the same content.
Google, 301 redirects, and multiple domains pointing to the same content. This is my first post here. I would like to begin by thanking anyone in advance for their help. It is much appreciated. Secondly, I'm posting in the wrong place or something please forgive me simply point me in the right direction I'm a quick learner. I think I'm battling a redirect problem but I want to be sure before I make changes. In order to accurately assess the situation a little background is necessary. I have had a site called tx-laws.com for about 15 years. It was a site that was used primarily by private resource and as such was never SEO'd. The site itself was in fact quite Seo unfriendly. despite a complete lack of marketing or SEO efforts, over time, SEO aside, this domain eventually made it to page one of Google Yahoo and Bing under the keywords Texas laws. About six months ago I decided to revamp the site and create a new resource aimed at a public market. A good deal of effort was made to re-work the SEO. The new site was developed at a different domain name: easylawlook up.com. Within a few months this domain name surpassed tx-laws in Google and was holding its place in position number eight out of 190 million results. Note that at this point no marketing has been done, that is to say there has been no social networking, no e-mail campaigns, no blogs, -- nothing but content. All was well until a few weeks ago I decided to upgrade our network and our servers. During this period there was some downtime unfortunately. When the upgrade was complete everything seemed fine until a week or so later when our primary domain easy law look up vanished off Google. At first I thought it was downtime but now I'm not so sure. The current configuration reroutes traffic from tx-laws to easylawlookup in IIS by pointing both domains to the same root directory. Everything else was handled through scripting. As far as I know this is how it was always set up. At present there is no 301 Redirect in place for tx-laws (as I'm sure there probably should be). Interestingly enough the back links to easylaw also went away. Even more telling however is that now when I visit link: easylawlookup.com there is only one link, and that link is to a domain which references tx-laws not easy law. So it would appear that I have confused Google with regards to my actual intentions. My question is this. Right now my rankings for tx-laws remain unchanged. The last thing I want to have happen is to see those disappear as well. If easy law has somehow been penalized and I redirect tx-laws to easy through a 301 will I screw up my rankings for this domain as well? Any comments or input on the situation are welcome. I just want to think it through before I start making more changes which might make things worse instead of better. Ultimately though, there is no reason that the old domain can't be redirected to the new domain at this point unless it would mean that I run the risk of losing my listings for tx-laws, ending up with nothing instead of transferring any link juice and traffic to easy law. With regards to the down time, it was substantial over a couple of weeks with many hours off-line. However this downtime would have affected both domains the only difference being that the one domain had been in existence for 15 years as opposed to six months for the other. So is my problem downtime, lack of proper 301 redirect, or something else? and if I implement a 301 at this point do I risk damaging the remaining domain which is operational? Thanks again for any help.
Technical SEO | | Steviebone0 -
Google Duplicate Content Penalty On My Own Site?
I am certain that I have hit a google penalty filter for my site http://www.playpokeronline.ca for my main keywords "play poker online" in google.ca I rank 670th and used to be on the first page between 1 and 10 in June. On Bing I am like 9th On my site I found the entire site duplicated as follows Original: www.playpokeronline.ca Duplicate www.playpokeronline.ca/playpokeronline/ this duplicate was not intentional and seems to be a result of my hosting at godaddy. for every page on my site and it shows up in webmaster tools I blocked the duplicate with robots.txt and a few days ago dropped it and wrote a rel=connonical tag in the top of each page visitors dropped from 100 per day in august to 12-20 in the last month. Google says that if duplicate content is made to try to game serps they may filter or penalize my site. Have I triggered this penalty or a different sort of over optimization penalty? Will the rel= canonical tags fix this or should i do something else? This Penalty Business is Not my Idea of a good time Thank You Jeb
Technical SEO | | PokerCanada0 -
Google News not indexing .index.html pages
Hi all, we've been asked by a blog to help them better indexing and ranking on Google News (with the site being already included in Google News with poor results) The blog had a chronicle URL duplication problem with each post existing with 3 different URLs: #1) www.domain.com/post.html (currently in noindex for editorial choices as showing all the comments) #2) www.domain.com/post/index.html (currently indexed showing only top comments) #3) www.domain.com/post/ (very same as #2) We've chosen URL #2 (/index.html) as canonical URL, and included a rel=canonical tag on URL #3 (/) linking to URL #2.
Technical SEO | | H-FARM
Also we've submitted yesterday a Google News sitemap including consistently the list of URLs #2 from the last 48h . The sitemap has been properly "digested" by Google and shows that all URLs have been sent and indexed. However if we use the site:domain.com command on Google News we see something completely different: Google News has indexed actually only some news and more specifically only the URLs #3 type (ending with the trailing slash instead of /index.html). Why ? What's wrong ? a) Does Google News bot have problems indexing URLs ending with .index.html ? While figuring out what's wrong we've found out that http://news.google.it/news/search?aq=f&pz=1&cf=all&ned=us&hl=en&q=inurl%3Aindex.html gives no results...it seems that Google News index overall does not include any URLs ending with /index.html b) Does Google News bot recognise rel=canonical tag ? c) Is it just a matter of time and then Google News will pick up the right URLs (/index.html) and/or shall we communicate Google News team any changes ? d) Any suggestions ? OR Shall we do the other way around. meaning make URL #3 the canonical one ? While Google News is showing these problems, Google Web search has actually well received the changes, so we don't know what to do. Thanks for your help, Matteo0