Have you ever seen or experienced a page indexed which is actually from a website which is blocked by robots.txt?
-
Hi all,
We use robots file and meta robots tags for blocking website or website pages to block bots from crawling. Mostly robots.txt will be used for website and expect all the pages to not getting indexed. But there is a condition here that any page from website can be indexed by Google even the site is blocked from robots.txt; because crawler may find the page link somewhere on internet as stated here at last paragraph. I wonder if this really the case where some webpages have got indexed.
And even we use meta tags at page level; do we need to block from robots.txt file? Can we use both techniques at a time?
Thanks
-
Hi vtmoz,
The most mandatory way to prevent any page to be indexed is by using a meta robots tag with a _noindex _parameter.
Then using robots.txt will help to optimize your server resources and is a way that prevent google to crawl any new page that do not have the meta robots tag.And yeah, its very common to have indexed pages even the robots.txt file blocks the entire website.
If what you are looking for is to remove from index the pages, follow this steps:
- Allow the whole website to be crawable (or at least that specific pages/section) in the robots.txt
- add the robots meta tag with "noindex,follow" parametres
- wait several weeks, 6 to 8 weeks is a fairly good time. Or just do a followup on those pages
- when you got the results (all your desired pages to be de-indexed) re-block with robots.txt those pages
- DO NOT erase the meta robots tag.
Hope it helps.
Best luck.
GR.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why google is not visiting any website from past 10 days
Hi, I observed why Google is not visiting www.SubhaVastu.com from past 10 days, later I checked thoroughly, not only for my site, google stopped visiting all websites from past 10/12 days. Is Google releasing any new updates to the crawler? Any new system is releasing soon. I am expecting Google updated their crawler by this Sunday night and it may visit as usual to all sites from 12-midnight pacific time. Has anyone observed it, any information regarding on this Google step. Thanks.
Algorithm Updates | | SubhaVaastu0 -
Need only tens of pages to be indexed out of hundreds: Robots.txt is Okay for Google to proceed with?
Hi all, We 2 sub domains with hundreds of pages where we need only 50 pages to get indexed which are important. Unfortunately the CMS of these sub domains is very old and not supporting "noindex" tag to be deployed on page level. So we are planning to block the entire sites from robots.txt and allow the 50 pages needed. But we are not sure if this is the right approach as Google been suggesting to depend mostly on "noindex" than robots.txt. Please suggest whether we can proceed with robots.txt file. Thanks
Algorithm Updates | | vtmoz0 -
Translate page?
Hi Guy's I have a question about Google asking me to translate the page when i search on brandname. It's in the Dutch Google and the searchquery is: "rotomshop" I can find why Google wants to translate the page because all our settings are Dutch. Does anyone have a suggestion? Thanks!
Algorithm Updates | | Happy-SEO1 -
Hi guys, I have a question about linking to a product page for linkbuilding. Does that count adversely vs. linking to a homepage?
Hi - so until now we have been building links via blog posts and articles and linking them to the homepage. It seems the ranking of some of my top keywords has fallen so had a few questions/concerns: Does it affect the rankings adversely if I link to the product page vs the homepage? What is rule of thumb for increasing rankings of inside pages/keywords and building links to them? Thanks
Algorithm Updates | | DGM0 -
Increased 404 and Blocked URL Notifications in Webmaster Tools
In the last 45 days, I am receiving an increasing number of 404 alerts in Google Webmaster Tools. When I audit the notifications, they are not "new" broken links, these are all links that have been pointing to non-existent pages for years that for some reason Google is just notifying me about them. This has also coincided with about a 30% drop in organic traffic from late April to early May. The site is www.petersons.com and its been around for a while and the site attracts a fair amount of natural links so in the 2 years I've managed the campaign I've done very little link-building. I'm in the process of setting up redirects for these urls but why is Google now notifying me of years old broken links and could that be one of the reasons for my drop in traffic. My second issue is my I am being notified that I am blocking over 8,000 urls in my Robots file when I am not. I attached a screenshot. Here is a link to a screenshot. http://i.imgur.com/ncoERgV.jpg
Algorithm Updates | | CUnet0 -
Canonicalization on more than one page?
is it proper to "canocalize" more than one page in a site? Or should it only be on the home page? eg: http://www.sundayschoolnetwork.com">
Algorithm Updates | | sakeith0 -
Why does Google say they have more URLs indexed for my site than they really do?
When I do a site search with Google (i.e. site:www.mysite.com), Google reports "About 7,500 results" -- but when I click through to the end of the results and choose to include omitted results, Google really has only 210 results for my site. I had an issue months back with a large # of URLs being indexed because of query strings and some other non-optimized technicalities - at that time I could see that Google really had indexed all of those URLs - but I've since implemented canonical URLs and fixed most (if not all) of my technical issues in order to get our index count down. At first I thought it would just be a matter of time for them to reconcile this, perhaps they were looking at cached data or something, but it's been months and the "About 7,500 results" just won't change even though the actual pages indexed keeps dropping! Does anyone know why Google would be still reporting a high index count, which doesn't actually reflect what is currently indexed? Thanks!
Algorithm Updates | | CassisGroup0 -
Does google index non-public pages ie. members logged in page
hi, I was trying to locate resources on the topics regarding how much the google bot indexes in order to qualify a 'good' site on their engine. For example, our site has many pages that are associated with logged in users and not available to the public until they acquire a login username and password. Although those pages show up in google analytics, they should not be made public in the google index which is what happens. In light of Google trying to qualify a site according to how 'engaged' a user is on the site, I would feel that the activities on those member pages are very important. Can anyone offer suggestions on how Google treats those pages since we are planning to do further SEO optimization of those pages. Thanks
Algorithm Updates | | jumpdates0