How can a Page indexed without crawled?
-
Hey moz fans,
In the google getting started guide it says**"
Note: **Pages may be indexed despite never having been crawled: the two processes are independent of each other. If enough information is available about a page, and the page is deemed relevant to users, search engine algorithms may decide to include it in the search results despite never having had access to the content directly. That said, there are simple mechanisms such as robots meta tags to make sure that pages are not indexed.
"How can it happen, I dont really get the point.
Thank you -
Pleasure is all mine my friend. You are most welcome. Moz SEO community is an indispensable asset and weapon in any SEO's inventory in my opinion. We learn a great deal here while helping others. I am really thankful to each and everyone here on Moz community. Long live Moz and Mozzers. YOU ROCK!!
-
Ov man, you always come tome with great ideas I never thought about that .
Thank you very much stay rock! -
Yes, of course my friend, Google has to crawl the page to see the page-level meta robots tag but till date I have not seen any page in Google's index that has been blocked using the robots.txt file and page-level meta robots tag. Password protecting your .htaccess file would be an overkill if you just want Google not to index a page. If you want Google to remove any particular page from its index, you can get it done from webmaster tools account. Here you go for more: https://support.google.com/webmasters/answer/1663419?hl=en
Good Luck to you my friend.
Best regards,
Devanur Rafi
-
Thank you guyz,
Devanur You've got the point let me correct you at one point.
You can't say google that remove my index just using meta robots tag, because It can't read the meta tag till it crawl.
So only solution looks like .htaccess password protect.
Anyway thanks for your efforts. -
I'm also thinking site maps, but I'm not really sure if they trust them that much to list links in it that they haven't crawled.
-
Hi friend,
If a page has been blocked using Robots.txt file, then Google will not crawl and index the page from within the website but what if a reference of that page is found on a third-party website? In cases like this, link discovery will happen and the page will be indexed without a Description snippet and such pages will have the following text in the place of a description in the search results pages:
"A description for this result is not available because of this site's robots.txt – learn more"
So inorder to completely stop Google from crawling and indexing a page, you should should block the page by implementing, page-level meta robots tag.
Here you go for more: https://support.google.com/webmasters/answer/156449?hl=en
Please feel free to post back if you have any other queries in this regards.
Best regards,
Devanur Rafi
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Same URL-Structure & the same number of URLs indexed on two different websites - can it lead to a Google penalty?
Hey guys. I've got a question about the url structure on two different websites with a similar topic (bith are job search websites). Although we are going to publish different content (texts) on these two websites and they will differ visually, the url structure (except for the domain name) remains exactly the same, as does the number of indexed landingpages on both pages. For example, www.yyy.com/jobs/mobile-developer & www.zzz.com/jobs/mobile-developer. In your opinion, can this lead to a Google penalty? Thanks in advance!
Intermediate & Advanced SEO | | vde130 -
Page must be internally linked to get indexed?
If a there is page like website.com/page; I think this page will be indexed by Google even we don't link it internally from anywhere. Is this true? Will it makes any difference in-terms of "indexability" if we list this page on sitemap? I know page's visibility will increase when link from multiple internal pages. I wonder will there be any noticeable difference while this page is listed in sitemap.
Intermediate & Advanced SEO | | vtmoz0 -
Need references to a company that can transition our 1000 page website from Http to Https without breaking our SEO backlinks and site structure
Hi Ya'll I'm looking for a company or independent who can transition our website from http to https. I want to make sure they know what they're doing with a Wordpress website. More importantly, i want to make sure they don't break any seo juice from external sources while internally nothing gets broken. Anyone have any good recommendations? You can reply back or DM me. Best, Shawn
Intermediate & Advanced SEO | | Shawn1240 -
Better to 301 or de-index 403 pages
Google WMT recently found and called out a large number of old unpublished pages as access denied errors. The pages are tagged "noindex, follow." These old pages are in Google's index. At this point, would it better to 301 all these pages or submit an index removal request or what? Thanks... Darcy
Intermediate & Advanced SEO | | 945010 -
Do you add 404 page into robot file or just add no index tag?
Hi, got different opinion on this so i wanted to double check with your comment is. We've got /404.html page and I was wondering if you would add this page to robot text so it wouldn't be indexed or would you just add no index tag? What would be the best approach? Thanks!
Intermediate & Advanced SEO | | Rubix0 -
Our login pages are being indexed by Google - How do you remove them?
Each of our login pages show up under different subdomains of our website. Currently these are accessible by Google which is a huge competitive advantage for our competitors looking for our client list. We've done a few things to try to rectify the problem: - No index/archive to each login page Robot.txt to all subdomains to block search engines gone into webmaster tools and added the subdomain of one of our bigger clients then requested to remove it from Google (This would be great to do for every subdomain but we have a LOT of clients and it would require tons of backend work to make this happen.) Other than the last option, is there something we can do that will remove subdomains from being viewed from search engines? We know the robots.txt are working since the message on search results say: "A description for this result is not available because of this site's robots.txt – learn more." But we'd like the whole link to disappear.. Any suggestions?
Intermediate & Advanced SEO | | desmond.liang1 -
Negative impact on crawling after upload robots.txt file on HTTPS pages
I experienced negative impact on crawling after upload robots.txt file on HTTPS pages. You can find out both URLs as follow. Robots.txt File for HTTP: http://www.vistastores.com/robots.txt Robots.txt File for HTTPS: https://www.vistastores.com/robots.txt I have disallowed all crawlers for HTTPS pages with following syntax. User-agent: *
Intermediate & Advanced SEO | | CommercePundit
Disallow: / Does it matter for that? If I have done any thing wrong so give me more idea to fix this issue.0 -
Old pages still crawled by SE returning 404s. Better to put 301 or block with robots.txt ?
Hello guys, A client of ours has thousand of pages returning 404 visibile on googl webmaster tools. These are all old pages which don't exist anymore but Google keeps on detecting them. These pages belong to sections of the site which don't exist anymore. They are not linked externally and didn't provide much value even when they existed What do u suggest us to do: (a) do nothing (b) redirect all these URL/folders to the homepage through a 301 (c) block these pages through the robots.txt. Are we inappropriately using part of the crawling budget set by Search Engines by not doing anything ? thx
Intermediate & Advanced SEO | | H-FARM0