Site being indexed by Google before it has launched
-
We are currently coming towards the end of a site migration, and are at the final stage of testing redirects etc. However, to our horror we've just discovered Google has started indexing the new site. Any ideas on how this could have happened? I have most recently asked for robots.txt to exclude anything with a certain parameter in URL. Is there a chance this, wrongly implemented, could have caused this?
-
Duplicate question, closing this question so all answers can be given at http://www.seomoz.org/q/site-being-indexed-by-google-before-it-has-launched-2
-
Many ways - Google discovers URLs through a large number of methods, although primarily through links. I have seen some pretty amazing ways of discovery though...
- Links posted in emails where the emails ended up on the web (like a private newsletter with a public archive)
- Links showing up in click stream data services like alexa
- Links showing up from "recently registered" domain lists
The rule of thumb is always ALWAYS start with a robots.txt. It is the first thing you should do when setting up a dev environment.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to stop google from indexing specific sections of a page?
I'm currently trying to find a way to stop googlebot from indexing specific areas of a page, long ago Yahoo search created this tag class=”robots-nocontent” and I'm trying to see if there is a similar manner for google or if they have adopted the same tag? Any help would be much appreciated.
Technical SEO | | Iamfaramon0 -
Bing indexing at a tiny fraction of Google
I've read through other posts about this but I can't find a solution that works for us. My site is porch.com, 1M+ pages indexed on Google, ~10k on Bing. I've submitted the same sitemaps, and there's nothing different for each bot in our robots file. It looks like Bing is more concerned with our 500 errors than Google, but not sure if that might be causing the issue. Can anyone point me to the right things to be researching/investigating? Fixing errors, sitemap crawling issues, etc. I'm not sure what to spend my time looking into...
Technical SEO | | Porch0 -
Which factors are effect on Google index?
Mywebsite have 455 URL submitedbut only 77 URLs are indexed. How can i improve more indexed URL?
Technical SEO | | magician0 -
Http VS https and google crawl and indexing ?
Is it true that https pages are not crawled and indexed by Google and other search engines as well as http pages?
Technical SEO | | sherohass0 -
Site indexing and traffic increased so dramatically overnight
Number of indexed pages jumped from 39000 to 52000 and traffic increased around 50% in my site.Note: used "site" command to check the indexed pages. I understand this is approximate.In addition, number of crawled pages/day also increased dramatically.No change in the robots.txt, sitemap, crawl errors and duplicate issues. But server migrated to different IT infrastructure. Before any celebration, want to identify the helper. Thanks.
Technical SEO | | gmk15670 -
Every time google caches our site it shows no website.
Our site <cite>www.skaino.co.uk/</cite> seems to be having real issues with being picked up with Google. The site has been around for a long time but no longer even ranks on google if you search for the word 'Skaino'. This is odd as its hardly a competitive keyword. If I do a site:www.skaino.co.uk then it shows all the pages proving the site has been indexed. But if I do cache:www.skaino.co.uk it shows a blank cache. I'm starting to worry that Google isn't able to crawl our site properly. If it helps to clarify we have a flash site with a HTML site running underneath for those who cant view flash. Im wandering if I've missed something glaringly obvious. Is it normal to have a blank google cache? Thanks AJ
Technical SEO | | handygammon0 -
Site:www.tld.com rank is it a measure of googles per page importance?
Hello, does the order of pages in a site:www.tld.com search show how important each page is to google? what if the homepage is not the first result?
Technical SEO | | adamzski0 -
Will using http ping, lastmod increase our indexation with Google?
If Google knows about our sitemaps and they’re being crawled on a daily basis, why should we use the http ping and /or list the index files in our robots.txt? Is there a benefit (i.e. improving indexability) to using both ping and listing index files in robots? Is there any benefit to listing the index sitemaps in robots if we’re pinging? If we provide a decent <lastmod>date is there going to be any difference in indexing rates between ping and the normal crawl that they do today?</lastmod> Do we need to all to cover our bases? thanks Marika
Technical SEO | | marika-1786190