Site being indexed by Google before it has launched
-
We are currently coming towards the end of a site migration, and are at the final stage of testing redirects etc. However, to our horror we've just discovered Google has started indexing the new site. Any ideas on how this could have happened? I have most recently asked for robots.txt to exclude anything with a certain parameter in URL. Is there a chance this, wrongly implemented, could have caused this?
-
Duplicate question, closing this question so all answers can be given at http://www.seomoz.org/q/site-being-indexed-by-google-before-it-has-launched-2
-
Many ways - Google discovers URLs through a large number of methods, although primarily through links. I have seen some pretty amazing ways of discovery though...
- Links posted in emails where the emails ended up on the web (like a private newsletter with a public archive)
- Links showing up in click stream data services like alexa
- Links showing up from "recently registered" domain lists
The rule of thumb is always ALWAYS start with a robots.txt. It is the first thing you should do when setting up a dev environment.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Webpages & Images Index Graph Gone Down Badly in Google Search Console Why?
Hello All, What is going on with Sitemap Index Status in Google Search Console :- Webpages Submitted - 35000 index showing 21000 whereas previously approx 34500 were index. Images Submitted - 85000 index showing - 11000 whereas previously approx 80000 were index. Whereas when I search in google site:abcd.com is it showing approx 27000 index for webpages. No message from google for penalty or warning etc.Please help.
Technical SEO | | wright3350 -
Some of my website urls are not getting indexed while checking (site: domain) in google
Some of my website urls are not getting indexed while checking (site: domain) in google
Technical SEO | | nlogix0 -
Site indexed by Google, but (almost) never gets impressions
Hi there, I have a question that I wasn't able to give it a reasonable answer yet, so I'm going to trust on all of you. Basically a site has all its pages indexed by Google (I verified with site:sitename.com) and it also has great and unique content. All on-page grades are A with absolutely no negative factors at all. However its pages do not get impressions almost at all. Of course I didn't expect it to be on page 1 since it has been launched on Dec, 1st, but it looks like Google is ignoring (or giving it bad scores) for some reason. Only things that can contribute to that could be: domain privacy on the domain, redirect from the www to the subdomain we use (we did this because it will be a multi-language site, so we'll assign to each country a subdomain), recency (it has been put online on Dec 1st and the domain is just a couple of months old). Or maybe because we blocked crawlers for a few days before the launch? Exactly a few days before Dec 1st. What do you think? What could be the reason for that? Thanks guys!
Technical SEO | | ruggero0 -
Test site got indexed in Google - What's the best way of getting the pages removed from the SERP's?
Hi Mozzers, I'd like your feedback on the following: the test/development domain where our sitebuilder works on got indexed, despite all warnings and advice. The content on these pages is in active use by our new site. Thus to prevent duplicate content penalties we have put a noindex in our robots.txt. However off course the pages are currently visible in the SERP's. What's the best way of dealing with this? I did not find related questions although I think this is a mistake that is often made. Perhaps the answer will also be relevant for others beside me. Thank you in advance, greetings, Folko
Technical SEO | | Yarden_Uitvaartorganisatie0 -
Will blocking the Wayback Machine (archive.org) have any impact on Google crawl and indexing/SEO?
Will blocking the Wayback Machine (archive.org) by adding the code they give have any impact on Google crawl and indexing/SEO? Anyone know? Thanks! ~Brett
Technical SEO | | BBuck0 -
How does Google find /feed/ at the end of all pages on my site?
Hi! In Google Webmaster Tools I find *.../feed/ as a 404 page in crawl errors. The problem is that none of these pages exist and they have no inbound links (except the start page). FYI, it´s a wordpress site. Example: www.mysite.com/subpage1/feed/ www.mysite.com/subpage2/feed/ www.mysite.com/subpage3/feed/ etc Does Google search for /feed/ by default or why do I keep getting these 404´s every day?
Technical SEO | | Vivamedia0 -
How do we ensure our new dynamic site gets indexed?
Just wondering if you can point me in the right direction. We're building a 'dynamically generated' website, so basically, pages don’t technically exist until the visitor types in the URL (or clicks an on page link), the pages are then created on the fly for the visitor. The major concern I’ve got is that Google won’t be able to index the site, as the pages don't exist until they're 'visited', and to top it off, they're rendered in JSPX, which makes things tricky to ensure the bots can view the content We’re going to build/submit a sitemap.xml to signpost the site for Googlebot but are there any other options/resources/best practices Mozzers could recommend for ensuring our new dynamic website gets indexed?
Technical SEO | | Hutch_e0