Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How preproduction website is getting indexed in Google.
-
Hi team,
Can anybody please help me to find how my preproduction website and urls are getting indexed in Google.
-
As Eric hinted, the best method to prevent any pages being indexed would be to use htaccess password protection dialog on your development site. It's fairly easy to implement. You can find instructions to do so here: http://www.htaccesstools.com/articles/password-protection/
-
Hi Anoop! Have everyone's answers helped? Do you still have any questions?

-
Anoop, when a 'development' or 'preproduction' website or subdomain is getting indexed, that means that you haven't stopped the search engines from crawling it. The search engines, especially Google, are very aggressive at crawling, and they will crawl just about any URL that they find. It seems as though all you have to do is visit that page and it's going to get crawled.
Best way to stop Google from crawling (then indexing) a website is to stop it from getting crawled using the robots.txt file. Keep in mind, though, that even if you tell them to stay out of it using the robots.txt file they will still index those URLs.
The only way to stop Google from crawling would be to password protect the website or make it available only on a private server, or available via VPN only.
-
In addition to noindexing the pages using the meta tag, if you have WMT / Search Console set up, you can request Google remove those URLs from their index for the time being. I've found that this may take up to a couple of hours from the removal request to the time of actual removal.
As to how they were found, there's a good chance that Google crawled a link to a preproduction webpage and went from there.
-
Hi
To prevent most search engine web crawlers from indexing a page on your site, place the following meta tag into the section of your page:
To prevent only Google web crawlers from indexing a page:
You should be aware that some search engine web crawlers might interpret the
noindexdirective differently. As a result, it is possible that your page might still appear in results from other search engines.here is complete guide: https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag?csw=1
-
Hi,
Have you noindexed & nofollowed the site and pages? I would also suggest you block all crawlers by disallowing access in the robots.txt file.
Do you know if this has all been done?
-Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why images are not getting indexed and showing in Google webmaster
Hi, I would like to ask why our website images not indexing in Google. I have shared the following screenshot of the search console. https://www.screencast.com/t/yKoCBT6Q8Upw Last week (Friday 14 Sept 2018) it was showing 23.5K out 31K were submitted and indexed by Google. But now, it is showing only 1K 😞 Can you please let me know why might this happen, why images are not getting indexed and showing in Google webmaster.
Technical SEO | | 21centuryweb0 -
Fake Links indexing in google
Hello everyone, I have an interesting situation occurring here, and hoping maybe someone here has seen something of this nature or be able to offer some sort of advice. So, we recently installed a wordpress to a subdomain for our business and have been blogging through it. We added the google webmaster tools meta tag and I've noticed an increase in 404 links. I brought this up to or server admin, and he verified that there were a lot of ip's pinging our server looking for these links that don't exist. We've combed through our server files and nothing seems to be compromised. Today, we noticed that when you do site:ourdomain.com into google the subdomain with wordpress shows hundreds of these fake links, that when you visit them, return a 404 page. Just curious if anyone has seen anything like this, what it may be, how we can stop it, could it negatively impact us in anyway? Should we even worry about it? Here's the link to the google results. https://www.google.com/search?q=site%3Amshowells.com&oq=site%3A&aqs=chrome.0.69i59j69i57j69i58.1905j0j1&sourceid=chrome&es_sm=91&ie=UTF-8 (odd links show up on pages 2-3+)
Technical SEO | | mshowells0 -
Blocked URL parameters can still be crawled and indexed by google?
Hy guys, I have two questions and one might be a dumb question but there it goes. I just want to be sure that I understand: IF I tell webmaster tools to ignore an URL Parameter, will google still index and rank my url? IS it ok if I don't append in the url structure the brand filter?, will I still rank for that brand? Thanks, PS: ok 3 questions :)...
Technical SEO | | catalinmoraru0 -
Why is my blog disappearing from Google index?
My Google blogger blog is about 10 months old. In that time i have worked really hard with adding unique content, building relationships with other bloggers in the same niche, and done some inbound marketing. 2 weeks ago I updated the template to something cleaner, with a little more "wordpress" feel to it. This means i've messed about with the code a lot in these weeks, adding social buttons etc. The problem is that from some point late last week thurs/fri my pages started disappearing from Googles index. I have checked webmaster tools and have no manual actions. My link profile is pretty clean as its a new site, and i have manually checked every piece of content published for plagiarism etc. So what is going on? Did i break my blog? Or is something else amiss? Impressions are down 96% comparing Nov 1-5th to previous 5 days. site is here: http://bit.ly/174beVm Thanks for any help in advance.
Technical SEO | | Silkstream0 -
How Does Google's "index" find the location of pages in the "page directory" to return?
This is my understanding of how Google's search works, and I am unsure about one thing in specific: Google continuously crawls websites and stores each page it finds (let's call it "page directory") Google's "page directory" is a cache so it isn't the "live" version of the page Google has separate storage called "the index" which contains all the keywords searched. These keywords in "the index" point to the pages in the "page directory" that contain the same keywords. When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory" These returned pages are given ranks based on the algorithm The one part I'm unsure of is how Google's "index" knows the location of relevant pages in the "page directory". The keyword entries in the "index" point to the "page directory" somehow. I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls. Since Google's "page directory" is a cache, would the urls be the same as the live website (and would the keywords in the "index" point to these urls)? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I want to discuss this is to know the effects of changing a pages url by understanding how the search process works better.
Technical SEO | | reidsteven750 -
CDN Being Crawled and Indexed by Google
I'm doing a SEO site audit, and I've discovered that the site uses a Content Delivery Network (CDN) that's being crawled and indexed by Google. There are two sub-domains from the CDN that are being crawled and indexed. A small number of organic search visitors have come through these two sub domains. So the CDN based content is out-ranking the root domain, in a small number of cases. It's a huge duplicate content issue (tens of thousands of URLs being crawled) - what's the best way to prevent the crawling and indexing of a CDN like this? Exclude via robots.txt? Additionally, the use of relative canonical tags (instead of absolute) appear to be contributing to this problem as well. As I understand it, these canonical tags are telling the SEs that each sub domain is the "home" of the content/URL. Thanks! Scott
Technical SEO | | Scott-Thomas0 -
De-indexed from Google
Hi Search Experts! We are just launching a new site for a client with a completely new URL. The client can not provide any access details for their existing site. Any ideas how can we get the existing site de-indexed from Google? Thanks guys!
Technical SEO | | rikmon0 -
Why are Google search results different if you are log'd into Google or not?
I get different results when I'm log'd into my Google account associated with my website than if I'm not. The same country is occurring. So how can I rely on the google results I'm seeing? For instance my site is page 1 with the improvements I made based on SEOMOZ if I'm log'd in. Yet I'm not on the first 25 pages if I'm not logged in.
Technical SEO | | Romana0