Dev Site Was Indexed By Google
-
Two of our dev sites(subdomains) were indexed by Google. They have since been made private once we found the problem. Should we take another step to remove the subdomain through robots.txt or just let it ride out?
From what I understand, to remove the subdomain from Google we would verify the subdomain on GWT, then give the subdomain it's own robots.txt and disallow everything.
Any advice is welcome, I just wanted to discuss this before making a decision.
-
We ran into this in the past, and one thing that we (think) happened is that the links to the dev site were sent via email to several gmail accounts. We think this is how Google then indexed the site, as there were no inbound links posted anywhere.
I think that the main issue is how it's perceived by the client, and if they are freaking out about it. In that case, using an access control password to prevent anyone from coming to the site will limit anyone from seeing it.
The robot.txt file should flush it out, but yes, it takes a little bit of time.
-
I've had this happen before. In the dev subdomain, I added a robots.txt that excluded everything, verified the subdomain as its own site in GWT, then asked for that site (dev subdomain) to be removed.
I then went and used a free code monitoring service that checked for code changes of a URL once a day. I set it up to check the live site robots.txt and the robots.txt of all of the dev sites, so I'd know within 24 hours if the developers had tweaked the robots.txt.
-
Hi Tyler,
You definitely don't want to battle yourself for duplicate content. If the current sub-domains have little link juice (in links) to them, I would simply block the domain from being further indexed. If there are a couple pages that are of high value it maybe worth the time to use a 301 redirect to prevent losing any links / juice.
Using robots.txt or noindex / tags may work, but in my personal experience the easiest and most efficient way to block any indexing is simply use .htaccess / .htpasswrd this will prevent anybody without credentials from even viewing your site effectively blocking all spiders / bots and unwanted snoopers.
-
Hey Tyler,
We would follow the same protocol if in your shoes. Remove any instance of the indexed dev subdomain(s), then create your new robot.txts files for each subdomain and disavow any indexed content/links as an extra step. Also, double check and even resubmit your root domain's XML sitemap so Google can reindex your main content/links as a precautionary measure.
PS - We develop on a separate server and domain for any new work for our site or any client sites. Doing this allows us to block Google from everything.
Hope this was helpful! - Patrick
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google indexing is slowing down?
I have up to 20 million unique pages, and so far I've only submitted about 30k of them on my sitemap. We had a few load related errors during googles initial visits, and it thought some were duplicates, but we fixed all that. We haven't gotten a crawl related error for 2 weeks now. Google appears to be indexing fewer and fewer urls every time it visits. Any ideas why? I am not sure how to get all our pages indexed if its going to operate like this... love some help thanks! HnJaXSM.png
Technical SEO | | RyanTheMoz0 -
Pages are Indexed but not Cached by Google. Why?
Hello, We have magento 2 extensions website mageants.com since 1 years google every 15 days cached my all pages but suddenly last 15 days my websites pages not cached by google showing me 404 error so go search console check error but din't find any error so I have cached manually fetch and render but still most of pages have same 404 error example page : - https://www.mageants.com/free-gift-for-magento-2.html error :- http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&rlz=1C1CHBD_enIN803IN804&oq=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&aqs=chrome..69i57j69i58.1569j0j4&sourceid=chrome&ie=UTF-8 so have any one solutions for this issues
Technical SEO | | vikrantrathore0 -
Google Indexed a version of my site w/ MX record subdomain
We're doing a site audit and found "internal" links to a page in search console that appear to be from a subdomain of our site based on our MX record. We use Google Mail internally. The links ultimately redirect to our correct preferred subdomain "www", but I am concerned as to why this is happening and if it can have any negative SEO implications. Example of one of the links: Links aspmx3.googlemail.com.sullivansolarpower.com/about/solar-power-blog/daniel-sullivan/renewable-energy-and-electric-cars-are-not-political-footballs I did a site operator search, site:aspmx3.googlemail.com.sullivansolarpower.com on google and it returns several results.
Technical SEO | | SS.Digital0 -
Should I remove these pages from the Google index?
Hi there, Please have a look at the following URL http://www.elefant-tours.com/index.php?callback=imagerotator&gid=65&483. It's a "sitemap" generated by a Wordpress plug-in called NextGen gallery and it maps all the images that have been added to the site through this plugin, which is quite a lot in this case. I can see that these "sitemap" pages have been indexed by Google and I'm wondering whether I should remove these or not? In my opinion these are pages that a search engine would never would want to serve as a search result and pages that a visitor never would want to see. Attracting any traffic through Google images is irrelevant in this case. What is your advice? Block it or leave it indexed or something else?
Technical SEO | | Robbern0 -
Micro-site homepage not being indexed
http://www.reebok.com/en-US/reebokonehome/ This is a homepage for an instructor network micro-site on Reebok.com The robots.txt file was excluding the /en-US/ directory, we've since removed that exclusion, and resubmitted this URL for indexing via Google Webmaster but we are still not seeing it in the index. Any advice would be very helpful, we may be missing some blocking issue or perhaps we just need to wait longer?
Technical SEO | | PatrickDugan0 -
Remove Site from Google
How can I get my website out of google? I want all pages completely gone. Thanks!
Technical SEO | | tylerfraser0 -
Google crawl index issue with our website...
Hey there. We've run into a mystifying issue with Google's crawl index of one of our sites. When we do a "site:www.burlingtonmortgage.biz" search in Google, we're seeing lots of 404 Errors on pages that don't exist on our site or seemingly on the remote server. In the search results, Google is showing nonsensical folders off the root domain and then the actual page is within that non-existent folder. An example: Google shows this in its index of the site (as a 404 Error page): www.burlingtonmortgage.biz/MQnjO/idaho-mortgage-rates.asp The actual page on the site is: www.burlingtonmortgage.biz/idaho-mortgage-rates.asp Google is showing the folder MQnjO that doesn't exist anywhere on the remote. Other pages they are showing have different folder names that are just as wacky. We called our hosting company who said the problem isn't coming from them... Has anyone had something like this happen to them? Thanks so much for your insight!
Technical SEO | | ILM_Marketing
Megan0