Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Best way to remove full demo (staging server) website from Google index
-
I've recently taken over an in-house role at a property auction company, they have a main site on the top-level domain (TLD) and 400+ agency sub domains!
I recently found that the web development team have a demo domain per site, which is found on a subdomain of the original domain - mirroring the site. The problem is that they have all been found and indexed by Google:
Obviously this is a problem as it is duplicate content and so on, so my question is... what is the best way to remove the demo domain / sub domains from Google's index?
We are taking action to add a noindex tag into the header (of all pages) on the individual domains but this isn't going to get it removed any time soon! Or is it?
I was also going to add a robots.txt file into the root of each domain, just as a precaution! Within this file I had intended to disallow all.
The final course of action (which I'm holding off in the hope someone comes up with a better solution) is to add each demo domain / sub domain into Google Webmaster and remove the URLs individually.
Or would it be better to go down the canonical route?
-
Why couldn't I just put a password on the staging site, and let Google sort out the rest? Just playing devil's advocate.
-
If you've enough time to verify each subdomain in WMT and also removing 400+ domains one by one, then you can go for solution 2. You can't remove subdomain from verified WMT account of main domain, that's why you need to verify each domain.
Adding canonical is a better option, it wouldn't remove all of the demo domains from Google's index rapidly, you have to wait for few months, but you'll be on the safe side.
-
Out of curiosity, why wouldn't you recommend solution 2?
You mentioned that you faced a similar kind of situation in the past, how did that work out? Which of the 3 solutions (or all) did you opt for?
-
Good advice but an IP restriction for the demo sites won't be possible on this occasion as our router throws out a range of different IP addresses and we occasionally need the sites to be viewed externally! Any other suggestions to help?
-
I'd also recommend putting in an IP restriction for any of the demo sites.
So that if anyone visits the demo sites from a non-whitelisted IP address, then you can display an error message, or simply redirect them over to the live site.
That will likely have the search results quickly removed from the search engine.
Hope this helps!
-- Jeff
-
Solution 1:
Add robots.txt on all demo domains and block them, or add noindex in their header.
Solution 2: Verify each domain in webmaster tools and remove it entirely from the link removal section ( I wouldn't recommend this).
Solution 3:
If your both domains like agency1.domain.com and demo.agency1.domain.com have same coding and are clone then you should just add canonical url to the agency1.domain.com and canonical will be http://agency.domain.com/ it will work if it will be automatically shown in the demo domain. if it doesn't show up in the demo domain automatically then add the same canonical to the demo domain.
It will take some time to deindexed from serps, but it will surely work. I've faced the same kind of situation in past.
-
Noindex is your best option, really. It might take weeks, but I don't think any other method is going to be faster. Plus, technically speaking, "noindex" is the proper method for what you want to do - canonical tags or a robots.txt may do the job, but they aren't exactly the right way.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best way to set up URL structure for reviews off of PDP pages.
We are adding existing customer reviews to Product Detail Pages pages. There are about 300 reviews per product so we're going to have to paginate reviews off of the PDP page. I'm wondering what the best url structure for reviews pages is to get the most seo benefit. For example, would it be something like this? site.com/category/product/reviews/page-1 or something that used parameters, such as: site.com/reviews?product=a Also, what is the best way to show that the internal link on the PDP page to "All Reviews" is a higher priority link than the other links on the page?
Intermediate & Advanced SEO | | katseo10 -
How do internal search results get indexed by Google?
Hi all, Most of the URLs that are created by using the internal search function of a website/web shop shouldn't be indexed since they create duplicate content or waste crawl budget. The standard way to go is to 'noindex, follow' these pages or sometimes to use robots.txt to disallow crawling of these pages. The first question I have is how these pages actually would get indexed in the first place if you wouldn't use one of the options above. Crawlers follow links to index a website's pages. If a random visitor comes to your site and uses the search function, this creates a URL. There are no links leading to this URL, it is not in a sitemap, it can't be found through navigating on the website,... so how can search engines index these URLs that were generated by using an internal search function? Second question: let's say somebody embeds a link on his website pointing to a URL from your website that was created by an internal search. Now let's assume you used robots.txt to make sure these URLs weren't indexed. This means Google won't even crawl those pages. Is it possible then that the link that was used on another website will show an empty page after a while, since Google doesn't even crawl this page? Thanks for your thoughts guys.
Intermediate & Advanced SEO | | Mat_C0 -
New Subdomain & Best Way To Index
We have an ecommerce site, we'll say at https://example.com. We have created a series of brand new landing pages, mainly for PPC and Social at https://sub.example.com, but would also like for these to get indexed. These are built on Unbounce so there is an easy option to simply uncheck the box that says "block page from search engines", however I am trying to speed up this process but also do this the best/correct way. I've read a lot about how we should build landing pages as a sub-directory, but one of the main issues we are dealing with is long page load time on https://example.com, so I wanted a kind of fresh start. I was thinking a potential solution to index these quickly/correctly was to make a redirect such as https://example.com/forward-1 -> https:sub.example.com/forward-1 then submit https://example.com/forward-1 to Search Console but I am not sure if that will even work. Another possible solution was to put some of the subdomain links accessed on the root domain say right on the pages or in the navigation. Also, will I definitely be hurt by 'starting over' with a new website? Even though my MozBar on my subdomain https://sub.example.com has the same domain authority (DA) as the root domain https://example.com? Recommendations and steps to be taken are welcome!
Intermediate & Advanced SEO | | Markbwc0 -
Pages are Indexed but not Cached by Google. Why?
Here's an example: I get a 404 error for this: http://webcache.googleusercontent.com/search?q=cache:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all But a search for qjamba restaurant coupons gives a clear result as does this: site:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all What is going on? How can this page be indexed but not in the Google cache? I should make clear that the page is not showing up with any kind of error in webmaster tools, and Google has been crawling pages just fine. This particular page was fetched by Google yesterday with no problems, and even crawled again twice today by Google Yet, no cache.
Intermediate & Advanced SEO | | friendoffood2 -
Proper 301 in Place but Old Site Still Indexed In Google
So i have stumbled across an interesting issue with a new SEO client. They just recently launched a new website and implemented a proper 301 redirect strategy at the page level for the new website domain. What is interesting is that the new website is now indexed in Google BUT the old website domain is also still indexed in Google? I even checked the Google Cached date and it shows the new website with a cache date of today. The redirect strategy has been in place for about 30 days. Any thoughts or suggestions on how to get the old domain un-indexed in Google and get all authority passed to the new website?
Intermediate & Advanced SEO | | kchandler0 -
How to get google to categorize a website in search results?
Hello everyone and thanks in advance for your time. I have a good understanding about SEO, backlinks etc but nowhere near to professional! A good friend of mine has an online store made with opencart e commerce platform he would like to have have category view when his company name is searched on google. Does anyone has any idea how can this be achieved?
Intermediate & Advanced SEO | | superofelia0 -
Best way to permanently remove URLs from the Google index?
We have several subdomains we use for testing applications. Even if we block with robots.txt, these subdomains still appear to get indexed (though they show as blocked by robots.txt. I've claimed these subdomains and requested permanent removal, but it appears that after a certain time period (6 months)? Google will re-index (and mark them as blocked by robots.txt). What is the best way to permanently remove these from the index? We can't use login to block because our clients want to be able to view these applications without needing to login. What is the next best solution?
Intermediate & Advanced SEO | | nicole.healthline0 -
Best practice for removing indexed internal search pages from Google?
Hi Mozzers I know that it’s best practice to block Google from indexing internal search pages, but what’s best practice when “the damage is done”? I have a project where a substantial part of our visitors and income lands on an internal search page, because Google has indexed them (about 3 %). I would like to block Google from indexing the search pages via the meta noindex,follow tag because: Google Guidelines: “Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.” http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35769 Bad user experience The search pages are (probably) stealing rankings from our real landing pages Webmaster Notification: “Googlebot found an extremely high number of URLs on your site” with links to our internal search results I want to use the meta tag to keep the link juice flowing. Do you recommend using the robots.txt instead? If yes, why? Should we just go dark on the internal search pages, or how shall we proceed with blocking them? I’m looking forward to your answer! Edit: Google have currently indexed several million of our internal search pages.
Intermediate & Advanced SEO | | HrThomsen0