How is Google finding our preview subdomains?
-
I've noticed that Google is able to find, crawl and index preview subdomains we set up for new client sites (e.g. clientpreview.example.com). I know now to use "meta name="robots" and robots.txt) to block the search engines from crawling these subdomains. My question though, is how is Google finding these subdomains? We don't link to these preview domains from anywhere else, so I can't figure out how Google is even getting there.
Does anybody have any insight on this?
-
Thanks for your response Irving. We put some of our preview sites on subdomains of our main domain, but then remove them after the site goes live, so their shouldn't be any duplicate content issues. The main question is just how Google is finding these subdomains.
-
Thanks for the insight guys.
-
I don't specifically use the Google Toolbar, but others in the office may (although I don't think so). It sounds like Chrome could be a potential source as well?
-
I think that this is a good idea. But you gotta be careful.
Our competitor (who ranked #1 and we ranked at #2) had their site redesigned and the design company included the noindex on every page. They forgot to take it off when the new design went live. It took them quite a while to figure it out and we enjoyed all of their sales for about a month.
We are #1 now and they are #2. Must have been a bad design job.
-
If the subdomains are added to WMT google will know about it. if you are designing sites for clients and putting them on your site as subdomains it behooves you to make sure 100% that their dev sites are not being seen by Google. It's duplicate content and your subdomain is the original source of this content. Looks unprofessional too
a) verify any subdomain you are creating for a client in WMT
b) block it in robots.txt and noindex nofollow all pages globally
c) for the ones that are already indexed, go into google WMT and go into that subdomain account and request removal of the site in Googles index. This will remove the indexing for that subdomain only don't worry it won't remove your main site from the index.
-
I would also consider adding a noindex tag if you want the urls removed.
-
I agree with Mat. You never know, but yes Chrome could be another major source. It also depends what you set as your privacy when you setup Chrome (Send anonymous usage data to Google, Yes/No ?) and so on.
-
We usually put them behind an .htaccess login now. We've had situations where the development site have been outranking the live site. Great demo of the power of on-site optimisation, but still a bit annoying for the client.
People used to always blame google toolbar for this. Likewise using chrome could potentially add something to the "to crawl" list. I wonder what the respective privacy policies say about that. I've also seen staging sites pick up links. When an external link on the staging site has been clicked it has alerted someone else, appeared as a link back/trackback etc.
-
The discovery can be from multiple mediums. Do you or the client have Google Toolbar installed ?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Not all images indexed in Google
Hi all, Recently, got an unusual issue with images in Google index. We have more than 1,500 images in our sitemap, but according to Search Console only 273 of those are indexed. If I check Google image search directly, I find more images in index, but still not all of them. For example this post has 28 images and only 17 are indexed in Google image. This is happening to other posts as well. Checked all possible reasons (missing alt, image as background, file size, fetch and render in Search Console), but none of these are relevant in our case. So, everything looks fine, but not all images are in index. Any ideas on this issue? Your feedback is much appreciated, thanks
Technical SEO | | flo_seo1 -
Subdomain as News Section instead of Source in Google News?
Hi, trying to dig into Google News for a large site, mostly containing news.
Technical SEO | | m.m
The structure of the site network is subdomain.domain.se, and each subdomain has it's own brand with it's own news: x.domain.se
y.domain.se
z.domain.se
etc... Each brand/subdomain is more or less to equate with its own subjectfield/section. In Google News every subdomain is configured with it's own Site Source url, but also having the set up with one section with the same url. It seems like they're getting conflicts in Google News, Google can't always figure out which news article to which brand. Example: an article owned by brand A, but it is sometimes happens that articles getting labeled as brand B in the news SERP, though the link takes you correctly to brand A. I am thinking that this config in News Publisher Center may be a problem? Anyone having any thoughts if that would be better if we delete all source urls except for domain.se-brand and then put all the other subdomains as sections? www.domain.se x.domain.se y.doamin.se z.domain.se Any smart thoughts on this one? Or anything else that could make this wrong labeling (all content included images are hosted in same domain for example). Regards,
Magnus0 -
What does Google PageSpeed measure?
What does the PageSpeed tool actually measure? Does it say that a webserver is fast or slow? Thanks in advanced!
Technical SEO | | DanielMulderNL0 -
Should I remove these pages from the Google index?
Hi there, Please have a look at the following URL http://www.elefant-tours.com/index.php?callback=imagerotator&gid=65&483. It's a "sitemap" generated by a Wordpress plug-in called NextGen gallery and it maps all the images that have been added to the site through this plugin, which is quite a lot in this case. I can see that these "sitemap" pages have been indexed by Google and I'm wondering whether I should remove these or not? In my opinion these are pages that a search engine would never would want to serve as a search result and pages that a visitor never would want to see. Attracting any traffic through Google images is irrelevant in this case. What is your advice? Block it or leave it indexed or something else?
Technical SEO | | Robbern0 -
Google Sitelinks
Is there anyway to control the sitelinks under a listing in Google? I have a group of lawyers where 1 of the them is showing up in the sitelinks. They want all of the lawyers to show up. Right now it is showing 1 lawyer, about page, contact us page, etc. Thanks!!!!
Technical SEO | | SixTwoInteractive0 -
Google Alerts News Images
I have a Google Alert setup which is pulling information from a blog. I am receiving images as part of the alert. The issue that I am having is that the images have nothing to do with the blog post. Is there a way to control what images are received in the alert. From what I have gathered, if it grabs an image it should be part of the blog post.
Technical SEO | | ricknakao0 -
Best blocking solution for Google
Posting this for Dave SottimanoI Here's the scenario: You've got a set of URLs indexed by Google, and you want them out quickly Once you've managed to remove them, you want to block Googlebot from crawling them again - for whatever reason. Below is a sample of the URLs you want blocked, but you only want to block /beerbottles/ and anything past it: www.example.com/beers/brandofbeer/beerbottles/1 www.example.com/beers/brandofbeer/beerbottles/2 www.example.com/beers/brandofbeer/beerbottles/3 etc.. To remove the pages from the index should you?: Add the Meta=noindex,follow tag to each URL you want de-indexed Use GWT to help remove the pages Wait for Google to crawl again If that's successful, to block Googlebot from crawling again - should you?: Add this line to Robots.txt: DISALLOW */beerbottles/ Or add this line: DISALLOW: /beerbottles/ "To add the * or not to add the *, that is the question" Thanks! Dave
Technical SEO | | goodnewscowboy0