Development site crawled
-
We just found out our password protected development site has been crawled. We are worried about duplicate content - what are the best steps to take to correct this beyond adding to robots.txt?
-
Unfortunately, robots.txt won't prevent your site from being crawled and indexed if there is a link from an external site pointing to yours. What you need to do is use
on all your development pages. I don't know how big your site is, so this may or may not be a lot of work. Do this, then after the next Google crawl, your pages will be dropped from the SERPs.
-
Thanks Stephen & Kyle! We had the site behind a login, so we're not sure how this happened. Any idea?
-
Put the site behind a login
-
Oops! That sounds unfortunate, Marcy. How did that happen?
Once you have added the correct rules to the robots.txt - I'm guessing you're using "Disallow: /" - you can request, if your development site is registered in Google Webmaster Tools, that Google remove the site from its index.
www.google.com/webmasters/tools/url-removal
Hope that helps,
K
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Mobile site crawl returns poorer results on 100% responsive site
Has anyone experienced an issue where Google Mobile site crawl returns poorer results than their Desktop site crawl on a 100% responsive website that passes all Google Mobile tests?
Intermediate & Advanced SEO | | MFCommunications0 -
Subdomained White-Label Sites
Wanted to pass along a specific use-case that I'm thinking through in the technical setup for a client. Site: http://www.abc.com is an ecommerce company that offers the ability to white-label a site so an affiliate can join and get access to the site, and ultimately get a cut of whatever is sold through that affiliate. So I join the site and get access to scott.xyz.com and can handle my business through that. From a technical standpoint, this is the proposed technical setup of the site. Canonical URLS will be set to www.xyz.com Pages on scott.xyz.com will be set to noindex, while the main www.xyz.com will be set to be indexed Webmaster Tools for scott.xyz.com will be set to have preferred domain of www.xyz.com scott.xyz.com will have separate robots.txt instructing to block crawl Questions Am I missing any steps in properly setting up the technical background of the subdomain sites? The use of subdomains isn't something that I am able to move away from. Will any links in to scott.xyz.com pass juice and authority to www.xyz.com, or does the noindex/nocrawl block that from happening? Is there anything else that I am missing? Thanks!
Intermediate & Advanced SEO | | RosemarieReed
Scott0 -
Wrong titles in site links
Hello fellow marketers, I have found this weird thing with our website in the organic results. The sitelinks in the SERP shows wrong written text. As in grammatically incorrect text. My question is where does Google get the text from? It is not the page title as we can see it. kKsFv0X.png
Intermediate & Advanced SEO | | auke18101 -
Severe health issues are found on your site. - Check site health (GWT)
Hi, We run a Magento website - When i log in to Google Webmaster Tools, I am getting this message: Severe health issues are found on your site. - <a class="GNHMM2RBFH">Check site health
Intermediate & Advanced SEO | | bjs2010
</a>Is robots.txt blocking important pages? Some important page is blocked by robots.txt. Now, this is the weird part - the page being blocked is the admin page of magento - under
www.domain.com/index.php/admin/etc..... Now, this message just wont go away - its been there for days now - so why does Google think this is an "important page"? It doesnt normally complain if you block other parts of the site ?? Any ideas? THanks0 -
How can i redirect my site to other domain ?
I have been running an eCommerce site since 2008 and have a PR3 with mostly have an authority link from reputed sites, how can I transfer my existing eCommerce site to the new domain so in the new domain i get SEO value from the old domain. Please advice.
Intermediate & Advanced SEO | | chandubaba0 -
Stop Google crawling a site at set times
Hi All I know I can use robots.txt to block Google from pages on my site but is there a way to stop Google crawling my site at set times of the day? Or to request that they crawl at other times? Thanks Sean
Intermediate & Advanced SEO | | ske110 -
PDF on financial site that duplicates ~50% of site content
I have a financial advisor client who has a downloadable PDF on his site that contains about 9 pages of good info. Problem is much of the content can also be found on individual pages of his site. Is it best to noindex/follow the pdf? It would be great to let the few pages of original content be crawlable, but I'm concerned about the duplicate content aspect. Thanks --
Intermediate & Advanced SEO | | 540SEO0 -
Can your site be penalised by backlinks?
Hi, I just wanted to get some clarification on whether Google would penalize your site if you had many links coming from a questionable site. We've been struggling with rankings for years even though we have one of the oldest sites in the industry with a good link profile and the site is well optimized. I was looking through webmaster tools and noticed that one website links to us over 100,000 times, all to the home page. The site is www.vietnamfuntravel.com. When I looked at the site it seems that they operate a massive links exchange, I'm not sure what the history is and why they link to us so much though. Is there any chance that this could impact us negatively? if it is then what would be the best way to deal with the situation? I could ask them to take the links down but can't guarantee they would do it quickly (if at all). Would blocking their domain from our htaccess file have the desired effect?
Intermediate & Advanced SEO | | Maximise0