Google indexing staging / development site that is redirected...
-
Hi Moz Fans! - Please help.
We had a acme.stagingdomain.com while a site was in development, when it went live it redirected (302) to acmeprofessionalservices.com (real names redacted!!)
no known external links to staging site
although staging site url has been emailed from Google Apps(!!!)
now found that staging site is in the index even though it redirects to the proper public site.
and some (but not all) of the pages are in the index too. They all redirect to the proper public site when visited.
It is convenient to have a redirect from the staging site to the new one for the team, Chrome etc. remember frequently visited sites. Be a shame to lose that.
Yes, these pages can be removed using webmaster tools.
But how did they get in the index to start with?And if we're building a new site, and a customer has an existing site is there a danger of duplicate content etc. penalties caused by the staging site?
We had a similar incident recently when a PDF that was not linked anywhere on the site appeared in the index. The link had been emailed through Google Apps, and visited in Chrome, but that was it.
So 3 questions.
Why is the staging site still in the index despite the redirects?
How did they get in the index in the first place?
Will the new staging site affect the rank of the existing site, eg. duplicate content penalties?
-
Hi There
1. It could still be in the index because they are 302 redirect and not 301. 302 is temporary, and therefore Google may not de-index those URLs. It also takes time. I've seen Google take months to noindex redirecting URLs. Also, make sure you are not blocking crawling of the dev site, or Google will not see the redirects.
2. I am not sure how they got there to begin with. I pretty much always can find some sort of error - maybe someone tweeted a staging URL, maybe crawling wasn't blocked, maybe there was one link to staging from the live site etc etc. Regardless - somehow Google crawled it To prevent this in the future always block crawling of staging servers well before you ever put anything on them.
3. Usually Google tries to sort this out. They won't give you a penalty for "technical" duplicate content (penalties are more for "malicious" duplicate content ie: stealing people's content). So you won't get penalized, but the more you can help Google out by sorting it out, the more time Google can spend crawling the correct site etc.
What I would do now is, if you do want the staging URLs to redirect (which might not be the best solution if you want to ever go back and work on the staging server again) - but if you do, use 301 redirects and make sure you are allowing crawling of the staging site. Keep it registered in webmaster tools and this way you can monitor the indexation levels.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to effectively de-index in Magento site?
We have thousands of Missing Description issues but most of them are account/login pages. i.s. /customer/account/ etc... We tried to de-index them through the Configuration using the instructions here - https://docs.magento.com/user-guide/marketing/search-engine-robots.html But they're still appearing as issues in the Site Crawl. Even without the site crawl issue, we don't really want these to appear in the SERPs. Does anybody know how to properly de-index these login pages in Magento? Thank you!
Technical SEO | | LASClients0 -
Weird Google indexing issues with www being forced
IM working on a site which is really not indexing as it should, I have created a sitemap.xml which I thought would fix the issue but it hasn't, what seems to be happening is the Google is making www pages canonical for some of the site and without www for the rest. the site should be without www. see images attached for a visual explanation.
Technical SEO | | Donsimong
when adding pages in Google search console without www some pages cannot be indexed as Google thinks the www version is canonical, and I have no idea why, there is no canonical set up at all, what I would do if I could is to add canonical tags to each page to pint to the non www version, but the CMA does not allow for canonical. not quite sure how to proceed, how to tell google that the non www version is in fact correct, I dont have any idea why its assuming www is canonical either??? k11cGAv zOuwMxv0 -
How to check if an individual page is indexed by Google?
So my understanding is that you can use site: [page url without http] to check if a page is indexed by Google, is this 100% reliable though? Just recently Ive worked on a few pages that have not shown up when Ive checked them using site: but they do show up when using info: and also show their cached versions, also the rest of the site and pages above it (the url I was checking was quite deep) are indexed just fine. What does this mean? thank you p.s I do not have WMT or GA access for these sites
Technical SEO | | linklander0 -
How to know how much pages are indexed on Google?
I have a big site, there are a way to know what page are not indexed? I know that you can use site: but with a big site is a mess to check page by page. This is a tool or a system to check a entire site and automatically find non-indexed pages?
Technical SEO | | markovald0 -
Link to Articles for news sites in Google SERPs
I'm trying to figure out why when I search for "international news" or "world news", for example, some sites in the SERPs have links to news articles, while others don't. For "international news", result of Fox News and New York Times have links to articles, while CNN (the top result), only have sitelinks. I would appreciate any theories on why this happens. Thanks.
Technical SEO | | seoFan210 -
Google Not Indexed WWW name
Here is my domain - http://www.plugnbuy.com . When i see through "site" google not showing with WWW index but the same when i do without WWW.. it is showing in search. So yesturday i changed the setting from GWM to preferred domain as a WWW appear but today still not showing anything... Please help..
Technical SEO | | mamuti0 -
301 redirect from root to /index.aspx
I have taken over the SEO for www.domain.net. The way i've inherited the setup is that www.domain.net is 301 redirected to www.domain.net/index.aspx Looking at top pages and linking root domains in Opensiteexplorer I can see that www.domain.net/index.aspx has 1,006 linking root domains www.domain.net has 806 linking root domains. I assume that www.domain.net is passing the value of it's 806 domain links to www.domain.net/index.aspx via the 301 redirect and because of this would expect www.domain.net/index.aspx to be the strongest page on the site and be the url that ranks in the listings for many relevant searches. It appears however that www.domain.net is what is shown in listings and not www.domain.net/index.aspx ?? Can anyone explain why this might be?? If I do a site: search in Google then www.domain.net is indexed and not www.domain.net/index.aspx ??
Technical SEO | | QubaSEO0