How did my dev site end up in the search results?
-
We use a subdomain for our dev site. I never thought anything of it because the only way you can reach the dev site is through a vpn. Google has somehow indexed it. Any ideas on how that happened? I am adding the noindex tag, should I used canonical? Or is there anything else you can think of?
-
Personally, I'd still recommend using robots.txt to disallow all crawlers, even if more steps are taken.
-
Don't use tool removal, it can go bad indeed. Now, are you sure that there are no external links coming from anywhere?
For now I'd recommend putting noindex, nofollow on that dev subdomain and do manual recrawl through GWT.
-
It just uses internal links. Do you think I should try the webmaster tools removal? That seems like it could go wrong.
-
I never used screaming frog, does it check both external and internal links?
-
I have ran screaming frog to see if there are any links to any pages and but couldn't see any. Even if Google did try to follow it the firewall would stop them. It is so strange.
-
Then my first assumption is that it's linked from somewhere - read my comment a little above.
-
Then there is a leak somewhere - Google bots can "see" your subdomain.
Or it's been simply linked from somewhere. Then Google will try to follow the link and that would make it indexed.
-
They are telling me that there are no holes, and I have tried getting to the pages but can not do it unless I am on my vpn.
-
We never updated the robots.txt because the site was behind a firewall. If you click on any of the results it will not load the page unless on my VPN.
-
Robots.txt won't help anyhow. Bots still can see that there is such directory, they just won't see what's inside of those directories/subdomains.
-
Hi there.
If what you say is true, then there are only two answers: you got a leak somewhere or your settings/configuration is messed up.I'd say go talk to your system admin and make sure that everything what's supposed to be closed is closed, IPs, which are supposed to be open for use are open and those IPs only.
-
Have you updated the dev sites robots.txt to disallow everything? It is up to the bot to listen, but that combined with removing all of the dev URLs from Google Webmaster tools should do the trick.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Search Console Site Property Questions
I have a few questions regarding Google Search Console. Google Search Console tells you to add all versions of your website https, http, www, and non-www. 1.) Do I than add ALL the information for ALL versions? Sitemaps, preferred site, etc.? 2.) If yes, when I add sitemaps to each version, do I add the sitemap url of the site version I'm on or my preferred version? - For instance when adding a sitemap to a non-www version of the site, do I use the non-www version of the sitemap? Or since I prefer a https://www.domain.com/sitemap.xml do I use it there? 3.) When adding my preferred site (www or non-www) do I use my preferred site on all site versions? (https, http, www, and non-www) Thanks in advance. Answers vary throughout Google!
Intermediate & Advanced SEO | | Mike.Bean0 -
On-site Search - Revisited (again, *zZz*)
Howdy Moz fans! Okay so there's a mountain of information out there on the webernet about internal search results... but i'm finding some contradiction and a lot of pre-2014 stuff. Id like to hear some 2016 opinion and specifically around a couple of thoughts of my own, as well as some i've deduced from other sources. For clarity, I work on a large retail site with over 4 million products (product pages), and my predicament is thus - I want Google to be able to find and rank my product pages. Yes, I can link to a number of the best ones by creating well planned links via categorisation, silos, efficient menus etc (done), but can I utilise site search for this purpose? It was my understanding that Google bots don't/can't/won't use a search function... how could it? It's like expeciting it to find your members only area, it can't login! How can it find and index the millions of combinations of search results without typing in "XXXXL underpants" and all the other search combinations? Do I really need to robots.txt my search query parameter? How/why/when would googlebot generate that query parameter? Site Search is B.A.D - I read this everywhere I go, but is it really? I've read - "It eats up all your search quota", "search results have no content and are classed as spam", "results pages have no value" I want to find a positive SEO output to having a search function on my website, not just try and stifle Mr Googlebot. What I am trying to learn here is what the options are, and what are their outcomes? So far I have - _Robots.txt - _Remove the search pages from Google _No Index - _Allow the crawl but don't index the search pages. _No Follow - _I'm not sure this is even a valid idea, but I picked it up somewhere out there. _Just leave it alone - _Some of your search results might get ranked and bring traffic in. It appears that each and every option has it's positive and negative connotations. It'd be great to hear from this here community on their experiences in this practice.
Intermediate & Advanced SEO | | Mark_Elton0 -
Why is this url redirecting to our site?
I was doing an audit on our site and searching for duplicate content using some different terms from each of our pages. I came across the following result: www.sswug.org/url/32639 redirects to our website. Is that normal? There are hundreds of these url's in google all with the exact same description. I thought it was odd. Any ideas and what is the consequence of this?
Intermediate & Advanced SEO | | Sika220 -
Is this ok for content on our site?
We run a printing company and as an example the grey box (at the bottom of the page) is what we have on each page http://www.discountbannerprinting.co.uk/banners/vinyl-pvc-banners.html We used to use this but tried to get most of the content on the page, but we now want to add a bit more in-depth information to each page. The question i have is - would a 1200 word document be ok in there and not look bad to Google.
Intermediate & Advanced SEO | | BobAnderson0 -
I have search result pages that are completely different showing up as duplicate content.
I have numerous instances of this same issue in our Crawl Report. We have pages showing up on the report as duplicate content - they are product search result pages for completely different cruise products showing up as duplicate content. Here's an example of 2 pages that appear as duplicate : http://www.shopforcruises.com/carnival+cruise+lines/carnival+glory/2013-09-01/2013-09-30 http://www.shopforcruises.com/royal+caribbean+international/liberty+of+the+seas We've used Html 5 semantic markup to properly identify our Navigation <nav>, our search widget as an <aside>(it has a large amount of page code associated with it). We're using different meta descriptions, different title tags, even microformatting is done on these pages so our rich data shows up in google search. (rich snippet example - http://www.google.com/#hl=en&output=search&sclient=psy-ab&q=http:%2F%2Fwww.shopforcruises.com%2Froyal%2Bcaribbean%2Binternational%2Fliberty%2Bof%2Bthe%2Bseas&oq=http:%2F%2Fwww.shopforcruises.com%2Froyal%2Bcaribbean%2Binternational%2Fliberty%2Bof%2Bthe%2Bseas&gs_l=hp.3...1102.1102.0.1601.1.1.0.0.0.0.142.142.0j1.1.0...0.0...1c.1.7.psy-ab.gvI6vhnx8fk&pbx=1&bav=on.2,or.r_qf.&bvm=bv.44442042,d.eWU&fp=a03ba540ff93b9f5&biw=1680&bih=925 ) How is this distinctly different content showing as duplicate? Is SeoMoz's site crawl flawed (or just limited) and it's not understanding that my pages are not dupe? Copyscape does not identify these pages as dupe. Should we take these crawl results more seriously than copyscape? What action do you suggest we take? </aside> </nav>
Intermediate & Advanced SEO | | JMFieldMarketing0 -
A Site in Flash to Optimize
Hello, I have to understand if this site www.spacemilanmodels.com.pt can be optimize since the entire website is in flash wich is not good for optimizacion. What do you guys suggest? Recommendations? Is it possible only with link-building? Tks for the help!
Intermediate & Advanced SEO | | PedroM0 -
Does Google crawl the pages which are generated via the site's search box queries?
For example, if I search for an 'x' item in a site's search box and if the site displays a list of results based on the query, would that page be crawled? I am asking this question because this would be a URL that is non existent on the site and hence am confused as to whether Google bots would be able to find it.
Intermediate & Advanced SEO | | pulseseo0 -
Block all search results (dynamic) in robots.txt?
I know that google does not want to index "search result" pages for a lot of reasons (dup content, dynamic urls, blah blah). I recently optimized the entire IA of my sites to have search friendly urls, whcih includes search result pages. So, my search result pages changed from: /search?12345&productblue=true&id789 to /product/search/blue_widgets/womens/large As a result, google started indexing these pages thinking they were static (no opposition from me :)), but i started getting WMT messages saying they are finding a "high number of urls being indexed" on these sites. Should I just block them altogether, or let it work itself out?
Intermediate & Advanced SEO | | rhutchings0