Disallow statement - is this tiny anomaly enough to render Disallow invalid?
-
Google site search (site:'hbn.hoovers.com') indicates 171,000 results for this subdomain. That is not a desired result - this site has 100% duplicate content. We don't want SEs spending any time here.
Robots.txt is set up mostly right to disallow all search engines from indexing this site. That asterisk at the end of the disallow statement looks pretty harmless - but could that be why the site has been indexed?
User-agent: * Disallow: /*
-
Interesting. I'd never heard that before.
We've never had GA or GWT on these mirror sites before, so it's hard to say what Google is doing these days.
But the goal is definitely to make them and their contents invisible to SEs. We'll get GWT on there and start removing URLs.
Thanks!
-
The additional asterisk shouldn't do you any harm, although standard practice seems to be just putting the "/".
Does it seem like Google is still crawling this subdomain when you look at webmasters crawl stats? While the disallow function in robots.txt will usually stop bots from crawling, it doesn't prevent them from indexing or keeping pages indexed that were before the disallow was put in place. If you want these pages removed from the index, you can request it through webmasters and also use meta robots noindex as opposed to the robots.txt file. Moz has a good article about it here: http://moz.com/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
If you're just worried about bots crawling the subdomain, it's possible they've already stopped crawling it, but continue to index it due to history or additional indicators suggesting they should index it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I submitted Sitemaps from AIO SEO to google search console, if I now delete the AIO plugin, do my sitemaps become invalid?
I use Yoast as SEO for my new Wordpress website https://www.satisfiedshoes.com/, however I couldn't get the sitemaps with Yoast as it was giving me error 404, and regardless of what I tried, it wasn't working. So I then got the All In One SEO while still having Yoast installed, I easily got the AIO sitemaps and then submitted them successfully to the Google search console. My question is that now I got the sitemaps on Google, since I'd rather use Yoast, If I want to delete AIO, will the sidemaps given to Google become invalid? There is no point keeping both SEO plugins active right? Thank You
Technical SEO | | iamzain160 -
Invalid Microdata - How much of an impact does invalid microdata have on SERPS
Invalid Microdata - How much of an impact does invalid microdata have on SERPS?? The Low down. We are located in Australia We run our business on the Bigcommerce platform. Problem is Google is crawling our bigcommerce in USD and displaying our micro data (price in USD instead of AUD) How much of a problem is this in terms of SEO issues? We have seen a steady decline or many of our top 3 rankings shift down a few pegs to mid-bottom of top 10. We're also getting google shopping microdata warnings too. Hi, I am just wondering how we fix invalid micro data (Price) is displaying USD where we are located in Australia so it should be AUD. Solutions: Does anyone have a solution for this they can help me out with to resolve this microdata issue on the bigcommerce platform (stencil cornerstone based template)? Are there any other technical elements at first glance you note on our website that may be a potential cause in the SERP decline from top 3's to top 10's? URL https://wwww.fishingtackleshop.com.au
Technical SEO | | oceanstorm0 -
H1 on responsive pages - Not enough room
Hi everyone, I'm running a Real Estate site, and I'm wondering how to deal with H1 for a responsive version for my search results. My first idea is to have a dynamic H1 that changes according to the filters that are being used, for instance: 140 Apartments on Sale at Miami Beach with 2 bedrooms, so I just used 4 filters Now the problem arrives if a mobile user comes around, I can't show him the same H1 since I don't have enough room for it. Do you guys think it might be a problem if I just show that H1 just in html and not on the user sight?, or perhaps there is a way to switch the H1 whenever the responsive version is active or not. Any help would be much appreciated it.
Technical SEO | | JoaoCJ0 -
Medium sizes forum with 1000's of thin content gallery pages. Disallow or noindex?
I have a forum at http://www.onedirection.net/forums/ which contains a gallery with 1000's of very thin-content pages. We've currently got these photo pages disallowed from the main googlebot via robots.txt, but we do all the Google images crawler access. Now I've been reading that we shouldn't really use disallow, and instead should add a noindex tag on the page itself. It's a little awkward to edit the source of the gallery pages (and keeping any amends the next time the forum software gets updated). Whats the best way of handling this? Chris.
Technical SEO | | PixelKicks0 -
Have I done enough seo on this page to make a difference
Hi, my home page has been a thorn in my side for as long as i remember. On normal sites i am ok with seo but when it comes to my magazine site it is a whole new ball game as everything is different. I have been working with a developer who has told me to remove the intro to the site on the home page and to move the bottom of the site which was about the magazine but i am not sure if this is right. I want the site to rank well for the following Lifestyle Magazine now before our upgrade, we ranked well for this and other words, we were number one for a very long time and then stayed on the first page but now since the upgrade, i am jumping from page 9, 10, to six and not sure why that is happening. I would like to know if the advice i have been given is correct, have i done enough on the page to rank well for lifestyle magazine, or should i be doing what i have been taught previously where i should be having an intro to the site so google can pick up the words lifestyle magazine and other words. the site is www.in2town.co.uk many thanks for your input
Technical SEO | | ClaireH-1848860 -
Can I disallow my subdomain for penguin recover?
Hi, I have a site like BannerBuzz.com, before last penguin my site's all keywords were in good position in google, but after penguin hit on my website, my all keywords are going down and down day by day, i have done some changes in my website for improvement, but in 1 change i have some confusion. i have one sub domain (http://reviews.bannerbuzz.com/), which display my websites all keywords user reviews, in which every category's 15 reviews are display in my website http://www.bannerbuzz.com so are those user reviews consider as duplicate content between sub domain and main website. can i disallow sub domain from all search engine? currently sub domain is open for all search engine, is that helpful to block it? Thanks
Technical SEO | | CommercePundit0 -
Should search pages be disallowed in robots.txt?
The SEOmoz crawler picks up "search" pages on a site as having duplicate page titles, which of course they do. Does that mean I should put a "Disallow: /search" tag in my robots.txt? When I put the URL's into Google, they aren't coming up in any SERPS, so I would assume everything's ok. I try to abide by the SEOmoz crawl errors as much as possible, that's why I'm asking. Any thoughts would be helpful. Thanks!
Technical SEO | | MichaelWeisbaum0 -
How do you disallow HTTPS?
I currently have a site (startuploans.org) that runs everything as http, recently we decided to start an online application to process loan apps. Now, for one certain section we configured ssl to work (https://www.startuploans.org/secure/). If I go to the HTTPS url for any of my other pages they show up...I was going to just 301 everything from https but because it is in a subdirectiory I can't... Also, canonical URL's won't work either because it's a totally different system and the pages are generated in an odd manor. It's really just 1 page that needs to be disallowed.. Is there any way to disallow all HTTPS requests from robots.txt while keeping all the HTTP requests working as normal?
Technical SEO | | WebsiteConsultants0