Disallow statement - is this tiny anomaly enough to render Disallow invalid?
-
Google site search (site:'hbn.hoovers.com') indicates 171,000 results for this subdomain. That is not a desired result - this site has 100% duplicate content. We don't want SEs spending any time here.
Robots.txt is set up mostly right to disallow all search engines from indexing this site. That asterisk at the end of the disallow statement looks pretty harmless - but could that be why the site has been indexed?
User-agent: * Disallow: /*
-
Interesting. I'd never heard that before.
We've never had GA or GWT on these mirror sites before, so it's hard to say what Google is doing these days.
But the goal is definitely to make them and their contents invisible to SEs. We'll get GWT on there and start removing URLs.
Thanks!
-
The additional asterisk shouldn't do you any harm, although standard practice seems to be just putting the "/".
Does it seem like Google is still crawling this subdomain when you look at webmasters crawl stats? While the disallow function in robots.txt will usually stop bots from crawling, it doesn't prevent them from indexing or keeping pages indexed that were before the disallow was put in place. If you want these pages removed from the index, you can request it through webmasters and also use meta robots noindex as opposed to the robots.txt file. Moz has a good article about it here: http://moz.com/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
If you're just worried about bots crawling the subdomain, it's possible they've already stopped crawling it, but continue to index it due to history or additional indicators suggesting they should index it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Invalid Microdata - How much of an impact does invalid microdata have on SERPS
Invalid Microdata - How much of an impact does invalid microdata have on SERPS?? The Low down. We are located in Australia We run our business on the Bigcommerce platform. Problem is Google is crawling our bigcommerce in USD and displaying our micro data (price in USD instead of AUD) How much of a problem is this in terms of SEO issues? We have seen a steady decline or many of our top 3 rankings shift down a few pegs to mid-bottom of top 10. We're also getting google shopping microdata warnings too. Hi, I am just wondering how we fix invalid micro data (Price) is displaying USD where we are located in Australia so it should be AUD. Solutions: Does anyone have a solution for this they can help me out with to resolve this microdata issue on the bigcommerce platform (stencil cornerstone based template)? Are there any other technical elements at first glance you note on our website that may be a potential cause in the SERP decline from top 3's to top 10's? URL https://wwww.fishingtackleshop.com.au
Technical SEO | | oceanstorm0 -
Fetch and Render misses middle chunk of page
Hey folks, I was checking out a site with the Search Console's "Fetch and Render" function and found something potentially worrisome. A big chunk of the middle of the page (the homepage) shows up as empty space in the preview render window. The site isn't doing so hot in terms of rankings, and I'm wondering if this issue is causing it (since it could indicate that 80% of the copy on the homepage is invisible to Google) A few other details: The specific content isn't showing in either view. Both the "What Google sees" and "What the visitor sees" are missing this chunk of the page The content IS visible in cached versions of the page The html for the content seems to be in the The "Fetch" part returns "Complete" as opposed to "Partial" so I don't THINK it's a matter of javascript stuff getting blocked by robots.txt This website was built using the Wordpress theme "Suco" and the parts of the page that aren't rendering are all built with the Themify Builder tool Not ALL of the Themify Builder elements are showing up as blank. There's a slider element that's rendering just fine Any ideas on what could cause whole portions of a page not to show up in Fetch and Render? Thanks!
Technical SEO | | BrianAlpert780 -
Have I done enough seo on this page to make a difference
Hi, my home page has been a thorn in my side for as long as i remember. On normal sites i am ok with seo but when it comes to my magazine site it is a whole new ball game as everything is different. I have been working with a developer who has told me to remove the intro to the site on the home page and to move the bottom of the site which was about the magazine but i am not sure if this is right. I want the site to rank well for the following Lifestyle Magazine now before our upgrade, we ranked well for this and other words, we were number one for a very long time and then stayed on the first page but now since the upgrade, i am jumping from page 9, 10, to six and not sure why that is happening. I would like to know if the advice i have been given is correct, have i done enough on the page to rank well for lifestyle magazine, or should i be doing what i have been taught previously where i should be having an intro to the site so google can pick up the words lifestyle magazine and other words. the site is www.in2town.co.uk many thanks for your input
Technical SEO | | ClaireH-1848860 -
Spider Indexed Disallowed URLs
Hi there, In order to reduce the huge amount of duplicate content and titles for a cliënt, we have disallowed all spiders for some areas of the site in August via the robots.txt-file. This was followed by a huge decrease in errors in our SEOmoz crawl report, which, of course, made us satisfied. In the meanwhile, we haven't changed anything in the back-end, robots.txt-file, FTP, website or anything. But our crawl report came in this November and all of a sudden all the errors where back. We've checked the errors and noticed URLs that are definitly disallowed. The disallowment of these URLs is also verified by our Google Webmaster Tools, other robots.txt-checkers and when we search for a disallowed URL in Google, it says that it's blocked for spiders. Where did these errors came from? Was it the SEOmoz spider that broke our disallowment or something? You can see the drop and the increase in errors in the attached image. Thanks in advance. [](<a href=)" target="_blank">a> [](<a href=)" target="_blank">a> LAAFj.jpg
Technical SEO | | ooseoo0 -
Is Noindex Enough To Solve My Duplicate Content Issue?
Hello SEO Gurus! I have a client who runs 7 web properties. 6 of them are satellite websites, and 7th is his company's main website. For a long while, my company has, among other things, blogged on a hosted blog at www.hismainwebsite.com/blog, and when we were optimizing for one of the other satellite websites, we would simply link to it in the article. Now, however, the client has gone ahead and set up separate blogs on every one of the satellite websites as well, and he has a nifty plug-in set up on the main website's blog that pipes in articles that we write to their corresponding satellite blog as well. My concern is duplicate content. In a sense, this is like autoblogging -- the only thing that doesn't make it heinous is that the client is autoblogging himself. He thinks that it will be a great feature for giving users to his satellite websites some great fresh content to read -- which I agree, as I think the combination of publishing and e-commerce is a thing of the future -- but I really want to avoid the duplicate content issue and a possible SEO/SERP hit. I am thinking that a noindexing of each of the satellite websites' blog pages might suffice. But I'd like to hear from all of you if you think that even this may not be a foolproof solution. Thanks in advance! Kind Regards, Mike
Technical SEO | | RCNOnlineMarketing0 -
How long does it take for traffic to bounce back from and accidental robots.txt disallow of root?
We accidentally uploaded a robots.txt disallow root for all agents last Tuesday and did not catch the error until yesterday.. so 6 days total of exposure. Organic traffic is down 20%. Google has since indexed the correct version of the robots.txt file. However, we're still seeing awful titles/descriptions in the SERPs and traffic is not coming back. GWT shows that not many pages were actually removed from the index but we're still seeing drastic rankings decreases. Anyone been through this? Any sort of timeline for a recovery? Much appreciated!
Technical SEO | | bheard0 -
How to disallow google and roger?
Hey Guys and girls, i have a question, i want to disallow all robots from accessing a certain root link: Get rid of bots User-agent: * Disallow: /index.php?_a=login&redir=/index.php?_a=tellafriend%26productId=* Will this make the bots not to access any web link that has the prefix you see before the asterisk? And at least google and roger will get away by reading "user-agent: *"? I know this isn't the standard proceedure but if it works for google and seomoz bot we are good.
Technical SEO | | iFix0 -
Differences between Lynx Viewer, Fetch as Googlebot and SEOMoz Googlebot Rendering
Three tools to render a site as Googlebot would see it: SEOMoz toolbar.
Technical SEO | | qlkasdjfw
Lynxviewer (http://www.yellowpipe.com/yis/tools/lynx/lynx_viewer.php )
Fetch as Googlebot. I have a website where I can see dropdown menus in regular browser rendering, Lynxviewer and Fetch as Googlebot. However, in the SEOMoz toolbar 'render as googlebot' tool, I am unable to see these dropdown menus when I have javascript disabled. Does this matter? Which of these tools is a better way to see how googlebot views your site?0