Why are these results being showed as blocked by robots.txt?
-
If you perform this search, you'll see all m. results are blocked by robots.txt: http://goo.gl/PRrlI, but when I reviewed the robots.txt file: http://goo.gl/Hly28, I didn't see anything specifying to block crawlers from these pages.
Any ideas why these are showing as blocked?
-
Hi,
Your robots.txt file is very .. steroid healthy. It has his own universe
Are you 100% sure all of the entries are legit and clean ?
First thing I would do is to check Web M;aster Tools for the mobile subdomain. If you don't have it yet, that will be a good place to start - to verify the m subdomain.
Once in WeB Master Tools - you can debug this in no time.
Cheers.
-
but, even when i search from my mobile device, I get the same results (that m. is blocked)
-
I can't submit because I haven't claimed m. in GWT
-
If you haven't already done so, I recommend testing your robots.txt file against one of your mobile pages (such as m.healthline.com/treatments) in Google Webmaster Tools. You can do this by logging into GWT, then click Health, then Blocked URLs.
If you have already tested it in GWT, can you let us know what the results said?
-
Another good article from the community
-
So after a little it or research as I never ever came past this before as all the site we do are responsive, I found this
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=72462
It seems Google wont index a website that they think is a mobile website within the main serp, and vice verse ...
Hope that helps, cause it had me puzzled
Regards
John
-
Which directory are you storing your mobile website files within ...
-
Oh, sorry, on further investigation I see its just your mobile site that are being blocked ...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Schema Markup Validator vs. Rich Results Test
I am working on a schema markup project. When I test the schema code in the Schema Markup Validator, everything looks fine, no errors detected. However, when I test it in the Rich Results Test, a few errors come back.
Intermediate & Advanced SEO | | Collegis_Education
What is the difference between these two tests? Should I trust one over the other?1 -
Wildcarding Robots.txt for Particular Word in URL
Hey All, So I know that this isn't a standard robots.txt, I'm aware of how to block or wildcard certain folders but I'm wondering whether it's possible to block all URL's with a certain word in it? We have a client that was hacked a year ago and now they want us to help remove some of the pages that were being autogenerated with the word "viagra" in it. I saw this article and tried implementing it https://builtvisible.com/wildcards-in-robots-txt/ and it seems that I've been able to remove some of the URL's (although I can't confirm yet until I do a full pull of the SERPs on the domain). However, when I test certain URL's inside of WMT it still says that they are allowed which makes me think that it's not working fully or working at all. In this case these are the lines I've added to the robots.txt Disallow: /*&viagra Disallow: /*&Viagra I know I have the solution of individually requesting URL's to be removed from the index but I want to see if anybody has every had success with wildcarding URL's with a certain word in their robots.txt? The individual URL route could be very tedious. Thanks! Jon
Intermediate & Advanced SEO | | EvansHunt0 -
Baidu Spider appearing on robots.txt
Hi, I'm not too sure what to do about this or what to think of it. This magically appeared in my companies robots.txt file (literally magically appeared/text is below) User-agent: Baiduspider
Intermediate & Advanced SEO | | IceIcebaby
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: / I know that Baidu is the Google of China, but I'm not sure why this would appear in our robots.txt all of a sudden. Should I be worried about a hack? Also, would I want to disallow Baidu from crawling my companies website? Thanks for your help,
-Reed0 -
Is there a way to show random blocks of text to users without it affecting SEO? Cloaking for good?
My client has a pretty creative idea for his web copy. In the body of his page there will be a big block of text that contains random industry related terms but within that he will bold and colorize certain words that create a coherent sentence. Something to the effect of "cut through the noise with a marketing team that gets results". Get it? So if you were to read the paragraph word-for-word it would make no sense at all. It's basically a bunch of random words. He's worried this will affect his SEO and appear to be keyword stuffing to Google. My question is: Is there a way to block certain text on a webpage from search engines but show them to users? I guess it would be the opposite of cloaking? But it's still cloaking...isn't it? In the end we'll probably just make the block of text an image instead but I was just wondering if anyone has any creative solutions. Thanks!
Intermediate & Advanced SEO | | TheOceanAgency0 -
Robots.txt Question
For our company website faithology.com we are attempting to block out any urls that contain a ? mark to keep google from seeing some pages as duplicates. Our robots.txt is as follows: User-Agent: * Disallow: /*? User-agent: rogerbot Disallow: /community/ Is the above correct? We are wanting them to not crawl any url with a "?" inside, however we don't want to harm ourselves in seo. Thanks for your help!
Intermediate & Advanced SEO | | BMPIRE0 -
New server update + wrong robots.txt = lost SERP rankings
Over the weekend, we updated our store to a new server. Before the switch, we had a robots.txt file on the new server that disallowed its contents from being indexed (we didn't want duplicate pages from both old and new servers). When we finally made the switch, we somehow forgot to remove that robots.txt file, so the new pages weren't indexed. We quickly put our good robots.txt in place, and we submitted a request for a re-crawl of the site. The problem is that many of our search rankings have changed. We were ranking #2 for some keywords, and now we're not showing up at all. Is there anything we can do? Google Webmaster Tools says that the next crawl could take up to weeks! Any suggestions will be much appreciated.
Intermediate & Advanced SEO | | 9Studios0 -
Search Engine Blocked by robots.txt for Dynamic URLs
Today, I was checking crawl diagnostics for my website. I found warning for search engine blocked by robots.txt I have added following syntax to robots.txt file for all dynamic URLs. Disallow: /*?osCsid Disallow: /*?q= Disallow: /*?dir= Disallow: /*?p= Disallow: /*?limit= Disallow: /*review-form Dynamic URLs are as follow. http://www.vistastores.com/bar-stools?dir=desc&order=position http://www.vistastores.com/bathroom-lighting?p=2 and many more... So, Why should it shows me warning for this? Does it really matter or any other solution for these kind of dynamic URLs.
Intermediate & Advanced SEO | | CommercePundit0 -
Subdomains - duplicate content - robots.txt
Our corporate site provides MLS data to users, with the end goal of generating leads. Each registered lead is assigned to an agent, essentially in a round robin fashion. However we also give each agent a domain of their choosing that points to our corporate website. The domain can be whatever they want, but upon loading it is immediately directed to a subdomain. For example, www.agentsmith.com would be redirected to agentsmith.corporatedomain.com. Finally, any leads generated from agentsmith.easystreetrealty-indy.com are always assigned to Agent Smith instead of the agent pool (by parsing the current host name). In order to avoid being penalized for duplicate content, any page that is viewed on one of the agent subdomains always has a canonical link pointing to the corporate host name (www.corporatedomain.com). The only content difference between our corporate site and an agent subdomain is the phone number and contact email address where applicable. Two questions: Can/should we use robots.txt or robot meta tags to tell crawlers to ignore these subdomains, but obviously not the corporate domain? If question 1 is yes, would it be better for SEO to do that, or leave it how it is?
Intermediate & Advanced SEO | | EasyStreet0