Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Disallow wildcard match in Robots.txt
-
This is in my robots.txt file, does anyone know what this is supposed to accomplish, it doesn't appear to be blocking URLs with question marks
Disallow: /?crawler=1
Disallow: /?mobile=1Thank you
-
This is a good reply.
Everyone gets really confused because Robots.txt has very minor, partial wildcard support and that makes people think that Robots.txt files use Regex, which they do not. Instead of having some weird half and half implementation, it would be much better IMO if the Robots.txt initiative / directive were updated to say "yes, you can use full regular expressions with regards to URL string matching".
Many people are left in a kind of silly guessing game because Google doesn't 'properly' elaborate or invest in expanding the definitions to their currently (publicly) assumed end-game.
People assume that if "*" will match any string of characters, "?" will match any individual character when used in a robots.txt file. This would make sense, but it's not the case. AFAIK there are only one or two supported wildcard characters in Robots.txt and that's why people get confused, looking for escape characters and the suchlike.
-
Hi Amanda,
Those lines tell GoogleBot not to crawl urls that have that text fragments.
For example, wont crawl: domain.com/category/product**?mobile=1**BUT, that doesnt mean that will not crawl every URL with question marks. For that, the line should be like this:
Disallow: /*?I do highly recommend you to read this guides:
About /robots.txt - Official site - Robotstxt.org
Robots.txt - Moz
Robots.txt: the ultimate guide - YOAST
The Complete Guide to Robots.txt - PORTENTHope it helps.
Best luck.
GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What is the importance of exact match keywords for local SEO in service industry businesses?
I am working with a local service contractor. Several of his competitors have domain names with exact match keywords. Audits of competitor sites and use of other research tools reveals that their sites are behind in content and technical SEO. The competitor sites consistently rank higher in organic search results. I am new to SEO and I understand that some of my lack of clarity here is a result of not understanding the value of key word use in local SEO vs. wider efforts.
Technical SEO | | Andrew Woffenden8 -
Robots.txt allows wp-admin/admin-ajax.php
Hello, Mozzers!
Technical SEO | | AndyKubrin
I noticed something peculiar in the robots.txt used by one of my clients: Allow: /wp-admin/admin-ajax.php What would be the purpose of allowing a search engine to crawl this file?
Is it OK? Should I do something about it?
Everything else on /wp-admin/ is disallowed.
Thanks in advance for your help.
-AK:2 -
Robot.txt : How to block a specific file type in several subdirectories ?
Hello everyone ! I need help setting up a robot.txt. I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site. Block files of a specific file type (for example, .gif) | Disallow: /*.gif$ 2 questions : Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ? Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$ Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files. Let's say I want to block pdf files in all these 3 directories /fileadmin/directory1 /fileadmin/directory1/sub1 /fileadmin/directory1/sub1/pdf Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple : Disallow: /fileadmin/directory1*/ Many thanks in advance for any insight you may have.
Technical SEO | | LabeliumUSA0 -
Two META Robots tags on a page - which will win?
Hi, Does anybody know which meta-robots tag will "win" if there is more than one on a page? The situation:
Technical SEO | | jmueller
our CMS is not very flexible and so we have segments of META-Tags on the page that originate from templates.
Now any author can add any meta-tag from within his article-editor.
The logic delivering the pages does not care if there might be more than one meta-robots tag present (one from template, one from within the article). Now we could end up with something like this: Which one will be regarded by google & co?
First?
Last?
None? Thanks a lot,
Jan0 -
I accidentally blocked Google with Robots.txt. What next?
Last week I uploaded my site and forgot to remove the robots.txt file with this text: User-agent: * Disallow: / I dropped from page 11 on my main keywords to past page 50. I caught it 2-3 days later and have now fixed it. I re-imported my site map with Webmaster Tools and I also did a Fetch as Google through Webmaster Tools. I tweeted out my URL to hopefully get Google to crawl it faster too. Webmaster Tools no longer says that the site is experiencing outages, but when I look at my blocked URLs it still says 249 are blocked. That's actually gone up since I made the fix. In the Google search results, it still no longer has my page title and the description still says "A description for this result is not available because of this site's robots.txt – learn more." How will this affect me long-term? When will I recover my rankings? Is there anything else I can do? Thanks for your input! www.decalsforthewall.com
Technical SEO | | Webmaster1230 -
Oh no googlebot can not access my robots.txt file
I just receive a n error message from google webmaster Wonder it was something to do with Yoast plugin. Could somebody help me with troubleshooting this? Here's original message Over the last 24 hours, Googlebot encountered 189 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%. Recommended action If the site error rate is 100%: Using a web browser, attempt to access http://www.soobumimphotography.com//robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot. If your robots.txt is a static page, verify that your web service has proper permissions to access the file. If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure. If the site error rate is less than 100%: Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors. The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website. After you think you've fixed the problem, use Fetch as Google to fetch http://www.soobumimphotography.com//robots.txt to verify that Googlebot can properly access your site.
Technical SEO | | BistosAmerica0 -
No indexing url including query string with Robots txt
Dear all, how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt? Thanks!
Technical SEO | | HMK-NL0 -
Robots.txt File Redirects to Home Page
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/
Technical SEO | | kchandler0