Disallow wildcard match in Robots.txt

AmandaBridge

This is in my robots.txt file, does anyone know what this is supposed to accomplish, it doesn't appear to be blocking URLs with question marks

Disallow: /?crawler=1
Disallow: /?mobile=1

Thank you

effectdigital

This is a good reply.

Everyone gets really confused because Robots.txt has very minor, partial wildcard support and that makes people think that Robots.txt files use Regex, which they do not. Instead of having some weird half and half implementation, it would be much better IMO if the Robots.txt initiative / directive were updated to say "yes, you can use full regular expressions with regards to URL string matching".

Many people are left in a kind of silly guessing game because Google doesn't 'properly' elaborate or invest in expanding the definitions to their currently (publicly) assumed end-game.

People assume that if "*" will match any string of characters, "?" will match any individual character when used in a robots.txt file. This would make sense, but it's not the case. AFAIK there are only one or two supported wildcard characters in Robots.txt and that's why people get confused, looking for escape characters and the suchlike.

Gaston Riera

Hi Amanda,

Those lines tell GoogleBot not to crawl urls that have that text fragments.
For example, wont crawl: domain.com/category/product**?mobile=1**

BUT, that doesnt mean that will not crawl every URL with question marks. For that, the line should be like this:
Disallow: /*?

I do highly recommend you to read this guides:
About /robots.txt - Official site - Robotstxt.org
Robots.txt - Moz
Robots.txt: the ultimate guide - YOAST
The Complete Guide to Robots.txt - PORTENT

Hope it helps.
Best luck.
GR

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Disallow wildcard match in Robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Robots.txt allows wp-admin/admin-ajax.php

I have two robots.txt pages for www and non-www version. Will that be a problem?

Crawl solutions for landing pages that don't contain a robots.txt file?

Log in, sign up, user registration and robots

Should I block robots from URLs containing query strings?

Internal search : rel=canonical vs noindex vs robots.txt

301 Redirect with an Exact Domain name Match

Is blocking RSS Feeds with robots.txt necessary?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved