What does Disallow: /french-wines/?* actually do - robots.txt
-
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?*
Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark?
Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL?
I think this has been done to block URLs containing query strings.
Thanks, Luke
-
Glad to help, Luke!
-
Thanks Logan for your help with this - much appreciated. Really helpful!
-
Disallow: /?* is the same thing as Disallow:/?, since the asterisk is a wildcard, both of those disallows prevent any URL that begins with /? from being crawled.
And yes, it is incredibly easy to disallow the wrong thing! The robots.txt tester in Search Console (under the Crawl menu) is very helpful for figuring out what a disallow will catch and what it will let by. I highly recommend testing any new disallows there before releasing them into the wild.
-
Thanks again Logan.
What would Disallow: /?* do because that is what the site I am looking at has implemented. Perhaps it works both ways around?
I imagine it's easy to disallow the wrong thing or possibly not disallow the right thing. Ugh.
-
Disallow: /*?
This disallow literally says to crawlers 'if a URL starts with a slash (all URLs) and has a parameter, don't crawl it'. The * is a wildcard that says anything between / and ? is applicable to the disallow.
It's very easy to disallow the wrong this especially in regards to parameters, for this reason I always do these 2 things rather than using robots.txt:
- Set the purpose of each parameter in Search Console - Go to Crawl > URL Parameters to configure for your site
- Self-referring canonicals - most people disallow URLs with parameters in robots.txt to prevent indexing, but this only prevents crawling. A self-referring canonical pointing to the root level of that URL will prevent indexing or URLs with parameters.
Hope that's helpful!
-
Thanks Logan - I was just reading: Disallow: /*? # block any URL that includes a ? (and thus a query string) - do you know why the ? comes before the * in this case?
-
Hi Luke,
You are correct that this was done to block URLs with parameters. However, since there's no wildcard (the asterisk) before the folder name, the URL would have to start with /french-wines/. This disallow is really only preventing crawling on the single URL www.yoursite.com/french-wines/ with any parameters appended.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What should I include in disavow file and/or reconsideration request?
My client got a manual penalty notice. Need to submit a disavow file and reconsideration request which is new territory for me. The task of contacting/disavowing 100's of sites to remove 1000's of links is a bit overwhelming. Answers to any of these questions would be greatly appreciated. Search console is showing 100's of hacked websites pointing to the site. Many of the incoming links showing in search console are already gone. Should I include in the disavow file or is the disavow file only for links that persist? I have read that Google does not actually read the #remarks in the disavow file. Since its manual penalty should I include them anyway since it's possible that a human could look it over? If anyone who has submitted a reconsideration request for unnatural links can comment on their use or non use of #remarks and the result that would be very helpful. You can tell that Google wants an effort to be made that the site owners are contacted. What is the best way to document that? In the reconsideration request?: The disavow file? or both.
Intermediate & Advanced SEO | | KentH0 -
P.O Box VS. Actual Address
We have a website (http://www.delivertech.ca) that uses a P.O Box number versus an actual address as their "location". Does this affect SEO? Is it better to use an actual address? Thanks.
Intermediate & Advanced SEO | | Web3Marketing870 -
URL Parameters Settings in WMT/Search Console
On an large ecommerce site the main navigation links to URLs that include a legacy parameter. The parameter doesn’t actually seem to do anything to change content - it doesn’t narrow or specify content, nor does it currently track sessions. We’ve set the canonical for these URLs to be without the parameter. (We did this when we started seeing that Google was stripping out the parameter in the majority of SERP results themselves.) We’re trying to best strategize on how to set the parameters in WMT (search console). Our options are to set to: 1. No: Doesn’t affect page content’ - and then the Crawl field in WMT is auto-set to ‘Representative URL’. (Note, that it's unclear what ‘Representative URL’ is defined as. Google’s documentation suggests that a representative URL is a canonical URL, and we've specifically set canonicals to be without the parameter so does this contradict? ) OR 2. ‘Yes: Changes, reorders, or narrows page content’ And then it’s a question of how to instruct Googlebot to crawl these pages: 'Let Googlebot decide' OR 'No URLs'. The fundamental issue is whether the parameter settings are an index signal or crawl signal. Google documents them as crawl signals, but if we instruct Google not to crawl our navigation how will it find and pass equity to the canonical URLs? Thoughts? Posted by Susan Schwartz, Kahena Digital staff member
Intermediate & Advanced SEO | | AriNahmani0 -
Intro to programming/coding for seo
Hello, I am currently a SEO and am looking for an Intro to programming/coding course to help me implement various technical SEO tasks for my clients and the business-as the programming dept will not help me, as they do not see the value of SEO. Could someone pls recommend an online course that would introduce me to basic concepts and also specifically, the information that would help me to enhance our SEO? I would also like to better understand APIs. Thanks so much in advance for your help! Lauren
Intermediate & Advanced SEO | | lfrazer1 -
Block subdomain directory in robots.txt
Instead of block an entire sub-domain (fr.sitegeek.com) with robots.txt, we like to block one directory (fr.sitegeek.com/blog).
Intermediate & Advanced SEO | | gamesecure
'fr.sitegeek.com/blog' and 'wwww.sitegeek.com/blog' contain the same articles in one language only labels are changed for 'fr' version and we suppose that duplicate content cause problem for SEO. We would like to crawl and index 'www.sitegee.com/blog' articles not 'fr.sitegeek.com/blog'. so, suggest us how to block single sub-domain directory (fr.sitegeek.com/blog) with robot.txt? This is only for blog directory of 'fr' version even all other directories or pages would be crawled and indexed for 'fr' version. Thanks,
Rajiv0 -
Which is better /section/ or section/index.php?
I have noticed that Google has started to simply link to /section/ as opposed to /section/index.php and I haven't changed any canonical tags etc. I have looked at my pages moz authority for the two /section/ = 28/100
Intermediate & Advanced SEO | | TimHolmes
/section/index.php = 42/100 How would I go about transferring the authority to /section/ from /section/index.php to hopefully help me in my organic serp positions etc. Any insight would be great 🐵0 -
How do I handle this 301/indexing mess?
I'm working on a client's site and noticed a brisk drop in rankings. In doing some digging I found that the homepage (domain.com) is 301'd to domain.com/home.html. Here's my problem/questions: 1. domain.com is indexed by Google 2. domain.com/home.html is not indexed by Google 3. both domains have some healthy linking 4. Is the fact that domain.com/home.html impacting rankings? 5. How do carefully handle this situation (ex. redirect domain.com/home.html back to domain.com?) 6. See the attached jpeg for a visual representation of my debacle. hcIiPAs
Intermediate & Advanced SEO | | rhoadesjohn0 -
404 Errors with my RSS Feed/sitemap
In my google webmasters I just started getting 404 errors that I'm not sure how to redirect. I'm getting quite a few that are ending in /feed/ for instance /nyc-accident-injury/feed/
Intermediate & Advanced SEO | | jsmythd
contact-us-thank-you/feed/ and then also a problem with my sitemap I guess? With /site-map/?postsort=tags The domain is pulversthompson.com0