What does Disallow: /french-wines/?* actually do - robots.txt
-
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?*
Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark?
Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL?
I think this has been done to block URLs containing query strings.
Thanks, Luke
-
Glad to help, Luke!
-
Thanks Logan for your help with this - much appreciated. Really helpful!
-
Disallow: /?* is the same thing as Disallow:/?, since the asterisk is a wildcard, both of those disallows prevent any URL that begins with /? from being crawled.
And yes, it is incredibly easy to disallow the wrong thing! The robots.txt tester in Search Console (under the Crawl menu) is very helpful for figuring out what a disallow will catch and what it will let by. I highly recommend testing any new disallows there before releasing them into the wild.
-
Thanks again Logan.
What would Disallow: /?* do because that is what the site I am looking at has implemented. Perhaps it works both ways around?
I imagine it's easy to disallow the wrong thing or possibly not disallow the right thing. Ugh.
-
Disallow: /*?
This disallow literally says to crawlers 'if a URL starts with a slash (all URLs) and has a parameter, don't crawl it'. The * is a wildcard that says anything between / and ? is applicable to the disallow.
It's very easy to disallow the wrong this especially in regards to parameters, for this reason I always do these 2 things rather than using robots.txt:
- Set the purpose of each parameter in Search Console - Go to Crawl > URL Parameters to configure for your site
- Self-referring canonicals - most people disallow URLs with parameters in robots.txt to prevent indexing, but this only prevents crawling. A self-referring canonical pointing to the root level of that URL will prevent indexing or URLs with parameters.
Hope that's helpful!
-
Thanks Logan - I was just reading: Disallow: /*? # block any URL that includes a ? (and thus a query string) - do you know why the ? comes before the * in this case?
-
Hi Luke,
You are correct that this was done to block URLs with parameters. However, since there's no wildcard (the asterisk) before the folder name, the URL would have to start with /french-wines/. This disallow is really only preventing crawling on the single URL www.yoursite.com/french-wines/ with any parameters appended.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Staging/Development Site Indexed?
So, my company's site has been pretty tough to try to get moving in the right direction on Google's SERPs. I had believed that it was mainly due to having a shortage of back links and a horrible home page load time. Everything else seems to be set up pretty well. I was messing around and used the site: Google search operator for our staging site. I found stage.site.com and a lot of our other staging pages in the search results. I have to think that this is the problem and causing a duplicate content penalty of the entire site. I guess I now need to 301 redirect the entire site? Has anyone every had this issue before and have fixed it? Thanks for any help.
Intermediate & Advanced SEO | | aua0 -
Are We Doing Link Building Right? Do Certain Links Actually Matter?
I've been thinking about this as I go through my daily link building activities for clients. Do we really know as much as we hope/think we do about how Google values inbound links, which links actually matter, and how much these link signals play into rankings? For example, does Google REALLY value the fact that a business is paying to sponsor a local sports team, or to join a local chamber? For local businesses, link building is rather difficult because they don't necessarily have the resources or ability to implement ongoing Content Marketing initiatives to earn links naturally. How can we be sure that the things we recommend actually make a difference? I had my family real estate business featured in almost a dozen articles as expert sources, with links from authoritative sites like Realtor.com and others. Does Google distinguish between a profile link on a site like Realtor.com vs. being featured as an expert source on home page news? Just second guessing a lot of this today. Anyone can to share thoughts and insights?
Intermediate & Advanced SEO | | RickyShockley0 -
URL categorization / subfolders
Hi Mozzers, We're currently in the process of a website redesign with new CMS and have the opportunity to change URL and structure. I would love some opinions as to what the best practise will be. A quick prerequisite, the website is entirely about France. French property, living, holidays, forum - everything. Therefore, we're unsure of the usage of the word France/French. Presently, we're running Classic ASP which allows for one subfolder then dynamic article ID. In my examples, I will take our activity holidays URL. At present this is /france-activity-holidays/DisplayArticle.asp?ID=12345. We know that DisplayArticle.asp?ID=12345 will simply become [article-title], however, its the preceding subfolders I would like some help with. Here are our thoughts on the options available. Can you please vote as to which you think is the best? /france-activity-holidays/ (one subfolder per category, as at present) /france/holidays/activity/ (always have a first subfolder with the word france) /holidays-to-france/activity-holidays/ (france in the primary subfolder) /holidays/activity-holidays-france/ (france in the secondary subfolder) /holidays/activity/ (because the whole website is about France, it is redundant to have /france/) /French-holidays/activity/ My gut feeling is either number 2 or 5. Concise, good for UX, OK for SEO. However, there is very little information around that is relevant to our sector. Thanks in advance! Matt
Intermediate & Advanced SEO | | Horizon0 -
Should I disallow via robots.txt for my sub folder country TLD's?
Hello, My website is in default English and Spanish as a sub folder TLD. Because of my Joomla platform, Google is listing hundreds of soft 404 links of French, Chinese, German etc. sub TLD's. Again, i never created these country sub folder url's, but Google is crawling them. Is it best to just "Disallow" these sub folder TLD's like the example below, then "mark as fixed" in my crawl errors section in Google Webmaster tools?: User-agent: * Disallow: /de/ Disallow: /fr/ Disallow: /cn/ Thank you, Shawn
Intermediate & Advanced SEO | | Shawn1240 -
Double Forward Slash in URL //
My client is using double forward slahes in URL like this "//" is this affecting SEO?
Intermediate & Advanced SEO | | yanaiguana1110 -
Authorship/Ranking when 2 separate authors
Hello, We've got a pdf with, on the front page "by author 1 and author 2" Author one published the PDF and sits at rank 3 Author 2 is us (on a completely different website than author 1) How can we rank highest, and can we use google authorship? Can we get our picture next to the article? Let me know if we need to convert to html for (1) and/or (2)
Intermediate & Advanced SEO | | BobGW0 -
Question about 301 redirect for trailing / ?
I am cleaning up a fairly large site. Some pages have a trailing slash on the end some don't. Some of the existing backlinks built used a trailing slash in the url and some didn't. We aren't concerned with picking a particular one but just want to get one set and stick to it from now on. I am wondering, would I clean this up within the same redirect in the htaccess file that takes care of the www and non www? example RewriteEngine On
Intermediate & Advanced SEO | | PEnterprises
RewriteBase /
RewriteCond %{HTTP_HOST} ^www.domain.com/ [NC]
RewriteRule ^(.*)$ http://domain.com$1 [L,R=301] I currently use that to redirect the www. to the non www as you can see. However here is what I was confused about. Would this code be enough to redirect ALL pages with a / to the ones without? or would I also need to add another code (so there is 2) to my htaccess like below? RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^domain.com/ [NC]
RewriteRule ^(.*)$ http://domain.com$1 [L,R=301] RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^www.domain.com/ [NC]
RewriteRule ^(.*)$ http://domain.com$1 [L,R=301] That way, now, even the non www pages with a trailing slash will redirect to the non www without the trailing slash. Hopefully you understand what I am getting at. I just want to redirect EVERYTHING to the non www WITHOUT a / Thank you Jake0 -
Link Juice / Java pop up
Hi all I am a bit unsure of something and would appreciate it if someone could clarify (without the sad trombone hinting that my question is stupid like the last time i asked a question) Our Newsletter was recently posted on a website and i am not sure if the link pointing back is actually passing link juice. When clicking the link, a Java pop up box appears saying "click here to go to authors site" I am wondering if this was implemented to avoid google passing its juice? Or if google can index the pop up and give us credit for the link? Please have a look at the article, and let me know what you guys think? http://www.bestholidaynews.com/adventure-and-activities/africa/our-top-3-overlanding-egypt-trips-2.html Thanks in advance Regards Greg
Intermediate & Advanced SEO | | AndreVanKets0