Robots.txt file question? NEver seen this command before
-
Hey Everyone!
Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant).
the command line is as follows:
Disallow: /*?*
I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me
Any help would be greatly appreciated!
Thanks, Rob
-
I don't think this is correct.
? is an attempt at using a RegEx in Robots file which I don't think works.
Further, if it was a properly formed regex, it would be ?
- is a special character for the user agent to mean all. For the disallow line, I believe you have to use a specific directory or page.
http://www.robotstxt.org/robotstxt.html
I could be wrong, but the info on this site has been my understanding from the past too.
-
It depends on how your site is structured.
For example if you have a page at
http://www.yourdomain.com/products.php
and this shows different things based on the parameter, like:
http://www.yourdomain.com/products.php?type=widgets
You will want to get rid of this line in your robots.txt
However if the parameter(s) doesn't change the content on the page, you can leave it in.
-
Thanks Ryan and Ryan! I'm just unfamiliar with this command set in the robots file, and getting settled into the company (5 weeks).. so I am still learning the site's structure and arch. With it all being new to me with limitations I am seeing from the CMS side, I was wondering if this might have been causing crawl issues for Bing and or Yahoo... I'm trying to gauge where we might be experiencing problems with the sites crawl functions.
-
Its not a bad idea in the robots.txt, but unless you are 100% confidant that you wont block something that you really want, i would consider just handling unwanted parameters and pages through the new Google Webmaster url handling toolset. that way you have more control over which ones do and dont get blocked.
-
So, for this parameter, should I keep it in the robots file?
-
Its preventing spiders from crawling pages with parameters in the URL. For example when you search on google you'll see a URL like so:
http://www.google.com/search?q=seo
This passes the parameter of q with a value of 'seo' to the page at google.com for it to work its magic with. This is almost definitely a good thing, unless the only way to access some content on your site is via URL parameters.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using one robots.txt for two websites
I have two websites that are hosted in the same CMS. Rather than having two separate robots.txt files (one for each domain), my web agency has created one which lists the sitemaps for both websites, like this: User-agent: * Disallow: Sitemap: https://www.siteA.org/sitemap Sitemap: https://www.siteB.com/sitemap Is this ok? I thought you needed one robots.txt per website which provides the URL for the sitemap. Will having both sitemap URLs listed in one robots.txt confuse the search engines?
Technical SEO | | ciehmoz0 -
Google PR Rank Question (s)
Hi
Technical SEO | | damientown
My Google PR rank is still 1/10 (www.abouttownmarketing.com) after 12 weeks of daily SEO work building what I thought were quality back links. Does anyone know how often Google updates its PR rank? Also is it a linear measure or link adwords quality score is it exponential?
Many thanks
Damien0 -
Confirming Robots.txt code deep Directories
Just want to make sure I understand exactly what I am doing If I place this in my Robots.txt Disallow: /root/this/that By doing this I want to make sure that I am ONLY blocking the directory /that/ and anything in front of that. I want to make sure that /root/this/ still stays in the index, its just the that directory I want gone. Am I correct in understanding this?
Technical SEO | | cbielich0 -
No indexing url including query string with Robots txt
Dear all, how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt? Thanks!
Technical SEO | | HMK-NL0 -
Does Bing ignore robots txt files?
Bonjour from "Its a miracle is not raining" Wetherby Uk 🙂 Ok here goes... Why despite a robots text file excluding indexing to site http://lewispr.netconstruct-preview.co.uk/ is the site url being indexed in Bing bit not Google? Does bing ignore robots text files or is there something missing from http://lewispr.netconstruct-preview.co.uk/robots.txt I need to add to stop bing indexing a preview site as illustrated below. http://i216.photobucket.com/albums/cc53/zymurgy_bucket/preview-bing-indexed.jpg Any insights welcome 🙂
Technical SEO | | Nightwing0 -
Restricted by robots.txt does this cause problems?
I have restricted around 1,500 links which are links to retailers website and links that affiliate links accorsing to webmaster tools Is this the right approach as I thought it would affect the link juice? or should I take the no follow out of the restricted by robots.txt file
Technical SEO | | ocelot0 -
What is the sense of robots.txt?
Using robots.txt to prevent search engine from indexing the page is not a good idea. so what is the sense of robots.txt? just for attracting robots to crawl sitemap?
Technical SEO | | jallenyang0 -
How do I use the Robots.txt "disallow" command properly for folders I don't want indexed?
Today's sitemap webinar made me think about the disallow feature, seems opposite of sitemaps, but it also seems both are kind of ignored in varying ways by the engines. I don't need help semantically, I got that part. I just can't seem to find a contemporary answer about what should be blocked using the robots.txt file. For example, I have folders containing site comps for clients that I really don't want showing up in the SERPS. Is it better to not have these folders on the domain at all? There are also security issues I've heard of that make sense, simply look at a site's robots file to see what they are hiding. It makes it easier to hunt for files when they know the directory the files are contained in. Do I concern myself with this? Another example is a folder I have for my xml sitemap generator. I imagine google isn't going to try to index this or count it as content, so do I need to add folders like this to the disallow list?
Technical SEO | | SpringMountain0