Robots.txt file question? NEver seen this command before
-
Hey Everyone!
Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant).
the command line is as follows:
Disallow: /*?*
I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me
Any help would be greatly appreciated!
Thanks, Rob
-
I don't think this is correct.
? is an attempt at using a RegEx in Robots file which I don't think works.
Further, if it was a properly formed regex, it would be ?
- is a special character for the user agent to mean all. For the disallow line, I believe you have to use a specific directory or page.
http://www.robotstxt.org/robotstxt.html
I could be wrong, but the info on this site has been my understanding from the past too.
-
It depends on how your site is structured.
For example if you have a page at
http://www.yourdomain.com/products.php
and this shows different things based on the parameter, like:
http://www.yourdomain.com/products.php?type=widgets
You will want to get rid of this line in your robots.txt
However if the parameter(s) doesn't change the content on the page, you can leave it in.
-
Thanks Ryan and Ryan! I'm just unfamiliar with this command set in the robots file, and getting settled into the company (5 weeks).. so I am still learning the site's structure and arch. With it all being new to me with limitations I am seeing from the CMS side, I was wondering if this might have been causing crawl issues for Bing and or Yahoo... I'm trying to gauge where we might be experiencing problems with the sites crawl functions.
-
Its not a bad idea in the robots.txt, but unless you are 100% confidant that you wont block something that you really want, i would consider just handling unwanted parameters and pages through the new Google Webmaster url handling toolset. that way you have more control over which ones do and dont get blocked.
-
So, for this parameter, should I keep it in the robots file?
-
Its preventing spiders from crawling pages with parameters in the URL. For example when you search on google you'll see a URL like so:
http://www.google.com/search?q=seo
This passes the parameter of q with a value of 'seo' to the page at google.com for it to work its magic with. This is almost definitely a good thing, unless the only way to access some content on your site is via URL parameters.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Set Up htaccess File
Looking for expert help (willing to pay) to set up a proper htaccess file. I'm having an issue as the site has a subdomain at secure.domain.com and has php extensions there. I tried a couple recommended code sets but it seems to be a mess. The site is working properly but this may be causing rankings issues. It's coded in pure HTML and PHP, no Wordpress stuff.
Technical SEO | | execubob
The delete www causes the secure side to fail. The delete html extensions causes the php extensions to fail.0 -
Hi - I have a question about IP addresses
- would it hurt link juice to host a blog on a different server to the rest of your website? I have a web host saying they can't run Wordpress as they won't support PHP for "security reasons" - one solution would be to set up Wordpress on a different server and redirect domain.com/blog there (I presume this is do-able?). But I don't know if that affects the SEO adversely?
Technical SEO | | abisti21 -
GWT returning 200 for robots.txt, but it's actually returning a 404?
Hi, Just wondering if anyone has had this problem before. I'm just checking a client's GWT and I'm looking at their robots.txt file. In GWT, it's saying that it's all fine and returns a 200 code, but when I manually visit (or click the link in GWT) the page, it gives me a 404 error. As far as I can tell, the client has made no changes to the robots.txt recently, and we definitely haven't either. Has anyone had this problem before? Thanks!
Technical SEO | | White.net0 -
Robots.txt
I have a client who after designer added a robots.txt file has experience continual growth of urls blocked by robots,tx but now urls blocked (1700 aprox urls) has surpassed those indexed (1000). Surely that would mean all current urls are blocked (plus some extra mysterious ones). However pages still listing in Google and traffic being generated from organic search so doesnt look like this is the case apart from the rather alarming webmaster tools report any ideas whats going on here ? cheers dan
Technical SEO | | Dan-Lawrence0 -
Googlebot does not obey robots.txt disallow
Hi Mozzers! We are trying to get Googlebot to steer away from our internal search results pages by adding a parameter "nocrawl=1" to facet/filter links and then robots.txt disallow all URLs containing that parameter. We implemented this late august and since that, the GWMT message "Googlebot found an extremely high number of URLs on your site", stopped coming. But today we received yet another. The weird thing is that Google gives many of our nowadays robots.txt disallowed URLs as examples of URLs that may cause us problems. What could be the reason? Best regards, Martin
Technical SEO | | TalkInThePark0 -
Summarize your question.Sitemap blocking or not blocking that is the question?
Hi from wet & overcast wetherby UK 😞 Ones question is this... " Is the sitemap plus boxes blocking bots ie they cant pass on this page http://www.langleys.com/Site-Map.aspx " Its just the + boxes that concern me, i remeber reading somewherte javascript nav can be toxic. Is there a way to test javascript nav set ups and see if they block bots or not? Thanks in advance 🙂
Technical SEO | | Nightwing0 -
Pagination question
I have a website http://www.example.com with pagination series starting with page1.html upto page10.html. With backlinks to some of the pages ( page1.html, page2.html----page7.html). If i include rel="next" and rel="prev" on page1.html to page10.html pages. Will value of those links will be transfered to http://www.example.com This is what i interpret from http://bit.ly/mUOrn2 Am i right ?
Technical SEO | | seoug_20050 -
Robots.txt
Hi everyone, I just want to check something. If you have this entered into your robots.txt file: User-agent: *
Technical SEO | | PeterM22
Disallow: /fred/ This wouldn't block /fred-review/ from being crawled would it? Thanks0