Robots.txt file question? NEver seen this command before
-
Hey Everyone!
Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant).
the command line is as follows:
Disallow: /*?*
I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me
Any help would be greatly appreciated!
Thanks, Rob
-
I don't think this is correct.
? is an attempt at using a RegEx in Robots file which I don't think works.
Further, if it was a properly formed regex, it would be ?
- is a special character for the user agent to mean all. For the disallow line, I believe you have to use a specific directory or page.
http://www.robotstxt.org/robotstxt.html
I could be wrong, but the info on this site has been my understanding from the past too.
-
It depends on how your site is structured.
For example if you have a page at
http://www.yourdomain.com/products.php
and this shows different things based on the parameter, like:
http://www.yourdomain.com/products.php?type=widgets
You will want to get rid of this line in your robots.txt
However if the parameter(s) doesn't change the content on the page, you can leave it in.
-
Thanks Ryan and Ryan! I'm just unfamiliar with this command set in the robots file, and getting settled into the company (5 weeks).. so I am still learning the site's structure and arch. With it all being new to me with limitations I am seeing from the CMS side, I was wondering if this might have been causing crawl issues for Bing and or Yahoo... I'm trying to gauge where we might be experiencing problems with the sites crawl functions.
-
Its not a bad idea in the robots.txt, but unless you are 100% confidant that you wont block something that you really want, i would consider just handling unwanted parameters and pages through the new Google Webmaster url handling toolset. that way you have more control over which ones do and dont get blocked.
-
So, for this parameter, should I keep it in the robots file?
-
Its preventing spiders from crawling pages with parameters in the URL. For example when you search on google you'll see a URL like so:
http://www.google.com/search?q=seo
This passes the parameter of q with a value of 'seo' to the page at google.com for it to work its magic with. This is almost definitely a good thing, unless the only way to access some content on your site is via URL parameters.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I have two robots.txt pages for www and non-www version. Will that be a problem?
There are two robots.txt pages. One for www version and another for non-www version though I have moved to the non-www version.
Technical SEO | | ramb0 -
Duplicate content question...
I have a high duplicate content issue on my website. However, I'm not sure how to handle or fix this issue. I have 2 different URLs landing to the same page content. http://www.myfitstation.com/tag/vegan/ and http://www.myfitstation.com/tag/raw-food/ .In this situation, I cannot redirect one URL to the other since in the future I will probably be adding additional posts to either the "vegan" tag or the "raw food tag". What is the solution in this case? Thank you
Technical SEO | | myfitstation0 -
URL Structure Question
We are building a job board website that will have a decent amount of "career resources" type content and want to make sure we set up our url structure correctly. After researching on Google and here I have an idea how to structure it but would like some insight if we are on the right track. We are using Wordpress for the content part of our website. We will have about 5 content categories (like resume-tips, job-interviews, job-search etc.) The two options we are considering; www.domain.com/career-resources/index.html As content start page www.domain.com/career-resources/resume-tips/index.html category start page www.domain.com/career-resources/resume-tips/top-5-resume-mistakes.html article name is the /career-resources/ folder really needed or can we go something like; www.domain.com/career-resources/index.html As content start page www.domain.com/resume-tips/index.html category start page www.domain.com/resume-tips/top-5-resume-mistakes.html article name Are we on the right track... and is one way better for SEO that the other? Thanks! Shaun
Technical SEO | | aactive0 -
Windows IIS 7 Redirect Question
I want to redirect the following 4 pages to the home page: http://www.phbalancedpool.com/pool-repair/pool_repair_arizona.html http://www.phbalancedpool.com/About%20Pool%20Cleaning%20Arizona/About_Page_Pool_Cleaning_Arizona.html http://www.phbalancedpool.com/specials/Pool%20Cleaning%20and%20Pool%20Repair%20Specials.html http://www.phbalancedpool.com/service-areas-in-arizona/Chandler_Gilbert_Mesa_Queen%20Creek_San%20Tan%20Valley.html This is what I am currently using for my Web.config file: <configuration></configuration> <match url=".*"></match> <add input="{HTTP_HOST}" pattern="^phbalancedpool.com$"></add> <action type="Redirect" url="http://www.phbalancedpool.com/{R:0}" <="" span="">redirectType="Permanent" /></action> <location path="pool-repair/pool_repair_arizona.html"></location> <location path="About%20Pool%20Cleaning%20Arizona/About_Page_Pool_Cleaning_Arizona.html"></location> <location path="specials/Pool%20Cleaning%20and%20Pool%20Repair%20Specials.html"></location> <location path="service-areas-in-arizona/Chandler_Gilbert_Mesa_Queen%20Creek_San%20Tan%20Valley.html"></location> Only the first one is actually redirecting and I can't figure out why. What do I need to do to fix this?
Technical SEO | | JordanJudson0 -
Track PDF files downloaded from my site
I came across this code for tracking PDF files [1. map.pdf ( name of PDF file ) and files is the folder name. Am i right ? 2. What shall i be able to track using the code given above ? a ) No. of clicks on links or how many persons downloaded the PDF files ? 3. Where in Google this report will be visible ? Thanks a lot.](http://www.example.com/files/map.pdf)
Technical SEO | | seoug_20050 -
Robots.txt versus sitemap
Hi everyone, Lets say we have a robots.txt that disallows specific folders on our website, but a sitemap submitted in Google Webmaster Tools that lists content in those folders. Who wins? Will the sitemap content get indexed even if it's blocked by robots.txt? I know content that is blocked by robot.txt can still get indexed and display a URL if Google discovers it via a link so I'm wondering if that would happen in this scenario too. Thanks!
Technical SEO | | anthematic0 -
Yoast canonical SEO question
Hi I've installed Yoasts SEO plugin. I've just set it up as a campaign in SEOMOZ pro and i now see 14 notices about rel=canonical. I haven't added the rel=canonical myself and is in connection with the Yoast code on the site. Why does it do that and should i do something about it?
Technical SEO | | infocell0 -
Does RogerBot read URL wildcards in robots.txt
I believe that the Google and Bing crawlbots understand wildcards for the "disallow" URL's in robots.txt - does Roger?
Technical SEO | | AspenFasteners0