Robots.txt questions...
-
All,
My site is rather complicated, but I will try to break down my question as simply as possible.
I have a robots.txt document in the root level of my site to disallow robot access to /_system/, my CMS. This looks like this:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/I have another robots.txt file in another level down, which is my holiday database - www.mysite.com/holiday-database/ - this is to disallow access to /holiday-database/ControlPanel/, my database CMS. This looks like this:
**User-agent: ***
Disallow: /ControlPanel/Am I correct in thinking that this file must also be in the root level, and not in the /holiday-database/ level? If so, should my new robots.txt file look like this:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/Or, like this:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/
Disallow: /ControlPanel/Thanks in advance.
Matt
-
Good answer Yannick.
here are some resources:
http://www.free-seo-news.com/all-about-robots-txt.htm
http://www.robotstxt.org/robotstxt.html
Good luck
-
Cheers gents.
-
Like:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/Search engines typically only look in the root of your domain to find robots.txt and sitemap.xml files.
-
Hey Matt
The first of your options looks right and google and other engines look for the robots.txt file in the site root rather than for each directory.
If you had a reason for not wanting that info in the root robots.txt file you can always use the robots meta tag on the pages in a given directory.
Few useful links:
Robots.txt
http://www.google.com/support/webmasters/bin/answer.py?answer=156449&&hl=enRobots Meta Tag
http://www.google.com/support/webmasters/bin/answer.py?answer=93710Marcus
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt error
Moz Crawler is not able to access the robots.txt due to server error. Please advice on how to tackle the server error.
Technical SEO | | Shanidel0 -
Robots.txt Syntax for Dynamic URLs
I want to Disallow certain dynamic pages in robots.txt and am unsure of the proper syntax. The pages I want to disallow all include the string ?Page= Which is the proper syntax?
Technical SEO | | btreloar
Disallow: ?Page=
Disallow: ?Page=*
Disallow: ?Page=
Or something else?0 -
Sub Domains and Robot.txt files...
This is going to seem like a stupid question, and perhaps it is but I am pulling out what little hair I have left. I have a sub level domain on which a website sits. The Main domain has a robots.txt file that disallows all robots. It has been two weeks, I submitted the sitemap through webmaster tools and still, Google has not indexed the sub domain website. My question is, could the robots.txt file on the main domain be affecting the crawlability of the website on the sub domain? I wouldn't have thought so but I can find nothing else. Thanks in advance.
Technical SEO | | Vizergy0 -
Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?
I've got several URL's that I need to disallow in my robots.txt file. For example, I've got several documents that I don't want indexed and filters that are getting flagged as duplicate content. Rather than typing in thousands of URL's I was hoping that wildcards were still valid.
Technical SEO | | mkhGT0 -
Another Penalty Question - Should I Start from Scratch?
I've seen many questions on google penalties recently. Not really sure where to go from here. I realised a year or so we would be living on borrowed time with our link building methods. We have been really successful in the past and are keen to build a site that has a bit more longevity. We have not received a warning from google but have lost pretty much all of our ranking for everything. My question is with our backlink profile as it is. Building links from various blog networks for the past 3 years. Is it just worth rebranding and starting from scratch rather than trying to get over a million links removed? We have a lot of content that I guess could be classed as spam. Should I really remove all of the content? or leave it running as we are still getting some traffic from other marketing activities. Or should I just get a new domain and transfer all the decent content?
Technical SEO | | DaveDawson2 -
OK to block /js/ folder using robots.txt?
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1
Technical SEO | | AndreVanKets
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0 -
Canonical tags/wordpress permalink question
Need help: Do canonical tags do the exact same thing that wordpress already does with it’s permalink function? Or are these 2 separate things? thank you.
Technical SEO | | bonnierSEO1 -
Robots.txt file question? NEver seen this command before
Hey Everyone! Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant). the command line is as follows: Disallow: /*?* I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me 😞 Any help would be greatly appreciated! Thanks, Rob
Technical SEO | | RobMay0