Robots.txt questions...
-
All,
My site is rather complicated, but I will try to break down my question as simply as possible.
I have a robots.txt document in the root level of my site to disallow robot access to /_system/, my CMS. This looks like this:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/I have another robots.txt file in another level down, which is my holiday database - www.mysite.com/holiday-database/ - this is to disallow access to /holiday-database/ControlPanel/, my database CMS. This looks like this:
**User-agent: ***
Disallow: /ControlPanel/Am I correct in thinking that this file must also be in the root level, and not in the /holiday-database/ level? If so, should my new robots.txt file look like this:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/Or, like this:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/
Disallow: /ControlPanel/Thanks in advance.
Matt
-
Good answer Yannick.
here are some resources:
http://www.free-seo-news.com/all-about-robots-txt.htm
http://www.robotstxt.org/robotstxt.html
Good luck
-
Cheers gents.
-
Like:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/Search engines typically only look in the root of your domain to find robots.txt and sitemap.xml files.
-
Hey Matt
The first of your options looks right and google and other engines look for the robots.txt file in the site root rather than for each directory.
If you had a reason for not wanting that info in the root robots.txt file you can always use the robots meta tag on the pages in a given directory.
Few useful links:
Robots.txt
http://www.google.com/support/webmasters/bin/answer.py?answer=156449&&hl=enRobots Meta Tag
http://www.google.com/support/webmasters/bin/answer.py?answer=93710Marcus
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking in Robots.txt and the re-indexing - DA effects?
I have two good high level DA sites that target the US (.com) and UK (.co.uk). The .com ranks well but is dormant from a commercial aspect - the .co.uk is the commercial focus and gets great traffic. Issue is the .com ranks for brand in the UK - I want the .co.uk to rank for brand in the UK. I can't 301 the .com as it will be used again in the near future. I want to block the .com in Robots.txt with a view to un-block it again when I need it. I don't think the DA would be affected as the links stay and the sites live (just not indexed) so when I unblock it should be fine - HOWEVER - my query is things like organic CTR data that Google records and other factors won't contribute to its value. Has anyone ever blocked and un-blocked and whats the affects pls? All answers greatly received - cheers GB
Technical SEO | | Bush_JSM0 -
Robots.txt - "File does not appear to be valid"
Good afternoon Mozzers! I've got a weird problem with one of the sites I'm dealing with. For some reason, one of the developers changed the robots.txt file to disavow every site on the page - not a wise move! To rectify this, we uploaded the new robots.txt file to the domain's root as per Webmaster Tool's instructions. The live file is: User-agent: * (http://www.savistobathrooms.co.uk/robots.txt) I've submitted the new file in Webmaster Tools and it's pulling it through correctly in the editor. However, Webmaster Tools is not happy with it, for some reason. I've attached an image of the error. Does anyone have any ideas? I'm managing another site with the exact same robots.txt file and there are no issues. Cheers, Lewis FNcK2YQ
Technical SEO | | PeaSoupDigital0 -
Subdomain/subfolder question
Hi community, Let's say I have a men's/women's clothing website. Would it be better to do clothing.com/mens and clothing.com/womens OR mens.clothing.com and womens.clothing.com? I understand Moz's stance on blogs that it should be clothing.com/blog, but wanted to ask for this different circumstance. Thanks for your help!
Technical SEO | | IceIcebaby0 -
Robots.txt
I have a client who after designer added a robots.txt file has experience continual growth of urls blocked by robots,tx but now urls blocked (1700 aprox urls) has surpassed those indexed (1000). Surely that would mean all current urls are blocked (plus some extra mysterious ones). However pages still listing in Google and traffic being generated from organic search so doesnt look like this is the case apart from the rather alarming webmaster tools report any ideas whats going on here ? cheers dan
Technical SEO | | Dan-Lawrence0 -
Another 301 redirect question - penalty?
Good Morning, We have 2 sites have images and minimal text on them. The images have links that point to a 3<sup>rd</sup> site that facilitates eCommerce. Question: If we 301 redirect these sites permanently to yet a 4<sup>th</sup> site… 1) Does it violate any G’s guidelines 2) Should we delete the links embedded in the images (as they point to the 3<sup>rd</sup> site) Thanks
Technical SEO | | Prime850 -
What are your thoughts on security of placing CMS-related folders in a robots.txt file?
So I was just about to add a whole heap of CMS-related folders to my robots.txt file to exclude them from search, and thought "hey, I'm publicly telling people where my admin folders are"...surely that's not right?! Should I leave them out of the robots.txt file, and hope for the best that they never get indexed? Should I use noindex meta data on every page? What are people's thoughts? Thanks, James PS. I know this is similar to lots of other discussions around meta noindex vs. robots.txt, but I'm after specific thoughts around the security aspect of listing your admin folders in a robots.txt file...
Technical SEO | | James-Distinction0 -
URL Structure Question
Hey folks, I have a weird problem and currently no idea how to fix it. We have a lot of pages showing up as duplicates although they are the same page, the only difference is the url structure. They seem to show up like: http://www.example.com/page/ and http://www.example.com/page What would I need to do to force the URLs into one format or the other to avoid having that one page counting as two? The same issue pops up with upper and lower case: http://www.example.com/Page and http://www.example.com/page Is there any solution to this or would I need to forward them with 301s or similar? Thanks, Mike
Technical SEO | | Malarowski0 -
Robots.txt
should I add anything else besides User-Agent: * to my robots.txt file? http://melo4.melotec.com:4010/
Technical SEO | | Romancing0