Robots.txt questions...
-
All,
My site is rather complicated, but I will try to break down my question as simply as possible.
I have a robots.txt document in the root level of my site to disallow robot access to /_system/, my CMS. This looks like this:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/I have another robots.txt file in another level down, which is my holiday database - www.mysite.com/holiday-database/ - this is to disallow access to /holiday-database/ControlPanel/, my database CMS. This looks like this:
**User-agent: ***
Disallow: /ControlPanel/Am I correct in thinking that this file must also be in the root level, and not in the /holiday-database/ level? If so, should my new robots.txt file look like this:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/Or, like this:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/
Disallow: /ControlPanel/Thanks in advance.
Matt
-
Good answer Yannick.
here are some resources:
http://www.free-seo-news.com/all-about-robots-txt.htm
http://www.robotstxt.org/robotstxt.html
Good luck
-
Cheers gents.
-
Like:
# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism**User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/Search engines typically only look in the root of your domain to find robots.txt and sitemap.xml files.
-
Hey Matt
The first of your options looks right and google and other engines look for the robots.txt file in the site root rather than for each directory.
If you had a reason for not wanting that info in the root robots.txt file you can always use the robots meta tag on the pages in a given directory.
Few useful links:
Robots.txt
http://www.google.com/support/webmasters/bin/answer.py?answer=156449&&hl=enRobots Meta Tag
http://www.google.com/support/webmasters/bin/answer.py?answer=93710Marcus
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Subdomain Question
Having a difficult time on our site and looking for some advice. Our site pages are indexed perfectly, however, we have a subdomain where we have all of our images and PDF's. We only have the main domain set-up in Search Console with our sitemap. We can't seem to get any of our images indexed by Google that are in the subdomain however all the PDF's are indexed. My thought is to add the subdomain to SC and create a new sitemap that is just for the subdomain. Assuming we are not blocking any folders or files with our robots.txt can anyone think of any other reasons why the images wouldn't get indexed.
Technical SEO | | cbathd
Thanks0 -
Little confused regarding robots.txt
Hi there Mozzers! As a newbie, I have a question that what could happen if I write my robots.txt file like this... User-agent: * Allow: / Disallow: /abc-1/ Disallow: /bcd/ Disallow: /agd1/ User-agent: * Disallow: / Hope to hear from you...
Technical SEO | | DenorL0 -
.htaccess Question
Hi,I have a website www.contractor-accounts.co.uk that has an .htaccess file that strips .php and forces a closing brace /. The site is now over 6 months old and still has a very low ranking with MOZ also rating the site as DA/PA = 1 which seems to indicate some sort of issue with the website. Can anyone offer any suggestions as to why this site is ranking poorly as much of the onpage SEO has been completed to a level of 90%+ for specific keyterms so I'm probably either looking at routing of the framework of so other technical SEO issues possibly? Any help much apreciated... <ifmodule mod_rewrite.c=""><ifmodule mod_negotiation.c="">Options -MultiViews</ifmodule> RewriteEngine On # Redirect Trailing Slashes...
Technical SEO | | ecrmeuro
# RewriteRule ^(.)/$ /$1 [L,R=301]
RewriteCond %{REQUEST_URI} /+[^.]+$
RewriteRule ^(.+[^/])$ %{REQUEST_URI}/ [R=301,L]
# Redirect non-WWW to WWW...
RewriteCond %{HTTP_HOST} ^contractor-accounts.co.uk [NC]
RewriteRule ^(.)$ http://www.contractor-accounts.co.uk/$1 [L,R=301] # Handle Front Controller...
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ index.php [L]</ifmodule>0 -
.htaccess redirect question
Hi guys and girls Please forgive me for being an apache noob, but I've been trawling for a while now and i can't seem to find a definitive guide for my current scenario. I've walked into a but of a cluster$%*! of a job, to rescue a horribly set up site. One of many, many problems is that they have 132 302redirects set up. Some of these are identical pages but http-https, others are the same but https-http and some are redirects to different content pages with http-http. A uniform redirecting of http to https is not an option so I'm looking to find out the best practice for reconfiguring these 302s to 301s within .htaccess? Thanks in advance 🙂
Technical SEO | | craig.gto0 -
Google insists robots.txt is blocking... but it isn't.
I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch?
Technical SEO | | ahockley0 -
Keyword density question.
For instance, if the keyword I'm targeting on a specific page is "New Orleans", the Keyword is everywhere it's supposed to be, title, meta, content, internal links, etc, .... So when I check my most relative key words with different tools, it always breaks the word up like: new - 12 times 2.3% orleans - 12 times 2.3% Should I try to fix this? or is this normal? and does google view this as 1 keyword when evaluating my site?
Technical SEO | | Nola5040 -
Is my robots.txt file working?
Greetings from medieval York UK 🙂 Everytime to you enter my name & Liz this page is returned in Google:
Technical SEO | | Nightwing
http://www.davidclick.com/web_page/al_liz.htm But i have the following robots txt file which has been in place a few weeks User-agent: * Disallow: /york_wedding_photographer_advice_pre_wedding_photoshoot.htm Disallow: /york_wedding_photographer_advice.htm Disallow: /york_wedding_photographer_advice_copyright_free_wedding_photography.htm Disallow: /web_page/prices.htm Disallow: /web_page/about_me.htm Disallow: /web_page/thumbnails4.htm Disallow: /web_page/thumbnails.html Disallow: /web_page/al_liz.htm Disallow: /web_page/york_wedding_photographer_advice.htm Allow: / So my question is please... "Why is this page appearing in the SERPS when its blocked in the robots txt file e.g.: Disallow: /web_page/al_liz.htm" ANy insights welcome 🙂0