Robots.txt
-
Hello Everyone,
The problem I'm having is not knowing where to have the robots.txt file on our server.
We have our main domain (company.com) with a robots.txt file in the root of the site, but we also have our blog (company.com/blog) where were trying to disallow certain directories from being crawled for SEO purposes...
Would having the blog in the sub-directory still need its own robots.txt? or can I reference the directories i don't want crawled within the blog using the root robots.txt file?
Thanks for your insight on this matter.
-
Thanks John & Naghimiac,
Both your responses helped me understand the robots.txt file and the proper ways of implementing it.
Thanks again for all your help!
-
The bots won't care about that. If you have your site on www.company.com, your robots.txt will reside at www.company.com/robots.txt, and its directives will apply to any pages living under www.company.com. When a bot comes to www.company.com/blog, it'll look for the robots.txt at www.company.com/robots.txt to see if it's allowed to crawl there. It won't look in a subdirectory. Robots.txt always resides on the root level.
If you had your blog at blog.company.com instead of company.com/blog, then you would have to have a separate robots.txt at blog.company.com/robots.txt. As you have your blog in a subdirectory rather than a subdomain, one robots.txt is all you need.
-
Thanks Naghimiac,
Your link is very resourceful, but on the other hand I was looking for something more specific as to blogs being in a sub-directory. I know by default WordPress has its own .htaccess file in the root of the blog directory and I have a separate .htaccess file in the root of my main domain. This is why I was thinking it needed its own robots.txt file.
Is the robot.txt known for only being in the root level of the main directory even if a blog is in a sub-directory?
-
You only need a robot file at your main directory and it is used for the whole website.
If you want to have more info's about robots.txt, there is an very good post from Lindsay: http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
With this I think it will be easier for you to go pro in robots files. Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No: 'noindex' detected in 'robots' meta tag
Pages on my site show No: 'noindex' detected in 'robots' meta tag. However, when I inspect the pages html, it does not show noindex. In fact, it shows index, follow. Majority of pages show the error and are not indexed by Google...Not sure why this is happening. The page below in search console shows the error above...
Technical SEO | | Sean_White_Consult0 -
How to stop robots.txt restricting access to sitemap?
I'm working on a site right now and having an issue with the robots.txt file restricting access to the sitemap - with no web dev to help, I'm wondering how I can fix the issue myself? The robots.txt page shows User-agent: * Disallow: / And then sitemap: with the correct sitemap link
Technical SEO | | Ad-Rank0 -
Guys & Gals anyone know if urllist.txt is still used?
I'm using a tool which generates urllist.txt and looking on the SEO Forums it seems that Yahoo used to use this. What I'd like to know is is it still used anywhere and should we have it on the site?
Technical SEO | | danwebman0 -
Can't find mistake in robots.txt
Hi all, we recently filled our robots.txt file to prevent some directories from crawling. Looks like: User-agent: * Disallow: /Views/ Disallow: /login/ Disallow: /routing/ Disallow: /Profiler/ Disallow: /LILLYPROFILER/ Disallow: /EventRweKompaktProfiler/ Disallow: /AccessIntProfiler/ Disallow: /KellyIntProfiler/ Disallow: /lilly/ now, as Google Webmaster Tools hasn't updated our robots.txt yet, I checked our robots.txt in some ckeckers. They tell me that the User agent: * contains an error. **Example:** **Line 1: Syntax error! Expected <field>:</field> <value></value> 1: User-agent: *** **`I checked other robots.txt written the same way --> they work,`** accordign to the checkers... **`Where the .... is the mistake???`** ```
Technical SEO | | accessKellyOCG0 -
How ro write a robots txt file to point to your site map
Good afternoon from still wet & humid wetherby UK... I want to write a robots text file that instruct the bots to index everything and give a specific location to the sitemap. The sitemap url is:http://business.leedscityregion.gov.uk/CMSPages/GoogleSiteMap.aspx Is this correct: User-agent: *
Technical SEO | | Nightwing
Disallow:
SITEMAP: http://business.leedscityregion.gov.uk/CMSPages/GoogleSiteMap.aspx Any insight welcome 🙂0 -
Robots.txt question
What is this robots.txt telling the search engines? User-agent: * Disallow: /stats/
Technical SEO | | DenverKelly0 -
Can I Disallow Faceted Nav URLs - Robots.txt
I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls. So disallow: /category.html/? /category2.html/? /category3.html/*? To prevent the price faceted url from being cached: /category.html?price=1%2C1000
Technical SEO | | tylerfraser
and
/category.html?price=1%2C1000&product_material=88 Thanks!0 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050