Robots.txt Syntax
-
Does the order of the robots.txt syntax matter in SEO?
For example (are there potential problems with this format):
User-agent: * Sitemap: Disallow: /form.htm Allow: / Disallow: /cgnet_directory
-
Rodrigo -
Thanks, and thanks for the follow-up. To be honest with you though...I have not seen or experienced anything about this. I tend to follow the suggested rules with code
So my answer is "I don't know". Anyone else know?
I also agree with you on the meta tags. Robots.txt is best used for disallowing folders and such, not pages. For instance, I might do a "Disallow: /admin" in the robots.txt file, but would never block a category page or something to that effect. If I wanted to remove it from the index, I'd also use the meta "noindex,follow" attribute. Good point!
-
Thanks John- good response. I think the biggest takeaway for me is to know that none of the "dis-order" above will actually cause errors in the file. However, I completely agree with your recommendations as to where the sitemap: should go, and why the allow parameter is unnecessary.
Last question, do you know if the blank line in-between the allow: and second disallow: parameter cause any issues?
side note for those using the robots.txt to block content, also consider the noindex,follow attribute in the META tag as an alternative to save some link value that those pages may be getting.
-
Rodrigo -
Good question. The syntax does in fact matter, though not necessarily for SEO rankings. It matters because if you screw up your robots.txt, you can inadvertently disallow your whole site (I did it last week. Not pretty. Blog post forthcoming).
To get to your question, it is usually best to put the "Sitemap: " line at the bottom of the robots.txt, but it is not required to have it there, so far as I know.
You do not need the Allow: / parameter, because if you leave it out, Google assumes that you want everything indexed except what is put in the "Disallow: " lines.
In your case, you are disallowing "http://www.site.com/form.htm" and everything in your cgnet_directory folder. If you want everything in these folders hidden from crawlers...you have done exactly what you need to do.
I'm still learning about this, so I'm open to any correction the rest of the community has.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Disallow: / in Search Console
Two days ago I found out through search console that my website's Robots.txt has changed to User-agent: *
Technical SEO | | RAN_SEO
Disallow: / When I check the robots.txt in the website it looks fine - I see its blocked just in search console( in the robots.txt tester). when I try to do fetch as google to the homepage I see its blocked. Any ideas why would robots.txt block my website? it was fine until the weekend. before that, in the last 3 months I saw I had blocked resources in the website and I brought back pages with fetch as google. Any ideas?0 -
HTTP Status showing up in opensiteexplorer top pages as blocked by robot.txt file
I am trying to find an answer to this question it has alot of url on this page with no data when i go into the data source and search for noindex or robot.txt but the site is visible in the search engines ?
Technical SEO | | ReSEOlve0 -
Is having no robots.txt file the same as having one and allowing all agents?
The site I am working on currently has no robots.txt file. However, I have just uploaded a sitemap and would like to point the robots.txt file to it. Once I upload the robots.txt file, if I allow access to all agents, is this the same as when the site had no robots.txt file at all; do I need to specify crawler access on can the robots.txt file just contain the link to the sitemap?
Technical SEO | | pugh0 -
Removal request for entire catalog. Can be done without blocking in robots?
Bunch of thin content (catalog) pages modified with "follow, noindex" few weeks ago. Site completely re-crawled and related cache shows that these pages were not indexed again. So it's good I suppose 🙂 But all of them are still in main Google index and shows up from time to time in SERPs. Will they eventually disappear or we need to submit removal request?Problem is we really don't want to add this pages into robots.txt (they are passing link juice down below to product pages)Thanks!
Technical SEO | | LocalLocal0 -
Help needed with robots.txt regarding wordpress!
Here is my robots.txt from google webmaster tools. These are the pages that are being blocked and I am not sure which of these to get rid of in order to unblock blog posts from being searched. http://ensoplastics.com/theblog/?cat=743 http://ensoplastics.com/theblog/?p=240 These category pages and blog posts are blocked so do I delete the /? ...I am new to SEO and web development so I am not sure why the developer of this robots.txt file would block pages and posts in wordpress. It seems to me like that is the reason why someone has a blog so it can be searched and get more exposure for SEO purposes. IS there a reason I should block any pages contained in wodrpress? Sitemap: http://www.ensobottles.com/blog/sitemap.xml User-agent: Googlebot Disallow: /*/trackback Disallow: /*/feed Disallow: /*/comments Disallow: /? Disallow: /*? Disallow: /page/
Technical SEO | | ENSO
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/plugins/ Disallow: /wp-content/themes/ Disallow: /trackback Disallow: /commentsDisallow: /feed0 -
No follow syntax
is ref=nofollow the same as rel=nofollow? in other words, does ref=nofollow not pass any link juice?
Technical SEO | | SoulSurfer80 -
Should I set up a disallow in the robots.txt for catalog search results?
When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?
Technical SEO | | JordanJudson0 -
Robots.txt File Redirects to Home Page
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/
Technical SEO | | kchandler0