Robots.txt Syntax
-
Does the order of the robots.txt syntax matter in SEO?
For example (are there potential problems with this format):
User-agent: * Sitemap: Disallow: /form.htm Allow: / Disallow: /cgnet_directory
-
Rodrigo -
Thanks, and thanks for the follow-up. To be honest with you though...I have not seen or experienced anything about this. I tend to follow the suggested rules with code
So my answer is "I don't know". Anyone else know?
I also agree with you on the meta tags. Robots.txt is best used for disallowing folders and such, not pages. For instance, I might do a "Disallow: /admin" in the robots.txt file, but would never block a category page or something to that effect. If I wanted to remove it from the index, I'd also use the meta "noindex,follow" attribute. Good point!
-
Thanks John- good response. I think the biggest takeaway for me is to know that none of the "dis-order" above will actually cause errors in the file. However, I completely agree with your recommendations as to where the sitemap: should go, and why the allow parameter is unnecessary.
Last question, do you know if the blank line in-between the allow: and second disallow: parameter cause any issues?
side note for those using the robots.txt to block content, also consider the noindex,follow attribute in the META tag as an alternative to save some link value that those pages may be getting.
-
Rodrigo -
Good question. The syntax does in fact matter, though not necessarily for SEO rankings. It matters because if you screw up your robots.txt, you can inadvertently disallow your whole site (I did it last week. Not pretty. Blog post forthcoming).
To get to your question, it is usually best to put the "Sitemap: " line at the bottom of the robots.txt, but it is not required to have it there, so far as I know.
You do not need the Allow: / parameter, because if you leave it out, Google assumes that you want everything indexed except what is put in the "Disallow: " lines.
In your case, you are disallowing "http://www.site.com/form.htm" and everything in your cgnet_directory folder. If you want everything in these folders hidden from crawlers...you have done exactly what you need to do.
I'm still learning about this, so I'm open to any correction the rest of the community has.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Role of Robots.txt and Search Console parameters settings
Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"?
Technical SEO | | LivDetrick0 -
301 redirect syntax for htaccess
I'm working on some htaccess redirects for a few stray pages and have come across a few different varieties of 301s that are confusing me a bit....Most sources suggest: Redirect 301 /pageA.html http://www.site.com/pageB.html or using some combination of: RewriteRule + RewriteCond + RegEx I've also found examples of: RedirectPermanent /pageA.html http://www.site.com/pageB.html I'm confused because our current htaccess file has quite a few (working) redirects that look like this: Redirect permanent /pageA.html http://www.site.com/pageB.html This syntax seems to work, but I'm yet to find another Redirect permanent in the wild, only examples of Redirect 301 or RedirectPermanent Is there any difference between these? Would I benefit at all from replacing Redirect permanent with Redirect 301?
Technical SEO | | SamKlep1 -
Adding your sitemap to robots.txt
Hi everyone, Best practice question: When adding your sitemap to your robots.txt file, do you add the whole sitemap at once or do you add different subcategories (products, posts, categories,..) separately? I'm very curious to hear your thoughts!
Technical SEO | | WeAreDigital_BE0 -
Should a login page for a payroll / timekeeping comp[any be no follow for robots.txt?
I am managing a Timekeeping/Payroll company. My question is about the customer login page. Would this typically be nofollow for robots?
Technical SEO | | donsilvernail0 -
Do I have a robots.txt problem?
I have the little yellow exclamation point under my robots.txt fetch as you can see here- http://imgur.com/wuWdtvO This version shows no errors or warnings- http://imgur.com/uqbmbug Under the tester I can currently see the latest version. This site hasn't changed URLs recently, and we haven't made any changes to the robots.txt file for two years. This problem just started in the last month. Should I worry?
Technical SEO | | EcommerceSite0 -
Guys & Gals anyone know if urllist.txt is still used?
I'm using a tool which generates urllist.txt and looking on the SEO Forums it seems that Yahoo used to use this. What I'd like to know is is it still used anywhere and should we have it on the site?
Technical SEO | | danwebman0 -
Can't find mistake in robots.txt
Hi all, we recently filled our robots.txt file to prevent some directories from crawling. Looks like: User-agent: * Disallow: /Views/ Disallow: /login/ Disallow: /routing/ Disallow: /Profiler/ Disallow: /LILLYPROFILER/ Disallow: /EventRweKompaktProfiler/ Disallow: /AccessIntProfiler/ Disallow: /KellyIntProfiler/ Disallow: /lilly/ now, as Google Webmaster Tools hasn't updated our robots.txt yet, I checked our robots.txt in some ckeckers. They tell me that the User agent: * contains an error. **Example:** **Line 1: Syntax error! Expected <field>:</field> <value></value> 1: User-agent: *** **`I checked other robots.txt written the same way --> they work,`** accordign to the checkers... **`Where the .... is the mistake???`** ```
Technical SEO | | accessKellyOCG0 -
How can I make Google Webmaster Tools see the robots.txt file when I am doing a .htacces redirec?
We are moving a site to a new domain. I have setup an .htaccess file and it is working fine. My problem is that Google Webmaster tools now says it cannot access the robots.txt file on the old site. How can I make it still see the robots.txt file when the .htaccess is doing a full site redirect? .htaccess currently has: Options +FollowSymLinks -MultiViews
Technical SEO | | RalphinAZ
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www.)?michaelswilderhr.com$ [NC]
RewriteRule ^ http://www.s2esolutions.com/ [R=301,L] Google webmaster tools is reporting: Over the last 24 hours, Googlebot encountered 1 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%.0