Robots.txt | any SEO advantage to having one vs not having one?
-
Neither of my sites has a robots.txt file. I guess I have never been bothered by any particular bot enough to exclude it.
Is there any SEO advantage to having one anyways?
-
It's good practice, especially if you are operating a CMS that can create accessible URLs that cause duplicate content problems, create "junk" pages, etc. For example: http://www.asos.com/robots.txt
Google dislikes search results pages being indexed, so you can block those off, e.g. http://moz.com/robots.txt
You can disallow the archive.org bot if you don't want old versions of your site appearing in its search engine, and as others have said you can point to your xml sitemap.
It's not a bad resource to have at your disposal for site hygiene / maintenance reasons, but it's not an absolute necessity either.
-
There are actually a couple good reasons but in short, it's "best practice" so it won't hurt by adding it in. It wont take more than a couple minutes.
-
Just good practice. One SEO advantage would be to include a reference to your sitemap within the robots.txt file.
Aside from that, if you want all of your pages crawled and don't have a sitemap (although you should), no need for a robots.txt file.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should a login page for a payroll / timekeeping comp[any be no follow for robots.txt?
I am managing a Timekeeping/Payroll company. My question is about the customer login page. Would this typically be nofollow for robots?
Technical SEO | | donsilvernail0 -
Robots File
For some reason the robots file on this site: http://rushhour.net.au/robots.txt Is giving this in Google: <cite class="_Rm">www.rushhour.net.au/bootcamp.html</cite>A description for this result is not available because of this site's robots.txtLearn moreCan anyone tell me why please?thanks.
Technical SEO | | SuitsAdmin0 -
Magento technical SEO issues
Hi This is lots of questions and don't expect full answers but if anyone can help or put me in touch with some who can that would be great so here are 3 issues we have from some auditing our site Firstly on pages like https://www.tidy-books.co.uk/shop-with-us/sort-by/price/sort-direction/desc so any pages where there is a sortby the cananoical link doesn't seem to be working correctly. So for here it is https://www.tidy-books.co.uk/shop-with-us/sort-by/price/sort-direction/desc"/> but should be https://www.tidy-books.co.uk/shop-with-us"/> secondly with have a lot of duplicate title tags mainly caused from the blog and the above problem see-> http://prntscr.com/b2t9xe but regarding the blog we have an issue where 2 canonical appearing for example this page
Technical SEO | | tidybooks
https://www.tidy-books.co.uk/blog/page/19/ there are 2 canonical links appearing https://www.tidy-books.co.uk/blog/page/19/"/> we want it to be this
https://www.tidy-books.co.uk/blog/"/> Thirdly
Our mobile usability issues have gone up a lot see- > http://prntscr.com/b2tado
I can see what the issue is that this folder https://www.tidy-books.co.uk/skin/frontend/tidybooks/default/images/ was being crawled by google and contains lots of 'index of' pages. I've disallowed directory in robots.txt as shown here -> http://prntscr.com/b2tbc5 is that correct? any help would be great Just to let you know we use magento v1.7 we use SEO suite ultimate extension and we use fishpigs wordpress extension thanks0 -
SEO for Parallax Website
Hi, Are there any implications of having a parallax website and the URL not changing as you scroll down the page? So basically the whole site is under the same URL? However, when you click on the menu the URL does change? Cheers
Technical SEO | | National-Homebuyers0 -
GWT returning 200 for robots.txt, but it's actually returning a 404?
Hi, Just wondering if anyone has had this problem before. I'm just checking a client's GWT and I'm looking at their robots.txt file. In GWT, it's saying that it's all fine and returns a 200 code, but when I manually visit (or click the link in GWT) the page, it gives me a 404 error. As far as I can tell, the client has made no changes to the robots.txt recently, and we definitely haven't either. Has anyone had this problem before? Thanks!
Technical SEO | | White.net0 -
Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?
I've got several URL's that I need to disallow in my robots.txt file. For example, I've got several documents that I don't want indexed and filters that are getting flagged as duplicate content. Rather than typing in thousands of URL's I was hoping that wildcards were still valid.
Technical SEO | | mkhGT0 -
Robots.txt
Hi there, My question relates to the robots.txt file. This statement: /*/trackback Would this block domain.com/trackback and domain.com/fred/trackback ? Peter
Technical SEO | | PeterM220 -
Site not being Indexed that fast anymore, Is something wrong with this Robots.txt
My wordpress site's robots.txt used to be this: User-agent: * Disallow: Sitemap: http://www.domainame.com/sitemap.xml.gz I also have all in one SEO installed and other than posts, tags are also index,follow on my site. My new posts used to appear on google in seconds after publishing. I changed the robots.txt to following and now post indexing takes hours. Is there something wrong with this robots.txt? User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /wp-login.php Disallow: /wp-login.php Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /author Disallow: /category Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /login/ Disallow: /wget/ Disallow: /httpd/ Disallow: /*.php$ Disallow: /? Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.cgi$ Disallow: /*.xhtml$ Disallow: /? Disallow: /*?Allow: /wp-content/uploads User-agent: TechnoratiBot/8.1 Disallow: ia_archiverUser-agent: ia_archiver Disallow: / disable duggmirror User-agent: duggmirror Disallow: / allow google image bot to search all imagesUser-agent: Googlebot-Image Disallow: /wp-includes/ Allow: /* # allow adsense bot on entire siteUser-agent: Mediapartners-Google* Disallow: Allow: /* Sitemap: http://www.domainname.com/sitemap.xml.gz
Technical SEO | | ideas1230