Robots.txt best practices & tips
-
Hey,
I was wondering if someone could give me some advice on whether I should block the robots.txt file from the average user (not from googlebot, yandex, etc)?
If so, how would I go about doing this? With .htaccess I'm guessing - but not an expert.
What can people do with the information in the file? Maybe someone can give me some "best practices"? (I have a wordpress based website)
Thanks in advance!
-
Asking about the ideal configuration for a robots.txt file for WordPress is opening a huge can of worms There's plenty of discussion and disagreement about exactly what's best, but a lot of it depends on the actual configuration and goals of your own website. That's too long a discussion to get into here, but below is what I can recommend as a pretty basic, failsafe version that should work for most sites:
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/Sitemap: http://www.yoursite.com/sitemap.xml
I always prefer to explicitly declare the location of my site map, even if it's in the default location.
There are other directives you can include, but they depend more on how you have handled other aspects of your website - e.g. trackbacks, comments and search results pages as well as feeds. This is where the list can get grey, as there are multiple ways to accomplish this, depending how your site is optimised, but here's a representative example.
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Disallow: /category//
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Disallow: /?
Disallow: /?Sorry I can't be more specific on the above example, but it's where things really come down to how you're managing your specific site, and are a much bigger discussion. A web search for "best WordPress robots.txt file" will certainly show you the range of opinions on this.
The key thing to remember with a robots.txt file is that it does not cause blocked URLs to be removed from the index, it only stops the crawlers from traversing those pages. It's designed to help the crawlers spend their time on the pages that you have declared useful, instead of wasting their time on pages that are more administrative in nature. A crawler has a limited amount of time to spend on your site, and you want it to spend that time looking at the valuable pages, not the backend.
Paul
-
Thanks for the detailed answer Paul!
Do you think there is anything I should block for a wordpress website? I blocked /admin.
-
There is really no reason to block the robots.txt file from human users, Jazy. They'll never see it unless they actively go looking for it, and even if they do, it's just directives for where you want the search crawlers to go and where you want them to stay away from.
The only thing a human user will learn from this, is what sections of your site you consider to be nonessential to a search crawler. Even without the robots file, if they were really interested in this information, they could acquire it in other ways.
If you're trying to use your robots.txt file to block information about pages on your website you want to keep private or you don't want anyone to know about, doing it in the robots.txt file is the wrong place anyway. (That's done in .htaccess, which should be blocked from human readers.)
There's enough complexity to managing a website, there's no reason to add more by trying to block your robots file from human users.
Hope that helps?
Paul
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt in subfolders and hreflang issues
A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations: UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
Technical SEO | | lauralou82
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US. They have the following hreflang tags across all pages: We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously). Search Console says there are no hreflang tags at all. Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location. Any suggestions how we can remove UK listings from Google US and vice versa?0 -
How important is AMP?
I have a client site with 200+ landing pages. We implemented AMP and many of the pages lost a lot of key elements including, sidebars, Calls to Action and footers. Our developer claims that we need to customize each of the 200+ pages for AMP to show those things (don't 100% believe him). So the questions are: a. How important is AMP? if we dump AMP will that hurt us? The site is already mobile friendly and clean, loads fast.
Technical SEO | | dk7
b.Does it sound fishy that he says each page needs to be cusotomized to show sidebar, footer content, CTAs?0 -
What is the best practice for redirecting a lower authority TLD to a high authority TLD?
Hi there moz community! My organization is blessed with an extremely high authority TLD (91). Powers-that-be want to start using a lesser authority (though still a respectable 62) TLD in marketing materials because they think it's more memorable/less confusing for users. We currently have a 302 redirect in place from score-62 to score-91, and our situation relative to the engines is strong. However, if they ramp-up a branding campaign using the 62-score TLD, should we change the 302 to a 301? I don't want to risk infecting that 91 score with any juice relative to the score-62 TLD. There isn't a lot written for the best practice in redirecting a lower-authority TLD to a high authority TLD - almost all the literature is about preserving your score/juice when redirecting an old TLD to a new TLD. Thanks for anyone/everyone's help! Brian Alpert; Smithsonian Institution
Technical SEO | | Smithsonian1 -
Robots and Canonicals on Moz
We noticed that Moz does not use a robots "index" or "follow" tags on the entire site, is this best practice? Also, for pagination we noticed that the rel = next/prev is not on the actual "button" rather in the header Is this best practice? Does it make a difference if it's added to the header rather than the actual next/previous buttons within the body?
Technical SEO | | PMPLawMarketing0 -
Best Schema Advice
Hi, I am new here and I have searched for but not got a definitive answer for this. I am sorting out a website which is a scaffolding company operating in a particular area. They are only interested in targeting a particular area and from what I have read through here I need to mark the site up with schema mentioning their company name and address. My issue is that I seem to find lots of conflicting advice about what should go it and how it should be laid out. I would love to know peoples opinions on where the best guide for setting up schema correctly for a site like this. They use wordpress, I am ok with inserting code to the site etc, I just want to make sure I get it right from the start. Once I have done this, I understand that I need to get local citations using the same NAP as how the site is marked up. Sorry for what might seem like a daft question but I am a designer and I am still learning the ins and outs of SEO. Thanks
Technical SEO | | kirstyseo0 -
Should I use canonicals? Best practice?
Hi there, I've been working on a pretty dated site. The product pages have tabs that separate the product information, e.g., a tab for specifications, a tab for system essentials, an overview tab that is actually just a copy of the product page. Each tab is actually a link to a completely separate page, so product/main-page is split into product/main-page/specs, product/main-page/resources, etc. Wondering if canonicals would be appropriate in this situation? The information isn't necessarily duplicate (except for the overview tabs) but with each tab as a separate page, I would imagine that's diluting the value of the main page? The information all belongs to the main page, shouldn't it be saying "I'm a version of the main page"?
Technical SEO | | anneoaks0 -
Site blocked by robots.txt and 301 redirected still in SERPs
I have a vanity URL domain that 301 redirects to my main site. That domain does have a robots.txt to disallow the entire site as well. However, for a branded enough search that vanity domain still shows up in SERPs and has the new Google message of: A description for this result is not available because of this site's robots.txt I get why the message is there - that's not my , my question is shouldn't a 301 redirect trump this domain showing in SERPs, ever? Client isn't happy about it showing at all. How can I get the vanity domain out of the SERPs? THANKS in advance!
Technical SEO | | VMLYRDiscoverability0 -
What is the best way to close my blog?
I have a blog on a separate address to my website. http://cheshireweddingphotographyblog.co.uk/ and http://celynnenphotography.co.uk Now I'm going to have a new website which is going to be wordpress based and it will sit on the main website (http://celynnenphotography.co.uk ) and include both gallery and blog. now the blog does well enough on google, etc.. so i wanted to mix their SEO juju and all that, but what happens with my blog? Do i: Stop paying for hosting, nice and simple. OR Do I need to do something?
Technical SEO | | IoanSaid0