Is robots.txt a must-have for 150 page well-structured site?
-
By looking in my logs I see dozens of 404 errors each day from different bots trying to load robots.txt. I have a small site (150 pages) with clean navigation that allows the bots to index the whole site (which they are doing). There are no secret areas I don't want the bots to find (the secret areas are behind a Login so the bots won't see them).
I have used rel=nofollow for internal links that point to my Login page.
Is there any reason to include a generic robots.txt file that contains "user-agent: *"?
I have a minor reason: to stop getting 404 errors and clean up my error logs so I can find other issues that may exist. But I'm wondering if not having a robots.txt file is the same as some default blank file (or 1-line file giving all bots all access)?
-
Thanks, Keri. No, it's a hand-built blog. No CMS.
I think the googlebot is doing a good job of indexing my site. The site is small and when I search for my content I do find it in google. I was pretty sure that google worked the way you describe. So it sounds like sitemaps are an optional hint, and perhaps not needed for relatively small sites (couple hundred pages of well linked content). Thanks.
-
The phrase "blog entries" makes me ask are you on a CMS like Wordpress, or are the blog entries pages you are creating from scratch?
If you're on WP or a CMS, you'll want a robots.txt so that your admin, plugin, and other directories aren't indexed. On the plus side, WP (and other CMSs) have plugins that will generate a sitemap.xml file you as you add pages.
Google will find pages if you don't have a site map, or forget to add them. The sitemap is a way to let Google know what is out there, but it a) isn't required for Google to index a page and b) won't force Google to index a page.
-
Thanks, Keith. Makes sense.
So how important is an xml sitemap for a 150 page site with clean navigation? As near as I can tell (from the site: command) my whole site is already being indexed by Google. Does a sitemap buy me anything? What happens if my sitemap is partial (ie if I forget to add new pages to it, but I do link to the new pages from my other indexed pages, then will the new pages get indexed)? I'm a little worried about sitemap maintenance as I add new blog entries and so on...
-
Hi Mike...
I am sure that you are always going to get a range of opinions to this kind of question.
I think that for your site the answer may be simply that having a robots.txt file is more of a “belt and braces” safe harbour-type thing – the same goes for say whether you should have a keywords meta tag – many say these pieces of code can be of marginal value but, when you are competing head to head for a #1 listing (ie 35%+ of the clicks) then you should use every option and weapon possible ...furthermore, if your site is likely to grow significantly or eventually have content/files that you may want excluded, it’s just a “tidy” thing to have had indexed over time.
Also, don’t forget that best practice robots.txt file taxonomy is to also include directions to your xml sitemap/s.
Here is an example from one of our sites...
User-agent: *
Disallow: /design_examples.xml
Disallow: /case_studies.xmlUser-agent: Googlebot-Image
Disallow: /Sitemap: http://www.sitetopleveldomain.com/sitemap.xml
In this example there are two root files specifically excluded from all bots and this site has also specifically excluded the Google Images bot as they were getting a lot of traffic from image searches and then subsequently seeing the same copyright images turn up on a hundred junk sites – this doesn’t stop image scraping but certainly reduces the ease of finding them.
In relation to the “or 1-line file giving all bots all access” part of your question...
Some bots (most notably Google) now support an additional field called "Allow:"
As the name suggests, "Allow:" lets you specifically indicate what files/folders CAN be crawled, excluding all others. However, this field is currently not part of the "robots.txt" protocol and so not universally supported, so my suggestion would be to test it for your site for a week, as it might confuse some less intelligent crawlers.
So, in summary, my recommendation is to keep a simple robots.txt file, test if the Allow: field works for you and also ensure you have that guide to your xml sitemap – although wearing a belt and braces might not be a good look, at least your pants are unlikely to fall down
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My site was Not removed from google, but my most visited page was. what does that mean?
Help. My most important page http://hoodamath.com/games/ has disappeared from google, why the rest of my site still remains. i can't find anything about this type of ban. any help would be appreciated ( i would like to sleep tonight)
Technical SEO | | hoodamath0 -
Robots.txt to disallow /index.php/ path
Hi SEOmoz, I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?. I don't use that extension, but would it cause me any problems from an SEO perspective? How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
Technical SEO | | Mikkehl0 -
BEST Wordpress Robots.txt Sitemap Practice??
Alright, my question comes directly from this article by SEOmoz http://www.seomoz.org/learn-seo/robotstxt Yes, I have submitted the sitemap to google, bing's webmaster tools and and I want to add the location of our site's sitemaps and does it mean that I erase everything in the robots.txt right now and replace it with? <code>User-agent: * Disallow: Sitemap: http://www.example.com/none-standard-location/sitemap.xml</code> <code>???</code> because Wordpress comes with some default disallows like wp-admin, trackback, plugins. I have also read other questions. but was wondering if this is the correct way to add sitemap on Wordpress Robots.txt http://www.seomoz.org/q/robots-txt-question-2 http://www.seomoz.org/q/quick-robots-txt-check. http://www.seomoz.org/q/xml-sitemap-instruction-in-robots-txt-worth-doing I am using Multisite with Yoast plugin so I have more than one sitemap.xml to submit Do I erase everything in Robots.txt and replace it with how SEOmoz recommended? hmm that sounds not right. User-agent: *
Technical SEO | | joony2008
Disallow:
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-login.php
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /comments **ERASE EVERYTHING??? and changed it to** <code> <code>
<code>User-agent: *
Disallow: </code> Sitemap: http://www.example.com/sitemap_index.xml</code> <code>``` Sitemap: http://www.example.com/sub/sitemap_index.xml ```</code> <code>?????????</code> ```</code>0 -
Does adding a YouTube video to a page decrease site speed?
If you embed a YouTube video on your page, does Google count that as part of their site speed calculation. Since it is in a iFrame, I would think that it is not counted.
Technical SEO | | ProjectLabs0 -
Robots.txt versus sitemap
Hi everyone, Lets say we have a robots.txt that disallows specific folders on our website, but a sitemap submitted in Google Webmaster Tools that lists content in those folders. Who wins? Will the sitemap content get indexed even if it's blocked by robots.txt? I know content that is blocked by robot.txt can still get indexed and display a URL if Google discovers it via a link so I'm wondering if that would happen in this scenario too. Thanks!
Technical SEO | | anthematic0 -
Mitigating duplicate page content on dynamic sites such as social networks and blogs.
Hello, I recently did an SEOMoz crawl for a client site. As it typical, the most common errors were duplicate page title and duplicate content. The client site is a custom social network for researchers. Most of the pages that showing as duplicate are simple variations of each user's profile such as comment sections, friends pages, and events. So my question is how can we limit duplicate content errors for a complex site like this. I already know about the rel canonical tag, and rel next tag, but I'm not sure if either of these will do the job. Also, I don't want to lose potential links/link juice for good pages. Are there ways of using the "noindex" tag in batches? For instance: noindex all urls containing this character? Or do most CMS allow this to be done systematically? Anyone with experience doing SEO for a custom Social Network or Forum, please advise. Thanks!!!
Technical SEO | | BPIAnalytics0 -
Blocking other engines in robots.txt
If your primary target of business is not in China is their any benefit to blocking Chinese search robots in robots.txt?
Technical SEO | | Romancing0 -
Need Help With Robots.txt on Magento eCommerce Site
Hello, I am having difficulty getting my robots.txt file to be configured properly. I am getting error emails from Google products stating they can't view our products because they are being blocked, and this past week, in my SEO dashboard, the URL's receiving search traffic dropped by almost 40%. Is there anyone that can offer assistance on a good template robots.txt file I can use for a Magento eCommerce website? The one I am currently using was found at this site here: e-commercewebdesign.co.uk/blog/magento-seo/magento-robots-txt-seo.php - However, I am getting problems from Google now because of it. I searched and found this thread here: http://www.magentocommerce.com/wiki/multi-store_set_up/multiple_website_setup_with_different_document_roots#the_root_folder_robots.txt_file - But I felt like maybe I should get some additional help on properly configuring a robots for a Magento site. Thanks in advance for any help. Please, let me know if you need more info to provide assistance.
Technical SEO | | JerDoggMckoy0