Is robots.txt a must-have for 150 page well-structured site?
-
By looking in my logs I see dozens of 404 errors each day from different bots trying to load robots.txt. I have a small site (150 pages) with clean navigation that allows the bots to index the whole site (which they are doing). There are no secret areas I don't want the bots to find (the secret areas are behind a Login so the bots won't see them).
I have used rel=nofollow for internal links that point to my Login page.
Is there any reason to include a generic robots.txt file that contains "user-agent: *"?
I have a minor reason: to stop getting 404 errors and clean up my error logs so I can find other issues that may exist. But I'm wondering if not having a robots.txt file is the same as some default blank file (or 1-line file giving all bots all access)?
-
Thanks, Keri. No, it's a hand-built blog. No CMS.
I think the googlebot is doing a good job of indexing my site. The site is small and when I search for my content I do find it in google. I was pretty sure that google worked the way you describe. So it sounds like sitemaps are an optional hint, and perhaps not needed for relatively small sites (couple hundred pages of well linked content). Thanks.
-
The phrase "blog entries" makes me ask are you on a CMS like Wordpress, or are the blog entries pages you are creating from scratch?
If you're on WP or a CMS, you'll want a robots.txt so that your admin, plugin, and other directories aren't indexed. On the plus side, WP (and other CMSs) have plugins that will generate a sitemap.xml file you as you add pages.
Google will find pages if you don't have a site map, or forget to add them. The sitemap is a way to let Google know what is out there, but it a) isn't required for Google to index a page and b) won't force Google to index a page.
-
Thanks, Keith. Makes sense.
So how important is an xml sitemap for a 150 page site with clean navigation? As near as I can tell (from the site: command) my whole site is already being indexed by Google. Does a sitemap buy me anything? What happens if my sitemap is partial (ie if I forget to add new pages to it, but I do link to the new pages from my other indexed pages, then will the new pages get indexed)? I'm a little worried about sitemap maintenance as I add new blog entries and so on...
-
Hi Mike...
I am sure that you are always going to get a range of opinions to this kind of question.
I think that for your site the answer may be simply that having a robots.txt file is more of a “belt and braces” safe harbour-type thing – the same goes for say whether you should have a keywords meta tag – many say these pieces of code can be of marginal value but, when you are competing head to head for a #1 listing (ie 35%+ of the clicks) then you should use every option and weapon possible ...furthermore, if your site is likely to grow significantly or eventually have content/files that you may want excluded, it’s just a “tidy” thing to have had indexed over time.
Also, don’t forget that best practice robots.txt file taxonomy is to also include directions to your xml sitemap/s.
Here is an example from one of our sites...
User-agent: *
Disallow: /design_examples.xml
Disallow: /case_studies.xmlUser-agent: Googlebot-Image
Disallow: /Sitemap: http://www.sitetopleveldomain.com/sitemap.xml
In this example there are two root files specifically excluded from all bots and this site has also specifically excluded the Google Images bot as they were getting a lot of traffic from image searches and then subsequently seeing the same copyright images turn up on a hundred junk sites – this doesn’t stop image scraping but certainly reduces the ease of finding them.
In relation to the “or 1-line file giving all bots all access” part of your question...
Some bots (most notably Google) now support an additional field called "Allow:"
As the name suggests, "Allow:" lets you specifically indicate what files/folders CAN be crawled, excluding all others. However, this field is currently not part of the "robots.txt" protocol and so not universally supported, so my suggestion would be to test it for your site for a week, as it might confuse some less intelligent crawlers.
So, in summary, my recommendation is to keep a simple robots.txt file, test if the Allow: field works for you and also ensure you have that guide to your xml sitemap – although wearing a belt and braces might not be a good look, at least your pants are unlikely to fall down
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My WP website got attack by malware & now my website site:www.example.ca shows about 43000 indexed page in google.
Hi All My wordpress website got attack by malware last week. It affected my index page in google badly. my typical site:example.ca shows about 130 indexed pages on google. Now it shows about 43000 indexed pages. I had my server company tech support scan my site and clean the malware yesterday. But it still shows the same number of indexed page on google. Does anybody had ever experience such situation and how did you fixed it. Looking for help. Thanks FILE HIT LIST:
Technical SEO | | Chophel
{YARA}Spam_PHP_WPVCD_ContentInjection : /home/example/public_html/wp-includes/wp-tmp.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-includes/wp-vcd.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-content/themes/oceanwp.zip
{YARA}webshell_webshell_cnseay02_1 : /home/example2/public_html/content.php
{YARA}eval_post : /home/example2/public_html/wp-includes/63292236.php
{YARA}webshell_webshell_cnseay02_1 : /home/example3/public_html/content.php
{YARA}eval_post : /home/example4/public_html/wp-admin/28855846.php
{HEX}php.generic.malware.442 : /home/example5/public_html/wp-22.php
{HEX}php.generic.cav7.421 : /home/example5/public_html/SEUN.php
{HEX}php.generic.malware.442 : /home/example5/public_html/Webhook.php0 -
Best Web-site Structure/ SEO Strategy for an online travel agency?
Dear Experts! I need your help with pointing me in the right direction. So far I have found scattered tips around the Internet but it's hard to make a full picture with all these bits and pieces of information without a professional advice. My primary goal is to understand how I should build my online travel agency web-site’s (https://qualistay.com) structure, so that I target my keywords on correct pages and do not create a duplicate content. In my particular case I have very similar properties in similar locations in Tenerife. Many of them are located in the same villa or apartment complex, thus, it is very hard to come up with the unique description for each of them. Not speaking of amenities and pricing blocks, which are standard and almost identical (I don’t know if Google sees it as a duplicate content). From what I have read so far, it’s better to target archive pages rather than every single property. At the moment my archive pages are: all properties (includes all property types and locations), a page for each location (includes all property types). Does it make sense adding archive pages by property type in addition OR in stead of the location ones if I, for instance, target separate keywords like 'villas costa adeje' and 'apartments costa adeje'? At the moment, the title of the respective archive page "Properties to rent in costa adeje: villas, apartments" in principle targets both keywords... Does using the same keyword in a single property listing cannibalize archive page ranking it is linking back to? Or not, unless Google specifically identifies this as a duplicate content, which one can see in Google Search Console under HTML Improvements and/or archive page has more incoming links than a single property? If targeting only archive pages, how should I optimize them in such a way that they stay user-friendly. I have created (though, not yet fully optimized) descriptions for each archive page just below the main header. But I have them partially hidden (collapsible) using a JS in order to keep visitors’ focus on the properties. I know that Google does not rank hidden content high, at least at the moment, but since there is a new algorithm Mobile First coming up in the near future, they promise not to punish mobile sites for a collapsible content and will use mobile version to rate desktop one. Does this mean I should not worry about hidden content anymore or should I move the descirption to the bottom of the page and make it fully visible? Your feedback will be highly appreciated! Thank you! Dmitry
Technical SEO | | qualistay1 -
Is sitemap required on my robots.txt?
Hi, I know that linking your sitemap from your robots.txt file is a good practice. Ok, but... may I just send my sitemap to search console and forget about adding ti to my robots.txt? That's my situation: 1 multilang platform which means... ... 2 set of pages. One for each lang, of course But my CMS (magento) only allows me to have 1 robots.txt file So, again: may I have a robots.txt file woth no sitemap AND not suffering any potential SEO loss? Thanks in advance, Juan Vicente Mañanas Abad
Technical SEO | | Webicultors0 -
Improving SEO Structure of a Page
Our site is an online Marketplace for Services. Naturally, we have a lot of unique content in the form of :
Technical SEO | | Ideas2life
a) Job Posts
b) Profiles of Service Providers We also have 2 very important pages:
a) The Job Listing Page
b) The Service Provider Page The Listing pages have very valuable H1 Titles, but everything else is duplicate content. To capture those keywords currently in H1, we have created a different landing page for each category page, and we`ll optimize around that, so these H1s are not that big of a deal any more. These landing pages are the key to our SEO strategy and we are building new content every day to help them rank I want to make the Listing Pages No Index Follow. This way they pass Juice to Jobs, and Profiles which have unique contents, but are not indexed themselves. Is this a bad idea? I have been thinking about doing this for over a year but it never felt important enough to be worth the risk of accidentally screwing up We `ll soon do a new on page flow optimization and that's why I am considering this again. Thank you so much in advance Argyris0 -
How to handle mobile site with less pages than the main site?
We are developing a mobile version of our website that will utilize responsive design/dynamic serving. About 70% of the main website will be included in the mobile version. What (if anything) should be the redirect for pages not included in the mobile version of the site? Also - for one specific section users will be redirected from that page to the homepage, what is the redirect that should be used for this? Thanks!
Technical SEO | | theLotter0 -
Best practices for controlling link juice with site structure
I'm trying to do my best to control the link juice from my home page to the most important category landing pages on my client's e-commerce site. I have a couple questions regarding how to NOT pass link juice to insignificant pages and how best to pass juice to my most important pages. INSIGNIFICANT PAGES: How do you tag links to not pass juice to unimportant pages. For example, my client has a "Contact" page off of there home page. Now we aren't trying to drive traffic to the contact page, so I'm worried about the link juice from the home page being passed to it. Would you tag the Contact link with a "no follow" tag, so it doesn't pass the juice, but then include it in a sitemap so it gets indexed? Are there best practices for this sort of stuff?
Technical SEO | | Santaur0 -
Robots.txt versus sitemap
Hi everyone, Lets say we have a robots.txt that disallows specific folders on our website, but a sitemap submitted in Google Webmaster Tools that lists content in those folders. Who wins? Will the sitemap content get indexed even if it's blocked by robots.txt? I know content that is blocked by robot.txt can still get indexed and display a URL if Google discovers it via a link so I'm wondering if that would happen in this scenario too. Thanks!
Technical SEO | | anthematic0 -
What is the best top menu linking structure (for SEO) for my site: A or B?
I don't know if these two scenarios are any different as far as SEO is concerned, but I wanted to ask to get an opinion. On my website: http://www.rainchainsdirect.com you can see there is a top menu with "About" "Info" "Questions" etc. Some of these links lead to further pages that are essentially a indeces for multiple further links. My question is: in terms of SEO, is it better to A) have all links (that are now on the pages that the menu links lead to) displayed in a drop down menu directly from the top menu (and bypassing an intermediate page) or B) to have it as it is now where you have to click to an intermediate page (like "rain chain info") to get access to the links (and not have such a large drop down menu) Is there a difference in terms of SEO? In terms of useability it almost seems like a toss up between the two, so if there were better SEO value to one of the other, then I would choose that one. By the way, I know that the way it is structured now is strange, where there is only one drop down that leads to the same page as the top menu item, but that will be fixed, fyi. Thanks!
Technical SEO | | csblev0