Application & understanding of robots.txt
-
Hello Moz World!
I have been reading up on robots.txt files, and I understand the basics. I am looking for a deeper understanding on when to deploy particular tags, and when a page should be disallowed because it will affect SEO. I have been working with a software company who has a News & Events page which I don't think should be indexed. It changes every week, and is only relevant to potential customers who want to book a demo or attend an event, not so much search engines. My initial thinking was that I should use noindex/follow tag on that page. So, the pages would not be indexed, but all the links will be crawled.
I decided to look at some of our competitors robots.txt files. Smartbear (https://smartbear.com/robots.txt), b2wsoftware (http://www.b2wsoftware.com/robots.txt) & labtech (http://www.labtechsoftware.com/robots.txt).
I am still confused on what type of tags I should use, and how to gauge which set of tags is best for certain pages. I figured a static page is pretty much always good to index and follow, as long as it's public. And, I should always include a sitemap file. But, What about a dynamic page? What about pages that are out of date? Will this help with soft 404s?
This is a long one, but I appreciate all of the expert insight. Thanks ahead of time for all of the awesome responses.
Best Regards,
Will H.
-
Yup.. also don't forget that robots.txt is just a "recommendation" for robots. they do not obey it
Basically Google does what ever it wants to
Also if you want to block a folder so its inner content wont be "accessed", in case anylink will point to this page, even if its coming from outside of your domain, it will be indexed.. Although the content of it wont be shown on search results but it will show up with a notice stating that the site content is blocked due to the sites robots.txt..best of luck!
-
Great Advice Yossi & Chris. Thanks for taking the time to reply. I will have to dig into the Google Guidelines for additional information, but both of your points are valid. I think I was looking at robots.txt the wrong way. Thanks Again Guys!
-
I completely agree with Yossi here; no need to go blocking that page at all.
I can't really add any further value to the points he has covered but one other part of your question suggested that perhaps you're looking at this the wrong way (and it's very common, don't worry!). Rather than having your site stay as-is and just obscuring the bad parts of it from search engines, the thought process should really around creating a great website instead.
If you're ever considering blocking a page from search engines, the first step should always be "why am I blocking this page(s); could I just fix the issue instead?".
For example, you asked if this might help with soft 404s. Rather than trying to find a way to hide these soft 404s, spend that time fixing them instead!
-
Hi Will
There are some concerns that you have which I do not understand.
Why you want to block News & Events page? If it has unique content and on top of that if it is updated regularly, you have no reason to block access to the page. If it is "relevant to potential customers who want to book a demo" its great. I would definitely keep it indexed and followed.Google explicitly states that you should not block access to a page if you simply want to de-index it/remove it. If the page should not be indexed publicly you should remove it or password protect it (a google suggestion).
About tags, i assume you are talking about meta tags, correct?
There is no need to use any kind of meta tag to signal search engines that they need to index or follow the page, you use it only when you want to limit them not to take certain actions.
Also there is no difference between a static or dynamic page when it comes to tag usage. There is no rules for that. A page perfectly be static for years and still get indexed and ranked very good. (but, well we all know that updating the site is a ranking signal)
If you believe that certain page should be tagged "noindex" it is not because it is not updated within the last month or year. Just for an example: contact us pages, about us pages and terms of use pages. These are super static pages that in many cases probably wont be changed for years.best
Yossi
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEM Rush & Duplicate content
Hi SEMRush is flagging these pages as having duplicate content, but we have rel = next etc implemented: https://www.key.co.uk/en/key/brand/bott https://www.key.co.uk/en/key/brand/bott?page=2 Or is it being flagged as they're just really similar pages?
Intermediate & Advanced SEO | | BeckyKey0 -
"noindex, follow" or "robots.txt" for thin content pages
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great. I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
Intermediate & Advanced SEO | | khi50 -
Looking for SEO & design help
Our website is www.mosquitocurtains.com built by an amateur (me). Traffic has been on the decline, slipping from enviable rankings, higher bounce rates. Of course I have a modest budget but can't seem to sift through those that say, "Pay us a bundle and cross your fingers that we're any good."
Intermediate & Advanced SEO | | Kurtyj0 -
What could be the reasons why PA & DA changed
Hi, What could be the reason why PA & DA of the site dropped? Thank you
Intermediate & Advanced SEO | | Webdeal0 -
What should I block with a robots.txt file?
Hi Mozzers, We're having a hard time getting our site indexed, and I have a feeling my dev team may be blocking too much of our site via our robots.txt file. They say they have disallowed php and smarty files. Is there any harm in allowing these pages? Thanks!
Intermediate & Advanced SEO | | Travis-W1 -
Panda/Penguin & more than one services site in niche
Hello, My friend has a personal development training site. I have been advised not to make separate personal coaching sites for the owners of the training sites. Do you have experience that Panda/Penguin could penalize for separate sites in a similar niche? Do you need any more info to give a good response? Thank you.
Intermediate & Advanced SEO | | BobGW0 -
How long will Google take to read my robots.txt after updating?
I updated www.egrecia.es/robots.txt two weeks ago and I still haven't solved Duplicate Title and Content on the website. The Google SERP doesn't show those urls any more but SEOMOZ Crawl Errors nor Google Webmaster Tools recognize the change. How long will it take?
Intermediate & Advanced SEO | | Tintanus0 -
Canonical Fix Value & Pointer To Good Instructions?
Could you tell me whether the "canonical fix" is still a relevant and valuable SEO method? I'm talking about the .htaccess (or ISAPI for Microsoft) level fix to make all of the non-www page URLs on a website redirect to the www. version - so that SEO "value" isn't split between the two. I'm NOT talking about the newer <rel= canonical="" http:="" ...="">tag that goes in the HEAD section on an HTML page - as a fix for some duplicate content issues (I guess). </rel=> I still hear about the latter, but less about the former. But the former is different than the latter right - it doesn't replace it? And I'm not sure if the canonical fix is relevant to a WordPress-based website - are you? Also I can never find any page or article on the Web, etc. that explains clearly how to implement the canonical fix for Apache and Microsoft servers. Could you please point me to one? Thanks in advance!
Intermediate & Advanced SEO | | DenisL0