Application & understanding of robots.txt
-
Hello Moz World!
I have been reading up on robots.txt files, and I understand the basics. I am looking for a deeper understanding on when to deploy particular tags, and when a page should be disallowed because it will affect SEO. I have been working with a software company who has a News & Events page which I don't think should be indexed. It changes every week, and is only relevant to potential customers who want to book a demo or attend an event, not so much search engines. My initial thinking was that I should use noindex/follow tag on that page. So, the pages would not be indexed, but all the links will be crawled.
I decided to look at some of our competitors robots.txt files. Smartbear (https://smartbear.com/robots.txt), b2wsoftware (http://www.b2wsoftware.com/robots.txt) & labtech (http://www.labtechsoftware.com/robots.txt).
I am still confused on what type of tags I should use, and how to gauge which set of tags is best for certain pages. I figured a static page is pretty much always good to index and follow, as long as it's public. And, I should always include a sitemap file. But, What about a dynamic page? What about pages that are out of date? Will this help with soft 404s?
This is a long one, but I appreciate all of the expert insight. Thanks ahead of time for all of the awesome responses.
Best Regards,
Will H.
-
Yup.. also don't forget that robots.txt is just a "recommendation" for robots. they do not obey it
Basically Google does what ever it wants to
Also if you want to block a folder so its inner content wont be "accessed", in case anylink will point to this page, even if its coming from outside of your domain, it will be indexed.. Although the content of it wont be shown on search results but it will show up with a notice stating that the site content is blocked due to the sites robots.txt..best of luck!
-
Great Advice Yossi & Chris. Thanks for taking the time to reply. I will have to dig into the Google Guidelines for additional information, but both of your points are valid. I think I was looking at robots.txt the wrong way. Thanks Again Guys!
-
I completely agree with Yossi here; no need to go blocking that page at all.
I can't really add any further value to the points he has covered but one other part of your question suggested that perhaps you're looking at this the wrong way (and it's very common, don't worry!). Rather than having your site stay as-is and just obscuring the bad parts of it from search engines, the thought process should really around creating a great website instead.
If you're ever considering blocking a page from search engines, the first step should always be "why am I blocking this page(s); could I just fix the issue instead?".
For example, you asked if this might help with soft 404s. Rather than trying to find a way to hide these soft 404s, spend that time fixing them instead!
-
Hi Will
There are some concerns that you have which I do not understand.
Why you want to block News & Events page? If it has unique content and on top of that if it is updated regularly, you have no reason to block access to the page. If it is "relevant to potential customers who want to book a demo" its great. I would definitely keep it indexed and followed.Google explicitly states that you should not block access to a page if you simply want to de-index it/remove it. If the page should not be indexed publicly you should remove it or password protect it (a google suggestion).
About tags, i assume you are talking about meta tags, correct?
There is no need to use any kind of meta tag to signal search engines that they need to index or follow the page, you use it only when you want to limit them not to take certain actions.
Also there is no difference between a static or dynamic page when it comes to tag usage. There is no rules for that. A page perfectly be static for years and still get indexed and ranked very good. (but, well we all know that updating the site is a ranking signal)
If you believe that certain page should be tagged "noindex" it is not because it is not updated within the last month or year. Just for an example: contact us pages, about us pages and terms of use pages. These are super static pages that in many cases probably wont be changed for years.best
Yossi
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
AMP for Online Forums/Communities
This great article by Eric Enge makes a compelling case for jumping to AMP for news sites, but does anyone have any data or have heard of an online forum using AMP as well? I run an online community and we are thinking of making the (relatively heavy) investment into making most of our pages AMP friendly... Thank you! Patrick
Intermediate & Advanced SEO | | WallStreetOasis.com0 -
Best SEO Strategy for Badges & Awards.
Hello Moz Friends! I was wondering what the correct "SEO friendly" strategy is with badges and awards. We recently got BBB accredited and added their badge to the footer of the website. We also added a review badge from shopper approved to the footer. As I'm joining other communities, I see there's badges given to us. For example, Alignable. Great place for networking. They offer a badge that says "locals recommend us" or something. Should I embed these badges onto our website someplace? Should I create a page for just badges or place them in the footer or sidebar widgets? What the best SEO practice for this? Thank you!!
Intermediate & Advanced SEO | | LindsayE2 -
Soft 404s for unpublished & 301'd content
Hi, One site I work with unpublished a lot of thin content. Great idea, right? These unpublished pages were then 301'd up to the main category page that they previously existed in. Now Google Webmaster Tools calls them out as soft 404 errors. This seems unexpected since the pages were 301'd. Here is my question; Is this a serious problem that may affect the site's overall organic results and if so what should I do about it? Thanks... Darcy
Intermediate & Advanced SEO | | 945010 -
Pages getting into Google Index, blocked by Robots.txt??
Hi all, So yesterday we set up to Remove URL's that got into the Google index that were not supposed to be there, due to faceted navigation... We searched for the URL's by using this in Google Search.
Intermediate & Advanced SEO | | bjs2010
site:www.sekretza.com inurl:price=
site:www.sekretza.com inurl:artists= So it brings up a list of "duplicate" pages, and they have the usual: "A description for this result is not available because of this site's robots.txt – learn more." So we removed them all, and google removed them all, every single one. This morning I do a check, and I find that more are creeping in - If i take one of the suspecting dupes to the Robots.txt tester, Google tells me it's Blocked. - and yet it's appearing in their index?? I'm confused as to why a path that is blocked is able to get into the index?? I'm thinking of lifting the Robots block so that Google can see that these pages also have a Meta NOINDEX,FOLLOW tag on - but surely that will waste my crawl budget on unnecessary pages? Any ideas? thanks.0 -
Robots.txt assistance
I want to block all the inner archive news pages of my website in robots.txt - we don't have R&D capacity to set up rel=next/prev or create a central page that all inner pages would have a canonical back to, so this is the solution. The first page I want indexed reads:
Intermediate & Advanced SEO | | theLotter
http://www.xxxx.news/?p=1 all subsequent pages that I want blocked because they don't contain any new content read:
http://www.xxxx.news/?p=2
http://www.xxxx.news/?p=3
etc.... There are currently 245 inner archived pages and I would like to set it up so that future pages will automatically be blocked since we are always writing new news pieces. Any advice about what code I should use for this? Thanks!0 -
Rich Snippets Ratings For Q&A Discussions, Articles,
Hi, I'm looking for how I can use a star rating for a q&a discussion or article/blog post to achieve a rich snippets search result. I'm thinking about a user rating for "Was this helpful?" 1 to 5 stars. As I look at schema.org and do and other reading on it, it looks like it's possible to rate only a set group of content types, blogs and discussions not included. However, I've seen rich snippets ratings in SERPs for blog posts, like this example https://www.google.com/search?q=erp+implementation+challenges&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a#q=panorama+consulting+blog&client=firefox-a&hs=gId&hl=en&rls=org.mozilla:en-US:official&ei=QmCBUYLLCOfwiwKHhIAQ&start=20&sa=N&bav=on.2,or.r_cp.r_qf.&bvm=bv.45921128,d.cGE&fp=eb2f15e2a98a4631&biw=2144&bih=995 On page, it looks like they used some simple span tags. So, my question is, which content type category does that fit into for rating and is that strategy safe enough going forward? Also, are there more steps to making this work? It it is okay to have users rate the helpfulness of a discussion or article and get rich snippets, I'd kinda like to do it. Best... Darcy
Intermediate & Advanced SEO | | 945010 -
How can I remove duplicate content & titles from my site?
Without knowing I created multiple URLs to the same page destinations on my website. My ranking is poor and I need to fix this problem quickly. My web host doesn't understand the problem!!! How can I use canonical tags? Can somebody help, please.
Intermediate & Advanced SEO | | ZoeAlexander0 -
What content should I block in wodpress with robots.txt?
I need to know if anyone has tips on creating a good robots.txt. I have read a lot of info, but I am just not clear on what I should allow and not allow on wordpress. For example there are pages and posts, then attachments, wp-admin, wp-content and so on. Does anyone have a good robots.txt guideline?
Intermediate & Advanced SEO | | ENSO0