Application & understanding of robots.txt
-
Hello Moz World!
I have been reading up on robots.txt files, and I understand the basics. I am looking for a deeper understanding on when to deploy particular tags, and when a page should be disallowed because it will affect SEO. I have been working with a software company who has a News & Events page which I don't think should be indexed. It changes every week, and is only relevant to potential customers who want to book a demo or attend an event, not so much search engines. My initial thinking was that I should use noindex/follow tag on that page. So, the pages would not be indexed, but all the links will be crawled.
I decided to look at some of our competitors robots.txt files. Smartbear (https://smartbear.com/robots.txt), b2wsoftware (http://www.b2wsoftware.com/robots.txt) & labtech (http://www.labtechsoftware.com/robots.txt).
I am still confused on what type of tags I should use, and how to gauge which set of tags is best for certain pages. I figured a static page is pretty much always good to index and follow, as long as it's public. And, I should always include a sitemap file. But, What about a dynamic page? What about pages that are out of date? Will this help with soft 404s?
This is a long one, but I appreciate all of the expert insight. Thanks ahead of time for all of the awesome responses.
Best Regards,
Will H.
-
Yup.. also don't forget that robots.txt is just a "recommendation" for robots. they do not obey it
Basically Google does what ever it wants to
Also if you want to block a folder so its inner content wont be "accessed", in case anylink will point to this page, even if its coming from outside of your domain, it will be indexed.. Although the content of it wont be shown on search results but it will show up with a notice stating that the site content is blocked due to the sites robots.txt..best of luck!
-
Great Advice Yossi & Chris. Thanks for taking the time to reply. I will have to dig into the Google Guidelines for additional information, but both of your points are valid. I think I was looking at robots.txt the wrong way. Thanks Again Guys!
-
I completely agree with Yossi here; no need to go blocking that page at all.
I can't really add any further value to the points he has covered but one other part of your question suggested that perhaps you're looking at this the wrong way (and it's very common, don't worry!). Rather than having your site stay as-is and just obscuring the bad parts of it from search engines, the thought process should really around creating a great website instead.
If you're ever considering blocking a page from search engines, the first step should always be "why am I blocking this page(s); could I just fix the issue instead?".
For example, you asked if this might help with soft 404s. Rather than trying to find a way to hide these soft 404s, spend that time fixing them instead!
-
Hi Will
There are some concerns that you have which I do not understand.
Why you want to block News & Events page? If it has unique content and on top of that if it is updated regularly, you have no reason to block access to the page. If it is "relevant to potential customers who want to book a demo" its great. I would definitely keep it indexed and followed.Google explicitly states that you should not block access to a page if you simply want to de-index it/remove it. If the page should not be indexed publicly you should remove it or password protect it (a google suggestion).
About tags, i assume you are talking about meta tags, correct?
There is no need to use any kind of meta tag to signal search engines that they need to index or follow the page, you use it only when you want to limit them not to take certain actions.
Also there is no difference between a static or dynamic page when it comes to tag usage. There is no rules for that. A page perfectly be static for years and still get indexed and ranked very good. (but, well we all know that updating the site is a ranking signal)
If you believe that certain page should be tagged "noindex" it is not because it is not updated within the last month or year. Just for an example: contact us pages, about us pages and terms of use pages. These are super static pages that in many cases probably wont be changed for years.best
Yossi
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Href Lang & Canonical Tags
Hi I have 2 issues appearing on my site audit, for a number of pages. I don't think I actually have an issue but just want to make sure. Using this page as an example - http://www.key.co.uk/en/key/0-5-l-capacity-round-safety-can-149p210 The errors I get are: 1. Conflicting hreflang and rel=canonical Canonical page points to a different language URL - when using href & canonicals, it states I need a self referential canonical . The page above is a SKU page, so we include a canonical back to the original model page so we don't get lots of duplicate content issues. Our canonical will point to - http://www.key.co.uk/en/key/justrite-round-safety-cans 2. No self referencing hreflang. Are these big issues? I'd think the bigger issue would be if I add self referencing canonicals and end up with lots of duplicate content. Any advice would be much appreciated 🙂
Intermediate & Advanced SEO | | BeckyKey0 -
Robots.txt wildcards - the devs had a disagreement - which is correct?
Hi – the lead website developer was assuming that this wildcard: Disallow: /shirts/?* would block URLs including a ? within this directory, and all the subdirectories of this directory that included a “?” The second developer suggested that this wildcard would only block URLs featuring a ? that come immediately after /shirts/ - for example: /shirts?minprice=10&maxprice=20 BUT argued that this robots.txt directive would not block URLS featuring a ? in sub directories - e.g. /shirts/blue?mprice=100&maxp=20 So which of the developers is correct? Beyond that, I assumed that the ? should feature a * on each side of it – for example - /? - to work as intended above? Am I correct in assuming that?
Intermediate & Advanced SEO | | McTaggart0 -
Reviewing Category & Tag policy - Update
I recently (http://moz.com/community/q/less-tags-better-for-seo) started reviewing my category and tag policy, and things have been going very well. I thought I would share what I have done: Removed all tags from site Added unique descriptions for each post for the category excerpt. Only had the category description on the first page and use the description like a post to summarise and interlink to sub-categories or posts. This keeps pages from slipping down the number of clicks until it can be reached, improving link juice distribution. I also reduced the number of posts showing to 5, to allow more focus on the description (main part) of the category post. To add the category description on the first category page only in Wordpress, you need to go to the category.php or archive.php and change: to The overall aim was to have a hierarchal resource contained in the category page description. Whilst this is still a work in progress, you can see an example of what I am trying to achieve here: https://www.besthostnews.com/web-hosting-tutorials/cpanel/ https://www.besthostnews.com/web-hosting-tutorials/cpanel/mail/ If you have any further tips and advice as I continue to implement this (with good results so far), please feel free. Also, you can use the Visual Term Description Editor plugin to allow the wysiwyg editor for the category descriptions.
Intermediate & Advanced SEO | | TheWebMastercom1 -
Use Canonical or Robots.txt for Map View URL without Backlink Potential
I have a Page X with lots of unique content. This page has a "Map view" option, which displays some of the info from Page X, but a lot is ommitted. Questions: Should I add canonical even though Map View URL does not display a lot of info from Page X or adding to robots.txt or noindex, follow? I don't see any back links coming to Map View URL Should Map View page have unique H1, title tag, meta des?
Intermediate & Advanced SEO | | khi50 -
Ecommerce product URLs & flat architecture?
Hey Mozzers, I'm optimizing a small ecommerce site. The site URL directory structure seems all good & logical, BUT should I try for a flatter architecture - so that the individual products are at top level after the domain name in URLs? e.g.
Intermediate & Advanced SEO | | GregDixson
www.domain.com/first-item/
www.domain.com/second-item/
etc. etc. My current setup (I'm using the Woocommerce plugin in Wordpress): www.domain.com/shop/ (main shop page)
www.domain.com/shop/category-name-1/
www.domain.com/shop/category-name-2/
www.domain.com/shop/category-name-3/ with products appearing as:
www.domain.com/product/first-item/
www.domain.com/product/second-item/
etc. I've researched some big brand ecommerce sites and most seem to be domain.com/amazing-product/ even if the product itself is many categories or sub-categories down. i.e. Homepage > Home & Furniture > Furniture > Living Room Furniture > Coffee Tables As I say the information architecture makes sense from a user point of view, but I'm guessing the individual products would stand more chance of ranking if directly following the domain name? Woocommerce although flexible doesn't seem to do this out-of-the-box, so please some advice before I go on a hacking and URL rewriting mission! Thanks 🙂0 -
Rich Snippets Ratings For Q&A Discussions, Articles,
Hi, I'm looking for how I can use a star rating for a q&a discussion or article/blog post to achieve a rich snippets search result. I'm thinking about a user rating for "Was this helpful?" 1 to 5 stars. As I look at schema.org and do and other reading on it, it looks like it's possible to rate only a set group of content types, blogs and discussions not included. However, I've seen rich snippets ratings in SERPs for blog posts, like this example https://www.google.com/search?q=erp+implementation+challenges&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a#q=panorama+consulting+blog&client=firefox-a&hs=gId&hl=en&rls=org.mozilla:en-US:official&ei=QmCBUYLLCOfwiwKHhIAQ&start=20&sa=N&bav=on.2,or.r_cp.r_qf.&bvm=bv.45921128,d.cGE&fp=eb2f15e2a98a4631&biw=2144&bih=995 On page, it looks like they used some simple span tags. So, my question is, which content type category does that fit into for rating and is that strategy safe enough going forward? Also, are there more steps to making this work? It it is okay to have users rate the helpfulness of a discussion or article and get rich snippets, I'd kinda like to do it. Best... Darcy
Intermediate & Advanced SEO | | 945010 -
301 Re-direct Implementation & Its Possible Aftermaths
Hi all, I'm currently working on a domain that seems to be 'unofficially' blacklisted by Google. The reason behind my belief are, Ranking process of KW became stagnant. Current crawling and indexing rate has been decreased. Site performance deteriorate after every Search engine update or major data refreshes. And few major indications pointing out that search engines might started doubting its authority. The site is live n running for about 10+ yr and consists of 6000+ pages out of which 5000+ pages are indexed. The site also have some serious issues like, The site has been 2 times penalized by Google. The link ratio & inbound link quality of the site is quite unnatural (mostly directory links, links form spammy sites, bad-neighborhood links etc. ) The site is in flat file and not CMS, thus making it extremely difficult to maintain and update it. Due to the above reasons I was thinking of implementing 301 re-direction. I would like to redirect this poor performing existing domain to a new fresh one keeping the URL structure and files same and maintaining 1:1 redirection rules. I've read an awesome article by Danny Dover on 301 Re direction of a site here in SEOMOZ. It seems that if any one follow the steps mentioned there can actually get benefited by the overall re direction process. Now I'd like know your suggestion about following points: 1. Considering the factors that I've stated, do you think that it would be good to go with this re direction idea? 2. If 301 is implemented then what can be its immediate effects on current rankings and site performance? 3. Assuming that the ranks drowned or gets completely vanished from SERP, after what approx time period can be regain back? 4. Any other suggestion that might help me out to better understand the situation.
Intermediate & Advanced SEO | | ITRIX0 -
Googlebot Can't Access My Sites After I Repair My Robots File
Hello Mozzers, A colleague and I have been collectively managing about 12 brands for the past several months and we have recently received a number of messages in the sites' webmaster tools instructing us that 'Googlebot was not able to access our site due to some errors with our robots.txt file' My colleague and I, in turn, created new robots.txt files with the intention of preventing the spider from crawling our 'cgi-bin' directory as follows: User-agent: * Disallow: /cgi-bin/ After creating the robots and manually re-submitting it in Webmaster Tools (and receiving the green checkbox), I received the same message about Googlebot not being able to access the site, only difference being that this time it was for a different site that I manage. I repeated the process and everything, aesthetically looked correct, however, I continued receiving these messages for each of the other sites I manage on a daily-basis for roughly a 10-day period. Do any of you know why I may be receiving this error? is it not possible for me to block the Googlebot from crawling the 'cgi-bin'? Any and all advice/insight is very much welcome, I hope I'm being descriptive enough!
Intermediate & Advanced SEO | | NiallSmith1