Application & understanding of robots.txt
-
Hello Moz World!
I have been reading up on robots.txt files, and I understand the basics. I am looking for a deeper understanding on when to deploy particular tags, and when a page should be disallowed because it will affect SEO. I have been working with a software company who has a News & Events page which I don't think should be indexed. It changes every week, and is only relevant to potential customers who want to book a demo or attend an event, not so much search engines. My initial thinking was that I should use noindex/follow tag on that page. So, the pages would not be indexed, but all the links will be crawled.
I decided to look at some of our competitors robots.txt files. Smartbear (https://smartbear.com/robots.txt), b2wsoftware (http://www.b2wsoftware.com/robots.txt) & labtech (http://www.labtechsoftware.com/robots.txt).
I am still confused on what type of tags I should use, and how to gauge which set of tags is best for certain pages. I figured a static page is pretty much always good to index and follow, as long as it's public. And, I should always include a sitemap file. But, What about a dynamic page? What about pages that are out of date? Will this help with soft 404s?
This is a long one, but I appreciate all of the expert insight. Thanks ahead of time for all of the awesome responses.
Best Regards,
Will H.
-
Yup.. also don't forget that robots.txt is just a "recommendation" for robots. they do not obey it
Basically Google does what ever it wants to
Also if you want to block a folder so its inner content wont be "accessed", in case anylink will point to this page, even if its coming from outside of your domain, it will be indexed.. Although the content of it wont be shown on search results but it will show up with a notice stating that the site content is blocked due to the sites robots.txt..best of luck!
-
Great Advice Yossi & Chris. Thanks for taking the time to reply. I will have to dig into the Google Guidelines for additional information, but both of your points are valid. I think I was looking at robots.txt the wrong way. Thanks Again Guys!
-
I completely agree with Yossi here; no need to go blocking that page at all.
I can't really add any further value to the points he has covered but one other part of your question suggested that perhaps you're looking at this the wrong way (and it's very common, don't worry!). Rather than having your site stay as-is and just obscuring the bad parts of it from search engines, the thought process should really around creating a great website instead.
If you're ever considering blocking a page from search engines, the first step should always be "why am I blocking this page(s); could I just fix the issue instead?".
For example, you asked if this might help with soft 404s. Rather than trying to find a way to hide these soft 404s, spend that time fixing them instead!
-
Hi Will
There are some concerns that you have which I do not understand.
Why you want to block News & Events page? If it has unique content and on top of that if it is updated regularly, you have no reason to block access to the page. If it is "relevant to potential customers who want to book a demo" its great. I would definitely keep it indexed and followed.Google explicitly states that you should not block access to a page if you simply want to de-index it/remove it. If the page should not be indexed publicly you should remove it or password protect it (a google suggestion).
About tags, i assume you are talking about meta tags, correct?
There is no need to use any kind of meta tag to signal search engines that they need to index or follow the page, you use it only when you want to limit them not to take certain actions.
Also there is no difference between a static or dynamic page when it comes to tag usage. There is no rules for that. A page perfectly be static for years and still get indexed and ranked very good. (but, well we all know that updating the site is a ranking signal)
If you believe that certain page should be tagged "noindex" it is not because it is not updated within the last month or year. Just for an example: contact us pages, about us pages and terms of use pages. These are super static pages that in many cases probably wont be changed for years.best
Yossi
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
AMP for WordPress: To Do Or Not To Do
Hello SEO's, Recently some of my VIPs (Very Important Pages) have slipped, and all the pages above them are AMP. I've been waiting to switch to AMP for as long as possible bc I've heard it's a very mixed bag. As of Oct 2018, what do people think? Is it worth doing? Is there a preferred plugin for wordpress? Are things more likely to go right than wrong? The page that has gotten hit the hardest is https://humanfoodbar.com/plant-paradox-diet/plant-paradox-diet-full-shopping-list-for-lectin-free-diet/. It used to bring in ~70% of organic traffic. It was #1 and is now often near the bottom of the page. 😞 Thanks all! Remy
Intermediate & Advanced SEO | | remytennant1 -
Geo-Targeted Sub-Domains & Duplicate Content/Canonical
For background the sub domain structure here is inherited and commited to due to tech restrictions with some of our platforms. The brand I work with is splitting out their global site into regional sub sites (not too relevant but this is in order to display seasonal product in different hemispheres and to link to stores specific to the region). All sub-domains except EU will be geo-targeted to their relevant country. Regions and sub domains for reference: AU - Australia CA - Canada CH - Switzeraland EU - All Euro zone countries NZ - New Zealand US - United States This will be done with Wordpress multisite. The set up allows to publish content on one 'master' sub site and then decide which other sub sites to 'broadcast' to. Some content is specific to a sub-domain/region so no issue with duplicate and can set the sub-site version as canonical. However some content will appear on all sub-domains. au.example.com/awesome-content/ nz.example.com/awesome-content/ Now first question is since these domains are geo-targeted should I just have them all canonical to the version on that sub-domain? eg Or should I still signal the duplicate content with one canonical version? Essentially the top level example.com exists as a site only for publishing purposes - if a user lands on the top level example.com/awesome-content/ they are given a pop up to select region and redirected to the relevant sub-domain version. So I'm also unsure whether I want that content indexed at all?? I could make the top level example.com versions of all content be the canonical that all others point to eg. and rely on geo-targeting to have the right links show in the right search locations. I hope that's kind of clear?? Obviously I find it confusing and therefore hard to relay! Any feedback at all gratefully received. Cheers, Steve
Intermediate & Advanced SEO | | SteveHoney0 -
Help with Schema & what's considered "Spammy structured markup"
Hello all! I was wondering if someone with a good understanding of schema markup could please answer my question about the correct use so I can correct a penalty I just received. My website is using the following schema markup for our reviews and today I received this message in my search console. UGH... Manual Actions This site may not perform as well in Google results because it appears to be in violation of Google's Webmaster Guidelines. Site-wide matches Some manual actions apply to entire site <colgroup><col class="JX0GPIC-d-h"><col class="JX0GPIC-d-x"><col class="JX0GPIC-d-a"></colgroup>
Intermediate & Advanced SEO | | reversedotmortgage
| | Reason | Affects |
| | Spammy structured markup Markup on some pages on this site appears to use techniques such as marking up content that is invisible to users, marking up irrelevant or misleading content, and/or other manipulative behavior that violates Google's Rich Snippet Quality guidelines. Learn more. | I have used the webmasters rich snippets tool but everything checks out. The only thing I could think of is my schema tag for "product." rather than using a company like tag? (https://schema.org/Corporation). We are a mortgage company so we sell a product it's called a mortgage so I assumed product would be appropriate. Could that even be the issue? I checked another site that uses a similar markup and they don't seem to have any problems in SERPS. http://www.fha.com/fha_reverse shows stars and they call their reviews "store" OR could it be that I added my reviews in my footer so that each of my pages would have a chance at displaying my stars? All our reviews are independently verified and we just would like to showcase them. I greatly appreciate the feedback and had no intentions of abusing the markup. From my site: All Reverse Mortgage 4.9 out of 5 301 Verified Customer Reviews from eKomi | |
| | [https://www.ekomi-us.com/review-reverse.mortgage.html](<a class=)" rel="nofollow" title="eKomi verified customer reviews" target="_BLANK" style="text-decoration:none; font-size:1.1em;"> |
| | ![](<a class=)imgs/rating-bar5.png" /> |
| | |
| | All Reverse Mortgage |
| | |
| | |
| | 4.9 out of 5 |
| | 301 Verified Customer Reviews from eKomi |
| | |
| | |
| | |
| | |1 -
Robots txt is case senstive? Pls suggest
Hi i have seen few urls in the html improvements duplicate titles Can i disable one of the below url in the robots.txt? /store/Solar-Home-UPS-1KV-System/75652
Intermediate & Advanced SEO | | Rahim119
/store/solar-home-ups-1kv-system/75652 if i disable this Disallow: /store/Solar-Home-UPS-1KV-System/75652 will the Search engines scan this /store/solar-home-ups-1kv-system/75652 im little confused with case senstive.. Pls suggest go ahead or not in the robots.txt0 -
Robots.txt Blocking - Best Practices
Hi All, We have a web provider who's not willing to remove the wildcard line of code blocking all agents from crawling our client's site (user-agent: *, Disallow: /). They have other lines allowing certain bots to crawl the site but we're wondering if they're missing out on organic traffic by having this main blocking line. It's also a pain because we're unable to set up Moz Pro, potentially because of this first line. We've researched and haven't found a ton of best practices regarding blocking all bots, then allowing certain ones. What do you think is a best practice for these files? Thanks! User-agent: * Disallow: / User-agent: Googlebot Disallow: Crawl-delay: 5 User-agent: Yahoo-slurp Disallow: User-agent: bingbot Disallow: User-agent: rogerbot Disallow: User-agent: * Crawl-delay: 5 Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp
Intermediate & Advanced SEO | | ReunionMarketing0 -
Google & Bing not indexing a Joomla Site properly....
Can someone explain the following to me please. The background: I launched a new website - new domain with no history. I added the domain to my Bing webmaster tools account, verified the domain and submitted the XML sitemap at the same time. I added the domain to my Google analytics account and link webmaster tools and verified the domain - I was NOT asked to submit the sitemap or anything. The site has only 10 pages. The situation: The site shows up in bing when I search using site:www.domain.com - Pages indexed:- 1 (the home page) The site shows up in google when I search using site:www.domain.com - Pages indexed:- 30 Please note Google found 30 pages - the sitemap and site only has 10 pages - I have found out due to the way the site has been built that there are "hidden" pages i.e. A page displaying half of a page as it is made up using element in Joomla. My questions:- 1. Why does Bing find 1 page and Google find 30 - surely Bing should at least find the 10 pages of the site as it has the sitemap? (I suspect I know the answer but I want other peoples input). 2. Why does Google find these hidden elements - Whats the best way to sort this - controllnig the htaccess or robots.txt OR have the programmer look into how Joomla works more to stop this happening. 3. Any Joomla experts out there had the same experience with "hidden" pages showing when you type site:www.domain.com into Google. I will look forward to your input! 🙂
Intermediate & Advanced SEO | | JohnW-UK0 -
301 Re-direct Implementation & Its Possible Aftermaths
Hi all, I'm currently working on a domain that seems to be 'unofficially' blacklisted by Google. The reason behind my belief are, Ranking process of KW became stagnant. Current crawling and indexing rate has been decreased. Site performance deteriorate after every Search engine update or major data refreshes. And few major indications pointing out that search engines might started doubting its authority. The site is live n running for about 10+ yr and consists of 6000+ pages out of which 5000+ pages are indexed. The site also have some serious issues like, The site has been 2 times penalized by Google. The link ratio & inbound link quality of the site is quite unnatural (mostly directory links, links form spammy sites, bad-neighborhood links etc. ) The site is in flat file and not CMS, thus making it extremely difficult to maintain and update it. Due to the above reasons I was thinking of implementing 301 re-direction. I would like to redirect this poor performing existing domain to a new fresh one keeping the URL structure and files same and maintaining 1:1 redirection rules. I've read an awesome article by Danny Dover on 301 Re direction of a site here in SEOMOZ. It seems that if any one follow the steps mentioned there can actually get benefited by the overall re direction process. Now I'd like know your suggestion about following points: 1. Considering the factors that I've stated, do you think that it would be good to go with this re direction idea? 2. If 301 is implemented then what can be its immediate effects on current rankings and site performance? 3. Assuming that the ranks drowned or gets completely vanished from SERP, after what approx time period can be regain back? 4. Any other suggestion that might help me out to better understand the situation.
Intermediate & Advanced SEO | | ITRIX0 -
Should I robots block this directory?
There's about 43k pages indexed in this directory, and while helpful to end users, I don't see it being a great source of unique content for search engines. Would you robots block or meta noindex nofollow these pages in the /blissindex/ directory? ie. http://www.careerbliss.com/blissindex/petsmart-index-980481/ http://www.careerbliss.com/blissindex/att-index-1043730/ http://www.careerbliss.com/blissindex/facebook-index-996632/
Intermediate & Advanced SEO | | CareerBliss0