Robots.txt Blocking - Best Practices
-
Hi All,
We have a web provider who's not willing to remove the wildcard line of code blocking all agents from crawling our client's site (user-agent: *, Disallow: /). They have other lines allowing certain bots to crawl the site but we're wondering if they're missing out on organic traffic by having this main blocking line. It's also a pain because we're unable to set up Moz Pro, potentially because of this first line.
We've researched and haven't found a ton of best practices regarding blocking all bots, then allowing certain ones. What do you think is a best practice for these files?
Thanks!
User-agent: * Disallow: / User-agent: Googlebot Disallow: Crawl-delay: 5 User-agent: Yahoo-slurp Disallow: User-agent: bingbot Disallow: User-agent: rogerbot Disallow: User-agent: * Crawl-delay: 5 Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp
-
Thanks for taking the time to respond in depth, GreenStone. We appreciate the advice and have passed your response along to the web hosting company (along with a frustrated email) explaining they're not adhering to anyone's best practices. Hopefully this will convince them!
-
Thanks, Dmitrii for your response! From our research we've seen similar recommendations and it helps to have more evidence to back it up. Hopefully these guys will give in a bit!
-
Completely agree, I really wouldn't want to host my stuff with a company that can't figure out what really the best practices are ;-). This is very well layed out why you shouldn't want to set up your robots.txt like it is right now.
-
In general, I definitely wouldn't recommend the way the web-provider is handling this.
- Disallowing all while adding exceptions should never be the norm. Allowing all to crawl while adding exceptions for other crawlers aside from google would be best practice generally,
- It makes a lot more sense to just allow crawlers full access, and then add crawl delays for non google crawlers, in addition to disallowing those specific sub-folders: Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp.
- Googlebot Disallow: Crawl-delay: 5, does not do you any good, as google does not obey these commands. Only Search Console can control this.
- You can test what is visible to googlebot within search console's "robots" subsection, in order to verify what they can access.
- Disallowing all while adding exceptions should never be the norm. Allowing all to crawl while adding exceptions for other crawlers aside from google would be best practice generally,
-
Here is another video from Matt - https://www.youtube.com/watch?v=I2giR-WKUfY
Lots of good points there too.
-
Hi.
Super weird client - that's for sure.
User-agent: * Disallow: /
Every bot will be blocked off! how in the world are they ranking?
watch that video, there are good ideas of bot and crawlers controlling. As well as you can consider that as best practices. And yes, what they have now is ridiculous.
https://moz.com/community/q/should-we-use-google-s-crawl-delay-setting
Here is a q/a about crawler delays. As far as I know Google ignores delays anyway, plus there is nothing good about it anyway.
Hope this helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What is internal like structure best for website
I want to construct website internal like structure better, can you advise me what's model architecture to build menu, navigation, build link, hub content will good for audience and search engine. Thank your advise
Intermediate & Advanced SEO | | dunghv360 -
Robots.txt & Duplicate Content
In reviewing my crawl results I have 5666 pages of duplicate content. I believe this is because many of the indexed pages are just different ways to get to the same content. There is one primary culprit. It's a series of URL's related to CatalogSearch - for example; http://www.careerbags.com/catalogsearch/result/index/?q=Mobile I have 10074 of those links indexed according to my MOZ crawl. Of those 5349 are tagged as duplicate content. Another 4725 are not. Here are some additional sample links: http://www.careerbags.com/catalogsearch/result/index/?dir=desc&order=relevance&p=2&q=Amy
Intermediate & Advanced SEO | | Careerbags
http://www.careerbags.com/catalogsearch/result/index/?color=28&q=bellemonde
http://www.careerbags.com/catalogsearch/result/index/?cat=9&color=241&dir=asc&order=relevance&q=baggallini All of these links are just different ways of searching through our product catalog. My question is should we disallow - catalogsearch via the robots file? Are these links doing more harm than good?0 -
Best Practice for setting up expert author contributing to Multiple Sites?
If a single author contributes to multiple sites, should each site have its own author page (tying to the same single gg+ account)? Ex. One author > one gg+ account > multiple author pages (one per site) Or, should all sites publishing his content link to a single author page/bio on a single, main site? Ex. One author > one gg+ account > a single author page on one site (all other sites link to this author page) In this event, where would the 'contributor to' link point for the additional sites he is contributing to, the homepage? Thanks!
Intermediate & Advanced SEO | | seagreen0 -
URL blocked
Hi there, I have recently noticed that we have a link from an authoritative website, however when I looked at the code, it looked like this: <a <span="">href</a><a <span="">="http://www.mydomain.com/" title="blocked::http://www.mydomain.com/">keyword</a> You will notice that in the code there is 'blocked::' What is this? has it the same effect as a nofollow tag? Thanks for any help
Intermediate & Advanced SEO | | Paul780 -
What are best SEO practices for product pages of unique items when the item is no longer available?
Hello, my company sells used cars though a website. Each vehicle page contains photos and details of the unit, but once the vehicle is sold, all the contents are replaced by a simple text like "this vehicle is not available anymore".
Intermediate & Advanced SEO | | Darioz
Title of the page also change to a generic one.
URL remains the same. I doubt this is the correct way of doing, but I cannot understand what method would be better. The improvement I am considering for pages of no longer available vehicles is this: keep the page alive but with reduced vehicle details, a text like: this vehicles is not available anymore and automatic recommendations for similar items. What do you think? Is this a good practice or do you suggest anything different? Also, should I put a NOINDEX tag on the expired vehicles pages? Thank you in advance for your help.0 -
Google Places, Multiple locations best practice
What is the best practice with having multiple locations in Google Places. Does having multiple Google Places set up for each business have a big effect on local rankings for the individual areas? Should I use the home page for the website listed on each page or is it better to have a specific landing page for each Google Places listing? Any other tips? Thanks, Daniel
Intermediate & Advanced SEO | | iSenseWebSolutions0 -
Best SEO Practices for Top-Level Navigation Structure
OK - First of all, thank you to those of you who view and take the time to answer our question. We are currently in the middle of re-designing our golf packages website, and we're trying to decide the best way to structure our Main Navigation for maximum SEO benefit while keeping user experience in mind. The top key phrases we are currently targeting: 1) Myrtle Beach Golf 2) Myrtle Beach Golf Packages You can find the current navigation structure we have come up with here: http://www.myrtlebeachsitemasters.com/index2.html So our question is this: We have subdivisions of: Golf Packages, Accommodations, Golf Courses Is it in our best interest to: A) Get rid of the subdivisions and consolidate them to one page? or B) Simply "NoFollow" the subdivisions within the Main Navigation? We are concerned about the subdivisons for 2 reasons: Too many internal links in Main Navigation The "first link only" rule with Google affecting our additional internal links on existing pages. THANK YOU again to those of you who take the time to answer this question. We really appreciate any clarification on this issue.
Intermediate & Advanced SEO | | JamesO0 -
Best multi-language site strategy?
When reading about multi-language site structure, general knowledge says that there are 2 right ways of doing it right: Assign one domain per region/ language: www.domain.fr www.domain.de www.domain.co.uk ... If a country has more than one language, such as Switzerland, you can create folders for those languages: www.domain.ch/fr - in french www.domain.ch/de - in german Have a unique domain www.domain.com for the whole site and create folders for language region: www.domina.com/fr www.domain.com/uk ... If a language is spoken in more than one country, you can create subfolders www.domain.com/fr-ch - french in switzerland www.domain.com/de-ch - german in switzerland At first sight, it seems that option 1 is the right one. However, sites such as www.apple.com are using option 2. I am unable to decide... what would you recommend? Any objective criteria?
Intermediate & Advanced SEO | | hockerty0