SEO Best Practices regarding Robots.txt disallow
-
I cannot find hard and fast direction about the following issue:
It looks like the Robots.txt file on my server has been set up to disallow "account" and "search" pages within my site, so I am receiving warnings from the Google Search console that URLs are being blocked by Robots.txt. (Disallow: /Account/ and Disallow: /?search=). Do you recommend unblocking these URLs?
I'm getting a warning that over 18,000 Urls are blocked by robots.txt. ("Sitemap contains urls which are blocked by robots.txt"). Seems that I wouldn't want that many urls blocked. ?
Thank you!!
-
mmm it depends.
it's really hard for me to answer without knowing your site but I would say that you're in the good direction. You want to provide google more ways to reach your quality content.
Now do you have any other page that is bringing bots there via a normal user navigation or is it all search driven?
While google can crawl pages that discovered via internal/external links it can't reproduce searches by typing in your nav bar, so I doubt those pages should be extremely valuable unless you link to them somehow. In that case you may want to keep google crawling them.
A different thing would be if you want to "index" them, as being searches they are probably aggregating different information already present on the site. For indexation purposes you may want to keep them out of the index while still allowing the bot to run through them.
Again beware of the crawl budget, you don't want google to be wandering around millions of search results instead of your money pages, unless you're able to let them crawl only a sub portion of that.
I hope this made sense
-
Thank you for your response! I'm going to do a bit more research but I think I will disallow "account", but unblock "search". The search feature on my site pulls up quality content, so seems like I would want that to be crawled. Does this sound logical to you?
-
That could be completely normal. Google sends a warning because you're giving conflicting directions as you are preventing them to crawl pages (via robots) you asked them to index (via sitemap).
They do not know how important those pages may be for you so you are the one that needs to assess what to do net.
Are those pages important for you? Do you want them to be in the index? if that's the case change your robots.txt rule, if not then remove them from the sitemap.
About the previous answer robots text is not used to block hackers but quite the opposite. Hackers can easily find via the robots txt which are the pages you'd like to block and visit them as they may be key pages (ex. wp-admin), but let's not focus on that as hackers have so many ways to find core pages that it's not the topic. Robots txt is normally used to avoid duplication issues and to prevent google from crawling low value pages and waste crawl budget.
-
Typically, you only want robots.txt to block access points that would allow hackers into your site like an admin page (e.g. www.examplesite.com/admin/). You definitely don't want it blocking your whole site. A developer or webmaster would be better at speaking to the specifics, but that's the quick, high-level answer.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Regarding SEO Structured Data
1. Should we add organization schema on all pages of the website OR just homepage? 2. What is the best practice for catalog page schema as every website is following a different pattern?
Intermediate & Advanced SEO | | Rajesh.Prajapati1 -
Best Practices for Title Tags for Product Listing Page
My industry is commercial real estate in New York City. Our site has 300 real estate listings. The format we have been using for Title Tags are below. This probably disastrous from an SEO perspective. Using number is a total waste space. A few questions:
Intermediate & Advanced SEO | | Kingalan1
-Should we set listing not no index if they are not content rich?
-If we do choose to index them, should we avoid titles listing Square Footage and dollar amounts?
-Since local SEO is critical, should the titles always list New York, NY or Manhattan, NY?
-I have red that titles should contain some form of branding. But our company name is Metro Manhattan Office Space. That would take up way too much space. Even "Metro Manhattan" is long. DO we need to use the title tag for branding or can we just focus on a brief description of page content incorporating one important phrase? Our site is: w w w . m e t r o - m a n h a t t a n . c o m <colgroup><col width="405"></colgroup>
| Turnkey Flatiron Tech Space | 2,850 SF $10,687/month | <colgroup><col width="405"></colgroup>
| Gallery, Office Rental | Midtown, W. 57 St | 4441SF $24055/month | <colgroup><col width="405"></colgroup>
| Open Plan Loft |Flatiron, Chelsea | 2414SF $12,874/month | <colgroup><col width="405"></colgroup>
| Tribeca Corner Loft | Varick Street | 2267SF $11,712/month | <colgroup><col width="405"></colgroup>
| 275 Madison, LAW, P7, 3,252SF, $65 - Manhattan, New York |0 -
What is the best structure for paginating comment structures on pages to preserve the maximum SEO juice?
You have a full webpage with a great amount of content, images & media. This is a social blogging site where other members can leave their comments and reactions to the article. Over time there are say 1000 comments on this page. So we set the canonical URL, and use Rel (Prev & Next) to tell the bots that the next subsequent block of 100 comments is attributed to the primary URL. Or... We allow the newest 10 comments to exist on the primary URL, with a "see all" comments link that refers to a new URL, and that is where the rest of the comments are paginated. Which option does the community feel would be most appropriate and would adhere to the best practices for managing this type of dynamic comment growth? Thanks
Intermediate & Advanced SEO | | HoloGuy0 -
H1 tags and keywords for subpages, is it best practice to reuse the keywords?
So let's say I have a parent page for shoes, and I have subpages for dress shoes, work shoes, play shoes, then inside each of those pages I have dress shoe cleaning, dress shoe repair, same for work and play shoes. Would it be ok to use h1 tags like this: Shoes > Dress Shoes > Dress Shoe Cleaning Dress Shoe Repair Work Shoes > Work Shoe Cleaning Work Shoe Repair Play Shoes > Play Shoe Cleaning Play Shoe Repair Would these be considered duplicate h1 tags since cleaning and repair are used for each subpage? In certain niche companies, it's rather difficult to use synonyms for keywords. Or is it ok to just keep things simple and use Shoes > Dress Shoes > Cleaning and so on? Especially since we have urls and breadcrumbs that are structured nicely using keywords, for this example both breadcrumbs and urls read like sitename.com/shoes/dress-shoes/cleaning. Any advice?
Intermediate & Advanced SEO | | Deacyde0 -
Joomla SEO
With so many articles on the web talking about how difficult Joomla is to work with in regards to SEO, I'm curious as to what techniques / changes you guys make when using Joomla with your SEO / inbound practices? Any extensions that you love? An extensions that you hate?
Intermediate & Advanced SEO | | DougHoltOnline0 -
How to lay off your SEO compnay?
I have decided to replace my seo company. The pint is this company has been partly my developer too. So he has set up a demo server of my website. 1- Should I be worried about duplicate material when I end my cooperation with this company(The demo server) 2- Should I be worried that if they do not like it, they go and delete all the submitted materials and destroy my pages rankings? Thanks all
Intermediate & Advanced SEO | | AlirezaHamidian0 -
Best practice for removing pages
I've got some crappy pages that I want to delete from a site. I've removed all the internal links to those pages and resubmitted new site maps that don't show the pages anymore, however the pages still index in search (as you would expect). My question is, what's the best practice for removing these pages? Should I just delete them and be done with it or make them 301 re-direct to a nicer generic page until they are removed from the search results?
Intermediate & Advanced SEO | | PeterAlexLeigh0 -
Search Engine Blocked by robots.txt for Dynamic URLs
Today, I was checking crawl diagnostics for my website. I found warning for search engine blocked by robots.txt I have added following syntax to robots.txt file for all dynamic URLs. Disallow: /*?osCsid Disallow: /*?q= Disallow: /*?dir= Disallow: /*?p= Disallow: /*?limit= Disallow: /*review-form Dynamic URLs are as follow. http://www.vistastores.com/bar-stools?dir=desc&order=position http://www.vistastores.com/bathroom-lighting?p=2 and many more... So, Why should it shows me warning for this? Does it really matter or any other solution for these kind of dynamic URLs.
Intermediate & Advanced SEO | | CommercePundit0