Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
SEO Best Practices regarding Robots.txt disallow
-
I cannot find hard and fast direction about the following issue:
It looks like the Robots.txt file on my server has been set up to disallow "account" and "search" pages within my site, so I am receiving warnings from the Google Search console that URLs are being blocked by Robots.txt. (Disallow: /Account/ and Disallow: /?search=). Do you recommend unblocking these URLs?
I'm getting a warning that over 18,000 Urls are blocked by robots.txt. ("Sitemap contains urls which are blocked by robots.txt"). Seems that I wouldn't want that many urls blocked. ?
Thank you!!
-
mmm it depends.
it's really hard for me to answer without knowing your site but I would say that you're in the good direction. You want to provide google more ways to reach your quality content.
Now do you have any other page that is bringing bots there via a normal user navigation or is it all search driven?
While google can crawl pages that discovered via internal/external links it can't reproduce searches by typing in your nav bar, so I doubt those pages should be extremely valuable unless you link to them somehow. In that case you may want to keep google crawling them.
A different thing would be if you want to "index" them, as being searches they are probably aggregating different information already present on the site. For indexation purposes you may want to keep them out of the index while still allowing the bot to run through them.
Again beware of the crawl budget, you don't want google to be wandering around millions of search results instead of your money pages, unless you're able to let them crawl only a sub portion of that.
I hope this made sense
-
Thank you for your response! I'm going to do a bit more research but I think I will disallow "account", but unblock "search". The search feature on my site pulls up quality content, so seems like I would want that to be crawled. Does this sound logical to you?
-
That could be completely normal. Google sends a warning because you're giving conflicting directions as you are preventing them to crawl pages (via robots) you asked them to index (via sitemap).
They do not know how important those pages may be for you so you are the one that needs to assess what to do net.
Are those pages important for you? Do you want them to be in the index? if that's the case change your robots.txt rule, if not then remove them from the sitemap.
About the previous answer robots text is not used to block hackers but quite the opposite. Hackers can easily find via the robots txt which are the pages you'd like to block and visit them as they may be key pages (ex. wp-admin), but let's not focus on that as hackers have so many ways to find core pages that it's not the topic. Robots txt is normally used to avoid duplication issues and to prevent google from crawling low value pages and waste crawl budget.
-
Typically, you only want robots.txt to block access points that would allow hackers into your site like an admin page (e.g. www.examplesite.com/admin/). You definitely don't want it blocking your whole site. A developer or webmaster would be better at speaking to the specifics, but that's the quick, high-level answer.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Faceted Navigation URLs Best Practices
Hi, We are developing new Products Pages with faceted filters. You can see it here: https://www.viatrading.com/wholesale-products/ We have a feature allowing to Order By and Group By, which alters the order of all products. There will also be the option to view Products as a table, which will contain same products but with different design and maybe slightly different content of each product. All this will happen without changing the URL, https://www.viatrading.com/all/ Is this the best practice? Thanks,
Intermediate & Advanced SEO | | viatrading10 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
Onsite SEO vs Offsite SEO
Hey I know the importance of both onsite & offsite, primarily with regard to outreach/content/social. One thing I am trying to determine at the moment, is how much do I invest in offsite. My current focus is to improve our onpage content on product pages, which is taking some time as we have a small team. But I also know our backlinks need to improve. I'm just struggling on where to spend my time. Finish the onsite stuff by section first, or try to do a bit of both onsite/offsite at the same time?
Intermediate & Advanced SEO | | BeckyKey1 -
Image URLs - best practice
Hi - I'm assuming image URL best practice follows same principles as non image URLs (not too many files and so on) - I notice alot of web devs putting photos in subdomains, so wonder if I'm missing something (I usually avoid subdomains like the plague)!
Intermediate & Advanced SEO | | McTaggart1 -
Best Practices for Homepage Title Tag
Hi, I would like to know if there is any update about the best practices for the homepage title tag. I mean, a couple of years ago, it was still working placing main keywords in the homepage title tag. But since the last google SERP update, the number of characters that are being shown were reduced, and now we try to work with 55 and 56 characters. That has reduced our capacity of including many keywords on the title tag. Besides, search engines are smarter now to choose the correct inner page to show in SERP. But I am wondering if the Homepage Title should have a branded orientation or should include main keywords, cause it is still working that strategy. I would appreciatte any update in this issue. Thank you!
Intermediate & Advanced SEO | | teconsite0 -
Slug best practices?
Hello, my team is trying to understand how to best construct slugs. We understand they need to be concise and easily understandable, but there seem to be vast differences between the three examples below. Are there reasons why one might be better than the others? http://www.washingtonpost.com/news/morning-mix/wp/2014/06/20/bad-boys-yum-yum-violent-criminal-or-not-this-mans-mugshot-is-heating-up-the-web/ http://hollywoodlife.com/2014/06/20/jeremy-meeks-sexy-mug-shot-felon-viral/ http://www.tmz.com/2014/06/19/mugshot-eyes-felon-sexy/
Intermediate & Advanced SEO | | TheaterMania0 -
Should comments and feeds be disallowed in robots.txt?
Hi My robots file is currently set up as listed below. From an SEO point of view is it good to disallow feeds, rss and comments? I feel allowing comments would be a good thing because it's new content that may rank in the search engines as the comments left on my blog often refer to questions or companies folks are searching for more information on. And the comments are added regularly. What's your take? I'm also concerned about the /page being blocked. Not sure how that benefits my blog from an SEO point of view as well. Look forward to your feedback. Thanks. Eddy User-agent: Googlebot Crawl-delay: 10 Allow: /* User-agent: * Crawl-delay: 10 Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /rss/ Disallow: /comments/feed/ Disallow: /page/ Disallow: /date/ Disallow: /comments/ # Allow Everything Allow: /*
Intermediate & Advanced SEO | | workathomecareers0 -
Meta NoIndex tag and Robots Disallow
Hi all, I hope you can spend some time to answer my first of a few questions 🙂 We are running a Magento site - layered/faceted navigation nightmare has created thousands of duplicate URLS! Anyway, during my process to tackle the issue, I disallowed in Robots.txt anything in the querystring that was not a p (allowed this for pagination). After checking some pages in Google, I did a site:www.mydomain.com/specificpage.html and a few duplicates came up along with the original with
Intermediate & Advanced SEO | | bjs2010
"There is no information about this page because it is blocked by robots.txt" So I had added in Meta Noindex, follow on all these duplicates also but I guess it wasnt being read because of Robots.txt. So coming to my question. Did robots.txt block access to these pages? If so, were these already in the index and after disallowing it with robots, Googlebot could not read Meta No index? Does Meta Noindex Follow on pages actually help Googlebot decide to remove these pages from index? I thought Robots would stop and prevent indexation? But I've read this:
"Noindex is a funny thing, it actually doesn’t mean “You can’t index this”, it means “You can’t show this in search results”. Robots.txt disallow means “You can’t index this” but it doesn’t mean “You can’t show it in the search results”. I'm a bit confused about how to use these in both preventing duplicate content in the first place and then helping to address dupe content once it's already in the index. Thanks! B0