Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
SEO Best Practices regarding Robots.txt disallow
-
I cannot find hard and fast direction about the following issue:
It looks like the Robots.txt file on my server has been set up to disallow "account" and "search" pages within my site, so I am receiving warnings from the Google Search console that URLs are being blocked by Robots.txt. (Disallow: /Account/ and Disallow: /?search=). Do you recommend unblocking these URLs?
I'm getting a warning that over 18,000 Urls are blocked by robots.txt. ("Sitemap contains urls which are blocked by robots.txt"). Seems that I wouldn't want that many urls blocked. ?
Thank you!!
-
mmm it depends.
it's really hard for me to answer without knowing your site but I would say that you're in the good direction. You want to provide google more ways to reach your quality content.
Now do you have any other page that is bringing bots there via a normal user navigation or is it all search driven?
While google can crawl pages that discovered via internal/external links it can't reproduce searches by typing in your nav bar, so I doubt those pages should be extremely valuable unless you link to them somehow. In that case you may want to keep google crawling them.
A different thing would be if you want to "index" them, as being searches they are probably aggregating different information already present on the site. For indexation purposes you may want to keep them out of the index while still allowing the bot to run through them.
Again beware of the crawl budget, you don't want google to be wandering around millions of search results instead of your money pages, unless you're able to let them crawl only a sub portion of that.
I hope this made sense
-
Thank you for your response! I'm going to do a bit more research but I think I will disallow "account", but unblock "search". The search feature on my site pulls up quality content, so seems like I would want that to be crawled. Does this sound logical to you?
-
That could be completely normal. Google sends a warning because you're giving conflicting directions as you are preventing them to crawl pages (via robots) you asked them to index (via sitemap).
They do not know how important those pages may be for you so you are the one that needs to assess what to do net.
Are those pages important for you? Do you want them to be in the index? if that's the case change your robots.txt rule, if not then remove them from the sitemap.
About the previous answer robots text is not used to block hackers but quite the opposite. Hackers can easily find via the robots txt which are the pages you'd like to block and visit them as they may be key pages (ex. wp-admin), but let's not focus on that as hackers have so many ways to find core pages that it's not the topic. Robots txt is normally used to avoid duplication issues and to prevent google from crawling low value pages and waste crawl budget.
-
Typically, you only want robots.txt to block access points that would allow hackers into your site like an admin page (e.g. www.examplesite.com/admin/). You definitely don't want it blocking your whole site. A developer or webmaster would be better at speaking to the specifics, but that's the quick, high-level answer.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemaps: Best Practice
What should and what shouldn't go in the sitemap? In particular, pages like subscribe to our newsletter/ unsubscribe to our newsletter? Is there really any benefit in highlighting those pages to the SEs? Thanks for any advice/ anecdotes 🙂
Intermediate & Advanced SEO | | Fubra0 -
Best SEO Strategy for Badges & Awards.
Hello Moz Friends! I was wondering what the correct "SEO friendly" strategy is with badges and awards. We recently got BBB accredited and added their badge to the footer of the website. We also added a review badge from shopper approved to the footer. As I'm joining other communities, I see there's badges given to us. For example, Alignable. Great place for networking. They offer a badge that says "locals recommend us" or something. Should I embed these badges onto our website someplace? Should I create a page for just badges or place them in the footer or sidebar widgets? What the best SEO practice for this? Thank you!!
Intermediate & Advanced SEO | | LindsayE2 -
Best-practice URL structures with multiple filter combinations
Hello, We're putting together a large piece of content that will have some interactive filtering elements. There are two types of filters, topics and object types. The architecture under the hood constrains us so that everything needs to be in URL parameters. If someone selects a single filter, this can look pretty clean: www.domain.com/project?topic=firstTopic
Intermediate & Advanced SEO | | digitalcrc
or
www.domain.com/project?object=typeOne The problems arise when people select multiple topics, potentially across two different filter types: www.domain.com/project?topic=firstTopic-secondTopic-thirdTopic&object=typeOne-typeTwo I've raised concerns around the structure in general, but it seems to be too late at this point so now I'm scratching my head thinking of how best to get these indexed. I have two main concerns: A ton of near-duplicate content and hundreds of URLs being created and indexed with various filter combinations added Over-reacting to the first point above and over-canonicalizing/no-indexing combination pages to the detriment of the content as a whole Would the best approach be to index each single topic filter individually, and canonicalize any combinations to the 'view all' page? I don't have much experience with e-commerce SEO (which this problem seems to have the most in common with) so any advice is greatly appreciated. Thanks!0 -
"noindex, follow" or "robots.txt" for thin content pages
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great. I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
Intermediate & Advanced SEO | | khi50 -
Block in robots.txt instead of using canonical?
When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?
Intermediate & Advanced SEO | | YairSpolter0 -
Best practice for duplicate website content: same root domain name but different extension
Hi there I have a new client who has two websites: http://www.bayofislandsteambuilding.co.nz
Intermediate & Advanced SEO | | turnbullholdingsltd
http://www.bayofislandsteambuilding.org.nz They are the same in every regard apart from the domain extension (.co.nz & .org.nz) which is likely to be causing them issues with Google ranking given the huge amount of duplicate content. What is the best practice approach to fixing this? Normally, if I was starting from scratch, I would set one of the extensions as an alias which redirects to the main domain. Thanks in advance. Laurie0 -
E-commerce site, one product multiple categories best practice
Hi there, We have an e-commerce shopping site with over 8000 products and over 100 categories. Some sub categories belong to multiple categories - for example, A Christmas trees can be under "Gardening > Plants > Trees" and under "Gifts > Holidays > Christmas > Trees" The product itself (example: Scandinavian Xmas Tree) can naturally belong to both these categories as well. Naturally these two (or more) categories have different breadcrumbs, different navigation bars, etc. From an SEO point of view, to avoid duplicate content issues, I see the following options: Use the same URL and change the content of the page (breadcrumbs and menus) based on the referral path. Kind of cloaking. Use the same URL and display only one "main" version of breadcrumbs and menus. Possibly add the other "not main" categories as links to the category / product page. Use a different URL based on where we came from and do nothing (will create essentially the same content on different urls except breadcrumbs and menus - there's a possibiliy to change the category text and page title as well) Use a different URL based on where we came from with different menus and breadcrumbs and use rel=canonical that points to the "main" category / product pages This is a very interesting issue and I would love to hear what you guys think as we are finalizing plans for a new website and would like to get the most out of it. Thank you all!
Intermediate & Advanced SEO | | arikbar0 -
Could you use a robots.txt file to disalow a duplicate content page from being crawled?
A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO. I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?
Intermediate & Advanced SEO | | gregelwell0