Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
SEO Best Practices regarding Robots.txt disallow
-
I cannot find hard and fast direction about the following issue:
It looks like the Robots.txt file on my server has been set up to disallow "account" and "search" pages within my site, so I am receiving warnings from the Google Search console that URLs are being blocked by Robots.txt. (Disallow: /Account/ and Disallow: /?search=). Do you recommend unblocking these URLs?
I'm getting a warning that over 18,000 Urls are blocked by robots.txt. ("Sitemap contains urls which are blocked by robots.txt"). Seems that I wouldn't want that many urls blocked. ?
Thank you!!
-
mmm it depends.
it's really hard for me to answer without knowing your site but I would say that you're in the good direction. You want to provide google more ways to reach your quality content.
Now do you have any other page that is bringing bots there via a normal user navigation or is it all search driven?
While google can crawl pages that discovered via internal/external links it can't reproduce searches by typing in your nav bar, so I doubt those pages should be extremely valuable unless you link to them somehow. In that case you may want to keep google crawling them.
A different thing would be if you want to "index" them, as being searches they are probably aggregating different information already present on the site. For indexation purposes you may want to keep them out of the index while still allowing the bot to run through them.
Again beware of the crawl budget, you don't want google to be wandering around millions of search results instead of your money pages, unless you're able to let them crawl only a sub portion of that.
I hope this made sense
-
Thank you for your response! I'm going to do a bit more research but I think I will disallow "account", but unblock "search". The search feature on my site pulls up quality content, so seems like I would want that to be crawled. Does this sound logical to you?
-
That could be completely normal. Google sends a warning because you're giving conflicting directions as you are preventing them to crawl pages (via robots) you asked them to index (via sitemap).
They do not know how important those pages may be for you so you are the one that needs to assess what to do net.
Are those pages important for you? Do you want them to be in the index? if that's the case change your robots.txt rule, if not then remove them from the sitemap.
About the previous answer robots text is not used to block hackers but quite the opposite. Hackers can easily find via the robots txt which are the pages you'd like to block and visit them as they may be key pages (ex. wp-admin), but let's not focus on that as hackers have so many ways to find core pages that it's not the topic. Robots txt is normally used to avoid duplication issues and to prevent google from crawling low value pages and waste crawl budget.
-
Typically, you only want robots.txt to block access points that would allow hackers into your site like an admin page (e.g. www.examplesite.com/admin/). You definitely don't want it blocking your whole site. A developer or webmaster would be better at speaking to the specifics, but that's the quick, high-level answer.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If my website do not have a robot.txt file, does it hurt my website ranking?
After a site audit, I find out that my website don't have a robot.txt. Does it hurt my website rankings? One more thing, when I type mywebsite.com/robot.txt, it automatically redirect to the homepage. Please help!
Intermediate & Advanced SEO | | binhlai0 -
Large robots.txt file
We're looking at potentially creating a robots.txt with 1450 lines in it. This will remove 100k+ pages from the crawl that are all old pages (I know, the ideal would be to delete/noindex but not viable unfortunately) Now the issue i'm thinking is that a large robots.txt will either stop the robots.txt from being followed or will slow our crawl rate down. Does anybody have any experience with a robots.txt of that size?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Best practice for expandable content
We are in the middle of having new pages added to our website. On our website we will have a information section containing various details about a product, this information will be several paragraphs long. we were wanting to show the first paragraph and have a read more button to show the rest of the content that is hidden. Whats googles view on this, is this bad for seo?
Intermediate & Advanced SEO | | Alexogilvie0 -
Robots.txt, does it need preceding directory structure?
Do you need the entire preceding path in robots.txt for it to match? e.g: I know if i add Disallow: /fish to robots.txt it will block /fish
Intermediate & Advanced SEO | | Milian
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anything But would it block?: en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything (taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier! As basically I'm wanting to block many URL that have BTS- in such as: http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybob But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as: http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingy Thanks for listening0 -
How to Disallow Tag Pages With Robot.txt
Hi i have a site which i'm dealing with that has tag pages for instant - http://www.domain.com/news/?tag=choice How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed. Any suggestions? Cheers, Mark
Intermediate & Advanced SEO | | monster990 -
If you have an unlimited SEO budget, what would you do?
Here's a bit of background information: I've achieved the targets and is now being offered what is essentially an unlimited budget. I have a nice list of ideas but thought I would the brilliant people here at the SEOMOZ community what they would do. So as to promote as much response as possible, I'm going to keep my list to myself for now. And by "SEO", I mean I can do things like content strategy, blogging, infographics, etc. Shoot away!
Intermediate & Advanced SEO | | andrep0 -
Robots.txt & url removal vs. noindex, follow?
When de-indexing pages from google, what are the pros & cons of each of the below two options: robots.txt & requesting url removal from google webmasters Use the noindex, follow meta tag on all doctor profile pages Keep the URLs in the Sitemap file so that Google will recrawl them and find the noindex meta tag make sure that they're not disallowed by the robots.txt file
Intermediate & Advanced SEO | | nicole.healthline0 -
Does capitalization matter for SEO?
Two places capitalization comes into play: (1) on-page use (title, h1, body text, img alt text, etc) (2) external anchor text I didn't think it mattered from Google's point of view for on-page usage (is this correct?) but I notice that OpenSiteExplorer' s 'anchor text distribution' tab shows different counts for the same keyword if it's capitalized in different ways (eg seomoz.org is listed separate from SEOmoz.org). Is that just OSE or does Google treat the keyword/phrase different based on its capitalization, too? And if so, then should I be creating external links to my site with the 'regular' and 'Capitalized' versions of my key phrases?
Intermediate & Advanced SEO | | scanlin1