Robots.txt - Googlebot - Allow... what's it for?
-
Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke
User-Agent: Googlebot
Allow: /.js
Allow: /.css
-
Thanks Tom - that's very useful - appreciated - and thanks also Clever PhD re: the robots.txt tester info - Luke
-
Just as a follow-up to Tom's great post. If you were wanting to test a robots.txt setup, especially if you were using a wildcard or using an allow combined with a disallow, Google Search Console under the Crawl section has a robots.txt Tester. You will see your most recent robots.txt file there that Google has a copy of. You can then modify that version and then enter a URL at the bottom to see if everything is set correctly or not. It is pretty handy, especially if you have a big robots.txt file. Note that this tool does not change how Google crawls your site or your robots.txt file, it is just for testing. Once you find the configuration that works, you would still need to update the robots.txt on your server.
-
Hi Luke
As you have correctly assumed, that particular robots command would be pointless.
The Googlebot does follow allow commands (while other ones do not), but it should only be used if it is an exception to a disallow rule.
So, for example, if you had a rule that blocked pages within a sub-directory, with:
Disallow: /example/*
You could create an allow rule that indexes a specific page within that directory to be indexed, like:
Allow: /example/page.html
Couple of things to point out here. "At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule." (Google Source). In this example, because the more specific rule is the allow rule, that will prevail. It is also best practice to put your "allow" rules at the top of the robots.txt file.
But in your example, if they have allow rules for JS and CSS files without having disavow rules for those directories/paths etc - it's a waste of space. Google will attempt to crawl anything it can by default - unless you disavow access.
TL;DR - You don't need to proactively tell Google to crawl CSS and JS - it will by default.
Hope this helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SSL and robots.txt question - confused by Google guidelines
I noticed "Don’t block your HTTPS site from crawling using robots.txt" here: http://googlewebmastercentral.blogspot.co.uk/2014/08/https-as-ranking-signal.html Does this mean you can't use robots.txt anywhere on the site - even parts of a site you want to noindex, for example?
Intermediate & Advanced SEO | | McTaggart0 -
My site has a loft of leftover content that's irrelevant to the main business -- what should I do with it?
Hi Moz! I'm working on a site that has thousands of pages of content that are not relevant to the business anymore since it took a different direction. Some of these pages still get a lot of traffic. What should I do with them? 404? Keep them? Redirect? Are these pages hurting rankings for the target terms? Thanks for reading!
Intermediate & Advanced SEO | | DA20130 -
Duplicate content when changing a site's URL due to algorithm penalty
Greetings A client was hit by penguin 2.1, my guess is that this was due to linkbuilding using directories. Google webmaster tools has detected about 117 links to the site and they are all from directories. Furthermore, the anchor texts are a bit too "perfect" to be natural, so I guess this two factors have earned the client's site an algorithm penalty (no manual penalty warning has been received in GWT). I have started to clean some of the backlinks, on Oct the 11th. Some of the webmasters I asked complied with my request to eliminate backlinks, some didn´t, I disavowed the links from the later. I saw some improvements on mid october for the most important KW (see graph) but ever since then the rankings have been falling steadily. I'm thinking about giving up on the domain name and just migrating the site to a new URL. So FINALLY MY QUESTION IS: if I migrate this 6-page site to a new URL, should I change the content completely ? I mean, if I just copy paste the content of the curent site into a new URL I will incur in dpolicate content, correct?. Is there some of the content I can copy ? or should I just start from scratch? Cheers hRggeNE
Intermediate & Advanced SEO | | Masoko-T0 -
Does anyone know why my website's domain authority has dropped from 51 to 49
However this does not seem to be in isolation. All of my competitors websites have taken a similar 1 or 2 points hit. I am thinking that as an industry we may have been affected by a mutual linking site being taken down, redesigned or just loosing its own domain authority. We do rank well for our keywords and we have been on a continual rise since I took over in January, we do a little bit of a guest blogging and I am trying to build links to the site but I am doing it slowly. Would anyone else have an idea on what has happened that would cause 4 sites in the same industry to take a 1 or 2 point hit? Thanks, Emmet
Intermediate & Advanced SEO | | CertificationEU1 -
If i disallow unfriendly URL via robots.txt, will its friendly counterpart still be indexed?
Our not-so-lovely CMS loves to render pages regardless of the URL structure, just as long as the page name itself is correct. For example, it will render the following as the same page: example.com/123.html example.com/dumb/123.html example.com/really/dumb/duplicative/URL/123.html To help combat this, we are creating mod rewrites with friendly urls, so all of the above would simply render as example.com/123 I understand robots.txt respects the wildcard (*), so I was considering adding this to our robots.txt: Disallow: */123.html If I move forward, will this block all of the potential permutations of the directories preceding 123.html yet not block our friendly example.com/123? Oh, and yes, we do use the canonical tag religiously - we're just mucking with the robots.txt as an added safety net.
Intermediate & Advanced SEO | | mrwestern0 -
What's the best internal linking strategy for articles and on-site resources?
We recently added an education center to our site with articles and information about our products and industry. What is the best way to link to and from that content? There are two options I'm considering: Link to articles from category and subcategory pages under a section called "related articles" and link back to these category and subcategory pages from the articles: category page <<--------->> education center article education center article <<---------->> subcategory page Only link from the articles to the category and subcategory pages: education center article ---------->> category page education center article ---------->> subcategory page Would #1 dilute the SEO value of the category and subcategory pages? I want to offer shoppers links to more information if they need it, but this may also take them away from the products. Has anyone tested this? Thanks!
Intermediate & Advanced SEO | | pbhatt0 -
Will blocking urls in robots.txt void out any backlink benefits? - I'll explain...
Ok... So I add tracking parameters to some of my social media campaigns but block those parameters via robots.txt. This helps avoid duplicate content issues (Yes, I do also have correct canonical tags added)... but my question is -- Does this cause me to miss out on any backlink magic coming my way from these articles, posts or links? Example url: www.mysite.com/subject/?tracking-info-goes-here-1234 Canonical tag is: www.mysite.com/subject/ I'm blocking anything with "?tracking-info-goes-here" via robots.txt The url with the tracking info of course IS NOT indexed in Google but IT IS indexed without the tracking parameters. What are your thoughts? Should I nix the robots.txt stuff since I already have the canonical tag in place? Do you think I'm getting the backlink "juice" from all the links with the tracking parameter? What would you do? Why? Are you sure? 🙂
Intermediate & Advanced SEO | | AubieJon0 -
Disallow my store in robots.txt?
Should I disallow my store directory in robots.txt? Here is the URL: https://www.stdtime.com/store/ Here are my reasons for suggesting this: SEOMOZ finds crawl "errors" in there that I don't care about I don't think I care if the search engines index those pages I only have one product, and it is not an impulse buy My product has a 60 day sales cycle, so price is less important than features
Intermediate & Advanced SEO | | raywhite0