Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
OK to block /js/ folder using robots.txt?
-
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU)
But what if you have lots and lots of JS and you dont want to waste precious crawl resources?
Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc.
And the legacy versions show up in Google Webmaster Tools as 404s. For example:
http://www.discoverafrica.com/js/global_functions.js?v=1.1
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether?
Isn't that what robots.txt was made for?
Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks.
We're just trying to power our content and UX elegantly with javascript.
What do you guys say:
Obey Matt? Or run the javascript gauntlet?
-
Hey!
So, I listened to Matt's video. I see his point about wanting to crawl the JS files just in case something tricky is going on. Do understand that this is a risk you take. I don't see an issue blocking crawling of those files from a logical perspective, but if you or someone that takes over for you in the future does do something sneaky with JS and you are caught ... plus you have blacked access to the offending files ... it is going to take a lot more work to get back in good graces with them.
It's like a cop searching your car. You have every right to ban them from doing so, but if you have nothing to hide, why make trouble? Matt is right, banning crawling of these files is not going to save you much but if you think it's an issue, feel free. Just know that they might take it as a possible flag in the future.
Kate
-
Harald, it looks like the response you've quoted is from http://groups.google.com/a/googleproductforums.com/forum/#!category-topic/webmasters/crawling-indexing--ranking/9MGYEoROdkg, which is a question about a menu that has javascript. I think this poster has a slightly different question. I'll ask another associate to come on in and take a look.
-
Hi Discover,I think that whenever we access the web pages , we have seen number of times that there is run time error & they asking for debug. This error message is helpful for the developers only but not for the users.
I think that you should please refer to the following link:
The truth about non javascript
I hope that above content help to solve your query.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What do you do with product pages that are no longer used ? Delete/redirect to category/404 etc
We have a store with thousands of active items and thousands of sold items. Each product is unique so only one of each. All products are pinned and pushed online ... and then they sell and we have a product page for a sold item. All products are keyword researched and often can rank well for longtail keywords Would you :- 1. delete the page and let it 404 (we will get thousands) 2. See if the page has a decent PA, incoming links and traffic and if so redirect to a RELEVANT category page ? ~(again there will be thousands) 3. Re use the page for another product - for example a sold ruby ring gets replaces with ta new ruby ring and we use that same page /url for the new item. Gemma
Technical SEO | | acsilver0 -
Robots.txt in subfolders and hreflang issues
A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations: UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
Technical SEO | | lauralou82
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US. They have the following hreflang tags across all pages: We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously). Search Console says there are no hreflang tags at all. Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location. Any suggestions how we can remove UK listings from Google US and vice versa?0 -
Canonical homepage link uses trailing slash while default homepage uses no trailing slash, will this be an issue?
Hello, 1st off, let me explain my client in this case uses BigCommerce, and I don't have access to the backend like most other situations. So I have to rely on BG to handle certain issues. I'm curious if there is much of a difference using domain.com/ as the canonical url while BG currently is redirecting our domain to domain.com. I've been using domain.com/ consistently for the last 6 months, and since we switches stores on Friday, this issue has popped up and has me a bit worried that we'll loose somehow via link juice or overall indexing since this could confuse crawlers. Now some say that the domain url is fine using / or not, as per - https://moz.com/community/q/trailing-slash-and-rel-canonical But I also wanted to see what you all felt about this. What says you?
Technical SEO | | Deacyde0 -
2 sitemaps on my robots.txt?
Hi, I thought that I just could link one sitemap from my site's robots.txt but... I may be wrong. So, I need to confirm if this kind of implementation is right or wrong: robots.txt for Magento Community and Enterprise ...
Technical SEO | | Webicultors
Sitemap: http://www.mysite.es/media/sitemap/es.xml
Sitemap: http://www.mysite.pt/media/sitemap/pt.xml Thanks in advance,0 -
Google insists robots.txt is blocking... but it isn't.
I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch?
Technical SEO | | ahockley0 -
Oh no googlebot can not access my robots.txt file
I just receive a n error message from google webmaster Wonder it was something to do with Yoast plugin. Could somebody help me with troubleshooting this? Here's original message Over the last 24 hours, Googlebot encountered 189 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%. Recommended action If the site error rate is 100%: Using a web browser, attempt to access http://www.soobumimphotography.com//robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot. If your robots.txt is a static page, verify that your web service has proper permissions to access the file. If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure. If the site error rate is less than 100%: Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors. The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website. After you think you've fixed the problem, use Fetch as Google to fetch http://www.soobumimphotography.com//robots.txt to verify that Googlebot can properly access your site.
Technical SEO | | BistosAmerica0 -
Google Off/On Tags
I came across this article about telling google not to crawl a portion of a webpage, but I never hear anyone in the SEO community talk about them. http://perishablepress.com/press/2009/08/23/tell-google-to-not-index-certain-parts-of-your-page/ Does anyone use these and find them to be effective? If not, how do you suggest noindexing/canonicalizing a portion of a page to avoid duplicate content that shows up on multiple pages?
Technical SEO | | Hakkasan1 -
Should I set up a disallow in the robots.txt for catalog search results?
When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?
Technical SEO | | JordanJudson0