Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Removing CSS & JS Files from Index
-
Hi,
Google has indexed a few .CSS and .JS files that belong to our WordPress plugins and themes. I had them blocked via robots, but realized this doesn't prevent indexation (and can likely hurt us since Google wants to access these files).
I've since removed the robots instructions, submitted a removal request via Search Console, but want to make sure they don't come back.
Is there a way to put a noindex tag within .CSS and .JS files? Or should I do something with .htaccess instead?
-
I figured .htaccess would be the best route. Thank you for researching and confirming. I appreciate it.
-
Hi Tim,
Assigning a noindex tag to these files will not block them, only prevent them from showing in SERPs. This is the intended goal and the reason I deleted my robots.txt file which prevented crawling.
-
There's quite a big difference between crawling directives, which block and indexing directives. This article by (former?) Moz user S_ebastian_ is a good foundation read.
This article at developers.google.com is a good second read. If I'm understanding it right, Google thinks in terms of crawling directives vs indexing / serving directives.
My attempt at <tl rl="">:</tl>
crawling = looking, using in any way :: controlled via robots.txt
indexing / serving = indexing, archiving, displaying snippets in results, etc :: controlled via html meta tags or web server htaccess (or similar for other web servers).
I'm not convinced yet, that asking for noindex via htaccess causes the same sort of grief that deny in robots.txt causes.
-
I would seriously think again when it comes to blocking/no-indexing your CSS and JS files - Google has in the past stated that if they cannot fully render your site properly then this could lead to poorer rankings.
You will also likely get notifications in your Search Console as errors for this too.
Check out this great article from July this year which goes into more details.
-
I haven't encountered undesirable .css or .js indexing myself (yet), but as you surmised, maybe this htaccess directive might be worth trying?
<filesmatch ".(txt|log|xml|css|js)$"="">Header set X-Robots-Tag "noindex"</filesmatch>
Google seems to support it
-
Unless I'm severely misreading the links provided, which I've read before, it seems Google is stating that they read, render, and sometimes index .CSS and .JS files. Here's an article written a week after the second article you posted.
The aforementioned WordPress plugin and theme files hosted on my server are indeed showing up in Google SERPs.
I do not want to prevent Googlebot from reaching these files as they're needed for optimal site performance, but I do want them to be no-indexed. Thus, I don't want robots.txt to prevent crawling, only indexing.
Let me know if I'm misunderstanding.
-
TL;DR - You're hesitated about problem that doesn't exist.
Googlebot doesn't index CSS or JS files. They index text files, HTML, PDF, DOC, XLS, etc. But doesn't index style sheets or javascript files.
All you need in WordPress is to create blank robots.txt file where WP is installed with this content:
User-agent: *
Disallow:
Sitemap: http://site/sitemap-file-name.xmlAnd that's all. This is explain many times:
http://googlewebmastercentral.blogspot.bg/2014/05/understanding-web-pages-better.html
http://googlewebmastercentral.blogspot.bg/2014/10/updating-our-technical-webmaster.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Indexed pages
Just started a site audit and trying to determine the number of pages on a client site and whether there are more pages being indexed than actually exist. I've used four tools and got four very different answers... Google Search Console: 237 indexed pages Google search using site command: 468 results MOZ site crawl: 1013 unique URLs Screaming Frog: 183 page titles, 187 URIs (note this is a free licence, but should cut off at 500) Can anyone shed any light on why they differ so much? And where lies the truth?
Technical SEO | Oct 30, 2016, 3:12 PM | muzzmoz1 -
Z-indexed content
I have some content on a page that I am not using any type of css hiding techniques, but I am using an image with a higher z-index in order to prevent the text from being seen until a user clicks a link to have the content scroll down. Are there any negative repercussions for doing this in regards to SEO?
Technical SEO | Sep 11, 2015, 3:32 PM | cokergroup0 -
Using the Google Remove URL Tool to remove https pages
I have found a way to get a list of 'some' of my 180,000+ garbage URLs now, and I'm going through the tedious task of using the URL removal tool to put them in one at a time. Between that and my robots.txt file and the URL Parameters, I'm hoping to see some change each week. I have noticed when I put URL's starting with https:// in to the removal tool, it adds the http:// main URL at the front. For example, I add to the removal tool:- https://www.mydomain.com/blah.html?search_garbage_url_addition On the confirmation page, the URL actually shows as:- http://www.mydomain.com/https://www.mydomain.com/blah.html?search_garbage_url_addition I don't want to accidentally remove my main URL or cause problems. Is this the right way this should look? AND PART 2 OF MY QUESTION If you see the search description in Google for a page you want removed that says the following in the SERP results, should I still go to the trouble of putting in the removal request? www.domain.com/url.html?xsearch_... A description for this result is not available because of this site's robots.txt – learn more.
Technical SEO | Jul 9, 2014, 6:27 AM | sparrowdog1 -
Remove html file extension and 301 redirects
Hi Recently I ask for some work done on my website from a company, but I am not sure what they've done is right.
Technical SEO | Apr 11, 2013, 10:03 AM | ulefos
What I wanted was html file extensions to be removed like
/ash-logs.html to /ash-logs
also the index.html to www.timports.co.uk
I have done a crawl diagnostics and have duplicate page content and 32 page title duplicates. This is so doing my head in please help This is what is in the .htaccess file <ifmodule pagespeed_module="">ModPagespeed on
ModPagespeedEnableFilters extend_cache,combine_css, collapse_whitespace,move_css_to_head, remove_comments</ifmodule> <ifmodule mod_headers.c="">Header set Connection keep-alive</ifmodule> <ifmodule mod_rewrite.c="">Options +FollowSymLinks -MultiViews</ifmodule> DirectoryIndex index.html RewriteEngine On
# Rewrite valid requests on .html files RewriteCond %{REQUEST_FILENAME}.html -f RewriteRule ^ %{REQUEST_URI}.html?rw=1 [L,QSA]
# Return 404 on direct requests against .html files RewriteCond %{REQUEST_URI} .html$
RewriteCond %{QUERY_STRING} !rw=1 [NC]
RewriteRule ^ - [R=404] AddCharset UTF-8 .html # <filesmatch “.(js|css|html|htm|php|xml|swf|flv|ashx)$”="">#SetOutputFilter DEFLATE #</filesmatch> <ifmodule mod_expires.c="">ExpiresActive On
ExpiresByType image/gif "access plus 1 years"
ExpiresByType image/jpeg "access plus 1 years"
ExpiresByType image/png "access plus 1 years"
ExpiresByType image/x-icon "access plus 1 years"
ExpiresByType image/jpg "access plus 1 years"
ExpiresByType text/css "access 1 years"
ExpiresByType text/x-javascript "access 1 years"
ExpiresByType application/javascript "access 1 years"
ExpiresByType image/x-icon "access 1 years"</ifmodule> <files 403.shtml="">order allow,deny allow from all</files> redirect 301 /PRODUCTS http://www.timports.co.uk/kiln-dried-logs
redirect 301 /kindling_firewood.html http://www.timports.co.uk/kindling-firewood.html
redirect 301 /about_us.html http://www.timports.co.uk/about-us.html
redirect 301 /log_delivery.html http://www.timports.co.uk/log-delivery.html redirect 301 /oak_boards_delivery.html http://www.timports.co.uk/oak-boards-delivery.html
redirect 301 /un_edged_oak_boards.html http://www.timports.co.uk/un-edged-oak-boards.html
redirect 301 /wholesale_logs.html http://www.timports.co.uk/wholesale-logs.html redirect 301 /privacy_policy.html http://www.timports.co.uk/privacy-policy.html redirect 301 /payment_failed.html http://www.timports.co.uk/payment-failed.html redirect 301 /payment_info.html http://www.timports.co.uk/payment-info.html1 -
De-indexed from Google
Hi Search Experts! We are just launching a new site for a client with a completely new URL. The client can not provide any access details for their existing site. Any ideas how can we get the existing site de-indexed from Google? Thanks guys!
Technical SEO | Nov 14, 2012, 2:20 PM | rikmon0 -
Removing URL Parentheses in HTACCESS
Im reworking a website for a client, and their current URLs have parentheses. I'd like to get rid of these, but individual 301 redirects in htaccess is not practical, since the parentheses are located in many URLs. Does anyone know an HTACCESS rule that will simply remove URL parantheses as a 301 redirect?
Technical SEO | Nov 26, 2012, 6:35 PM | JaredMumford0 -
How to tell if PDF content is being indexed?
I've searched extensively for this, but could not find a definitive answer. We recently updated our website and it contains links to about 30 PDF data sheets. I want to determine if the text from these PDFs is being archived by search engines. When I do this search http://bit.ly/rRYJPe (google - site:www.gamma-sci.com and filetype:pdf) I can see that the PDF urls are getting indexed, but does that mean that their content is getting indexed? I have read in other posts/places that if you can copy text from a PDF and paste it that means Google can index the content. When I try this with PDFs from our site I cannot copy text, but I was told that these PDFs were all created from Word docs, so they should be indexable, correct? Since WordPress has you upload PDFs like they are an image could this be causing the problem? Would it make sense to take the time and extract all of the PDF content to html? Thanks for any assistance, this has been driving me crazy.
Technical SEO | Dec 14, 2011, 8:06 PM | zazo0 -
Dynamically-generated .PDF files, instead of normal pages, indexed by and ranking in Google
Hi, I come across a tough problem. I am working on an online-store website which contains the functionlaity of viewing products details in .PDF format (by the way, the website is built on Joomla CMS), now when I search my site's name in Google, the SERP simply displays my .PDF files in the first couple positions (shown in normal .PDF files format: [PDF]...)and I cannot find the normal pages there on SERP #1 unless I search the full site domain in Google. I really don't want this! Would you please tell me how to figure the problem out and solve it. I can actually remove the corresponding component (Virtuemart) that are in charge of generating the .PDF files. Now I am trying to redirect all the .PDF pages ranking in Google to a 404 page and remove the functionality, I plan to regenerate a sitemap of my site and submit it to Google, will it be working for me? I really appreciate that if you could help solve this problem. Thanks very much. Sincerely SEOmoz Pro Member
Technical SEO | Apr 1, 2013, 7:15 PM | fugu0