Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
-
Hi MOZers,
This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers?
A) would Google totally ignore the image and the ALT tags information? OR
B) Google would consider the ALT tags information?
I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently.
Looking forward to all your responses
Malika
-
May I ask why you/your webmaster would have noindexed your images in the first place?
-
-
Hi Malika,
Blocking image directories or images themselves in robots.txt only prevents the image from being added to "image" search results. You will still get the full benefit of the alt text on the page, the image just won't appear in the image results.
How this actually works is the crawler will crawl the site and index all the text and weight (h1, h2, alt etc..) then when the crawler moves to add the image to the search cache it finds it can't access it due to robots.txt and simply ignores it and goes on.This leaves your original text as what is indexed as a search result, and nothing for image results.
If you are using Apache you may want to not use robots.txt as the method of blocking images. I would recommend using the .htaccess file with a code like this...
<filesmatch ".(bmp|gif|jpg|png|tif)$"="">Header set X-Robots-Tag "noindex"</filesmatch>
This is a blanket declaration and would prevent indexing of any images with the noted extensions on your site. This is particularly useful if you have multiple image directories. Further more if there are a few images you want indexed you could pick a particular extension like .jpeg for example (note jpeg not jpg), then just convert those few images and know they will be indexed as they are not in the exclusion list.
Another benefit of handling it this way is if you already have images that are indexed, using the noindex tag will get them out out of the image directory much faster than blocking them. The reason is you are giving Google a new directive which is "noindex", otherwise they will just treat them as inaccessible and move on, leaving any cached version to appear in the directory for some time.
Hope that makes sense and helps,
Don
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt blocked internal resources Wordpress
Hi all, We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one: User-agent: *
Intermediate & Advanced SEO | | Mat_C
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php However, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts. Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO? Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index? Thanks for your thoughts!2 -
Can you index a Google doc?
We have updated and added completely new content to our state pages. Our old state content is sitting in a our Google drive. Can I make these public to get them indexed and provide a link back to our state pages? In theory it sounds like a great link building strategy... TIA!
Intermediate & Advanced SEO | | LindsayE1 -
Can noindexed pages accrue page authority?
My company's site has a large set of pages (tens of thousands) that have very thin or no content. They typically target a single low-competition keyword (and typically rank very well), but the pages have a very high bounce rate and are definitely hurting our domain's overall rankings via Panda (quality ranking). I'm planning on recommending we noindexed these pages temporarily, and reindex each page as resources are able to fill in content. My question is whether an individual page will be able to accrue any page authority for that target term while noindexed. We DO want to rank for all those terms, just not until we have the content to back it up. However, we're in a pretty competitive space up against domains that have been around a lot longer and have higher domain authorities. Like I said, these pages rank well right now, even with thin content. The worry is if we noindex them while we slowly build out content, will our competitors get the edge on those terms (with their subpar but continually available content)? Do you think Google will give us any credit for having had the page all along, just not always indexed?
Intermediate & Advanced SEO | | THandorf0 -
Google indexed wrong pages of my website.
When I google site:www.ayurjeewan.com, after 8 pages, google shows Slider and shop pages. Which I don't want to be indexed. How can I get rid of these pages?
Intermediate & Advanced SEO | | bondhoward0 -
Dev Subdomain Pages Indexed - How to Remove
I own a website (domain.com) and used the subdomain "dev.domain.com" while adding a new section to the site (as a development link). I forgot to block the dev.domain.com in my robots file, and google indexed all of the dev pages (around 100 of them). I blocked the site (dev.domain.com) in robots, and then proceeded to just delete the entire subdomain altogether. It's been about a week now and I still see the subdomain pages indexed on Google. How do I get these pages removed from Google? Are they causing duplicate content/title issues, or does Google know that it's a development subdomain and it's just taking time for them to recognize that I deleted it already?
Intermediate & Advanced SEO | | WebServiceConsulting.com0 -
Where to put a page ID in a URL?
Hello, My company is going to change URLs to example.com/category or example.com/product. When we will change the URLs to product or category pages somehow we have to check whether the requested page is from category table in DB or from products table (this gives much speed to page load time). So we have to choose how to make the different product and category pages.
Intermediate & Advanced SEO | | komeksimas
Programmers said that we need to insert id to URL. So the question is: Which is the better way to place an id to an URL? example.com/product-name?id=111 example.com/product-name/111 example.com/product_name-111 Or maybe we should use some other punctuation mark to separate id from product name? p.s. I have read Dynamic URLs vs. static URLs by Google and it still didn't answered which is the best for all of the pages. Somehow others solve this problem by typing only the names to the URL, but could anyone tell what that technology should be?0 -
Soft 404's from pages blocked by robots.txt -- cause for concern?
We're seeing soft 404 errors appear in our google webmaster tools section on pages that are blocked by robots.txt (our search result pages). Should we be concerned? Is there anything we can do about this?
Intermediate & Advanced SEO | | nicole.healthline4 -
Block an entire subdomain with robots.txt?
Is it possible to block an entire subdomain with robots.txt? I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
Intermediate & Advanced SEO | | kylesuss12