Crawling image folders / crawl allowance
-
We recently removed /img and /imgp from our robots.txt file thus allowing googlebot to crawl our image folders. Not sure why we had these blocked in the first place, but we opened them up in response to an email from Google Product Search about not being able to crawl images - which can/has hurt our traffic from Google Shopping.
My question is: will allowing Google to crawl our image files eat up our 'crawl allowance'? We wouldn't want Google to not crawl/index certain pages, and ding our organic traffic, because more of our allotted crawl bandwidth is getting chewed up crawling image files.
Outside of the non-detailed crawl stat graphs from Webmaster Tools, what's the best way to check how frequently/ deeply our site is getting crawled?
Thanks all!
-
I did this accidentally as well recently and had 100% of my products disallowed from google shopping within 48 hours. Sounds like it's not an option. They need the crawl your images folder to make sure you have valid images in you product listings.
-
if your rankings are improving, then good move!
-
Hey Richard,
We were previously blocking googlebot from crawling our images at all (through disallowing /img/ and /imgp/ in robots.txt file. We removed this block after recieving this email from Google:
Thank you for participating in Google Product Search. It has come to our attention that a robots.txt file is preventing us from crawling some or all of the images on your site. In order for us to access and display the images you provide in your product listings, we'd like you to modify your robots.txt file to allow user-agent 'googlebot' to crawl your site.
_Failure for Google to access your images may affect the visibility of your items on Google Product Search and Product Ad results. _
While I totally agree that image traffic will not convert like standard traffic, it is free and who knows, we may just pick up a few sales from it. Of course if this comes at the cost of eating up a disproportionate amount of our crawl allowance relative to the value (or avoiding any penalties from Google Product Search) we'd be better off leaving the block on.
By way of an update, it looks like our rankings have started to improve in Google product search. We first experienced a drop in rankings and traffic from Product Search on 4/16 and removed the block from robots.txt on 4/22.
-
Why do you need Google to reach inside your img folder? Images display on the page and are indexed then. Sure, if you are selling images, then I can see the need for this, but to just crawl the img folder??
If it is not huge, I do not see it penalizing you. I would make sure all images are named using keywords as crawling pic001.jpg, pic002.jpg, product01.jpg, logo.gif will not do you any good anyway.
Also I find bad linking coming from Google image searches. No one searches to purchase a coffee cup and looks in Google images to do so. Conversely, if someone is searching images of coffee cups to use in whatever, having them click over to your site is a waste of time. They are just going to grab the image and go leaving your metrics a mess.
I hope that helps.
-
It may effect crawl allowance but depends on the size of your site, page rank and trust etc.
One of the best ways to determine crawl depth and whether you have any issues is to create separate sitemaps for your most important content or areas of your site. You could also create an image sitemap.
Then you can monitor these over time and and will give you a good picture of which content is being crawled and indexed well and which content/images are not. This may also help you to find out if the site structure is too deep or whether you need to link more to deeper content in order to improve crawling and indexation.
Hope this helps.
-
Personally, I wouldn't try to figure out the impact by looking at crawl stats. I'd be more focused on end results. Have we had an increase in organic traffic, or conversions from Google shopping since we opened it up, or has either of these gone down?
That's what matters, and is the only real indicator as to whether it was a wise move or not.
-
You could check your server stats on who is accessing your site, this should tell you what bots are going to your pages when. I don't know what control panel you are using for your site, but if you are using Cpanel, I am sure there are tutorials online to help you find this information.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL structuring / redirect question
Hi there, I have a URL structuring / redirect question. I have many pages on my site but I set each page up to fall under one of two folders as I serve two unique markets and want each side to be indexed properly. I have SIDE A: www.domain/FOLDER-A.com and SIDE B: www.domain/FOLDER-B. The problem is that I have a page for www.domain.com and www.domain/FOLDER-A/page1.com but I do NOT have a page for www.domain/FOLDER-A. The reason for this is that I've opted to make what would be www.domain/FOLDER-A be www.domain.com and act the primary landing page the site. As a result, there is no page located at www.domain/FOLDER-A. My WordPress template (Divi by Elegant Themes) forced me to create a blank page to be able to build off the FOLDER-A framework. My question is that given I am forced to have this blank page, do I leave it be or create a 302 or 307 redirect to www.domain.com? I fear using a 301 redirect given I may want to utilize this page for content at some point in the future. This isn't the easiest post to follow so please let me know if I need to restate the question. Many thanks in advance!
Technical SEO | | KurtWSEO0 -
Site Crawling with Firewall Plugin
Just wondering if anyone has any experience with the WordPress Simple Firewall plugin. I have a client who is concerned about security as they've had issues in that realm in the past and they've since installed this plugin: https://wordpress.org/support/view/plugin-reviews/wp-simple-firewall?filter=4 Problem is, even with a proper robots file and appropriate settings within the firewall, I still cannot crawl the site with site crawler tools. Google seems to be accessing the site fine, but I still wonder if it is in anyway potentially hindering search spiders.
Technical SEO | | BrandishJay0 -
Subdomain/subfolder question
Hi community, Let's say I have a men's/women's clothing website. Would it be better to do clothing.com/mens and clothing.com/womens OR mens.clothing.com and womens.clothing.com? I understand Moz's stance on blogs that it should be clothing.com/blog, but wanted to ask for this different circumstance. Thanks for your help!
Technical SEO | | IceIcebaby0 -
ALT attribute keyword on the same image but different pages
Hi there, As i'm sure you're probably aware, moz advises to use a keyword within the ALT attribute on pages... On a new website I am launching, I have the ability to add an alt keyword to image headers. On multiple pages we have the exact same image but with different keywords associated them inside the alt attribute. The image itself is a collage of different images and so the keywords used can, quite sneakily, match the image. My question is therefore, will using different keywords on the same image on different pages have a negative effect on SEO? Thanks, Stuart
Technical SEO | | Stuart260 -
/~username
Hello, The utility on this site that crawls your site and highlights what it sees as potential problems reported an issue with /~username access seeing it as duplicate content i.e. mydomain.com/file.htm is the same as mydomain.com~/username/file.htm so I went to my server hosts and they disabled it using mod_userdir but GWT now gives loads of 404 errors. Have I gone about this the wrong way or was it not really a problem in the first place or have I fixed something that wasn't broken and made things worse? Thanks, Ian
Technical SEO | | jwdl0 -
Replacing H1's with images
We host a few Japanese sites and Japanese fonts tend to look a bit scruffy the larger they are. I was wondering if image replacement for H1 is risky or not? eg in short... spiders see: Some header text optimized for seo then in the css h1 {
Technical SEO | | -Al-
text-indent: -9999px;
} h1.header_1{ background:url(/images/bg_h1.jpg) no-repeat 0 0; } We are considering this technique, I thought I should get some advise before potentially jeopardising anything, especially as we are dealing with one of the most important on page elements. In my opinion any attempt to hide text could be seen as keyword stuffing, is it a case that in moderation it is acceptable? Cheers0 -
Image optimisation, alt and title tags
Is making the alt and title tags in an image the same bad for seo? Does anyone have any recommendations? any help much appreciated.
Technical SEO | | pauledwards0 -
Crawl Errors and Duplicate Content
SEOmoz's crawl tool is telling me that I have duplicate content at "www.mydomain.com/pricing" and at "www.mydomain.com/pricing.aspx". Do you think this is just a glitch in the crawl tool (because obviously these two URL's are the same page rather than two separate ones) or do you think this is actually an error I need to worry about? Is so, how do I fix it?
Technical SEO | | MyNet0