Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Crawling image folders / crawl allowance
-
We recently removed /img and /imgp from our robots.txt file thus allowing googlebot to crawl our image folders. Not sure why we had these blocked in the first place, but we opened them up in response to an email from Google Product Search about not being able to crawl images - which can/has hurt our traffic from Google Shopping.
My question is: will allowing Google to crawl our image files eat up our 'crawl allowance'? We wouldn't want Google to not crawl/index certain pages, and ding our organic traffic, because more of our allotted crawl bandwidth is getting chewed up crawling image files.
Outside of the non-detailed crawl stat graphs from Webmaster Tools, what's the best way to check how frequently/ deeply our site is getting crawled?
Thanks all!
-
I did this accidentally as well recently and had 100% of my products disallowed from google shopping within 48 hours. Sounds like it's not an option. They need the crawl your images folder to make sure you have valid images in you product listings.
-
if your rankings are improving, then good move!
-
Hey Richard,
We were previously blocking googlebot from crawling our images at all (through disallowing /img/ and /imgp/ in robots.txt file. We removed this block after recieving this email from Google:
Thank you for participating in Google Product Search. It has come to our attention that a robots.txt file is preventing us from crawling some or all of the images on your site. In order for us to access and display the images you provide in your product listings, we'd like you to modify your robots.txt file to allow user-agent 'googlebot' to crawl your site.
_Failure for Google to access your images may affect the visibility of your items on Google Product Search and Product Ad results. _
While I totally agree that image traffic will not convert like standard traffic, it is free and who knows, we may just pick up a few sales from it. Of course if this comes at the cost of eating up a disproportionate amount of our crawl allowance relative to the value (or avoiding any penalties from Google Product Search) we'd be better off leaving the block on.
By way of an update, it looks like our rankings have started to improve in Google product search. We first experienced a drop in rankings and traffic from Product Search on 4/16 and removed the block from robots.txt on 4/22.
-
Why do you need Google to reach inside your img folder? Images display on the page and are indexed then. Sure, if you are selling images, then I can see the need for this, but to just crawl the img folder??
If it is not huge, I do not see it penalizing you. I would make sure all images are named using keywords as crawling pic001.jpg, pic002.jpg, product01.jpg, logo.gif will not do you any good anyway.
Also I find bad linking coming from Google image searches. No one searches to purchase a coffee cup and looks in Google images to do so. Conversely, if someone is searching images of coffee cups to use in whatever, having them click over to your site is a waste of time. They are just going to grab the image and go leaving your metrics a mess.
I hope that helps.
-
It may effect crawl allowance but depends on the size of your site, page rank and trust etc.
One of the best ways to determine crawl depth and whether you have any issues is to create separate sitemaps for your most important content or areas of your site. You could also create an image sitemap.
Then you can monitor these over time and and will give you a good picture of which content is being crawled and indexed well and which content/images are not. This may also help you to find out if the site structure is too deep or whether you need to link more to deeper content in order to improve crawling and indexation.
Hope this helps.
-
Personally, I wouldn't try to figure out the impact by looking at crawl stats. I'd be more focused on end results. Have we had an increase in organic traffic, or conversions from Google shopping since we opened it up, or has either of these gone down?
That's what matters, and is the only real indicator as to whether it was a wise move or not.
-
You could check your server stats on who is accessing your site, this should tell you what bots are going to your pages when. I don't know what control panel you are using for your site, but if you are using Cpanel, I am sure there are tutorials online to help you find this information.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Size of image for article Schema
Hi, I implemented schema markup for an article and all tested fine and I can see it being fired in preview mode of Google Tag Manager. But when I run the URL which has it applied through Google Structured Testing tool it is not appearing. I have now read that the image needs to be a certain size. For AMP articles this appears to be 12oo pixels wide http://www.thesempost.com/google-changes-image-size-requirements-amp-articles/ But what about non-AMP articles? Does it need to be that big too?
Technical SEO | | AL123al0 -
How google crawls images and which url shows as source?
Hi, I noticed that some websites host their images to a different url than the one their actually website is hosted but in the end google link to the one that the site is hosted. Here is an example: This is a page of a hotel in booking.com: http://www.booking.com/hotel/us/harrah-s-caesars-palace.en-gb.html When I try a search for this hotel in google images it shows up one of the images of the slideshow. When I click on the image on Google search, if I choose the Visit Page button it links to the url above but the actual image is located in a totally different url: http://r-ec.bstatic.com/images/hotel/840x460/135/13526198.jpg My question is can you host your images to one site but show it to another site and in the end google will lead to the second one?
Technical SEO | | Tz_Seo0 -
Do YouTube videos in iFrames get crawled?
There seems to be quite a few articles out there that say iframes cause problems with organic search and that the various bots can't/won't crawl them. Most of the articles are a few years old (including Moz's video sitemap article). I'm wondering if this is still the case with YouTube/Vimeo/etc videos, all of which only offer iFrames as an embed option. I have a hard time believing that a Google property (YT) would offer an embed option that it's own bot couldn't crawl. However, let me know if that is in fact the case. Thanks! Jim
Technical SEO | | DigitalAnarchy0 -
Image Search
Hello Community, I have been reading and researching about image search and trying to find patterns within the results but unfortunately I could not get to a conclusion on 2 matters. Hopefully this community would have the answers I am searching for. 1) Watermarked Images (To remove or not to remove watermark from photos) I see a lot of confusion on this subject and am pretty much confused myself. Although it might be true that watermarked photos do not cause a punishment, it sure does not seem to help. At least in my industry and on a bunch of different random queries I have made, watermarked images are hard to come by on Google's images results. Usually the first results do not have any watermarks. I have read online that Google takes into account user behavior and most users prefer images with no watermark. But again, it is something "I have read online" so I don't have any proof. I would love to have further clarification and, if possible, a definite guide on how to improve my image results. 2) Multiple nested folders (Folder depth) Due to speed concerns our tech guys are using 1 image per folder and created a convoluted folder structure where the photos are actually 9 levels deep. Most of our competition and many small Wordpress blogs outrank us on Google images and on ALL INSTANCES I have checked, their photos are 3, 4 or 5 levels deep. Never inside 9 nested folders.
Technical SEO | | Koki.Mourao
So... A) Should I consider removing the watermark - which is not that intrusive but is visible?
B) Should I try to simplify the folder structure for my photos? Thank you0 -
Should Sitemaps be placed in the sub folder they reference?
I have a sitemap-index.xml file in the root. I then have several sitemaps linked to from the index in example.com/sitemaps/sitemap1.xml, example.com/sitemaps/sitemap2.xml, etc. I have seen on other sites that for example a sitemap containing blogs where the blogs are located at example.com/blog/blog1/ would be located at example.com/blog/sitemap.xml. Is it necessary to have the sitemap located in the same folder like this? I would like to have all sitemaps in a single sitemap folder for convenience but not if it will confuse search engines. My index count for URLs in some sitemaps has dropped dramatically in Google Webmaster Tools over the past month or so and I'm not sure if this is having an effect. If it matters, I have all sitemap files, including the index, listed in the robots.txt file.
Technical SEO | | Giovatto0 -
How to Delete the slug /category/ from wordpress category pages
Hi all, I would like to ask you what's the better way to eliminate the slug /category/ form the wordpress category pages. I need to delete the slug /category/ to make the url seo frendly. The problem is that my site is an old site with the page indexed by Google for a long time. Thanks for your advice.
Technical SEO | | salvyy0 -
/~username
Hello, The utility on this site that crawls your site and highlights what it sees as potential problems reported an issue with /~username access seeing it as duplicate content i.e. mydomain.com/file.htm is the same as mydomain.com~/username/file.htm so I went to my server hosts and they disabled it using mod_userdir but GWT now gives loads of 404 errors. Have I gone about this the wrong way or was it not really a problem in the first place or have I fixed something that wasn't broken and made things worse? Thanks, Ian
Technical SEO | | jwdl0 -
Image Size for SEO
Hi there I have a website which has some png images on pages, around 300kb - is this too much? How many kbs a page, to what extent do you know does Google care about page load speed? is every kb important, is there a limit? Any advice much appreciated.
Technical SEO | | pauledwards0