Crawling image folders / crawl allowance
-
We recently removed /img and /imgp from our robots.txt file thus allowing googlebot to crawl our image folders. Not sure why we had these blocked in the first place, but we opened them up in response to an email from Google Product Search about not being able to crawl images - which can/has hurt our traffic from Google Shopping.
My question is: will allowing Google to crawl our image files eat up our 'crawl allowance'? We wouldn't want Google to not crawl/index certain pages, and ding our organic traffic, because more of our allotted crawl bandwidth is getting chewed up crawling image files.
Outside of the non-detailed crawl stat graphs from Webmaster Tools, what's the best way to check how frequently/ deeply our site is getting crawled?
Thanks all!
-
I did this accidentally as well recently and had 100% of my products disallowed from google shopping within 48 hours. Sounds like it's not an option. They need the crawl your images folder to make sure you have valid images in you product listings.
-
if your rankings are improving, then good move!
-
Hey Richard,
We were previously blocking googlebot from crawling our images at all (through disallowing /img/ and /imgp/ in robots.txt file. We removed this block after recieving this email from Google:
Thank you for participating in Google Product Search. It has come to our attention that a robots.txt file is preventing us from crawling some or all of the images on your site. In order for us to access and display the images you provide in your product listings, we'd like you to modify your robots.txt file to allow user-agent 'googlebot' to crawl your site.
_Failure for Google to access your images may affect the visibility of your items on Google Product Search and Product Ad results. _
While I totally agree that image traffic will not convert like standard traffic, it is free and who knows, we may just pick up a few sales from it. Of course if this comes at the cost of eating up a disproportionate amount of our crawl allowance relative to the value (or avoiding any penalties from Google Product Search) we'd be better off leaving the block on.
By way of an update, it looks like our rankings have started to improve in Google product search. We first experienced a drop in rankings and traffic from Product Search on 4/16 and removed the block from robots.txt on 4/22.
-
Why do you need Google to reach inside your img folder? Images display on the page and are indexed then. Sure, if you are selling images, then I can see the need for this, but to just crawl the img folder??
If it is not huge, I do not see it penalizing you. I would make sure all images are named using keywords as crawling pic001.jpg, pic002.jpg, product01.jpg, logo.gif will not do you any good anyway.
Also I find bad linking coming from Google image searches. No one searches to purchase a coffee cup and looks in Google images to do so. Conversely, if someone is searching images of coffee cups to use in whatever, having them click over to your site is a waste of time. They are just going to grab the image and go leaving your metrics a mess.
I hope that helps.
-
It may effect crawl allowance but depends on the size of your site, page rank and trust etc.
One of the best ways to determine crawl depth and whether you have any issues is to create separate sitemaps for your most important content or areas of your site. You could also create an image sitemap.
Then you can monitor these over time and and will give you a good picture of which content is being crawled and indexed well and which content/images are not. This may also help you to find out if the site structure is too deep or whether you need to link more to deeper content in order to improve crawling and indexation.
Hope this helps.
-
Personally, I wouldn't try to figure out the impact by looking at crawl stats. I'd be more focused on end results. Have we had an increase in organic traffic, or conversions from Google shopping since we opened it up, or has either of these gone down?
That's what matters, and is the only real indicator as to whether it was a wise move or not.
-
You could check your server stats on who is accessing your site, this should tell you what bots are going to your pages when. I don't know what control panel you are using for your site, but if you are using Cpanel, I am sure there are tutorials online to help you find this information.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should i noindex/nofollow a faceted navigation page?
I have an ecommerce website with 4 departments, that share the same categories, For example a bicycle shop would have different products for mountain biking and road cycling, but they would both share the same 'tyres' category. I get around this by having the department as a filter, that changes the products on show, and adds a URL parameter of ?department=1. When this filter is applied, i have a canonical link setup to the non-filtered category. Any filter links are nofollowed. My top menu has 4 different sections, one for each department, and links to these URLs with the department parameter already on, these links are set to allow robots to follow. As i am actively pointing Google at these pages, and it is my main navigation, should the page they go to be noindexed? As its the canonical i want to rank. Hopefully this makes sense. Cheers
Technical SEO | | SEOhmygod0 -
Will this URL structure: "domain.com/s/content-title" cause problems?
Hey all, We have a new in-house built too for building content. The problem is it inserts a letter directly after the domain automatically. The content we build with these pages aren't all related, so we could end up with a bunch of urls like this: domain.com/s/some-calculator
Technical SEO | | joshuaboyd
domain.com/s/some-infographic
domain.com/s/some-long-form-blog-post
domain.com/s/some-product-page Could this cause any significant issues down the line?0 -
404 crawl errors ending with your domain name??
Hello, I have a crawl test with numerous 404 errors ending with my domain name..? Not sure what the cause is. Plugins? Ecommerce? I use Wordpress if that could lead to an answer. Thanks for your time. K
Technical SEO | | Hydraulicgirl0 -
What should i do to index images in google webmaster?
My website onlineplants.com.au. It's a shopping cart website. I do have nearly 1200 images but none of the images are indexed in google webmaster? what should i do. Thanks
Technical SEO | | Verve-Innovation1 -
Expired Domain - http:// or www
I have an old domain - When i use the link explorer i get way more juice out of the www version of my domain. I will be using wordpress to set up a new domain with the same name . My question is - How do I make it proper for seo? Do i just change the http:// to www in wordpress and be done with it? Does it even matter (thinking it does)
Technical SEO | | imagatto20 -
Reciprocal links / seo satellite
Hi guys, I am reasonably new to SEO. We operate a site. Lets call it brand.com. I would like to build up SEO juice and traffic for our site reasonably quickly, but with a view to not harming us in the long term. There are a large number of very small blogs in our space (> 100). Many of them are private blogs. I would like to gain links from these blogs. None of these blogs will send large amounts of traffic to our site on an individual basis, but in sum they provide both decent traffic and SEO juice. Leaving SEO out of mind I would offer them all returning links in exchange for linking to our blog (brand.com/blog) or our main domain (brand.com). They are decent quality sites that may be of interest to our users. They are not competitors and will not take any of our business away. Problem: I want to avoid being punished by Google for link exchange. In an ideal world I would event like to profit from these links from a SEO perspective. I have thought of a work around, but am not sure whether this will work at all. I will create 3 satellite pages: brand-partners.com, brand-tips.com and brand-blog-roll.com. I will feature links to these three sites prominently on my main site and my blog. This will provide these three sites with some SEO juice and trust from bloggers. In return for linking to my site, I will offer the small blogs links from these three "satellite" pages. I will try to diffuse the picture by adding some random links and obtaining some random links that I don't link back to. My approach is to always provide value to our users. Apart from the diffusing bit above I would say that creating these small hubs provides value (as we recommend valuable sites), while still enabling us to have some SEO effect. As I am reasonably new to SEO, I don't know whether the above is already a standard tactic employed or whether it contains some horrible pitfall that I should be aware of. I would be very thankful for any tips or feedback! Thank you and all the best, Daan
Technical SEO | | daan.loening0 -
Changing image path of the whole domain
Hi together, we are using a CDN for delivering static images. Due to some changes we want to change the path for images for the whole domain. like: images.example.com/old/var/test.jpg to images.example.com/new/var/test.jpg Does anyone know what could happend to SERPs? (old path will be available) Best regards Steffen
Technical SEO | | steffen_0 -
Redesign existing websites / worried about urls / mapping
Hi Guys, While redesigning existing websites that will have page name changes such as: example.com/products to be called example.com/solutions example.com/about-us to be called example.com/about should I 301 the old url to the new url. In the past I have not done this & I'm just wondering from an SEO point of view how bad is this? (On a scale of 1 to 10 how bad is this not 301ing urls, 10 being really bad & 1 being fine), Thanks.
Technical SEO | | Socialdude0