Crawling image folders / crawl allowance
-
We recently removed /img and /imgp from our robots.txt file thus allowing googlebot to crawl our image folders. Not sure why we had these blocked in the first place, but we opened them up in response to an email from Google Product Search about not being able to crawl images - which can/has hurt our traffic from Google Shopping.
My question is: will allowing Google to crawl our image files eat up our 'crawl allowance'? We wouldn't want Google to not crawl/index certain pages, and ding our organic traffic, because more of our allotted crawl bandwidth is getting chewed up crawling image files.
Outside of the non-detailed crawl stat graphs from Webmaster Tools, what's the best way to check how frequently/ deeply our site is getting crawled?
Thanks all!
-
I did this accidentally as well recently and had 100% of my products disallowed from google shopping within 48 hours. Sounds like it's not an option. They need the crawl your images folder to make sure you have valid images in you product listings.
-
if your rankings are improving, then good move!
-
Hey Richard,
We were previously blocking googlebot from crawling our images at all (through disallowing /img/ and /imgp/ in robots.txt file. We removed this block after recieving this email from Google:
Thank you for participating in Google Product Search. It has come to our attention that a robots.txt file is preventing us from crawling some or all of the images on your site. In order for us to access and display the images you provide in your product listings, we'd like you to modify your robots.txt file to allow user-agent 'googlebot' to crawl your site.
_Failure for Google to access your images may affect the visibility of your items on Google Product Search and Product Ad results. _
While I totally agree that image traffic will not convert like standard traffic, it is free and who knows, we may just pick up a few sales from it. Of course if this comes at the cost of eating up a disproportionate amount of our crawl allowance relative to the value (or avoiding any penalties from Google Product Search) we'd be better off leaving the block on.
By way of an update, it looks like our rankings have started to improve in Google product search. We first experienced a drop in rankings and traffic from Product Search on 4/16 and removed the block from robots.txt on 4/22.
-
Why do you need Google to reach inside your img folder? Images display on the page and are indexed then. Sure, if you are selling images, then I can see the need for this, but to just crawl the img folder??
If it is not huge, I do not see it penalizing you. I would make sure all images are named using keywords as crawling pic001.jpg, pic002.jpg, product01.jpg, logo.gif will not do you any good anyway.
Also I find bad linking coming from Google image searches. No one searches to purchase a coffee cup and looks in Google images to do so. Conversely, if someone is searching images of coffee cups to use in whatever, having them click over to your site is a waste of time. They are just going to grab the image and go leaving your metrics a mess.
I hope that helps.
-
It may effect crawl allowance but depends on the size of your site, page rank and trust etc.
One of the best ways to determine crawl depth and whether you have any issues is to create separate sitemaps for your most important content or areas of your site. You could also create an image sitemap.
Then you can monitor these over time and and will give you a good picture of which content is being crawled and indexed well and which content/images are not. This may also help you to find out if the site structure is too deep or whether you need to link more to deeper content in order to improve crawling and indexation.
Hope this helps.
-
Personally, I wouldn't try to figure out the impact by looking at crawl stats. I'd be more focused on end results. Have we had an increase in organic traffic, or conversions from Google shopping since we opened it up, or has either of these gone down?
That's what matters, and is the only real indicator as to whether it was a wise move or not.
-
You could check your server stats on who is accessing your site, this should tell you what bots are going to your pages when. I don't know what control panel you are using for your site, but if you are using Cpanel, I am sure there are tutorials online to help you find this information.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Cross Domain Canonicalization for Site Folder
Hello colleagues! I have a client who decided to launch a separate domain so they could offer their content translated for other countries. Each country (except the US/English) content lives in its own country folder as follows: client.com/01/02/zh
Technical SEO | | SimpleSearch
client.com/01/02/tw etc. The problem is that they post the content in US/English on this domain too. It does NOT have its own folder, but exists righth after the date (as in the above example) Oh, and the content is the same as on their "main" domain so google likes to index that sometimes vs. the original client on the domain where we want the traffic to go. SO, is there a way to say "hey google, please index the US content only on the main domain, but continue to index the translated content in these folders on this totally separate domain?" Thank you so much in advance.0 -
How to allow bots to crawl all but WP-content
Hello, I would like my website to remain crawlable to bots, but to block my wp content and media. Does the following robots.txt work? I worry that the * user agent may conflict with the others. User-agent: *
Technical SEO | | Tom3_15
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/ User-agent: GoogleBot
Allow: / User-agent: GoogleBot-Mobile
Allow: / User-agent: GoogleBot-Image
Allow: / User-agent: Bingbot
Allow: / User-agent: Slurp
Allow: /0 -
Sudden decrease in indexed AMP pages after 8/1/16 update
After the AMP update on 8/1/16, the number of AMP pages indexed suddenly dropped by about 50% and it's crushing our search traffic- I haven't been able to find any documentation on any changes to look out for and why we are getting a penalty- any advice or something I should look out for?
Technical SEO | | nystromandy0 -
Title Tag vs. H1 / H2
OK, Title tag, no problem, it's the SEO juice, appears on SERP, etc. Got it. But I'm reading up on H1 and getting conflicting bits of information ... Only use H1 once? H1 is crucial for SERP Use H1s for subheads Google almost never looks past H2 for relevance So say I've got a blog post with three sections ... do I use H1 three times (or does Google think you're playing them ...) Or do I create a "big" H1 subhead and then use H2s? Or just use all H2s because H1s are scary? 🙂 I frequently use subheads, it would seem weird to me to have one a font size bigger than another, but of course I can adjust that in settings ... Thoughts? Lisa
Technical SEO | | ChristianRubio0 -
All of my Image Looks like as a Link
Hey guyz,
Technical SEO | | atakala
I know I ask so many questions these days 😄 but you know ...
Anyway;
I have a website which has only 22 pages and today I try to crawl it with screaming frog and then I face with many image internal link.
But the problem is I dont see any of them when look at the site and source code of the site.
So why does it happen ?
Is there any idea ?
Please share with me.
Thank you .
http://i.imgur.com/dtj7Ws7.png dtj7Ws7.png0 -
Logo / H1 page structure
Not sure why I did it this way, but I need to know if this is bad for SEO - I suspect it is. Consider this is how the Logo and company name is shown on each page of our site. # UpCounsel, Inc Where the #title tag has a background image showing our company logo, rather than the actual company words. Then the CSS is also displacing the company text in favor of the logo, but also helps with accessbility so the first thing they see on the site is the company name. My question is - I should probably not use an H1 tag for this, right? And then in order to change this around, I should use H1's for the important title of the page. I understand there is some question about whether the page <title>and H1 should be the same, but we don't need to go there right now. ;-)</p></title>
Technical SEO | | Mase0 -
/forum/ or /hookah-forum/
I'm building a new website on Hookah.org. It will have a forum and blog. Should I put them in Hookah.org/hookah-forum/ and Hookah.com/hookah-blog/ or Hookah.org/forum and Hookah.org/blog I think /forum/ and /blog/ are easier for users but am not sure how much adding the word hookah helps with SEO.
Technical SEO | | Heydarian0 -
Www/nonwww .co.uk/.com
When I started SEO - I didn't really know what I was doing (still don't!) Just wondering if anyone can help me with this small problem. I now understand that I basically have 4 URLs www.ablemagazine.com (Page Authority: 38/100) www.ablemagazine.co.uk (Page Authority: 47/100) ablemagazine.com (Page Authority: 3/100) ablemagazine.co.uk (Page Authority: 51/100) What should be configuration be to ensure I'm not loosing masses amounts of linkjuice? At the moment I have ablemagazine.co.uk set as my default domain in webmaster tools. www.ablemagazine.com www.ablemagazine.co.uk and ablemagazine.com all 301 redirect here (I think)
Technical SEO | | craven220