Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Dilemma about "images" folder in robots.txt
-
Hi, Hope you're doing well.
I am sure, you guys must be aware that Google has updated their webmaster technical guidelines saying that users should allow access to their css files and java-scripts file if it's possible. Used to be that Google would render the web pages only text based. Now it claims that it can read the css and java-scripts. According to their own terms, not allowing access to the css files can result in sub-optimal rankings. "Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings."http://googlewebmastercentral.blogspot.com/2014/10/updating-our-technical-webmaster.htmlWe have allowed access to our CSS files. and Google bot, is seeing our webapges more like a normal user would do. (tested it in GWT)Anyhow, this is my dilemma. I am sure lot of other users might be facing the same situation. Like any other e commerce companies/websites.. we have lot of images. Used to be that our css files were inside our images folder, so I have allowed access to that. Here's the robots.txt --> http://www.modbargains.com/robots.txtRight now we are blocking images folder, as it is very huge, very heavy, and some of the images are very high res. The reason we are blocking that is because we feel that Google bot might spend almost all of its time trying to crawl that "images" folder only, that it might not have enough time to crawl other important pages. Not to mention, a very heavy server load on Google's and ours. we do have good high quality original pictures. We feel that we are losing potential rankings since we are blocking images. I was thinking to allow ONLY google-image bot, access to it. But I still feel that google might spend lot of time doing that. **I was wondering if Google makes a decision saying, hey let me spend 10 minutes for google image bot, and let me spend 20 minutes for google-mobile bot etc.. or something like that.. , or does it have separate "time spending" allocations for all of it's bot types. I want to unblock the images folder, for now only the google image bot, but at the same time, I fear that it might drastically hamper indexing of our important pages, as I mentioned before, because of having tons & tons of images, and Google spending enough time already just to crawl that folder.**Any advice? recommendations? suggestions? technical guidance? Plan of action? Pretty sure I answered my own question, but I need a confirmation from an Expert, if I am right, saying that allow only Google image access to my images folder. Sincerely,Shaleen Shah
-
Yup my images send me traffic from Google images on most of my sites and attractive images attract hotlinks as well. At the moment people are hosting their images on a different domain (cdn) and are still being credited with the images but I haven't tried to do that myself ie I don't know if they've set some "ownership" somewhere and somehow.
-
I recommend allowing Google to crawl those images. Google optimizes its crawl rate and once it has done a complete crawl it will understand how often to crawl certain areas of your site. My main concern would be that you are losing potential rankings and indexing from those images - if they are unique and high quality you definitely want them to index the images, understand the file names, and appropriately index them.
I wouldn't be concerned about Google bot eating up your server resources. If it does become a problem, then you can go back and adjust the bot access through the robots.txt, as you've done already. However, I would let them in first and only react if it becomes a problem.
I have tens of thousands of product images accessed by the google bot and it is no concern to my ecommerce company and the server resources. I'm not saying that it can't be a potential problem, but the benefit outweighs the risk of it being one - I choose a reactive stance in this situation.
Closely monitor your Google Webmaster Tools account, watch the crawl rate and statistics, and if it becomes an issue then decide on which image folders should or shouldn't be indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
Rel="prev" / "next"
Hi guys, The tech department implemented rel="prev" and rel="next" on this website a long time ago.
Intermediate & Advanced SEO | | AdenaSEO
We also added a canonical tag to the 'own' page. We're talking about the following situation: https://bit.ly/2H3HpRD However we still see a situation where a lot of paginated pages are visible in the SERP.
Is this just a case of rel="prev" and "next" being directives to Google?
And in this specific case, Google deciding to not only show the 1st page in the SERP, but still show most of the paginated pages in the SERP? Please let me know, what you think. Regards,
Tom1 -
Baidu Spider appearing on robots.txt
Hi, I'm not too sure what to do about this or what to think of it. This magically appeared in my companies robots.txt file (literally magically appeared/text is below) User-agent: Baiduspider
Intermediate & Advanced SEO | | IceIcebaby
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: / I know that Baidu is the Google of China, but I'm not sure why this would appear in our robots.txt all of a sudden. Should I be worried about a hack? Also, would I want to disallow Baidu from crawling my companies website? Thanks for your help,
-Reed0 -
When is it recommended to use a self referencing rel "canonical"?
In what type of a situation is it the best type of practice to use a self referencing rel "canonical" tag? Are there particular practices to be cautious of when using a self referencing rel "canonical" tag? I see this practice used mainly with larger websites but I can't find any information that really explains when is a good time to make use of this practice for SEO purposes. Appreciate all feedback. Thank you in advance.
Intermediate & Advanced SEO | | SEO_Promenade0 -
Why is "Noindex" better than a "Canonical" for Pagination?
"Noindex" is a suggested pagination technique here: http://searchengineland.com/the-latest-greatest-on-seo-pagination-114284, and everyone seems to agree that you shouldn't canonicalize all pages in a series to the first page, but I'd love if someone can explain why "noindex" is better than a canonical?
Intermediate & Advanced SEO | | nicole.healthline0 -
Recovering from robots.txt error
Hello, A client of mine is going through a bit of a crisis. A developer (at their end) added Disallow: / to the robots.txt file. Luckily the SEOMoz crawl ran a couple of days after this happened and alerted me to the error. The robots.txt file was quickly updated but the client has found the vast majority of their rankings have gone. It took a further 5 days for GWMT to file that the robots.txt file had been updated and since then we have "Fetched as Google" and "Submitted URL and linked pages" in GWMT. In GWMT it is still showing that that vast majority of pages are blocked in the "Blocked URLs" section, although the robots.txt file below it is now ok. I guess what I want to ask is: What else is there that we can do to recover these rankings quickly? What time scales can we expect for recovery? More importantly has anyone had any experience with this sort of situation and is full recovery normal? Thanks in advance!
Intermediate & Advanced SEO | | RikkiD220 -
Could you use a robots.txt file to disalow a duplicate content page from being crawled?
A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO. I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?
Intermediate & Advanced SEO | | gregelwell0 -
Google Said "Repeat the search with the omitted results included."
We have some pages targeting the different countries but with the Near to Similar content/products, just distinguished with the country name etc. one of the page was assigned to me for optimizing. two or three Similar pages are ranked with in top 50 for the main keyword. I updated some on page content to make it more distinguish from others. After some link building, I found that this page still not showing in Google result, even I found the following message on the google. "In order to show you the most relevant results, we have omitted some entries very similar to the 698 already displayed.
Intermediate & Advanced SEO | | alexgray
If you like, you can repeat the search with the omitted results included." I clicked to repeat omitted result and found that my targeted url on 450th place in google (before link building this was not) My questions are Is google consider this page low quality or duplicate content? Is there any role of internal linking to give importance a page on other (when they are near to similar)? Like these pages can hurt the whole site rankings? How to handle this issue?0