Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Blocking Pages Via Robots, Can Images On Those Pages Be Included In Image Search
-
Hi!
I have pages within my forum where visitors can upload photos. When they upload photos they provide a simple statement about the photo but no real information about the image,definitely not enough for the page to be deemed worthy of being indexed. The industry however is one that really leans on images and having the images in Google Image search is important to us.
The url structure is like such: domain.com/community/photos/~username~/picture111111.aspx
I wish to block the whole folder from Googlebot to prevent these low quality pages from being added to Google's main SERP results. This would be something like this:
User-agent: googlebot
Disallow: /community/photos/
Can I disallow Googlebot specifically rather than just using User-agent: * which would then allow googlebot-image to pick up the photos? I plan on configuring a way to add meaningful alt attributes and image names to assist in visibility, but the actual act of blocking the pages and getting the images picked up... Is this possible?
Thanks!
Leona
-
Are you seeing the images getting indexed, though? Even if GWT recognize the Robots.txt directives, blocking the pages may essentially keep the images from having any ranking value. Like Matt, I'm not sure this will work in practice.
Another option would be to create an alternate path to just the images, like an HTML sitemap with just links to those images and decent anchor text. The ranking power still wouldn't be great (you'd have a lot of links on this page, most likely), but it would at least kick the crawlers a bit.
-
Thanks Matt for your time and assistance! Leona
-
Hi Leona - what you have done is something along the lines of what I thought would work for you - sorry if I wasn't clear in my original response - I thought you meant if you created a robots.txt and specified Googlebot to be disallowed then Googlebot-image would pick up the photos still and as I said this wouldn't be the case as it Googlebot-image will follow what it set out for Googlebot unless you specify otherwise using the allow directive as I mentioned. Glad it has worked for you - keep us posted on your results.
-
Hi Matt,
Thanks for your feedback!
It is not my belief that Googlebot overwrides googlebot-images otherwise specifying something for a specific bot of Google's wouldn't work, correct?
I setup the following:
User-agent: googlebot
Disallow: /community/photos/
User-agent: googlebot-Image
Allow: /community/photos/
I tested the results in Google Webmaster Tools which indicated:
Googlebot: Blocked by line 26: Disallow: /community/photos/Detected as a directory; specific files may have different restrictions
Googlebot-Image: Allowed by line 29: Allow: /community/photos/Detected as a directory; specific files may have different restrictions
Thanks for your help!
Leona
-
Hi Leona
Googlebot-image and any of the other bots that Google uses follow the rules set out for Googlebot so blocking Googlebot would block your images as it overrides Googlebot-image. I don't think that there is a way around this using the disallow directive as you are blocking the directory which contains your images so they won't be indexed using specific images.
Something you may want to consider is the Allow directive -
Disallow: /community/photos/
Allow: /community/photos/~username~/
that is if Google is already indexing images under the username path?
The allow directive will only be successful if it contains more or equal number of characters as the disallow path, so bare in mind that if you had the following;
Disallow: /community/photos/
Allow: /community/photos/
the allow will win out and nothing will be blocked. please note that i haven't actioned the allow directive myself but looked into it in depth when i studied the robots.txt for my own sites it would be good if someone else had an experience of this directive. Hope this helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
Images on their own page?
Hi Mozers, We have images on their own separate pages that are then pulled onto content pages. Should the standalone pages be indexable? On the one hand, it seems good to have an image on it's own page, with it's own title. On the other hand, it may be better SEO for crawler to find the image on a content page dedicated to that topic. Unsure. Would appreciate any guidance! Yael
Intermediate & Advanced SEO | | yaelslater1 -
Switching from Http to Https, but what about images and image link juice?
Hi Ya'll. I'm transitioning our http version website to https. Important question: Do images have to have 301 redirects? If so, how and where? Please send me a link or explain best practices. Best, Shawn
Intermediate & Advanced SEO | | Shawn1241 -
How long to re-index a page after being blocked
Morning all! I am doing some research at the moment and am trying to find out, just roughly, how long you have ever had to wait to have a page re-indexed by Google. For this purpose, say you had blocked a page via meta noindex or disallowed access by robots.txt, and then opened it back up. No right or wrong answers, just after a few numbers 🙂 Cheers, -Andy
Intermediate & Advanced SEO | | Andy.Drinkwater0 -
"noindex, follow" or "robots.txt" for thin content pages
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great. I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
Intermediate & Advanced SEO | | khi50 -
Can we retrieve all 404 pages of my site?
Hi, Can we retrieve all 404 pages of my site? is there any syntax i can use in Google search to list just pages that give 404? Tool/Site that can scan all pages in Google Index and give me this report. Thanks
Intermediate & Advanced SEO | | mtthompsons0 -
Should I noindex the site search page? It is generating 4% of my organic traffic.
I read about some recommendations to noindex the URL of the site search.
Intermediate & Advanced SEO | | lcourse
Checked in analytics that site search URL generated about 4% of my total organic search traffic (<2% of sales). My reasoning is that site search may generate duplicated content issues and may prevent the more relevant product or category pages from showing up instead. Would you noindex this page or not? Any thoughts?0 -
Robots.txt is blocking Wordpress Pages from Googlebot?
I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot?
Intermediate & Advanced SEO | | ENSO0