How to Disallow Specific Folders and Sub Folders for Crawling?
-
Today, I have checked indexing for my website in Google. I found very interesting result over there. You can check that result by following result of Google.
I aware about use of robots.txt file and can disallow images folder to solve this issue.
But, It may block my images to get appear in Google image search.
So, How can I fix this issue?
-
You can, but then the content will be removed from Google's index for 90 days. I am not sure what effect this would have on pages with the images. It shouldn't have any effect, but I would hate for you to have rankings in any way affected for 90 days.
I have no experience in having images indexed in this manner. Perhaps someone else has more knowledge to share on this topic.
-
Can I use Remove URL facility from Google webmaster tools?
-
I checked your URL: http://www.lampslightingandmore.com/images/. The folder is now properly restricted and the images can no longer be seen using this method. Going forward, Google will not be able to index new images in the same manner your other images were indexed.
With respect to the images which have been indexed, I am not certain how Google will respond. The image links are still valid so they may keep them. On the other hand, the links are gone so they may remove them. If it were my site, I would wait 30 days to see if Google removed the results.
Another way you can resolve the issue is to change the file path to your images from /images to /image. This will immediately break all the links. You would need to ensure all the links on your site are updated properly. It still may take Google a month to de-index those results but it would certainly happen in that case.
-
I have added Options -Indexes for images folder in htaccess file.
But, I still able to find out images folder in Google indexing.
Can I check? Is it working properly or not? I don't want to index or display images folder in web search any more.
-
I am going to add following code to my htaccess page.
Options -Indexes
Will it work for me or not?
-
If you have a development team, they should instantly understand the problem.
A simple e-mail to any developer
E-mail title: Please fix
http://www.lampslightingandmore.com/images/
That's it. No other text should be needed. A developer should be able to look at the page and understand the index was left open and how to fix it. If you wish to be nicer then a simple "my index is open for the world to see, please don't allow public access to my server folders" should suffice.
-
Yes, I have similar problem with my code structure. Yesterday, I have set Relative path for all URLs. But, I am not sure about replacing of image name in code after make change in folder.
So, I don't want to go with that manner. I also discussed with my development team and recommend to go with htaccess method.
But, give me caution to follow specific method otherwise it may create big issue for crawling or indexing. Right??
-
The link you shared is perfect. Near the top there is a link for OPTIONS. Click on it and you will be on this page: http://httpd.apache.org/docs/1.3/mod/core.html#options
I want to very clearly state you should not make changes to your .htaccess file unless you are comfortable working with code. The slightest mistake and your entire site becomes unavailable. You can also damage the security of your site.
With that said, if you decide to proceed anyway you can add the text I shared to the top of your .htaccess file. You definitely should BACK UP the file before making any changes.
The suggestion vishalkialani made was to rename your /images folder to something else, perhaps /image. The problem is that if your site was not dynamically coded, you would break your image links.
-
In addition to what Ryan mentioned I would rename that folder on your server. That will make google's index outdated and you won't get any visitors on the server
-
I can't getting you.
-
also you can rename it so when google 's index shows up the results you won't get any hits.
if thats what you want.
-
Yes, I checked article to know more about it.
http://httpd.apache.org/docs/1.3/howto/htaccess.html
But, I am not able to find my solution. Can you suggest me specific article which suppose to help me more in same direction?
-
Hello.
You have left your site open in a manner which is not recommended. Please take a look at the following URL: http://www.lampslightingandmore.com/images/. On a properly secured server, you should receive a 404 Page Not Found or Access Denied type of error. Since the folder is left open, a Google crawler found it and you are seeing the results.
The means to secure your site varies based on your software configuration. If you are on an Apache web server (the most common setup) then these settings are controlled by your htaccess file. I am not an htaccess expert but I believe adding the following code to your .htaccess file at the top will fix the issue:
Options -Indexes
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Structure for geo location for specific page
On hackerearth.com/challenges page, there is an option to select languages. This option is in the footer. Once you select the language the url changes. Ex - if we select French, the URL changes to hackereath.com/fr/challenges. In case we decide to change the URL of this page with Geo, what should be the URL structure which accommodates languages as well. My research says that it would good to keep the url like domainname.com/page/language.
Intermediate & Advanced SEO | | Rajnish_HE0 -
Using "nofollow" internally can help with crawl budget?
Hello everyone. I was reading this article on semrush.com, published the last year, and I'd like to know your thoughts about it: https://www.semrush.com/blog/does-google-crawl-relnofollow-at-all/ Is that really the case? I thought that Google crawls and "follows" nofollowed tagged links even though doesn't pass any PR to the destination link. If instead Google really doesn't crawl internal links tagged as "nofollow", can that really help with crawl budget?
Intermediate & Advanced SEO | | fablau0 -
Dilemma about "images" folder in robots.txt
Hi, Hope you're doing well. I am sure, you guys must be aware that Google has updated their webmaster technical guidelines saying that users should allow access to their css files and java-scripts file if it's possible. Used to be that Google would render the web pages only text based. Now it claims that it can read the css and java-scripts. According to their own terms, not allowing access to the css files can result in sub-optimal rankings. "Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings."http://googlewebmastercentral.blogspot.com/2014/10/updating-our-technical-webmaster.htmlWe have allowed access to our CSS files. and Google bot, is seeing our webapges more like a normal user would do. (tested it in GWT)Anyhow, this is my dilemma. I am sure lot of other users might be facing the same situation. Like any other e commerce companies/websites.. we have lot of images. Used to be that our css files were inside our images folder, so I have allowed access to that. Here's the robots.txt --> http://www.modbargains.com/robots.txtRight now we are blocking images folder, as it is very huge, very heavy, and some of the images are very high res. The reason we are blocking that is because we feel that Google bot might spend almost all of its time trying to crawl that "images" folder only, that it might not have enough time to crawl other important pages. Not to mention, a very heavy server load on Google's and ours. we do have good high quality original pictures. We feel that we are losing potential rankings since we are blocking images. I was thinking to allow ONLY google-image bot, access to it. But I still feel that google might spend lot of time doing that. **I was wondering if Google makes a decision saying, hey let me spend 10 minutes for google image bot, and let me spend 20 minutes for google-mobile bot etc.. or something like that.. , or does it have separate "time spending" allocations for all of it's bot types. I want to unblock the images folder, for now only the google image bot, but at the same time, I fear that it might drastically hamper indexing of our important pages, as I mentioned before, because of having tons & tons of images, and Google spending enough time already just to crawl that folder.**Any advice? recommendations? suggestions? technical guidance? Plan of action? Pretty sure I answered my own question, but I need a confirmation from an Expert, if I am right, saying that allow only Google image access to my images folder. Sincerely,Shaleen Shah
Intermediate & Advanced SEO | | Modbargains1 -
Sub domain vs. sub folder
I know this has probably been asked many times and answered too, but things change a lot, so I would like to know with current search engine algos and co. The scenario is as follows: Building an ecommerce site and also want to incorporate a Q&A section, for support and FAQ's and such. should we go ahead and sub domain this like: community.test.com or rater go with test.com/community. I would really like to know why, why not and maybe some real life examples. Thank you all
Intermediate & Advanced SEO | | s-s0 -
URL Parameter & crawl stats
Hey Guys,I recently used the URL parameter tool in WBT to mark different urls that offers the same content.I have the parameter "?source=site1" , "?source=site2", etc...It looks like this: www.example.com/article/12?source=site1The "source parameter" are feeds that we provide to partner sites and this way we can track the referral site with our internal analytics platform.Although, pages like:www.example.com/article/12?source=site1 have canonical to the original page www.example.com/article/12, Google indexed both of the URLs
Intermediate & Advanced SEO | | Mr.bfz
www.example.com/article/12?source=site1andwww.example.com/article/12Last week I used the URL parameter tool to mark "source" parameter "No, this parameter doesnt effect page content (track usage)" and today I see a 40% decrease in my crawl stats.In one hand, It makes sense that now google is not crawling the repeated urls with different sources but in the other hand I thought that efficient crawlability would increase my crawl stats.In additional, google is still indexing same pages with different source parameters.I would like to know if someone have experienced something similar and by increasing crawl efficiency I should expect my crawl stats to go up or down?I really appreciate all the help!Thanks!0 -
Subdomain or folder for a section not focused on my core business
Hello there, I'm installing your analytics tool and it seems really great. I'm gonna use it for sure but I've a question that is more strategic and it's something the tool can't help me with 😛 I've a website active from 2008 and really well known in my country as a service website... we're like your "advisor" for utilities and insurances. The reason why is "savings" but really focused on utilities (broadband, gas, electricity) and check accounts or insurances. I’ve always used folders in my URLs instead of subdomains (for example www.site.com/section1 or www.site.com/section2 ). In this period I’m planning to open a new website section related to saving but not really close with what we really do in the rest of the website. This section is about coupons, vouchers and little offers. The problem is that with that section I’m going to write really a lot (a lot) of content trying to gain a lot of external links. It’s obvious that I already have a lot of contents about my core business and I’m going to write contents for original categories too. This section is anyway secondary for my business and my worry is that Google can identify me in the future as a website mainly focused on this new product. I’m really well indexed so I don’t want this decision to have any effect on my original situation. Finally the question 😛 Is it better to maintain for this section the same website structure with folders or indentify it as a subdomain to remark that it’s going to be like a totally different site with his dedicated news and all the rest? That’s why I’m evaluating a subdomain but I’m not really convinced cause subdomains can be considered as a different approach compared to original structure and of course using folder can be useful to gain root’s site rank. On the other hand, what can Google think about my core business? Thanks a lot for your help
Intermediate & Advanced SEO | | Uby850 -
Hey guys i have this issues on my crawling report what should i do to exlude the pages? are d
Overly-Dynamic URL Overly-Dynamic URL Although search engines can crawl dynamic URLs, search engine representatives have warned against using over 2 parameters in a given URL. Search engines may also see dynamic versions of the same URL as unique URLs, creating duplicate content.
Intermediate & Advanced SEO | | adulter0 -
Disallowed Pages Still Showing Up in Google Index. What do we do?
We recently disallowed a wide variety of pages for www.udemy.com which we do not want google indexing (e.g., /tags or /lectures). Basically we don't want to spread our link juice around to all these pages that are never going to rank. We want to keep it focused on our core pages which are for our courses. We've added them as disallows in robots.txt, but after 2-3 weeks google is still showing them in it's index. When we lookup "site: udemy.com", for example, Google currently shows ~650,000 pages indexed... when really it should only be showing ~5,000 pages indexed. As another example, if you search for "site:udemy.com/tag", google shows 129,000 results. We've definitely added "/tag" into our robots.txt properly, so this should not be happening... Google showed be showing 0 results. Any ideas re: how we get Google to pay attention and re-index our site properly?
Intermediate & Advanced SEO | | udemy0