How to Disallow Specific Folders and Sub Folders for Crawling?
-
Today, I have checked indexing for my website in Google. I found very interesting result over there. You can check that result by following result of Google.
I aware about use of robots.txt file and can disallow images folder to solve this issue.
But, It may block my images to get appear in Google image search.
So, How can I fix this issue?
-
You can, but then the content will be removed from Google's index for 90 days. I am not sure what effect this would have on pages with the images. It shouldn't have any effect, but I would hate for you to have rankings in any way affected for 90 days.
I have no experience in having images indexed in this manner. Perhaps someone else has more knowledge to share on this topic.
-
Can I use Remove URL facility from Google webmaster tools?
-
I checked your URL: http://www.lampslightingandmore.com/images/. The folder is now properly restricted and the images can no longer be seen using this method. Going forward, Google will not be able to index new images in the same manner your other images were indexed.
With respect to the images which have been indexed, I am not certain how Google will respond. The image links are still valid so they may keep them. On the other hand, the links are gone so they may remove them. If it were my site, I would wait 30 days to see if Google removed the results.
Another way you can resolve the issue is to change the file path to your images from /images to /image. This will immediately break all the links. You would need to ensure all the links on your site are updated properly. It still may take Google a month to de-index those results but it would certainly happen in that case.
-
I have added Options -Indexes for images folder in htaccess file.
But, I still able to find out images folder in Google indexing.
Can I check? Is it working properly or not? I don't want to index or display images folder in web search any more.
-
I am going to add following code to my htaccess page.
Options -Indexes
Will it work for me or not?
-
If you have a development team, they should instantly understand the problem.
A simple e-mail to any developer
E-mail title: Please fix
http://www.lampslightingandmore.com/images/
That's it. No other text should be needed. A developer should be able to look at the page and understand the index was left open and how to fix it. If you wish to be nicer then a simple "my index is open for the world to see, please don't allow public access to my server folders" should suffice.
-
Yes, I have similar problem with my code structure. Yesterday, I have set Relative path for all URLs. But, I am not sure about replacing of image name in code after make change in folder.
So, I don't want to go with that manner. I also discussed with my development team and recommend to go with htaccess method.
But, give me caution to follow specific method otherwise it may create big issue for crawling or indexing. Right??
-
The link you shared is perfect. Near the top there is a link for OPTIONS. Click on it and you will be on this page: http://httpd.apache.org/docs/1.3/mod/core.html#options
I want to very clearly state you should not make changes to your .htaccess file unless you are comfortable working with code. The slightest mistake and your entire site becomes unavailable. You can also damage the security of your site.
With that said, if you decide to proceed anyway you can add the text I shared to the top of your .htaccess file. You definitely should BACK UP the file before making any changes.
The suggestion vishalkialani made was to rename your /images folder to something else, perhaps /image. The problem is that if your site was not dynamically coded, you would break your image links.
-
In addition to what Ryan mentioned I would rename that folder on your server. That will make google's index outdated and you won't get any visitors on the server
-
I can't getting you.
-
also you can rename it so when google 's index shows up the results you won't get any hits.
if thats what you want.
-
Yes, I checked article to know more about it.
http://httpd.apache.org/docs/1.3/howto/htaccess.html
But, I am not able to find my solution. Can you suggest me specific article which suppose to help me more in same direction?
-
Hello.
You have left your site open in a manner which is not recommended. Please take a look at the following URL: http://www.lampslightingandmore.com/images/. On a properly secured server, you should receive a 404 Page Not Found or Access Denied type of error. Since the folder is left open, a Google crawler found it and you are seeing the results.
The means to secure your site varies based on your software configuration. If you are on an Apache web server (the most common setup) then these settings are controlled by your htaccess file. I am not an htaccess expert but I believe adding the following code to your .htaccess file at the top will fix the issue:
Options -Indexes
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl and Indexation Error - Googlebot can't/doesn't access specific folders on microsites
Hi, My first time posting here, I am just looking for some feedback on a indexation issue we have with a client and any feedback on possible next steps or items I may have overlooked. To give some background, our client operates a website for the core band and a also a number of microsites based on specific business units, so you have corewebsite.com along with bu1.corewebsite.com, bu2.corewebsite.com. The content structure isn't ideal, as each microsite follows a structure of bu1.corewebsite.com/bu1/home.aspx, bu2.corewebsite.com/bu2/home.aspx and so on. In addition to this each microsite has duplicate folders from the other microsites so bu1.corewebsite.com has indexable folders bu1.corewebsite.com/bu1/home.aspx but also bu1.corewebsite.com/bu2/home.aspx the same with bu2.corewebsite.com has bu2.corewebsite.com/bu2/home.aspx but also bu2.corewebsite.com/bu1/home.aspx. Therre are 5 different business units so you have this duplicate content scenario for all microsites. This situation is being addressed in the medium term development roadmap and will be rectified in the next iteration of the site but that is still a ways out. The issue
Intermediate & Advanced SEO | | ImpericMedia
About 6 weeks ago we noticed a drop off in search rankings for two of our microsites (bu1.corewebsite.com and bu2.corewebsite.com) over a period of 2-3 weeks pretty much all our terms dropped out of the rankings and search visibility dropped to essentially 0. I can see that pages from the websites are still indexed but oddly it is the duplicate content pages so (bu1.corewebsite.com/bu3/home.aspx or (bu1.corewebsite.com/bu4/home.aspx is still indexed, similiarly on the bu2.corewebsite microsite bu2.corewebsite.com/bu3/home.aspx and bu4.corewebsite.com/bu3/home.aspx are indexed but no pages from the BU1 or BU2 content directories seem to be indexed under their own microsites. Logging into webmaster tools I can see there is a "Google couldn't crawl your site because we were unable to access your site's robots.txt file." This was a bit odd as there was no robots.txt in the root directory but I got some weird results when I checked the BU1/BU2 microsites in technicalseo.com robots text tool. Also due to the fact that there is a redirect from bu1.corewebsite.com/ to bu1.corewebsite.com/bu4.aspx I thought maybe there could be something there so consequently we removed the redirect and added a basic robots to the root directory for both microsites. After this we saw a small pickup in site visibility, a few terms pop into our Moz campaign rankings but drop out again pretty quickly. Also the error message in GSC persisted. Steps taken so far after that In Google Search Console, I confirmed there are no manual actions against the microsites. Confirmed there is no instances of noindex on any of the pages for BU1/BU2 A number of the main links from the root domain to microsite BU1/BU2 have a rel="noopener noreferrer" attribute but we looked into this and found it has no impact on indexation Looking into this issue we saw some people had similar issues when using Cloudflare but our client doesn't use this service Using a response redirect header tool checker, we noticed a timeout when trying to mimic googlebot accessing the site Following on from point 5 we got a hold of a week of server logs from the client and I can see Googlebot successfully pinging the site and not getting 500 response codes from the server...but couldn't see any instance of it trying to index microsite BU1/BU2 content So it seems to me that the issue could be something server side but I'm at a bit of a loss of next steps to take. Any advice at all is much appreciated!0 -
GG webmaster tool crawl rate
Hello, I just redid my website and suddenly saw a spike in the crawl rate for about 2 weeks and I am now back to where I was on average. Is it normal ? My guess it that this increase was due to the change in site and new links ? However,I just want to make sure it is normal that it is back to a "normal rate" now that it has discovered all the links. Thank you,
Intermediate & Advanced SEO | | seoanalytics0 -
Should you bother disallowing low quality links with brand/non-commercial anchor text?
Hi Guys, Doing a link audit and have come across lots of low quality web directories pointing to the website. Most of the anchor text of these directories are the websites URL and not comercial/keyword focused anchor text. So if thats the case should we even bother doing a link removal request via google webmaster tools for these links, as the anchor text is non-commercial? Cheers.
Intermediate & Advanced SEO | | spyaccounts140 -
How can a Page indexed without crawled?
Hey moz fans,
Intermediate & Advanced SEO | | atakala
In the google getting started guide it says **"
Note: **Pages may be indexed despite never having been crawled: the two processes are independent of each other. If enough information is available about a page, and the page is deemed relevant to users, search engine algorithms may decide to include it in the search results despite never having had access to the content directly. That said, there are simple mechanisms such as robots meta tags to make sure that pages are not indexed.
" How can it happen, I dont really get the point.
Thank you0 -
Microsite as a stand-alone site under one domain and sub-domained under another: duplicate content penalty?
We developed and maintain a microsite (example: www.coolprograms.org) for a non-profit that lives outside their main domain name (www.nonprofit-mainsite.org) and features content related to a particular offering of theirs. They are utilizing a Google Grant to run AdWords campaigns related to awareness. They currently drive traffic from the AdWords campaigns to both the microsite (www.coolprograms.org) and their main site (www.nonprofit-mainsite.org). Google recently announced a change in their policy regarding what domains a Google Grant recipient can send traffic to via AdWords: https://support.google.com/nonprofits/answer/1657899?hl=en. The ads must all resolve to one root domain name (nonprofit-mainsite.org). If we were to subdomain the microsite (example: coolprograms.nonprofit-mainsite.org) and keep serving the same content via the microsite domain (www.coolprograms.org) is there a risk of being penalized for duplicate content? Are there other things we should be considering?
Intermediate & Advanced SEO | | marketing-iq0 -
How do I 301 Redirect a complete folder?
Hi, I am want to delete a folder and all the contents. I then need to redirect anyone that is trying to reach a file in that folder to another page on my site. example: www.mydomain.com/folder/ (contains 50 pages) I want to delete the folder and all 50 pages. So if someone tries to reach www.mydomain.com/folder/page1.php the redirect would take them to a specific page on my site. Doing this to clean up old content. How would I do this on the .htaccess? I have redirected a page but not a folder. Thanks in advance! Force7
Intermediate & Advanced SEO | | Force70 -
Canonical tags and GA tracking on premium sub-domain?
Hello! I'm launching a premium service on my site that will deliver two fairly distinct user experiences, but with nearly identical page content across the two. I'm thinking of placing the "upgraded" version on a subdomain, e.g. www.mysite.com, premium.mysite.com. Simple enough. I've run into two obstacles, however: -I don't want the premium site crawled separately, so I'd like to use canonical tags to pull all premium.* back to their www.* parents. --How different can page content be before canonical tags backfire? --Is there any other danger in using canonicals across subdomains like this? -Less importantly: with Google Analytics, if I track against the subdomain my visits will split naturally, and it should generate a second cookie for a new registrant who crosses subdomains. I could also use a visitor-level custom var. Good idea? Bad idea? Thanks! -m
Intermediate & Advanced SEO | | grumbles0 -
Could Sub domains damage our SEO?
Hi there, We're currently looking into integrating a new internal search function to our site which will involve housing the search results on a sub domain of our site. We have no intention of these search result pages becoming landing pages for organic traffic but would the inclusion of a sub domain affect the optimization of the main domain? i.e. could it effect our authority? Nige
Intermediate & Advanced SEO | | NigelJ0