How to Hide Directories in Search?
-
I noticed bad 404 error links in Google Webmaster Tools and they were pointing to directories that do not have an actual page, but hold information.
Ex: there are links pointing to our PDF folder which holds all of our pdf documents. If i type in , example.com/pdf/ it brings up a unformated webpage that displays all of our PDF links.
How do I prevent this from happening. Right now I am blocking these in my robots.txt file, but if i type them in, they still appear.
Or should I not worry about this?
-
Yes, a visit to example.com/dir should now return a 404 error (if you haven't done any redirecting/canonicalizing). This will increase your 404 count in Web Master tools but it's far preferable to the alternative. If you're not redirecting the robots.txt will eventually work and hopefully the links will just fall out of WMT.
-
My hosting company turned off directory browsing and now everything is how it should be. So to my understanding, if the server sees a file that does not have a index file, it should not be view able and should be forbidden. This shoujld not affect us from an SEO standpoint should it? My hosting company said they disabled all directories in our site, however everything still works, except for the forbidden file directories.
-
Basically it shouldn't really have an affect; those unformatted file listings are literally the web server automatically saying 'here's the files that are in this folder', there's no meta tags, description, on page elements, etc.
If you have these pages and they're ranking well, you generally don't want them to be. The automatic file browsing pages don't have your name, your company, etc. in them, and they're generally pretty ugly. They also theoretically could be 'stealing' juice from your 'real' pages, if your internal structure isn't flowing relevance properly.
Basically what I'm saying is that if these pages are having some kind of SEO effect, you probably don't want them to be since they're so basic.
Also I can't overstate the security concerns that directory browsing might be introducing. If someone can directory browse to where your code lives (.php, .aspx.vb, whatever) they may be able to read it. Code sometimes has important things like logins, passwords, merchant account ids, etc. in it that you definitely don't want people reading.
-
Agreed with Valerie that step 1 is to turn off those directory listing pages - that can be a security issue and you don't necessarily want people to see/access the whole list. Also, make doubly sure you don't have any internal links to that directory (Google crawled it somehow).
Generally, Robots.txt should prevent crawling, but it's not foolproof, and it's pretty bad about removing pages once they're indexed. If you can block the page from browsing and return a 404 for the root page, that should be fine. The other option would be to have the page removed in Google Webmaster Tools. You could request removal for the entire folder, but I'm guessing that you may want the actual PDFs indexed.
-
Will turning of directory browsing affect Search for all directories?
-
I really don't want to 301 redirect them as they are just holding files. This is happening with my includes file too. that holds our header, footer, navigation etc. I can check with our hosting company to find out.
-
I'd create an index.html for the directory, and then redirect it somewhere. This way, you're capturing the inbound links and then rescuing some of the inbound juice.
Otherwise, you can also check out this post for more info on other solutions and modifying your htaccess file to prevent the directory view - http://perishablepress.com/better-default-directory-views-with-htaccess/
-
Blocking it in robots.txt will work to hide it from search engines.
If you want to hide it from users or people to who type in the url, you can simply drop a blank "index.html" in the /pdf folder.
-
I would suggest 301'ing them to their /index.htm or /pdf.htm equivalents. If you don't know, a 301 is a signal to a web browser (or search crawler) saying "this page has permanently moved, please go to (otherpage.htm) instead".
Here's a good SEOMoz article explaining it a bit more:
http://www.seomoz.org/learn-seo/redirection
What might be more of a concern, is it sounds like your web server has directory browsing enabled. This could be a security issue (depending on your web server setup). Generally you don't want to expose directories if you don't have to because it gives a potential attacker insight into your system setup. Here's an example how to do it in Apache:
www.camelrichard.org/topics/Apache/Turn_OffDirectoryBrowsing
And IIS:
technet.microsoft.com/en-us/library/cc731109(v=ws.10).aspx
If you like I can confirm if you have open directories if you give me the link, either here or through private message.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How get google reviews on search results?
Hi, We have good google reviews. (4,8) Can we get this rating stars also on our organic search results ? Best remco
Intermediate & Advanced SEO | | remcoz0 -
Pages canonicaled to another appearing before the canonical on google searches
Hello, When I do this google search, this page(amandine roses category) appears before the one it is canonical-ed to(this multi-product version of amandine roses). This happens often with this multi-product template, where they don't rank as well as their category version(that are canonical to the multi-product version). Can someone maybe point us in the right direction on what the issue may be? What can be improved?
Intermediate & Advanced SEO | | globalrose.com0 -
Error 404 Search Console
Hi all, We have a number of 404 https status listed in Search Console even treated, not decrease. What happened: We launched a website with the urls www.meusite.com/url-abc. We launched these urls sitemap. Google has indexed. ... For some reason, the urls were changed four days later by some developer in my equipe. So I asked the redirection of URLs "old" already indexed to the new (of: / url-abc to / url-xyz) all correspondingly. I submit the sitemap with new urls. We fixed the internal links. And than marked as fixed in the Search Console. But it does not work! Has anyone had a similar experience? Thanks for any advice!
Intermediate & Advanced SEO | | mobic0 -
Site: inurl: Search
I have a site that allows for multiple filter options and some of these URL's have these have been indexed. I am in the process of adding the noindex, nofollow meta tag to these pages but I want to have an idea of how many of these URL's have been indexed so I can monitor when these have been re crawled and dropped. The structure for these URL's is: http://www.example.co.uk/category/women/shopby/brand1--brand2.html The unique identifier for the multiple filtered URL's is --, however I've tried using site:example.co.uk inurl:-- but this doesn't seem to work. I have also tried using regex but still no success. I was wondering if there is a way around this so I can get a rough idea of how many of these URL's have been indexed? Thanks
Intermediate & Advanced SEO | | GrappleAgency0 -
Will multiple domains from the same company rank for the same keyword search?
I'm trying to convince people that we need good marketing reasons for starting multiple domains, as it will be more difficult to rank multiple sites. Does anyone know if Google actively discourages multiple domains from the same company appearing in the search results for the same keyword? We are creating a separate content website which is related to an existing company website. Would you agree that is best to have these sites on one domain with the content site on a sub-domain perhaps? I'm worried about duplication of effort and cross-keyword targeting in particular. These sites would not have duplicate content.
Intermediate & Advanced SEO | | RG_SEO0 -
I have search result pages that are completely different showing up as duplicate content.
I have numerous instances of this same issue in our Crawl Report. We have pages showing up on the report as duplicate content - they are product search result pages for completely different cruise products showing up as duplicate content. Here's an example of 2 pages that appear as duplicate : http://www.shopforcruises.com/carnival+cruise+lines/carnival+glory/2013-09-01/2013-09-30 http://www.shopforcruises.com/royal+caribbean+international/liberty+of+the+seas We've used Html 5 semantic markup to properly identify our Navigation <nav>, our search widget as an <aside>(it has a large amount of page code associated with it). We're using different meta descriptions, different title tags, even microformatting is done on these pages so our rich data shows up in google search. (rich snippet example - http://www.google.com/#hl=en&output=search&sclient=psy-ab&q=http:%2F%2Fwww.shopforcruises.com%2Froyal%2Bcaribbean%2Binternational%2Fliberty%2Bof%2Bthe%2Bseas&oq=http:%2F%2Fwww.shopforcruises.com%2Froyal%2Bcaribbean%2Binternational%2Fliberty%2Bof%2Bthe%2Bseas&gs_l=hp.3...1102.1102.0.1601.1.1.0.0.0.0.142.142.0j1.1.0...0.0...1c.1.7.psy-ab.gvI6vhnx8fk&pbx=1&bav=on.2,or.r_qf.&bvm=bv.44442042,d.eWU&fp=a03ba540ff93b9f5&biw=1680&bih=925 ) How is this distinctly different content showing as duplicate? Is SeoMoz's site crawl flawed (or just limited) and it's not understanding that my pages are not dupe? Copyscape does not identify these pages as dupe. Should we take these crawl results more seriously than copyscape? What action do you suggest we take? </aside> </nav>
Intermediate & Advanced SEO | | JMFieldMarketing0 -
Yelp, yahoo, local directories..
We are currently in one location and we are moving our office. Would it be better to leave all the directories and add new listings with the new address? This way we get 2 listings until someone in this address claims a business here. or Would it be better to change all the existing to the new address?
Intermediate & Advanced SEO | | SEODinosaur0 -
De-indexing search results noindex, follow or noindex, nofollow
If search results were not originally blocked with robots.txt, and need to be de-indexed, is it better to use noindex, nofollow or noindex, follow?
Intermediate & Advanced SEO | | nicole.healthline0