How to Hide Directories in Search?
-
I noticed bad 404 error links in Google Webmaster Tools and they were pointing to directories that do not have an actual page, but hold information.
Ex: there are links pointing to our PDF folder which holds all of our pdf documents. If i type in , example.com/pdf/ it brings up a unformated webpage that displays all of our PDF links.
How do I prevent this from happening. Right now I am blocking these in my robots.txt file, but if i type them in, they still appear.
Or should I not worry about this?
-
Yes, a visit to example.com/dir should now return a 404 error (if you haven't done any redirecting/canonicalizing). This will increase your 404 count in Web Master tools but it's far preferable to the alternative. If you're not redirecting the robots.txt will eventually work and hopefully the links will just fall out of WMT.
-
My hosting company turned off directory browsing and now everything is how it should be. So to my understanding, if the server sees a file that does not have a index file, it should not be view able and should be forbidden. This shoujld not affect us from an SEO standpoint should it? My hosting company said they disabled all directories in our site, however everything still works, except for the forbidden file directories.
-
Basically it shouldn't really have an affect; those unformatted file listings are literally the web server automatically saying 'here's the files that are in this folder', there's no meta tags, description, on page elements, etc.
If you have these pages and they're ranking well, you generally don't want them to be. The automatic file browsing pages don't have your name, your company, etc. in them, and they're generally pretty ugly. They also theoretically could be 'stealing' juice from your 'real' pages, if your internal structure isn't flowing relevance properly.
Basically what I'm saying is that if these pages are having some kind of SEO effect, you probably don't want them to be since they're so basic.
Also I can't overstate the security concerns that directory browsing might be introducing. If someone can directory browse to where your code lives (.php, .aspx.vb, whatever) they may be able to read it. Code sometimes has important things like logins, passwords, merchant account ids, etc. in it that you definitely don't want people reading.
-
Agreed with Valerie that step 1 is to turn off those directory listing pages - that can be a security issue and you don't necessarily want people to see/access the whole list. Also, make doubly sure you don't have any internal links to that directory (Google crawled it somehow).
Generally, Robots.txt should prevent crawling, but it's not foolproof, and it's pretty bad about removing pages once they're indexed. If you can block the page from browsing and return a 404 for the root page, that should be fine. The other option would be to have the page removed in Google Webmaster Tools. You could request removal for the entire folder, but I'm guessing that you may want the actual PDFs indexed.
-
Will turning of directory browsing affect Search for all directories?
-
I really don't want to 301 redirect them as they are just holding files. This is happening with my includes file too. that holds our header, footer, navigation etc. I can check with our hosting company to find out.
-
I'd create an index.html for the directory, and then redirect it somewhere. This way, you're capturing the inbound links and then rescuing some of the inbound juice.
Otherwise, you can also check out this post for more info on other solutions and modifying your htaccess file to prevent the directory view - http://perishablepress.com/better-default-directory-views-with-htaccess/
-
Blocking it in robots.txt will work to hide it from search engines.
If you want to hide it from users or people to who type in the url, you can simply drop a blank "index.html" in the /pdf folder.
-
I would suggest 301'ing them to their /index.htm or /pdf.htm equivalents. If you don't know, a 301 is a signal to a web browser (or search crawler) saying "this page has permanently moved, please go to (otherpage.htm) instead".
Here's a good SEOMoz article explaining it a bit more:
http://www.seomoz.org/learn-seo/redirection
What might be more of a concern, is it sounds like your web server has directory browsing enabled. This could be a security issue (depending on your web server setup). Generally you don't want to expose directories if you don't have to because it gives a potential attacker insight into your system setup. Here's an example how to do it in Apache:
www.camelrichard.org/topics/Apache/Turn_OffDirectoryBrowsing
And IIS:
technet.microsoft.com/en-us/library/cc731109(v=ws.10).aspx
If you like I can confirm if you have open directories if you give me the link, either here or through private message.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Recent changes to Google organic search?
We have a client's website that was on page 1 for 2 years, and then in September fell off while a new website with virtually no visitors and never showing in organic search before shot to #1. Never seen anything like it. Today, my client is back where they were two weeks ago and the #1 listing I mentioned is not even on page 1 at all. In fact it's at the bottom of page 2! It seems to us, having read about Google organic changes made around July 3rd, 2018, that even more emphasis is now on the domain name (all the results on page 1 have my client's keyword in their domain name) and the importance of H1 tags and Title tags has risen to trump many other factors. Can anyone shed some light on changes you may have seen in the past few months? Along with huge changes to Adwords and AdGrants, Google seems all over the place (at least to us) and it is more challenging then ever. Thanks!!
Intermediate & Advanced SEO | | Teamzig0 -
Search engine submission - Urgent
Is it necessary to submit a new site to search engines? I have a brand-new site I purchased a few days ago which I didn't think to check until after I purchased it, But it has not been indexed by Google!
Intermediate & Advanced SEO | | seoman10
The domain was registered three months ago, and probably the website wouldn't have been designed until after that.
But I'm still left puzzling why the site is not indexed by Google. Any ideas? Thanks in advance.0 -
Directory Quality for Citation Building
Hello All, Just started to work on a new clients site that has been hit with multiple google penalties. I was looking at their backlink profile and noticed they have numerous links from what seem to be very low quality directory websites. My question is, when building citations and looking for directories to submit to, what makes one directory more credible then another one? If most of them are just publishing links and business information, why does google consider one credible and the other spammy? Clearly with some it's easy to tell if they are credible or not, but with some it is not as easy. Should you only really be submitting to the best of the best or are some lower lever ones ok too? Have read a few things on this topic, but most is older and just want to hear what people have to say on this today. Thanks.
Intermediate & Advanced SEO | | Whebb0 -
Does Google Index an Alert Div w/Delayed Hide
We have a div at the top of a client's the page that displays an alert to the user. After 30 seconds it is rendered hidden. Does Google index this? Does Google take this into account when it ranks the page?
Intermediate & Advanced SEO | | WEOMedia0 -
Search results all going to home page
I'm an author, and after doing a search for one of my books I realized that no matter what was searched, the user was getting lead to the homepage. please see the attached picture. How do I fix this and is this hurting my SEO? Capture.JPG Capture1.JPG
Intermediate & Advanced SEO | | StreetwiseReports0 -
Transactional vs Informative Search
I have a page that id ranking quiet good (Page1) for the plural of a Keyword but it is just ranking on Page 3 for the Singular Keyword. For more then one Year I am working on Onpage and Offpage optimization to improve ranking for the singular term, without success. Google is treating the two terms almost the same, when you search for term one also term 2 is marked in bold and the results are very similar. The big difference between both terms is in my opinion that one is more for informational search the other one is more for transactional search. Now i would be curious to know which factors could Google use to understand weather a search and a website is more transactional or informative? Apart of mentioning: Buy now, Shop, Buy now, Shop, Special offer etc. Any Ideas?
Intermediate & Advanced SEO | | SimCaffe0 -
Should I robots block this directory?
There's about 43k pages indexed in this directory, and while helpful to end users, I don't see it being a great source of unique content for search engines. Would you robots block or meta noindex nofollow these pages in the /blissindex/ directory? ie. http://www.careerbliss.com/blissindex/petsmart-index-980481/ http://www.careerbliss.com/blissindex/att-index-1043730/ http://www.careerbliss.com/blissindex/facebook-index-996632/
Intermediate & Advanced SEO | | CareerBliss0 -
Removing URLs in bulk when directory exclusion isn't an option?
I had a bunch of URLs on my site that followed the form: http://www.example.com/abcdefg?q=&site_id=0000000048zfkf&l= There were several million pages, each associated with a different site_id. They weren't very useful, so we've removed them entirely and now return a 404.The problem is, they're still stuck in Google's index. I'd like to remove them manually, but how? There's no proper directory (i.e. /abcdefg/) to remove, since there's no trailing /, and removing them one by one isn't an option. Is there any other way to approach the problem or specify URLs in bulk? Any insights are much appreciated. Kurus
Intermediate & Advanced SEO | | kurus1