How to Hide Directories in Search?
-
I noticed bad 404 error links in Google Webmaster Tools and they were pointing to directories that do not have an actual page, but hold information.
Ex: there are links pointing to our PDF folder which holds all of our pdf documents. If i type in , example.com/pdf/ it brings up a unformated webpage that displays all of our PDF links.
How do I prevent this from happening. Right now I am blocking these in my robots.txt file, but if i type them in, they still appear.
Or should I not worry about this?
-
Yes, a visit to example.com/dir should now return a 404 error (if you haven't done any redirecting/canonicalizing). This will increase your 404 count in Web Master tools but it's far preferable to the alternative. If you're not redirecting the robots.txt will eventually work and hopefully the links will just fall out of WMT.
-
My hosting company turned off directory browsing and now everything is how it should be. So to my understanding, if the server sees a file that does not have a index file, it should not be view able and should be forbidden. This shoujld not affect us from an SEO standpoint should it? My hosting company said they disabled all directories in our site, however everything still works, except for the forbidden file directories.
-
Basically it shouldn't really have an affect; those unformatted file listings are literally the web server automatically saying 'here's the files that are in this folder', there's no meta tags, description, on page elements, etc.
If you have these pages and they're ranking well, you generally don't want them to be. The automatic file browsing pages don't have your name, your company, etc. in them, and they're generally pretty ugly. They also theoretically could be 'stealing' juice from your 'real' pages, if your internal structure isn't flowing relevance properly.
Basically what I'm saying is that if these pages are having some kind of SEO effect, you probably don't want them to be since they're so basic.
Also I can't overstate the security concerns that directory browsing might be introducing. If someone can directory browse to where your code lives (.php, .aspx.vb, whatever) they may be able to read it. Code sometimes has important things like logins, passwords, merchant account ids, etc. in it that you definitely don't want people reading.
-
Agreed with Valerie that step 1 is to turn off those directory listing pages - that can be a security issue and you don't necessarily want people to see/access the whole list. Also, make doubly sure you don't have any internal links to that directory (Google crawled it somehow).
Generally, Robots.txt should prevent crawling, but it's not foolproof, and it's pretty bad about removing pages once they're indexed. If you can block the page from browsing and return a 404 for the root page, that should be fine. The other option would be to have the page removed in Google Webmaster Tools. You could request removal for the entire folder, but I'm guessing that you may want the actual PDFs indexed.
-
Will turning of directory browsing affect Search for all directories?
-
I really don't want to 301 redirect them as they are just holding files. This is happening with my includes file too. that holds our header, footer, navigation etc. I can check with our hosting company to find out.
-
I'd create an index.html for the directory, and then redirect it somewhere. This way, you're capturing the inbound links and then rescuing some of the inbound juice.
Otherwise, you can also check out this post for more info on other solutions and modifying your htaccess file to prevent the directory view - http://perishablepress.com/better-default-directory-views-with-htaccess/
-
Blocking it in robots.txt will work to hide it from search engines.
If you want to hide it from users or people to who type in the url, you can simply drop a blank "index.html" in the /pdf folder.
-
I would suggest 301'ing them to their /index.htm or /pdf.htm equivalents. If you don't know, a 301 is a signal to a web browser (or search crawler) saying "this page has permanently moved, please go to (otherpage.htm) instead".
Here's a good SEOMoz article explaining it a bit more:
http://www.seomoz.org/learn-seo/redirection
What might be more of a concern, is it sounds like your web server has directory browsing enabled. This could be a security issue (depending on your web server setup). Generally you don't want to expose directories if you don't have to because it gives a potential attacker insight into your system setup. Here's an example how to do it in Apache:
www.camelrichard.org/topics/Apache/Turn_OffDirectoryBrowsing
And IIS:
technet.microsoft.com/en-us/library/cc731109(v=ws.10).aspx
If you like I can confirm if you have open directories if you give me the link, either here or through private message.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
News articles on our website are being indexed, but not showing up for search queries.
News articles on distributed.com are being indexed by Google, but not showing up for any search queries. In Google Search, I can copy and paste the entire first paragraph of the article, and the listing still won't show up in search results. For example, https://distributed.com/news/dtcc-moves-closer-blockchain-powered-trades doesn't rank AT ALL for "DTCC Moves Closer to Blockchain-Powered Trades", the title of the article. We've tried the following so far: re-submitted sitemap to search console checked manual actions in search console checked for any no-index/no-follow tags Please help us solve this SEO mystery!
Intermediate & Advanced SEO | | BTC_Inc0 -
404s clinging on in Search Console
What is a reasonable length of time to expect 404s to be resolved in Search Console? There was a mass of 404s that were built up from directory changes and filtering URLs that have been fixed. These have all been fixed but of course there are some that slipped the net. How long is it reasonable to expect the old 404s that don't have any links to drop away from Search Console? New 404s are still being reported over 4 months later. 'First detected' is always showing as a date later than the fixed 404's date. Is this reasonable, i've never seen this being so resilient and not clean up like this? We manually fix these 404s and like popcorn more turn up. Just to add the bulk of 404s came into existence around a year ago and left for around 8 months.
Intermediate & Advanced SEO | | MickEdwards0 -
Google Sitelinks Search Box
For some reason, a search for our company name (“hometalk”) does not produce the search box in the results (even though we do have sitelinks). We are adding schema markup as outlined here, but we're not sure about: Will adding the code make the search bar appear (or at least increase the chances), or is it only going to change the functionality of the search box (to on-site search) for results that are already showing a search bar?
Intermediate & Advanced SEO | | YairSpolter0 -
How should I handle URL's created by an internal search engine?
Hi, I'm aware that internal search result URL's (www.example.co.uk/catalogsearch/result/?q=searchterm) should ideally be blocked using the robots.txt file. Unfortunately the damage has already been done and a large number of internal search result URL's have already been created and indexed by Google. I have double checked and these pages only account for approximately 1.5% of traffic per month. Is there a way I can remove the internal search URL's that have already been indexed and then stop this from happening in the future, I presume the last part would be to disallow /catalogsearch/ in the robots.txt file. Thanks
Intermediate & Advanced SEO | | GrappleAgency0 -
Subdomains vs directories on existing website with good search traffic
Hello everyone, I operate a website called Icy Veins (www.icy-veins.com), which gives gaming advice for World of Warcraft and Hearthstone, two titles from Blizzard Entertainment. Up until recently, we had articles for both games on the main subdomain (www.icy-veins.com), without a directory structure. The articles for World of Warcraft ended in -wow and those for Hearthstone ended in -hearthstone and that was it. We are planning to cover more games from Blizzard entertainment soon, so we hired a SEO consultant to figure out whether we should use directories (www.icy-veins.com/wow/, www.icy-veins.com/hearthstone/, etc.) or subdomains (www.icy-veins.com, wow.icy-veins.com, hearthstone.icy-veins.com). For a number of reason, the consultant was adamant that subdomains was the way to go. So, I implemented subdomains and I have 301-redirects from all the old URLs to the new ones, and after 2 weeks, the amount of search traffic we get has been slowly decreasing, as the new URLs were getting index. Now, we are getting about 20%-25% less search traffic. For example, the week before the subdomains went live we received 900,000 visits from search engines (11-17 May). This week, we only received 700,000 visits. All our new URLs are indexed, but they rank slightly lower than the old URLs used to, so I was wondering if this was something that was to be expected and that will improve in time or if I should just go for subdomains. Thank you in advance.
Intermediate & Advanced SEO | | damienthivolle0 -
Any problems with two sites by same owner targeting same keyword search?
I have a site, let's call it ExcellentFreeWidgets.com. There is a page on the site that is very popular and we'll call the page title, "Big Blue Widget." That page is currently #1 for the search "big blue widget." This week, I was able to buy the exact match domain for that page, we'll call it BigBlueWidget.com. I want to build a site on BigBlueWidget.com to better capitalize on that search "big blue widget," which is huge. The content would not be the same wording at all, but it would be the same subject. It would probably be a five page or so website, all about Big Blue Widgets: what they are, where to get them, etc. The sites will not reciprocally link to each other. New new site, BigBlueWidgets.com, would link to the existing site, ExcellentFreeWidgets.com. The new site and the current page will compete for position in the SERPs. Here are my questions to you experts: 1. Will Google care at all that the same entity owns both sites, or will just just rank for the term as they normally would. 2. I am not sure I'll run Adsense on the new site or not. I will be pointing a link back my ExcellentWidgets.com site from a button that says, "Get an Excellent Widget." But if I do run Adsense on it, does Google Adsense care that the same entity has a site and another site's page that are competing for the same term that both have Adsense add on them? Note: I do not want to start a new entity for the new site (I'm in CA and LLC's are $800/year) as it's probably not worth all that hassle and money. Thank you so much. I hope the that obfuscating the real domain names did not confuse the issue too much.
Intermediate & Advanced SEO | | bizzer0 -
Temporarily Delist Search Results
We have a client that we run campaign sites for. They have asked us to turn off our PPC and SEO in the short term so they can run some tests. PPC no problem straight forward action, but not as straight forward to just turn off SEO. Our campaign site is on Page 1, Position 4, 3 places below our clients site. They have asked us to effectively disappear from the landscape for a period of 1-2 months. Has anyone encountered this before, the ability to delist good SERP for a period of time? Details: Very small site with only 17 pages indexed within google, but home page has good SERP result. My issues are, How to approach this in the most effective manor? Once the delisting process is activated and the site/page disappears, then we reverse the process will we get back to where we were? Anyone encountered this before? I realise this is a ridiculous question and goes against SEO logic, get to page 1 results only to remove it, but hey, clients are always presenting new challenges for us to address..... Thanks
Intermediate & Advanced SEO | | Jellyfish-Agency0 -
Local Search without the user typing local?
Hi, I'm a somewhat regionally based voip provider for businesses. So I'm not interested in getting the #1 ranking for voip, but I'd like to get the top for my region. So in this case asheville voip and related searches. However, I know that alot of users in Asheville are not typing in Asheville voip when they google. They're just typing in voip or free voip, or cisco voip. Here's my Google Insight Search: http://www.google.com/insights/search/#q=voip&geo=US-NC&date=today%2012-m&cmpt=q So what I was thinking about doing was in addition to my main site. Building several smaller 'educational based sites' about the benefits of VOIP. Based on google insights something like ashevillevoipphone.com. And use it to capture leads and link to my main site. So my question is this: Is this a good strategy? If people in Asheville are just typing in voip phone, will ashevillevoipphone.com automatically have a better chance at a higher ranking? Thanksd David
Intermediate & Advanced SEO | | StraightRazorDesigns0