Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robot.txt : How to block a specific file type in several subdirectories ?
-
Hello everyone !
I need help setting up a robot.txt.
I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site.
Block files of a specific file type (for example,
.gif) | Disallow: /*.gif$2 questions :
- Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ?
Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$
- Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files.
Let's say I want to block pdf files in all these 3 directories
/fileadmin/directory1
/fileadmin/directory1/sub1
/fileadmin/directory1/sub1/pdf
Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple :
Disallow: /fileadmin/directory1*/
Many thanks in advance for any insight you may have.
-
Hey thank you for your answer, really appreciate it.
-
Use this code -
Disallow: /*.f$
If you want to block only one folder then use this -
Disallow: /folder1/.*f$
This rule will help to block both files only .pdf and .gif
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt in subfolders and hreflang issues
A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations: UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
Technical SEO | | lauralou82
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US. They have the following hreflang tags across all pages: We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously). Search Console says there are no hreflang tags at all. Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location. Any suggestions how we can remove UK listings from Google US and vice versa?0 -
Robots txt. in page with 301 redirect
We currently have a a series of help pages that we would like to disallow from our robots txt. The thing is that these help pages are located in our old website, which now has a 301 redirect to current site. Which is the proper way to go around? 1- Add the pages we want to disallow to the robots.txt of the new website? 2- Break the redirect momentarily and add the pages to the robots.txt of the old one? Thanks
Technical SEO | | Kilgray0 -
Is there a limit to how many URLs you can put in a robots.txt file?
We have a site that has way too many urls caused by our crawlable faceted navigation. We are trying to purge 90% of our urls from the indexes. We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags. Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file? Could this cause any issues for us? Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.
Technical SEO | | kcb81780 -
Is there any value in having a blank robots.txt file?
I've read an audit where the writer recommended creating and uploading a blank robots.txt file, there was no current file in place. Is there any merit in having a blank robots.txt file? What is the minimum you would include in a basic robots.txt file?
Technical SEO | | NicDale0 -
Should I block robots from URLs containing query strings?
I'm about to block off all URLs that have a query string using robots.txt. They're mostly URLs with coremetrics tags and other referrer info. I figured that search engines don't need to see these as they're always better off with the original URL. Might there be any downside to this that I need to consider? Appreciate your help / experiences on this one. Thanks Jenni
Technical SEO | | ShearingsGroup0 -
301 Redirect on a PDF, DOCX files?
Hi, I have to rename many pdf and docx files. How can I implement 301 redirect on them as they are linked from 'n' number of places? Regards, Shailendra Sial
Technical SEO | | IM_Learner1 -
Microsite on subdomain vs. subdirectory
Based on this post from 2009, it's recommended in most situations to set up a microsite as a subdirectory as opposed to a subdomain. http://www.seomoz.org/blog/understanding-root-domains-subdomains-vs-subfolders-microsites. The primary argument seems to be that the search engines view the subdomain as a separate entity from the domain and therefore, the subdomain doesn't benefit from any of the trust rank, quality scores, etc. Rand made a comment that seemed like the subdomain could SOMETIMES inherit some of these factors, but didn't expound on those instances. What determines whether the search engine will view your subdomain hosted microsite as part of the main domain vs. a completely separate site? I read it has to do with the interlinking between the two.
Technical SEO | | ryanwats0 -
How to find a specific link on my website (currently causing redirects)
Hi everyone, I've used crawlers like Xenu to find broken links before, and I love these tools. What I can't figure out is how to find specific pieces of code within my site. For example, Webmaster Tools tells me there are still links to old pages somewhere on my website but I just can't find them. Do you know of a crawler that can search for a specific link within the html? Thanks in advance, Josh
Technical SEO | | dreadmichael0