Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robot.txt : How to block a specific file type in several subdirectories ?
-
Hello everyone !
I need help setting up a robot.txt.
I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site.
Block files of a specific file type (for example,
.gif
) | Disallow: /*.gif$2 questions :
- Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ?
Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$
- Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files.
Let's say I want to block pdf files in all these 3 directories
/fileadmin/directory1
/fileadmin/directory1/sub1
/fileadmin/directory1/sub1/pdf
Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple :
Disallow: /fileadmin/directory1*/
Many thanks in advance for any insight you may have.
-
Hey thank you for your answer, really appreciate it.
-
Use this code -
Disallow: /*.f$
If you want to block only one folder then use this -
Disallow: /folder1/.*f$
This rule will help to block both files only .pdf and .gif
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How do I complete a reverse DNS check when completing log file analysis?
I'm doing some log file analysis and need to run a reverse DNS check to ensure that I'm analysing logs from Google and not any imposters. Is there a command I can use in terminal to do this? If not, whats the best way to verify Googlebot? Thanks
Technical SEO | | daniel-brooks0 -
Recommended log file analysis software for OS X?
Due to some questions over direct traffic and Googlebot behavior, I want to do some log file analysis. The catch is this is a Mac shop, so all our systems are on OS X. I have Windows 8 running in an emulator, but for the sake of simplicity I'd rather run all my software in OS X. This post by Tim Resnik recommended Web Log Explorer, but it's for Windows only. I did discover Sawmill, which claims to run on any platform. Any other suggestions? Bear in mind our site is load balanced over three servers, so please take that into consideration.
Technical SEO | | ufmedia0 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
Creating a CSV file for uploading 301 redirect URL map
Hi if i'm bulk uploading 301 redirects whats needed to create a csv file? is it just a case of creating an excel spreadsheet & have the old urls in column A and new urls in column B and then just convert to csv and upload ? or do i need to put in other details or paremeters etc etc ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Block Baidu crawler?
Hello! One of our websites receives a large amount of traffic from the Baidu crawler. We do not have any Chinese content or do any business with China since our market is Uk. Is it a good idea to block the Baidu crawler in the robots.txt or could it have any adverse effects on SEO of our site? What do you suggest?
Technical SEO | | AJPro0 -
Internal search : rel=canonical vs noindex vs robots.txt
Hi everyone, I have a website with a lot of internal search results pages indexed. I'm not asking if they should be indexed or not, I know they should not according to Google's guidelines. And they make a bunch of duplicated pages so I want to solve this problem. The thing is, if I noindex them, the site is gonna lose a non-negligible chunk of traffic : nearly 13% according to google analytics !!! I thought of blocking them in robots.txt. This solution would not keep them out of the index. But the pages appearing in GG SERPS would then look empty (no title, no description), thus their CTR would plummet and I would lose a bit of traffic too... The last idea I had was to use a rel=canonical tag pointing to the original search page (that is empty, without results), but it would probably have the same effect as noindexing them, wouldn't it ? (never tried so I'm not sure of this) Of course I did some research on the subject, but each of my finding recommanded one of the 3 methods only ! One even recommanded noindex+robots.txt block which is stupid because the noindex would then be useless... Is there somebody who can tell me which option is the best to keep this traffic ? Thanks a million
Technical SEO | | JohannCR0 -
Can I Disallow Faceted Nav URLs - Robots.txt
I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls. So disallow: /category.html/? /category2.html/? /category3.html/*? To prevent the price faceted url from being cached: /category.html?price=1%2C1000
Technical SEO | | tylerfraser
and
/category.html?price=1%2C1000&product_material=88 Thanks!0 -
Robots.txt File Redirects to Home Page
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/
Technical SEO | | kchandler0