Robot.txt : How to block a specific file type in several subdirectories ?
-
Hello everyone !
I need help setting up a robot.txt.
I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site.
Block files of a specific file type (for example,
.gif
) | Disallow: /*.gif$2 questions :
- Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ?
Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$
- Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files.
Let's say I want to block pdf files in all these 3 directories
/fileadmin/directory1
/fileadmin/directory1/sub1
/fileadmin/directory1/sub1/pdf
Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple :
Disallow: /fileadmin/directory1*/
Many thanks in advance for any insight you may have.
-
Hey thank you for your answer, really appreciate it.
-
Use this code -
Disallow: /*.f$
If you want to block only one folder then use this -
Disallow: /folder1/.*f$
This rule will help to block both files only .pdf and .gif
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Same URL, different Drupal content types
Hi all, I am working in Drupal which isn't always SEO-friendly. I want to convert some of our articles that are currently in an old article type to our new shiny longform template without losing SEO value. The process we use right now is to: change the URL of the old article in the CMS from /article-title to /article-title-old and then make the longform template /article-title in the CMS. Then hit publish. That way we can avoid having to mess with redirects. My concerns are that this will be seen as a bait and switch by Google. They are, after all, two separate pages — node-1 and node-2 on the back end — that are being smushed into the same skin aka same URL. I don't know if updating to the new template wipes out some of the info Google may have deemed important. I guess you could argue it's a redesign by CMS but I'm still not sure. Thoughts?
Technical SEO | | webbedfeet0 -
Can ht access file affect page load times
We have a large and old site. As we've transition from one CMS to another, there's been a need for create 301 redirects using our ht access file. I'm not a technical SEO person, but concerned that the size of our ht access file might be contributing source for long page download times. Can large ht access files cause slow page load times? Or is the coding of the 301 redirect a cause for slow page downloads? Thanks
Technical SEO | | ahw1 -
How to implement schema.org for different hotel rooms types
I'm working on a resort that has different type of rooms available. Does anyone know how to use schema.org to set it a hotel with different hotel room types. I looked at the hotel schema but I did not see any room types. Thanks!
Technical SEO | | ppapola0 -
Exclude root url in robots.txt ?
Hi, I have the following setup: www.example.com/nl
Technical SEO | | mikehenze
www.example.com/de
www.example.com/uk
etc
www.example.com is 301'ed to www.example.com/nl But now www.example.com is ranking instead of www.example.com/nl
Should is block www.example.com in robots.txt so only the subfolders are being ranked?
Or will i lose my ranking by doing this.0 -
Google (GWT) says my homepage and posts are blocked by Robots.txt
I guys.. I have a very annoying issue.. My Wordpress-blog over at www.Trovatten.com has some indexation-problems.. Google Webmaster Tools data:
Technical SEO | | FrederikTrovatten22
GWT says the following: "Sitemap contains urls which are blocked by robots.txt." and shows me my homepage and my blogposts.. This is my Robots.txt: http://www.trovatten.com/robots.txt
"User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/ Do you have any idea why it says that the URL's are being blocked by robots.txt when that looks how it should?
I've read a couple of places that it can be because of a Wordpress Plugin that is creating a virtuel robots.txt, but I can't validate it.. 1. I have set WP-Privacy to crawl my site
2. I have deactivated all WP-plugins and I still get same GWT-Warnings. Looking forward to hear if you have an idea that might work!0 -
Having to type Google CAPTCHA all the time
Hi guys, Our office has about 15 computers all on the same IP address and about 10 actively search on Google. Recently we have been asked to type in CAPTCHA almost every single time searching on Google and would like to know if you have any suggestions of resolving this. We do use Firefox Rank Checker to check ranking once per week (around 400 keywords) but we use Hide My Ass to hide the IP. No malware or virus detected on computers in the network. Many thanks for your help in advance David
Technical SEO | | sssrpm0 -
Should we block URL param in Webmaster tools after URL migration?
Hi, We have just released a new version of our website that now has a human readable nice URL's. Our old ugly URL's are still accessible and cannot be blocked/redirected. These old URL's use a URL param that has an xpath like expression language to define the location in our catalog. We have about 2 million pages indexed with this old URL param in it while we have approximately 70k nice URL's after the migration. This high number of old URL's is due to facetting that was done using this URL param. I wonder if we should now completely block this URL param from Google Webmaster tools so that these ugly URL's will be removed from the Google index. Or will this harm our position in Google? Thanks, Chris
Technical SEO | | eCommerceSEO0 -
What is consider best practice today for blocking admins from potentially getting indexed
What is consider best practice today for blocking pages, for instance xyz.com/admin pages, from getting indexed by the search engines or easily found. Do you recommend to still disallow it in the robots.txt file or is the robots.txt not the best place to notate your /admin location because of hackers and such? Is it better to hide the /admin with an obscure name, use the noidex tag on the page and don't list in the robots.txt file?
Technical SEO | | david-2179970