Robots.txt, does it need preceding directory structure?
-
Do you need the entire preceding path in robots.txt for it to match?
e.g:
I know if i add Disallow: /fish to robots.txt it will block
/fish
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anythingBut would it block?:
en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything(taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier!
As basically I'm wanting to block many URL that have BTS- in such as:
http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybobBut have other pages that I do not want blocked, in subfolders that also have BTS- in, such as:
http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingyThanks for listening
-
Yes this is what I thought, but wanted some second opinions.
Although I wouldn't actually need a wild card after BTS, as just leaving it open is the same as using a wildcard:
/fish*.......... Equivalent to "/fish" -- the trailing wildcard is ignored. https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt Thanks for the link, I'll take a look
-
You're right in with the **Disallow: /fish **in the robots file blocking all those initial links, but if you wanted to block everything inside the /en/ folder, you would need to do disallow: /en/fish
You could use a wildcard in the robots.txt file to do something along the lines of Disallow: /BTS-*
This _'should' _work, but it's always worth checking using a tool to make sure it's all implemented correctly. Distilled did a post a while back about a JS tool which allows you to test if robots.txt files work correctly which can be found here - http://www.distilled.net/blog/seo/js-bookmarklet-for-checking-if-a-page-is-blocked-by-robots-txt/
In addition to this, you could also use the 'blocked URLs' tool in GWT to see if the pages are successfully blocked once you've implemented the code.
Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Silo Structure in the eye of google?
does silo structure has a positive point on Google Ranking or not, and what is the importance of internal linking, how google see the internal linking content as compared to less internal linking, I'm trying an experiment I do a lot of internal backlinking in Website Unionwell as compared to Website B (which has apparently less internal Links) so with your experience in SEO field which site will get traffic rapidly.
Intermediate & Advanced SEO | | saimkhanna0 -
Regarding SEO Structured Data
1. Should we add organization schema on all pages of the website OR just homepage? 2. What is the best practice for catalog page schema as every website is following a different pattern?
Intermediate & Advanced SEO | | Rajesh.Prajapati1 -
URL structure - which one is better?
We are creating a new website and got stuck while deciding the URL structure. Our concern is which url is better in terms of SEO i.e. pune.fabogo.com/spa or fabogo.com/pune/spa and why. Also which one would rank faster if someone searches for **spas in pune if both **pages are same.
Intermediate & Advanced SEO | | fabogo_marketing0 -
Wordpress Blog in 2 languages. How to SEO or structure it?
Hi Moz community, I have got a wordpress blog currently in the spanish language. I want to create the same blog content but in english version. (manually translate it to english instead of using translation service such as Google Translate). How should i structure the blog for SEO? How will it work? Any structure markups i should know about? Any examples? Thanks
Intermediate & Advanced SEO | | WayneRooney0 -
How much content is needed to be competitive and rank well?
When considering on page / on site seo what process do you use / take to evaluate how much content is needed to be competitive and rank well?
Intermediate & Advanced SEO | | marknorman0 -
Help needed for a domain
I have a small translation agency in Brazil (this website), totally dependent on SEM. We are in business since 2007, and we were on top position for many relevant keywords until the middle of 2011, when the ranking for the most important keywords started dropping. In that time, we believed that we needed to redesign the old static website and replace it by a new modern one, with fresh content and with weekly updates, which we did, and it's now hosted on Squarespace. I took care to keep the old links working with 301 redirections. When we made the transfer from the static site to Squarespace (Mar/2012, see the attachment), the ranking dropping became even more serious. Today, we have less than 50 unique visitors per day, in a total desperate situation! To make things worse, we received an alert from Google on 23/September/2012 talking about unnatural inbound links, but Google said that "As a result, for this specific incident we are taking very targeted action on the unnatural links instead of your site as a whole", so we thought we didn't need to worry about. Google was correct, I worked many hours to register our website in web directories, I thought there would be no problem since I was doing this manually. My conclusions are: Something happened prior to Mar/2012 that was making us losing territory. I just don't know what! The migration to Squarespace was a huge mistake. I lost control over the html, and squarespace doesn't do a good job optimizing the pages for SEO. We also were also blasted by Penguin on September, but I believe this is not the main cause of the drop. We were already running very badly at this time. My actions are: a) I generated a DTOX report and I'm trying to clean up the links marked as toxic. That's a hard work! After that I will submit a reconsideration request. b) I'm working on the site: Improving internal link building for relevant keywords Recently I removed a "tag cloud" which I believe was hurting my SEO. Also, I did some redirections that were missing. c) I trying to generate new content to improve link building to my site. d) I'm also considering to stop putting all my coins on this domain, and maybe start a fresh new one. Yes, I'm desperate! 🙂 I would appreciate a lot to hear from you guys, expert people! Thanks a lot, MWcEdPa.png?1
Intermediate & Advanced SEO | | rodrigofreitas0 -
Need Reviews on my new website
Hi, I recently developed this website: http://goo.gl/fl5a5 And started link building to that website and getting some very good links so far. So far ok, but i would request some experienced guys here to post some reviews and help me with your suggestions so that i can rank better. Its been a month since i started link building to this site. . PS: I have cloned my competitors site with unique content. Will this becomes an issue? You can check my competitors site by Google'in my site entire title. Please let me know your thoughts on this.
Intermediate & Advanced SEO | | Vegit0 -
Robots.txt: Can you put a /* wildcard in the middle of a URL?
We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt. For example: Disallow: /images/ is blocked just fine However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed. The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds?
Intermediate & Advanced SEO | | IHSwebsite0