Robots.txt, does it need preceding directory structure?
-
Do you need the entire preceding path in robots.txt for it to match?
e.g:
I know if i add Disallow: /fish to robots.txt it will block
/fish
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anythingBut would it block?:
en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything(taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier!
As basically I'm wanting to block many URL that have BTS- in such as:
http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybobBut have other pages that I do not want blocked, in subfolders that also have BTS- in, such as:
http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingyThanks for listening
-
Yes this is what I thought, but wanted some second opinions.
Although I wouldn't actually need a wild card after BTS, as just leaving it open is the same as using a wildcard:
/fish*.......... Equivalent to "/fish" -- the trailing wildcard is ignored. https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt Thanks for the link, I'll take a look
-
You're right in with the **Disallow: /fish **in the robots file blocking all those initial links, but if you wanted to block everything inside the /en/ folder, you would need to do disallow: /en/fish
You could use a wildcard in the robots.txt file to do something along the lines of Disallow: /BTS-*
This _'should' _work, but it's always worth checking using a tool to make sure it's all implemented correctly. Distilled did a post a while back about a JS tool which allows you to test if robots.txt files work correctly which can be found here - http://www.distilled.net/blog/seo/js-bookmarklet-for-checking-if-a-page-is-blocked-by-robots-txt/
In addition to this, you could also use the 'blocked URLs' tool in GWT to see if the pages are successfully blocked once you've implemented the code.
Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
AU and US site needs Hreflang?
Hi guys, Just want to confirm if we need some SEO actions on two of our sites. Example:
Intermediate & Advanced SEO | | brandonegroup
https://www.example.com.au/collections/dresses
https://exampleamerica.com/collections/dresses Will the domain naming will fix the issue of possible duplication?
Do we still need to implement hreflang markups?0 -
Do I need to do 301s in this situation?
We have an e-commerce site built on Magento 2. We launched a few months ago, and had about 2K categories. The categories got indexed in Google for the most part. Shortly after launch, we decided to go with SLI for search and navigation because the native search/navigation was too slow given our database. The navigation pages are now hosted navigation pages; meaning, the URLs have changed and they are now hosted by SLI. I have done 301s for the most popular categories, but I didn't do 301s for all categories as we have to go through each category one-by-one and map it to the correct navigation page. Our new category sitemap only lists the new SLI category URLs. Will the fact that we have not 301'd all of our former categories hurt us as far as SEO? Do I have to do 301 redirects for all former category pages?
Intermediate & Advanced SEO | | kevin_h0 -
What is the best structure for paginating comment structures on pages to preserve the maximum SEO juice?
You have a full webpage with a great amount of content, images & media. This is a social blogging site where other members can leave their comments and reactions to the article. Over time there are say 1000 comments on this page. So we set the canonical URL, and use Rel (Prev & Next) to tell the bots that the next subsequent block of 100 comments is attributed to the primary URL. Or... We allow the newest 10 comments to exist on the primary URL, with a "see all" comments link that refers to a new URL, and that is where the rest of the comments are paginated. Which option does the community feel would be most appropriate and would adhere to the best practices for managing this type of dynamic comment growth? Thanks
Intermediate & Advanced SEO | | HoloGuy0 -
Help with Robots.txt On a Shared Root
Hi, I posted a similar question last week asking about subdomains but a couple of complications have arisen. Two different websites I am looking after share the same root domain which means that they will have to share the same robots.txt. Does anybody have suggestions to separate the two on the same file without complications? It's a tricky one. Thank you in advance.
Intermediate & Advanced SEO | | Whittie0 -
CHange insite Urls structure
Hello Guys! I have a situation with a website and I need some opinions. Today, the structured of my site is: (I have had this site architecture since many years) Main country home (www.mysite.com.tld) o Product_1 Home (www.mysite.com.tld/product1/) § Product_1 articles www.mysite.com.tld/product1/product1_art1 www.mysite.com.tld/product1/product1_art2 www.mysite.com.tld/product1/product1_artx o Product_2 Home (www.mysite.com.tld/product2/) § Product_2 articles www.mysite.com.tld/product1/product2_art1 www.mysite.com.tld/product1/product2_art2 www.mysite.com.tld/product1/product2_artx I have several TLDs with their main and their products. We are thinking in modify this structure and begin to use subdomains for each product (The IT guys need this approach because is simpler to distribute the servers load). I not very friendly with subdomains and big changes like this always can produce some problem (although the SEO migration would be ok, problems could appear, like ranking drops), But, the solution (the reasons are technical stuff), requires the mix of directories and subdomains in each product, leaving the structured in this way: Main country home (www.mysite.com.tld) o Product_1 Home (www.mysite.com.tld/product1/) § Product_1 articles product1.mysite.com.tld/product1_art1 product1.mysite.com.tld/product1_art2 product1.mysite.com.tld/product1_artx o Product_2 Home (www.mysite.com.tld/product2/) § Product_2 articles product2.mysite.com.tld/product1_art1 product2.mysite.com.tld/product1_art2 product2.mysite.com.tld/product1_artx So, the product home will be in a directory buy the pages of the articles of this product will be in a subdomain. What do you think about this solution? Beyond that the SEO migration would be fine, 301s, etc, can bring us difficulties in the rankings or the change can be done without any consideration? Thanks very much! Agustin
Intermediate & Advanced SEO | | SEOTeamDespegar0 -
Site Structured Navigated by Cookies
Is it advisable to have a site structure that is navigated via URLs rather than cookies? In a website that has several location based pages - each with their own functions and information? Is this a SEO priority? Will it help to combat duplicate content? Any help would be greatly appreciated!
Intermediate & Advanced SEO | | J_Sinclair0 -
Meta canonical or simply robots.txt other domain names with same content?
Hi, I'm working with a new client who has a main product website. This client has representatives who also sells the same products but all those reps have a copy of the same website on another domain name. The best thing would probably be to shut down the other (same) websites and redirect 301 them to the main, but that's impossible in the minding of the client. First choice : Implement a conical meta for all the URL on all the other domain names. Second choice : Robots.txt with disallow for all the other websites. Third choice : I'm really open to other suggestions 😉 Thank you very much! 🙂
Intermediate & Advanced SEO | | Louis-Philippe_Dea0 -
What is the best URL structure for categories?
A client's site currently uses the URL structure: www.website.com/�tegory%/%postname% Which I think is optimised fairly well, as the categories are keywords being targeted. However, as they are using a category hierarchy, often times the URL looks like this: www.website.com/parent-category/child-category/some-post-titles-are-quite-long-as-they-are-long-tail-terms Best practise often dictates (such as point 3 in this Moz article) that shorter URLs are better for several reasons. So I'm left with a few options: Remove the category from the URL Flatten the category hierarchy Shorten post titles two a word or two - which would hurt my long tail search term traffic. Leave it as it is What do we think is the best route to take? Thanks in advance!
Intermediate & Advanced SEO | | underscorelive0