Robots.txt, does it need preceding directory structure?
-
Do you need the entire preceding path in robots.txt for it to match?
e.g:
I know if i add Disallow: /fish to robots.txt it will block
/fish
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anythingBut would it block?:
en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything(taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier!
As basically I'm wanting to block many URL that have BTS- in such as:
http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybobBut have other pages that I do not want blocked, in subfolders that also have BTS- in, such as:
http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingyThanks for listening
-
Yes this is what I thought, but wanted some second opinions.
Although I wouldn't actually need a wild card after BTS, as just leaving it open is the same as using a wildcard:
/fish*.......... Equivalent to "/fish" -- the trailing wildcard is ignored. https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt Thanks for the link, I'll take a look
-
You're right in with the **Disallow: /fish **in the robots file blocking all those initial links, but if you wanted to block everything inside the /en/ folder, you would need to do disallow: /en/fish
You could use a wildcard in the robots.txt file to do something along the lines of Disallow: /BTS-*
This _'should' _work, but it's always worth checking using a tool to make sure it's all implemented correctly. Distilled did a post a while back about a JS tool which allows you to test if robots.txt files work correctly which can be found here - http://www.distilled.net/blog/seo/js-bookmarklet-for-checking-if-a-page-is-blocked-by-robots-txt/
In addition to this, you could also use the 'blocked URLs' tool in GWT to see if the pages are successfully blocked once you've implemented the code.
Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Merging Two Sites: Need Help!
I have two existing e-commerce sites. The older one, is built on the Yahoo platform and had limitations as far as user experience. The new site is built on the Magento 2 platform. We are going to be using SLI search for our search and navigation on the new Magento platform. SLI wants us to 301 all of our categories to the hosted category pages they will create, that will have a URL structure akin to site.com/shop/category-name.html. The issue is: If I want to merge the two sites, I will have to do a 301 to the category pages of the new site, which will have 301s going to the category pages hosted by SLI. I hope this makes sense! The way I see it, I have two options: Do a 301 from the old domain to categories of the new domain, and have the new domain's categories 301 to the SLI categories; or, I can do my 301s directly to the SLI hosted category pages. The downside of #1 is that I will be doing two 301s, and I know I will lose more link juice as a result. The upside of #1, is that if decide not to use SLI in the future, it is one less thing to worry about. The downside of #2, is that I will be directing all the category pages from the old site to a site I do not ultimately control. I appreciate any feedback.
Intermediate & Advanced SEO | | KH20171 -
Menu Structure & SEO
Hi I have been trying to decide whether we need to change our menu structure http://www.key.co.uk/en/key/ We have a lot of subcategories which are not in the menu structure and for SEO I wonder whether its best to have menu drop downs, so if a customer hovers over one category, it will display all the subcategories within this. I am concerned that sub categories we are trying to rank are many levels away from the homepage e.g If you want to find leather office chairs from the homepage, you have to go to the 'More categories' link, then choose seating > office seating > leather office seating. Users need to do a lot of navigating before seeing what we offer. I would prefer if a user could see these options in the menu when they hover over it. Does anyone think this would help SEO or just customer journey? Thank you
Intermediate & Advanced SEO | | BeckyKey0 -
How careful do you need to be about changes to readable URLs?
We are moving to Sitecore where the standard out the box is that if you change page title it amends the URL as well. I am worried that this will lead to SEO issues and am considering whether we need to get it locked down so that if the page title is amended (only in a minor way) it does not also change the URL. I have never worked with readable URLs before - what are the implications of the URL not exactly matching the wording of the page title?
Intermediate & Advanced SEO | | alzheimerssoc0 -
Recommended URL Structure
Hello, We are currently adding a new section of content on our site related to Marketing and more specifically 'Digital Marketing' (research reports, trend studies, etc). Over time (several months, or 1-3 years) we will add more 'general' marketing content. My question is which of the following URL structures makes more sense from an SEO perspective (and how best to quantify the benefit of one over another): www.mysite.com/marketing/digital/research/... www.mysite.com/digital-marketing/research/.. Thanks, Mike
Intermediate & Advanced SEO | | mike-gart0 -
Robots.txt: how to exclude sub-directories correctly?
Hello here, I am trying to figure out the correct way to tell SEs to crawls this: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ or this: http://www.mysite.com/directory/sub-directory2/sub-directory/... But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way: disallow: /directory/sub-directory/ disallow: /directory/sub-directory2/ disallow: /directory/sub-directory/sub-directory/ disallow: /directory/sub-directory2/subdirectory/ etc... I would end up having thousands of definitions to disallow all the possible sub-directory combinations. So, is the following way a correct, better and shorter way to define what I want above: allow: /directory/$ disallow: /directory/* Would the above work? Any thoughts are very welcome! Thank you in advance. Best, Fab.
Intermediate & Advanced SEO | | fablau1 -
Generating Rich Snippets without Structured Data
I noticed something in Google search results today that I can't explain. Any help would be appreciated. I performed a real estate based search and the top result featured a rich snippet showcasing the following... Address Price Bd/Ba
Intermediate & Advanced SEO | | RyanOD
912 Garden District Dr #17. Charlotte, NC 28202 $179,990 3 / 2
222 S Caldwell St #1602. Charlotte, NC 28202 $389,238 2 / 2&1/2 However, when I visit the page associated with this information, there is no Schema to be found. In fact, the page is, for the most part, just a large table listing homes on the market. The table headings are Address, Price, and Bd/Ba. Is it common for Google to use table based data to generate rich snippets? What is the best way to influence this? In the absence of Schema (as the page we are talking about has no Schema implementation), does Google default to table data? Has anyone seen this behavior before and, if so, can you point me to it? EDIT: I've now come across a few other examples where the information is not in a table, but rather in divs. Why are such sites (you can find some by searching for "[ZIPCODE] real estate") getting this treatment?0 -
Files blocked in robot.txt and seo
I use joomla and I have blocked the following in my robots.txt is there anything that is bad for seo ? User-agent: * Disallow: /administrator/ Disallow: /cache/ Disallow: /components/ Disallow: /images/ Disallow: /includes/ Disallow: /installation/ Disallow: /language/ Disallow: /libraries/ Disallow: /media/ Disallow: /modules/ Disallow: /plugins/ Disallow: /templates/ Disallow: /tmp/ Disallow: /xmlrpc/ Disallow: /mailto:myemail@myemail.com/ Disallow: /javascript:void(0) Disallow: /.pdf
Intermediate & Advanced SEO | | seoanalytics0 -
Rankings Nose Diving Help Needed
Hey There SEO Community, I am trying to help these people: http://goo.gl/B1smo They once ranked in the top 10 for "lifewave" and "lifewave patches" but have disappeared. Any idea why and what I can do to help? Thanks!
Intermediate & Advanced SEO | | siteoptimized0