How do i block an entire category/directory with robots.txt?
-
Anyone has any idea how to block an entire product category, including all the products in that category using the robots.txt file? I'm using woocommerce in wordpress and i'd like to prevent bots from crawling every single one of products urls for now.
The confusing part right now is that i have several different url structures linking to every single one of my products for example www.mystore.com/all-products, www.mystore.com/product-category, etc etc.
I'm not really sure how i'd type it into the robots.txt file, or where to place the file.
any help would be appreciated thanks
-
Thanks for the detailed answer, i will give it a try!
-
Hi
This should do it, you place the robots.txt in the root directory of your site.
User-agent: * Disallow: /product-category/
You can check out some more examples here: http://www.seomoz.org/learn-seo/robotstxt
As for the multiple urls linking to the same pages, you will just need to check all possible variants and make sure you have them covered in the robots.txt file.
Google webmaster tools has a page where you can use to check if the robots.txt file is doing what you expect it to do (under Health -> Blocked Urls).
It might be easier to block the pages with a meta tag as described in the link above if you are running a plugin allowing this, that should take care of all the different url structures also.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Hi - How do you get rid of duplicate content that was accidentally created on a tag url? For example, when I published a new article, the content was duplicated on: /posts/tag/lead-generation/
the original article was created with: /posts/shippers-looking-for-freight-brokers/ How can I fix this so a new URL is not created every time I add a tag to a new posting?
On-Page Optimization | | treetopgrowthstrategy0 -
Tech/Development Pricing ??
Hi I've submitted some site change request to a clients dev team and they have got back with a rather large quote thats left me gobsmacked, so i want to run it by any of you to get your feedback ? Creating/integrating a dynamic xml site map @ £3450 Fixing Crawl Errors & Warning reported by Moz report (i..e adding canonicalisation tag to each page to deal with dupe content and a few other small/misc items) @ £4850 Setting up/enabling ecom tracking in Analytics @ £910 Setting up 1 events based goal in analytics @ £490 Are these prices even plausible ? Cheers Dan
On-Page Optimization | | Dan-Lawrence0 -
Image heavy pages: Google friendly fonts / seo text etc
Hi Google friendly fonts - are these in wide use now, do they work ? If you have image heavy site do they work just as well as using what we used to call 'seo text'. I have heard that 'seo text' not really used anymore or at least rebranded to 'helpful, informative paragraph or two of body copy about the page with a couple of the pages target keywords in it'. I take it if fonts in image not google friendly then should still ask dev for some space to fit in a para or two of some proper body copy, with couple of pages target kw in it ? Also looking like if i succeed in this request will be below the fold, how hard should i fight for it to be above the fold ? cheers dan
On-Page Optimization | | Dan-Lawrence0 -
New jobboard: Can redirecting folder (site.com/jobboard) to subdomain (jobboard.site.com) hurt SEO?
Hi there, I'm planning to implement a jobboard on my website which needs to be installed on a subdomain (jobboard.site.com) but I'd really like to use site.com/jobboard for promoting this jobboard (jobboard collects external industry jobs). Are there any possible disadvantages when I set up a 301 redirect from jobboard.site.com to site.com/jobboard? Also: What if I want to move this jobboard to a unique domain one day (e.g. jobboard-industry-xy.com), Would that be tricky (as I'd basically have to redirect the folder-to-subdomain redirect to an external domain and therefore get a folder-to-subdomain-to-external-domain redirect...)? Cheers, Thomas
On-Page Optimization | | stl990 -
Dealing with thin content/95% duplicate content - canonical vs 301 vs noindex
My client's got 14 physical locations around the country but has a webpage for each "service area" they operate in. They have a Croydon location. But a separate page for London, Croydon, Essex, Luton, Stevenage and many other places (areas near Croydon) that the Croydon location serves. Each of these pages is a near duplicate of the Croydon page with the word Croydon swapped for the area. I'm told this was a SEO tactic circa 2001. Obviously this is an issue. So the question - should I 301 redirect each of the links to the Croydon page? Or (what I believe to be the best answer) set a rel=canonical tag on the duplicate pages). Creating "real and meaningful content" on each page isn't quite an option, sorry!
On-Page Optimization | | JamesFx0 -
Does Google respect User-agent rules in robots.txt?
We want to use an inline linking tool (LinkSmart) to cross link between a few key content types on our online news site. LinkSmart uses a bot to establish the linking. The issue: There are millions of pages on our site that we don't want LinkSmart to spider and process for cross linking. LinkSmart suggested setting a noindex tag on the pages we don't want them to process, and that we target the rule to their specific user agent. I have concerns. We don't want to inadvertently block search engine access to those millions of pages. I've seen googlebot ignore nofollow rules set at the page level. Does it ever arbitrarily obey rules that it's been directed to ignore? Can you quantify the level of risk in setting user-agent-specific nofollow tags on pages we want search engines to crawl, but that we want LinkSmart to ignore?
On-Page Optimization | | lzhao0 -
Tool for Generating Sitemap/ URL List
HI, I'm looking for a tool that'll generate a URL list for a site. I looked at this thread here http://www.seomoz.org/q/online-sitemap-generator which came up when I searched for sitemap generator. However, I don't need a sitemap per se, and I don't need to submit it to Google - just a list of pages is what I need.If it updated automatically, that would be useful as well. Anyone know of a tool, on or offline? Or anyone used Xenu and know if it's what I'm looking for? Or is there a simple solution that I'm missing? Thanks.
On-Page Optimization | | 5225Marketing0 -
What reasons exist to use noindex / robots.txt?
Hi everyone. I realise this may appear to be a bit of an obtuse question, but that's only because it is an obtuse question. What I'm after is a cataloguing of opinion - what reasons have SEOs had to implement noindex or add pages to their robots.txt on the sites they manage?
On-Page Optimization | | digitalstream0