Best practice for disallowing URLS with Robots.txt

centurysafety

Hi Everybody,

We are currently trying to tidy up the crawling errors which are appearing when we crawl the site. On first viewing, we were very worried to say the least:17000+. But after looking closer at the report, we found the majority of these errors were being caused by bad URLs featuring:

Currency - For example: "directory/currency/switch/currency/GBP/uenc/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL3dvcmt3ZWFyP3ByaWNlPTUwLSZzdGFuZGFyZHM9NzEx/"
Color - For example: ?color=91
Price - For example: "?price=650-700"
Order - For example: ?dir=desc&order=most_popular
Page - For example: "?p=1&standards=704"
Login - For example: "customer/account/login/referer/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL2NhdGFsb2cvcHJvZHVjdC92aWV3L2lkLzQ1ODczLyNyZXZpZXctZm9ybQ,,/"

My question now is as a novice of working with Robots.txt, what would be the best practice for disallowing URLs featuring these from being crawled?

Any advice would be appreciated!

TimHolmes

If you are looking to disallow url parameters you could use something like the following as a convention.

Disallow: /? or Disallow: /?dir=&order=&p= if you wanted to be more accurate with specific parameters. There have been a few Moz questions of this type over the last few years, if you do look to remove the parameters.

Also try and ensure that the product pages you have listed are well canonicalised and point to the original product etc. A good review on how to do this can be found here. This will in most cases be enough to remove any indexation/duplicate issues.

JordanLowry

First I assume you have webmaster tools set up?

They have a robots.txt tester tool which you can test out different parameters to make sure you get the right syntax. For example color would be blocked by: Disallow: /?color=91* and you would follow that similar format more or less.

If you are confused I highly recommend reading through Moz's robots.txt best practices guide before you make any changes. Be sure to test all out in webmaster tools(search console)>robots.txt tester.

Let me know if you run into any problems.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Best practice for disallowing URLS with Robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

SEO Best Practices regarding Robots.txt disallow

Www. or naked url?

SEO Best eCommerce Practice - Same Product Different Keywords

The Consequences & Best Practices In Changing Domains

Why is this url redirecting to our site?

Meta canonical or simply robots.txt other domain names with same content?

Soft 404's from pages blocked by robots.txt -- cause for concern?

URL Length or Exact Breadcrumb Navigation URL? What's More Important