Best practice for disallowing URLS with Robots.txt

centurysafety

Hi Everybody,

We are currently trying to tidy up the crawling errors which are appearing when we crawl the site. On first viewing, we were very worried to say the least:17000+. But after looking closer at the report, we found the majority of these errors were being caused by bad URLs featuring:

Currency - For example: "directory/currency/switch/currency/GBP/uenc/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL3dvcmt3ZWFyP3ByaWNlPTUwLSZzdGFuZGFyZHM9NzEx/"
Color - For example: ?color=91
Price - For example: "?price=650-700"
Order - For example: ?dir=desc&order=most_popular
Page - For example: "?p=1&standards=704"
Login - For example: "customer/account/login/referer/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL2NhdGFsb2cvcHJvZHVjdC92aWV3L2lkLzQ1ODczLyNyZXZpZXctZm9ybQ,,/"

My question now is as a novice of working with Robots.txt, what would be the best practice for disallowing URLs featuring these from being crawled?

Any advice would be appreciated!

TimHolmes

If you are looking to disallow url parameters you could use something like the following as a convention.

Disallow: /? or Disallow: /?dir=&order=&p= if you wanted to be more accurate with specific parameters. There have been a few Moz questions of this type over the last few years, if you do look to remove the parameters.

Also try and ensure that the product pages you have listed are well canonicalised and point to the original product etc. A good review on how to do this can be found here. This will in most cases be enough to remove any indexation/duplicate issues.

JordanLowry

First I assume you have webmaster tools set up?

They have a robots.txt tester tool which you can test out different parameters to make sure you get the right syntax. For example color would be blocked by: Disallow: /?color=91* and you would follow that similar format more or less.

If you are confused I highly recommend reading through Moz's robots.txt best practices guide before you make any changes. Be sure to test all out in webmaster tools(search console)>robots.txt tester.

Let me know if you run into any problems.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Best practice for disallowing URLS with Robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Disallowed "Search" results with robots.txt and Sessions dropped

Keywords in URL

We 410'ed URLs to decrease URLs submitted and increase crawl rate, but dynamically generated sub URLs from pagination are showing as 404s. Should we 410 these sub URLs?

Disallow URLs ENDING with certain values in robots.txt?

Robots.txt help

Best Practices for Moving a Sub-Domain to a Sub-Folder

Switching Url

Best practice for removing indexed internal search pages from Google?