Best practice for disallowing URLS with Robots.txt
-
Hi Everybody,
We are currently trying to tidy up the crawling errors which are appearing when we crawl the site. On first viewing, we were very worried to say the least:17000+. But after looking closer at the report, we found the majority of these errors were being caused by bad URLs featuring:
- Currency - For example: "directory/currency/switch/currency/GBP/uenc/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL3dvcmt3ZWFyP3ByaWNlPTUwLSZzdGFuZGFyZHM9NzEx/"
- Color - For example: ?color=91
- Price - For example: "?price=650-700"
- Order - For example: ?dir=desc&order=most_popular
- Page - For example: "?p=1&standards=704"
- Login - For example: "customer/account/login/referer/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL2NhdGFsb2cvcHJvZHVjdC92aWV3L2lkLzQ1ODczLyNyZXZpZXctZm9ybQ,,/"
My question now is as a novice of working with Robots.txt, what would be the best practice for disallowing URLs featuring these from being crawled?
Any advice would be appreciated!
-
If you are looking to disallow url parameters you could use something like the following as a convention.
Disallow: /? or Disallow: /?dir=&order=&p= if you wanted to be more accurate with specific parameters. There have been a few Moz questions of this type over the last few years, if you do look to remove the parameters.
Also try and ensure that the product pages you have listed are well canonicalised and point to the original product etc. A good review on how to do this can be found here. This will in most cases be enough to remove any indexation/duplicate issues.
-
First I assume you have webmaster tools set up?
They have a robots.txt tester tool which you can test out different parameters to make sure you get the right syntax. For example color would be blocked by: Disallow: /?color=91* and you would follow that similar format more or less.
If you are confused I highly recommend reading through Moz's robots.txt best practices guide before you make any changes. Be sure to test all out in webmaster tools(search console)>robots.txt tester.
Let me know if you run into any problems.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Set Robots.txt file to crawl my website at specific times
Our website provider has stated that they can only 'lift' their block on our website in order for it to be crawled as specific times. Is there any way to amend a robots.txt to ensure that it crawls our website at a specific time of day/night in order to coincide with the block being lifted? Many Thanks, Charlene
Intermediate & Advanced SEO | | CharleneKennedy120 -
¿Disallow duplicate URL?
Hi comunity, thanks for answering my question. I have a problem with a website. My website is: http://example.examples.com/brand/brand1 (good URL) but i have 2 filters to show something and this generate 2 URL's more: http://example.examples.com/brand/brand1?show=true (if we put 1 filter) http://example.examples.com/brand/brand1?show=false (if we put other filter) My question is, should i put in robots.txt disallow for these filters like this: **Disallow: /*?show=***
Intermediate & Advanced SEO | | thekiller990 -
Linking to other peoples you tube videos related to our "How to do " articles on our website. Is there Best practices ?
Hi All, We are currently writing some "How to do" articles on our tool hire website and as there is alot of DIY related you tube videos out there, we thought It would be good to link to some of these at the bottom of our articles. From an SEO perspective, is there any do's and don'ts with regards how we should implement this. We are unable to do our videos so linking to others would be our preferred option. Does anyone know if this would give an SEO ranking benefit even though it's an outbound link to someone's video etc. thanks Pete
Intermediate & Advanced SEO | | PeteC120 -
Title Tag Best Practices
In light of all the Google updates in 2013, have you updated/changed your title tag best practices? Is the format of (Keyword | Brand) still working well for your optimization efforts or have you started incorporating an approach similar to this format . (Keyword in a Sentence | Brand) Thanks in advance for your opinions.
Intermediate & Advanced SEO | | SEO5Team0 -
Robots.txt: how to exclude sub-directories correctly?
Hello here, I am trying to figure out the correct way to tell SEs to crawls this: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ or this: http://www.mysite.com/directory/sub-directory2/sub-directory/... But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way: disallow: /directory/sub-directory/ disallow: /directory/sub-directory2/ disallow: /directory/sub-directory/sub-directory/ disallow: /directory/sub-directory2/subdirectory/ etc... I would end up having thousands of definitions to disallow all the possible sub-directory combinations. So, is the following way a correct, better and shorter way to define what I want above: allow: /directory/$ disallow: /directory/* Would the above work? Any thoughts are very welcome! Thank you in advance. Best, Fab.
Intermediate & Advanced SEO | | fablau1 -
Will Canonical tag on parameter URLs remove those URL's from Index, and preserve link juice?
My website has 43,000 pages indexed by Google. Almost all of these pages are URLs that have parameters in them, creating duplicate content. I have external links pointing to those URLs that have parameters in them. If I add the canonical tag to these parameter URLs, will that remove those pages from the Google index, or do I need to do something more to remove those pages from the index? Ex: www.website.com/boats/show/tuna-fishing/?TID=shkfsvdi_dc%ficol (has link pointing here)
Intermediate & Advanced SEO | | partnerf
www.website.com/boats/show/tuna-fishing/ (canonical URL) Thanks for your help. Rob0 -
Best Practice for ALT tags of flags to interlink multinational site
For a partial keyword match domain name what would you recommend as ALT tag to internlink country domains (different CCTLD)? Option 1)
Intermediate & Advanced SEO | | lcourse
DOMAIN.com DOMAIN.de DOMAIN.co.uk => I am a bit concerned about this option in terms of potential penalty for keywords in ALT (since partial match domains) Option 2)
UK
DE
FR ... Option 3)
English UK
Deutsch Deutschland
Deutsch Österreich
Francais France => concerned here about mixing lots of languages in ALT tags in each page, which may confuse google language detection.0 -
Mobile Sitemap Best Practices w/ Responsive Design
I'm looking for insight into mobile sitemap best practices when building sites with responsive design. If a mobile site has the same urls as the desktop site the mobile sitemap would be very similar to the regular sitemap. Is a mobile sitemap necessary for sites that utilize responsive design? If so, is there a way to have a mobile sitemap that simply references the regular sitemap or is a new sitemap that has all urls tagged with the "" tag with each url required?
Intermediate & Advanced SEO | | AdamDorfman0