Robots.txt usage

holidayseo

Hey Guys,

I am about make an important improvement to our site's robots.txt

we have large number of properties on our site and we have different views for them. List, gallery and map view. By default list view shows up and user can navigate through gallery view.

We donot want gallery pages to get indexed and want to save our crawl budget for more important pages.

this is one example of our site:

http://www.holiday-rentals.co.uk/France/r31.htm

When you click on "gallery view" URL of this site will remain same in your address bar: but when you mouse over the "gallery view" tab it will show you URL with parameter "view=g". there are number of parameters: "view=g, view=l and view=m".

http://www.holiday-rentals.co.uk/France/r31.htm?view=l

http://www.holiday-rentals.co.uk/France/r31.htm?view=g

http://www.holiday-rentals.co.uk/France/r31.htm?view=m

Now my question is:

I If restrict bots by adding "Disallow: ?view=" in our robots.txt will it effect the list view too?

Will be very thankful if yo look into this for us.

Many thanks

Hassan

I will test this on some other site within our network too before putting it to important one's. to measure the impact but will be waiting for your recommendations. Thanks

sesertin

Others are right by the way canonical may be better, but if you insist on robots restriction you should add two schemas to each parameter:

disallow:?view=m disallow:?view=m*

so that you block the urls that contain the parameter at the end and block the ones that have it in the middle as well.

QPLF

I had a similar issue with my website: there were many ways of sorting a likst of items (date, title, etc) which ended up causing duplicate content, we solved the issue a couple of days ago by restricting the "sorted" pages using the robots.txt file. HOWEVER, this morning i found this text in the Google Webmaster Tools support section:

Google no longer recommends blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools.

source:
http://www.google.com/support/webmasters/bin/answer.py?answer=66359

I havent seen any negative effect on my site (yet), but I would agree with SuperlativB in the sense that YOU might be better off using "canonical" tags on these links

http://www.holiday-rentals.co.uk/...?view=l

http://www.holiday-rentals.co.uk/...?view=g

http://www.holiday-rentals.co.uk/...?view=m

holidayseo

For these paratmeters are not at the very end os the url you should add * after the letter of the parameter as well in the restriction

you got my point, thanks for looking into this. Since our search page load with list view by default and it is not in URL but still v=l represents the list view.

I want to disallow both parameters "view=g, view=m" in any URL from bots.

If these parameters are sometimes in between and some time at the end of URL what will be the work around for for both cases, you suggest?

Thanks for looking into this...

sesertin

You can do the restriction you want but if i get it right m stands for map view g stands for gallery view and l stands for list view. So if you want list view to be indexed and map and gallery view not to be indexed you should add two lines of distriction:

disallow:?view=m disallow:?view=g

if these paratmeters are not at the very end os the url you should add * after the letter of the parameter as well in the restriction

SuperlativB

Sounds like this is something canonical could solve for you. If you disallow ?view=* you would disallow all "?view" on your homepage, if you are unsure you should go for exact match rather that all.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt usage

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Robots.txt in subfolders and hreflang issues

Robots.txt Disallow: / in Search Console

A few misc Webmaster tools questions & Robots.txt etc

Disallow: /search/ in robots but soft 404s are still showing in GWT and Google search?

Allow or Disallow First in Robots.txt

Impact of "restricted by robots" crawler error in WT

How to block google robots from a subdomain

How do I use the Robots.txt "disallow" command properly for folders I don't want indexed?