Robots.txt usage
-
Hey Guys,
I am about make an important improvement to our site's robots.txt
we have large number of properties on our site and we have different views for them. List, gallery and map view. By default list view shows up and user can navigate through gallery view.
We donot want gallery pages to get indexed and want to save our crawl budget for more important pages.
this is one example of our site:
http://www.holiday-rentals.co.uk/France/r31.htm
When you click on "gallery view" URL of this site will remain same in your address bar: but when you mouse over the "gallery view" tab it will show you URL with parameter "view=g". there are number of parameters: "view=g, view=l and view=m".
http://www.holiday-rentals.co.uk/France/r31.htm?view=l
http://www.holiday-rentals.co.uk/France/r31.htm?view=g
http://www.holiday-rentals.co.uk/France/r31.htm?view=m
Now my question is:
I If restrict bots by adding "Disallow: ?view=" in our robots.txt will it effect the list view too?
Will be very thankful if yo look into this for us.
Many thanks
Hassan
I will test this on some other site within our network too before putting it to important one's. to measure the impact but will be waiting for your recommendations. Thanks
-
Others are right by the way canonical may be better, but if you insist on robots restriction you should add two schemas to each parameter:
disallow:?view=m disallow:?view=m*
so that you block the urls that contain the parameter at the end and block the ones that have it in the middle as well.
-
I had a similar issue with my website: there were many ways of sorting a likst of items (date, title, etc) which ended up causing duplicate content, we solved the issue a couple of days ago by restricting the "sorted" pages using the robots.txt file. HOWEVER, this morning i found this text in the Google Webmaster Tools support section:
Google no longer recommends blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the
rel="canonical"
link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools.source:
http://www.google.com/support/webmasters/bin/answer.py?answer=66359I havent seen any negative effect on my site (yet), but I would agree with SuperlativB in the sense that YOU might be better off using "canonical" tags on these links
http://www.holiday-rentals.co.uk/...?view=l
-
For these paratmeters are not at the very end os the url you should add * after the letter of the parameter as well in the restriction
you got my point, thanks for looking into this. Since our search page load with list view by default and it is not in URL but still v=l represents the list view.
I want to disallow both parameters "view=g, view=m" in any URL from bots.
If these parameters are sometimes in between and some time at the end of URL what will be the work around for for both cases, you suggest?
Thanks for looking into this...
-
You can do the restriction you want but if i get it right m stands for map view g stands for gallery view and l stands for list view. So if you want list view to be indexed and map and gallery view not to be indexed you should add two lines of distriction:
disallow:?view=m disallow:?view=g
if these paratmeters are not at the very end os the url you should add * after the letter of the parameter as well in the restriction
-
Sounds like this is something canonical could solve for you. If you disallow ?view=* you would disallow all "?view" on your homepage, if you are unsure you should go for exact match rather that all.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I have two robots.txt pages for www and non-www version. Will that be a problem?
There are two robots.txt pages. One for www version and another for non-www version though I have moved to the non-www version.
Technical SEO | | ramb0 -
Little confused regarding robots.txt
Hi there Mozzers! As a newbie, I have a question that what could happen if I write my robots.txt file like this... User-agent: * Allow: / Disallow: /abc-1/ Disallow: /bcd/ Disallow: /agd1/ User-agent: * Disallow: / Hope to hear from you...
Technical SEO | | DenorL0 -
Should a login page for a payroll / timekeeping comp[any be no follow for robots.txt?
I am managing a Timekeeping/Payroll company. My question is about the customer login page. Would this typically be nofollow for robots?
Technical SEO | | donsilvernail0 -
Will a Robots.txt 'disallow' of a directory, keep Google from seeing 301 redirects for pages/files within the directory?
Hi- I have a client that had thousands of dynamic php pages indexed by Google that shouldn't have been. He has since blocked these php pages via robots.txt disallow. Unfortunately, many of those php pages were linked to by high quality sites mulitiple times (instead of the static urls) before he put up the php 'disallow'. If we create 301 redirects for some of these php URLs that area still showing high value backlinks and send them to the correct static URLs, will Google even see these 301 redirects and pass link value to the proper static URLs? Or will the robots.txt keep Google away and we lose all these high quality backlinks? I guess the same question applies if we use the canonical tag instead of the 301. Will the robots.txt keep Google from seeing the canonical tags on the php pages? Thanks very much, V
Technical SEO | | Voodak0 -
One server, two domains - robots.txt allow for one domain but not other?
Hello, I would like to create a single server with two domains pointing to it. Ex: domain1.com -> myserver.com/ domain2.com -> myserver.com/subfolder. The goal is to create two separate sites on one server. I would like the second domain ( /subfolder) to be fully indexed / SEO friendly and have the robots txt file allow search bots to crawl. However, the first domain (server root) I would like to keep non-indexed, and the robots.txt file disallowing any bots / indexing. Does anyone have any suggestions for the best way to tackle this one? Thanks!
Technical SEO | | Dave1000 -
Robots.txt and joomla
Hello, I use joomla for my website and automatically all those files are blocked is that good or bad, so I remove anything and if so why ? User-agent: *
Technical SEO | | seoanalytics
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/ I also added to my robots.txt files my email address ( is that useful, I am afraid google passes PR to the email address )
and a javascript: void (0) because I have tabs on my webpage ( is that useful )
as well as a .pdf ( is it also useful ) any comments ? does anything need to be changed or is it ok ? Thank you,0 -
What can I do if Google Webmaster Tools doesn't recognize the robots.txt file?
I'm working on a recently hacked site for a client and and in trying to identify how exactly the hack is running I need to use the fetch as Google bot feature in GWT. I'd love to use this but it thinks the robots.txt is blocking it's acces but the only thing in the robots.txt file is a link to the sitemap. Unde the Blocked URLs section of the GWT it shows that the robots.txt was last downloaded yesterday but it's incorrect information. Is there a way to force Google to look again?
Technical SEO | | DotCar0 -
How does robots.txt affect aliased domains?
Several of my sites are aliased (hosted in subdirectories off the root domain on a single hosting account, but visible at www.theSubDirectorySite.com) Not ideal, I know, but that's a different issue. I want to block bots from viewing those files that are accessible in subdirectories on the main hosting account, www.RootDomain.com/SubDirectorySite/, and force the bots to look at www.SubDirectorySite.com instead. I utilized the canonical meta tag to point bots away from the sub directory site, but I am wondering what will happen if I use robots.txt to block those files from within the root domain. Will the bots, specifically Google bot, still index the site at its own URL, www.AnotherSite.com even if I've blocked that directory with Disallow: /AnotherSite/ ? THANK YOU!!!
Technical SEO | | michaelj_me0