Robots.txt usage
-
Hey Guys,
I am about make an important improvement to our site's robots.txt
we have large number of properties on our site and we have different views for them. List, gallery and map view. By default list view shows up and user can navigate through gallery view.
We donot want gallery pages to get indexed and want to save our crawl budget for more important pages.
this is one example of our site:
http://www.holiday-rentals.co.uk/France/r31.htm
When you click on "gallery view" URL of this site will remain same in your address bar: but when you mouse over the "gallery view" tab it will show you URL with parameter "view=g". there are number of parameters: "view=g, view=l and view=m".
http://www.holiday-rentals.co.uk/France/r31.htm?view=l
http://www.holiday-rentals.co.uk/France/r31.htm?view=g
http://www.holiday-rentals.co.uk/France/r31.htm?view=m
Now my question is:
I If restrict bots by adding "Disallow: ?view=" in our robots.txt will it effect the list view too?
Will be very thankful if yo look into this for us.
Many thanks
Hassan
I will test this on some other site within our network too before putting it to important one's. to measure the impact but will be waiting for your recommendations. Thanks
-
Others are right by the way canonical may be better, but if you insist on robots restriction you should add two schemas to each parameter:
disallow:?view=m disallow:?view=m*
so that you block the urls that contain the parameter at the end and block the ones that have it in the middle as well.
-
I had a similar issue with my website: there were many ways of sorting a likst of items (date, title, etc) which ended up causing duplicate content, we solved the issue a couple of days ago by restricting the "sorted" pages using the robots.txt file. HOWEVER, this morning i found this text in the Google Webmaster Tools support section:
Google no longer recommends blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the
rel="canonical"
link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools.source:
http://www.google.com/support/webmasters/bin/answer.py?answer=66359I havent seen any negative effect on my site (yet), but I would agree with SuperlativB in the sense that YOU might be better off using "canonical" tags on these links
http://www.holiday-rentals.co.uk/...?view=l
-
For these paratmeters are not at the very end os the url you should add * after the letter of the parameter as well in the restriction
you got my point, thanks for looking into this. Since our search page load with list view by default and it is not in URL but still v=l represents the list view.
I want to disallow both parameters "view=g, view=m" in any URL from bots.
If these parameters are sometimes in between and some time at the end of URL what will be the work around for for both cases, you suggest?
Thanks for looking into this...
-
You can do the restriction you want but if i get it right m stands for map view g stands for gallery view and l stands for list view. So if you want list view to be indexed and map and gallery view not to be indexed you should add two lines of distriction:
disallow:?view=m disallow:?view=g
if these paratmeters are not at the very end os the url you should add * after the letter of the parameter as well in the restriction
-
Sounds like this is something canonical could solve for you. If you disallow ?view=* you would disallow all "?view" on your homepage, if you are unsure you should go for exact match rather that all.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Clarification regarding robots.txt protocol
Hi,
Technical SEO | | nlogix
I have a website , and having 1000 above url and all the url already got indexed in Google . Now am going to stop all the available services in my website and removed all the landing pages from website. Now only home page available . So i need to remove all the indexed urls from Google . I have already used robots txt protocol for removing url. i guess it is not a good method for adding bulk amount of urls (nearly 1000) in robots.txt . So just wanted to know is there any other method for removing indexed urls.
Please advice.0 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
Robots.txt anomaly
Hi, I'm monitoring a site thats had a new design relaunch and new robots.txt added. Over the period of a week (since launch) webmaster tools has shown a steadily increasing number of blocked urls (now at 14). In the robots.txt file though theres only 12 lines with the disallow command, could this be occurring because a line in the command could refer to more than one page/url ? They all look like single urls for example: Disallow: /wp-content/plugins
Technical SEO | | Dan-Lawrence
Disallow: /wp-content/cache
Disallow: /wp-content/themes etc, etc And is it normal for webmaster tools reporting of robots.txt blocked urls to steadily increase in number over time, as opposed to being identified straight away ? Thanks in advance for any help/advice/clarity why this may be happening ? Cheers Dan0 -
Robots.txt best practices & tips
Hey, I was wondering if someone could give me some advice on whether I should block the robots.txt file from the average user (not from googlebot, yandex, etc)? If so, how would I go about doing this? With .htaccess I'm guessing - but not an expert. What can people do with the information in the file? Maybe someone can give me some "best practices"? (I have a wordpress based website) Thanks in advance!
Technical SEO | | JonathanRolande0 -
Navigating The New Rules - Clarification on NoFollow Usage
I posted some of this elsewhere but would like feedback from some of SEOMoz community. An author. Lets say she has a book out on Relationship Advice.
Technical SEO | | CarlosFernandes
Lets say her book was even called Relationship Help, Advice and Tips. She promotes it for years on her website and implements an affiliate program to get wider reach. Affiliates link to it by the name of the book. One day she even gets a mention or two on a few Yahoo editorial type pages that reviewed said book. A few other very big name websites also link to her and even link to her (without her asking) to her domain no less and make the link say simply Relationship Advice. The links were in the body of the pages. Again, these were unsolicited reviews that she did not even ask for. In the old world - that was ok - in as much as unharmful to her site. In the new world she's toast. She has taken down the book pages she worked 7 years to build up. I don't even think that will help. People linked to her website and put "relationship Advice" in the links because that's what she gave and was an expert at. She didn't ask for those links.
2) A large well known web directory that many have heard of - choose to charge for inclusion into their directory. BUT - you can get a free link if you include some code on your website. A reciprocation that is well known. I have read many many articles and posts by many people over the years on this - and as far as I can tell that reciprocation model for free submission was OK. As long as directories didn't have search functions that served search results that were biased to paid link submissions they seemed to be ok. In terms of the free submission - I read a post way back by Matt that said as long as the directory wasn't asking for the reciprocal link in addition to charging for the submission - that was OK. So, scoot forward to 2012. Said directory has hundreds of thousands of links to it - due tot he reciprocal code that was on many of the free links. The code on the websites that got free links obviously promotes the directory by putting the main keyword in the link. ie "Web Directory". In this new world - is this OK ? That's what they do. They are after all a web directory? The company in scenario 2 with hundreds of thousands of links all saying virtually the same phrase - with the vast majority of the backlinks being from generated reciprocal links for free advertisers in its directory - they are doing FINE. Not hurt at all. The small business owner / author in scenario 1 - who had unsolicited natural links coming to her with anchor text detailing something she did and was an expert at - has gone from the SERPS. Should the company in Scenario 2 - that COULD DO something about the anchor text in the reciprocal links back to their website - now change the recip code so that it just says their brand name instead of "web directory" ? Should the author - if she ever regains from this hell - now have some kind of policy clearly stated on her website - that if any person is ever to link to her website ever again - they MUST only link to it with her name in the anchor text - and never link up words she is an authority on? How can she prevent that? So now is it up to the advertiser or the publisher to ensure we are all safe? If small business person Billy Bob inquires about a paid link on a website and the publisher dosn't tell him that the link may hurt his site and he does not not request a NOFOLLOW on it (because he is just a clueless business owner) - are they (the publishing website) liable for Billy Bob's site tanking if it does? Or is it the advertiser's job to be aware of all said issues - because I know the vast majority of Billy Bob's wouldn't be. How long has everyone got to "get in line"? There are many in the search community offering paid links on their websites in "Sponsored Links" sections - without the use of NOFOLLOWS and i don't see any devaluing of their advertisers websites. If rules are rules let everyone play them. Getting sick of the hypocrisy. I aim to get to Journeyman though just so I can get a DOFOLLOW link on this site 🙂 Incentives eh! Carlos1 -
Un-Indexing a Page without robots.txt or access to HEAD
I am in a situation where a page was pushed live (Went live for an hour and then taken down) before it was supposed to go live. Now normally I would utilize the robots.txt or but I do not have access to either and putting a request in will not suffice as it is against protocol with the CMS. So basically I am left to just utilizing the and I cannot seem to find a nice way to play with the SE to get this un-indexed. I know for this instance I could go to GWT and do it but for clients that do not have GWT and for all the other SE's how could I do this? Here is the big question here: What if I have a promotional page that I don't want indexed and am met with these same limitations? Is there anything to do here?
Technical SEO | | DRSearchEngOpt0 -
Need Help With Robots.txt on Magento eCommerce Site
Hello, I am having difficulty getting my robots.txt file to be configured properly. I am getting error emails from Google products stating they can't view our products because they are being blocked, and this past week, in my SEO dashboard, the URL's receiving search traffic dropped by almost 40%. Is there anyone that can offer assistance on a good template robots.txt file I can use for a Magento eCommerce website? The one I am currently using was found at this site here: e-commercewebdesign.co.uk/blog/magento-seo/magento-robots-txt-seo.php - However, I am getting problems from Google now because of it. I searched and found this thread here: http://www.magentocommerce.com/wiki/multi-store_set_up/multiple_website_setup_with_different_document_roots#the_root_folder_robots.txt_file - But I felt like maybe I should get some additional help on properly configuring a robots for a Magento site. Thanks in advance for any help. Please, let me know if you need more info to provide assistance.
Technical SEO | | JerDoggMckoy0