Robots.txt: excluding URL

anakyn

Hi,

spiders crawl some dynamic urls in my website (example: http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/ + http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/open=true) as different pages, resulting duplicate content of course.

What is syntax for disallow these kind of urls in robots.txt?

Thanks so much

john4math

You don't want to do this in robots.txt. If you serve pages with these parameters, people will inevitably link to them, and even if they're disallowed in your robots.txt file, Google maybe still index them, according to this: "While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web."

This is what the rel=canonical tag is designed for. You should use that to tell Google the page is duplicate content of another page on your site, and that it should refer to that other page. You can read (and watch a video) about that here.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt: excluding URL

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Should I change my website urls?

Correct robots.txt for WordPress

Toxic URL???

URL structure of the page: Does this one need to contain the most important keyword for better SEO?

To update or not to update news URLs ?

404 crawl errors with all url+domain

URL structure for a new WordPress site

Tool for Generating Sitemap/ URL List