Robots.txt: excluding URL

anakyn

Hi,

spiders crawl some dynamic urls in my website (example: http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/ + http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/open=true) as different pages, resulting duplicate content of course.

What is syntax for disallow these kind of urls in robots.txt?

Thanks so much

john4math

You don't want to do this in robots.txt. If you serve pages with these parameters, people will inevitably link to them, and even if they're disallowed in your robots.txt file, Google maybe still index them, according to this: "While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web."

This is what the rel=canonical tag is designed for. You should use that to tell Google the page is duplicate content of another page on your site, and that it should refer to that other page. You can read (and watch a video) about that here.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt: excluding URL

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

From an SEO perspective, which is preferable in the URL for a non-English site: local language or English?

URL advice

Importance of URL Structure

URL Rewrite

Using magentos own url re-writes

Long URL in listing job portal

How to Define Best URL Structure for Product Pages?

Keywords in URL: