Robots.txt: excluding URL
-
Hi,
spiders crawl some dynamic urls in my website (example: http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/ + http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/open=true) as different pages, resulting duplicate content of course.
What is syntax for disallow these kind of urls in robots.txt?
Thanks so much
-
You don't want to do this in robots.txt. If you serve pages with these parameters, people will inevitably link to them, and even if they're disallowed in your robots.txt file, Google maybe still index them, according to this: "While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web."
This is what the rel=canonical tag is designed for. You should use that to tell Google the page is duplicate content of another page on your site, and that it should refer to that other page. You can read (and watch a video) about that here.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I change my website urls?
We're translating our website in a few languages (FR / DE / JP) using subdirectories. So our website will have the following urls www.brand.com/en
On-Page Optimization | | dcalexandra
www.brand.com/fr
www.brand.com/de
www.brand.com/jp I would like to change the url structure of a few pages from www.brand.com/section/feature-name to www.brand.com/feature-name Is it a good idea to do this now since we're adding the subfolders and these are anyway new urls in google's eyes?0 -
Correct robots.txt for WordPress
Hi. So I recently launched a website on WordPress (1 main page and 5 internal pages). The main page got indexed right off the bat, while other pages seem to be blocked by robots.txt. Would you please look at my robots file and tell me what‘s wrong? I wanted to block the contact page, plugin elements, users’ comments (I got a discussion space on every page of my website) and website search section (to prevent duplicate pages from appearing in google search results). Looks like one of the lines is blocking every page after ”/“ from indexing, even though everything seems right. Thank you so much. FzSQkqB.jpg
On-Page Optimization | | AslanBarselinov1 -
Toxic URL???
Hi I have a URL that produced page 1, number 1 to 3 for most of our industries top phrases. Then we received a google penalty, (as did several of our competitors on the same day). We were effectively wiped from google. After much disavowing we were allowed back into the search results, this took about 3 months. I have employed the services of a top London SEO company for over a year now and have seen no significant improvement. I believe they are doing there best, however there results are VERY poor. According to the various tools, (searchmetrics, woorank, semrush) to name but a few, our site scores very well, yet we are not getting the results. Page one seems to be full of totally new websites, most of which I have never heard of, and have appeared from nowhere. Should I scrap our URL and put up a completely new one, and put a redirect from the original one? This would be a biggy since our url has been around for 20 years. Thanks for reading. Andy
On-Page Optimization | | First-VehicleLeasing0 -
URL structure of the page: Does this one need to contain the most important keyword for better SEO?
Hi everyone, I’m trying to get "air-conditioner-repair.html" to rank higher for the keyword "air conditioner los angeles". I am wondering whether or not I should change URL to "air-conditioner-los-angeles-repair.html" to get better results? Will be thankful very much for any advise you can offer!
On-Page Optimization | | kirupa0 -
To update or not to update news URLs ?
We manage a huge daily news website in my small country - keeping this a bit mysterious in case competitors are reading 🙂 Our URL structure is www.companyname.com/news/categoryofnews/title-of-article?id=articleid In this hyperreactive news world, title of articles change frequently (may be ten times a day for the main stories). The question we debate is : should we reflect the modification of the title in the URL or not ? Example : "Trump says he wants to ban search engines" would have URL http://www.companyname.com/news/entertainment/Trump-says-he-wants-to-ban-search-engines?id=12345678 Later in the day the title becomes "Trump denies he suggested banning search engines". Should the URL be modified to http://www.companyname.com/news/entertainment/Trump-denies-he-suggested-banning-search-engines?id=12345678 (option A) or not (option B) ? In Google News it makes no difference because of the sitemap, but in Google organic things are different. At present (option B in place), Google apparently doesn't see that the article has been updated, and shows the initial timestamp which is visually (and presumably SEOwise) not good : our new news looks like old news. Modifiying the URL would solve that issue, but could, may be, create another one : the new URL, being considered a new article, would lose, the acquired weight of the previous one in terms of referrals, social trafic and so on. Or not ? What do you think is the best option ? Thanks for your expertise, Yves
On-Page Optimization | | yves678901 -
404 crawl errors with all url+domain
We have 187 crawl 404 errors. All urls on web make a 404 error that this http://www.domain.com/[.....]l/www.domain.com all errors added to the url, the url domain I put an example gestoriabarcelona.com/www.gestoriabarcelona.com
On-Page Optimization | | promonet
gestoriabarcelona.com/tarifas/www.gestoriabarcelona.com
gestoriabarcelona.com/category/noticias/page/7/www.gestoriabarcelona.com
gestoriabarcelona.com/2012/08/amortizacion-de-unaconstruccion/
www.gestoriabarcelona.com
[..] I don't know where can i find to solve errors Anyone can help me? Thanks0 -
URL structure for a new WordPress site
Hi I'm building a new next big thing website from scratch (for a translation agency) and I encountered an issue with the URL structure. I need to chose the URL for important targeted keyword pages and I have a conflict between two tools I'm using. Please read below the situation: domain: mashtranslation.com target keyword: french translation services which URL you think is better from a SEO point of view (and possibly for users): mashtranslation.com/services/french/ OR mashtranslation.com/french-translation-services/ I'm asking this because one WordPress plugin (Wordpress SEO by Yoast) says the URL structure is not optimised while another tool (Market Samurai) says the URL is optimised.
On-Page Optimization | | flo20 -
Tool for Generating Sitemap/ URL List
HI, I'm looking for a tool that'll generate a URL list for a site. I looked at this thread here http://www.seomoz.org/q/online-sitemap-generator which came up when I searched for sitemap generator. However, I don't need a sitemap per se, and I don't need to submit it to Google - just a list of pages is what I need.If it updated automatically, that would be useful as well. Anyone know of a tool, on or offline? Or anyone used Xenu and know if it's what I'm looking for? Or is there a simple solution that I'm missing? Thanks.
On-Page Optimization | | 5225Marketing0