Robots.txt: excluding URL
-
Hi,
spiders crawl some dynamic urls in my website (example: http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/ + http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/open=true) as different pages, resulting duplicate content of course.
What is syntax for disallow these kind of urls in robots.txt?
Thanks so much
-
You don't want to do this in robots.txt. If you serve pages with these parameters, people will inevitably link to them, and even if they're disallowed in your robots.txt file, Google maybe still index them, according to this: "While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web."
This is what the rel=canonical tag is designed for. You should use that to tell Google the page is duplicate content of another page on your site, and that it should refer to that other page. You can read (and watch a video) about that here.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Virtual URL Google not indexing?
Dear all, We have two URLs: The main URL which is crawled both by GSC and where Moz assigns our keywords is: https://andipaeditions.com/banksy/ The second one is called a virtual url by our developpers: https://andipaeditions.com/banksy/signedandunsignedprintsforsale/ This is currently not indexed by Google. We have been linking to the second URL and I am unable to see if this is passing juice/anything on to the main one /banksy/ Is it a canonical? The /banksy/ is the one that is being picked up in serps/by Moz and worry that the two similar URLs are splitting the signal. Should I redirect from the second to the first? Thank you
On-Page Optimization | | TAT1000 -
Robots.txt Question for E-Commerce Sites
Hi All, I have a couple of e-commerce clients and have a question about URLs. When you perform a search on website all URLs contain a question mark, for example: /filter.aspx?search=blackout I'm not sure that I want these indexed. Could I be causing any harm/danger if I add this to the robots.txt file? /*? Any suggestions welcome! Gavin
On-Page Optimization | | IcanAgency0 -
Googlebot found an extremely high number of URLs on your site:
Website: www.gobol.in Although I have no indexed my search pages by adding /catalogsearch in robots.txt, still we are getting same error again and again Here's a list of sample URLs with potential problems. http://www.gobol.in/catalogsearch/result/index/?category=&mobile_feature=4575_4578&q=panasonic+NR-BU303LH1H+REFRIGERATOR+296+L+GREY&special_price=32%2C456&x=0&y=0 http://www.gobol.in/mobile-and-accessories/mobiles-and-brands.html?manufacturer=4753_3355_455_4435_4720_3407_2412_4728_4784_4790_2010_4789_4376_2469&operating_system_mobile=4612 Please help
On-Page Optimization | | Obbserv0 -
How to transfer old WP blog to new URL
I have a 9 year old WP website with a WP blog which is still getting 300+ new visitors a day even though I have not written a blog for 5 years and have not updated content. Some posts have over 25,000 links. However the Moz analytics is fraught with significant errors-404 redirects, page not found, dup content, no metatags, title too long etc. I was totally inexperienced 5 years ago and made many errors. However the basic content was sound and still is producing new visitors. I am starting a new ecommerce website using the same name but the URL and server will be different. I want to transfer my WP blog to the new site. I am concerned however that bringing the posts over can create the same errors on the new site. If I update all of the blogs on the old site using Yoast before transferring the blog to the new site will that help. I suppose I could check those flagged dup content and only transfer one of that category?
On-Page Optimization | | wianno1680 -
Modify URL, how to re-index
hello, I have just modified URL, do I need to re-submit sitemap or something else to search engines?
On-Page Optimization | | JohnHuynh0 -
Categories and URL Structure - When to add a new directory?
I've been wondering this for quite awhile so I figured I should just ask. Suppose my website has 5 categories and the url structure looks like: www.mysite.com/category1/ www.mysite.com/category2/ do I also want to create a landing page for the above categories at the same URL depth as the homepage of the site? www.mysite.com/category1.html OR what about: www.mysite.com/category1/index.html Which is a better way to do this? Also, if your site began as fairly small and your 5 categories were your only other pages other than index, about, and contact pages (meaning you really had no reason to create separate directories), then as time passes, you decide to add 3 subcategory pages that would fit into a page: www.mysite.com/category1.html would you add a folder with he same name as the html page, and then rename the html file as index.html and place it into the new folder?
On-Page Optimization | | SEO-Pump.com0 -
Site Maps / Robots.txt etc
Hi everyone I have setup a site map using a Wordpress pluggin: http://lockcity.co.uk/site-map/ Can you please tell me if this is sufficient for the search engines? I am trying to understand the difference between this and having a robots.txt - or do I need both? Many thanks, Abi
On-Page Optimization | | LockCity0 -
Prevent Indexing of URLs Based on Tags
I started my website as a blog over at Posterous, but decided to turn it into a full scale business website with a self-hosted WordPress theme. Shortly after transitioning from Posterous to WordPress, I noticed that Google was indexing not only my old blog posts, but the URLs of my blog posts based on the tags they have. Is there any reason why this is a problem? I'm sure it shouldn't qualify as duplicate content, but for some reason it just feels a bit sloppy to me to have all of these pages indexed...Is this a non-issue? Should I just be more discriminating with my use of 'tags' if it bothers me? JiGLH.png
On-Page Optimization | | williammarlow0