Robots.txt: excluding URL
-
Hi,
spiders crawl some dynamic urls in my website (example: http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/ + http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/open=true) as different pages, resulting duplicate content of course.
What is syntax for disallow these kind of urls in robots.txt?
Thanks so much
-
You don't want to do this in robots.txt. If you serve pages with these parameters, people will inevitably link to them, and even if they're disallowed in your robots.txt file, Google maybe still index them, according to this: "While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web."
This is what the rel=canonical tag is designed for. You should use that to tell Google the page is duplicate content of another page on your site, and that it should refer to that other page. You can read (and watch a video) about that here.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirect chains error on the home page URL
Hello Everyone, I'm getting redirect chains error on the home page URL:
On-Page Optimization | | Nikhil_Falcon
http://ebitdacatalyst.com in Wordpress. I've checked my redirection list in the plugin, and haven't found any redirections on http://ebitdacatalyst.com. Can anyone please help me in solving this issue? I don't know where from it's coming.0 -
Url shows up in "Inurl' but not when using time parameters
Hey everybody, I have been testing the Inurl: feature of Google to try and gauge how long ago Google indexed our page. SO, this brings my question. If we run inurl:https://mysite.com all of our domains show up. If we run inurl:https://mysite.com/specialpage the domain shows up as being indexed If I use the "&as_qdr=y15" string to the URL, https://mysite.com/specialpage does not show up. Does anybody have any experience with this? Also on the same note when I look at how many pages Google has indexed it is about half of the pages we see on our backend/sitemap. Any thoughts would be appreciated. TY!
On-Page Optimization | | HashtagHustler1 -
Multi Keyword URL Ranking at Number 1
Here is part of a URL that takes the local number 1 spot for "implant dentist glasgow" [website] /implant-dentistry-glasgow-scotland/implant-dentistry-glasgow-scotland.html The first /implant-dentistry-glasgow-scotland/ directory or page is protected and presumably just exists for ranking reasons. I am tempted to copy that URL on a client's implant page to compete for the keyword (I believe I have better content). Given that it works well for the other site, can you think of any reason that would that be a bad idea? Thanks very much.
On-Page Optimization | | neilmac0 -
How do I create multiple page URLs that are optimized for location and keywords that may be overlapping or the same?
Hi guys, I am attempting to create unique URLs for several different pages on a website. Let's say hypothetically that this is a website for a chain of Ice Cream Shops in Missouri. Let's say they have 15 locations in Springfield, Missouri. I would ideally like to optimize our Ice Cream Shop's in Springfield, Missouri with the main keyword (ice cream) but also the geo-specific location (Springfield), but we obviously can't have duplicate URLs for these 15 locations. We also have several secondary keywords, think things like: frozen yogurt or waffle cone that we can also use, although it would most likely be more powerful if we use the primary keyword. Any suggestions for how to go about doing this most effectively? Thanks!
On-Page Optimization | | GreenStone0 -
Paginated URLs are getting Indexed
Hi, For ex: - My site is www.abc.com and Its paginated URLs for www.abc.com/jobs-in-delhi are in the format of : www.abc.com/jobs-in-delhi-1, www.abc.com/jobs-in-delhi-2 and vice versa also i have used pagination tags rel=next and rel=prev. My concern is all the paginated URLs are getting indexed so is their any disadvantage if these URLs are getting indexed as somewhere i have read that link juice may get distributed in case of pagination. isn't it good to use Noindex, Follow so that we can make the Google to understand that paginated page are not so much important and that should not be ranked.
On-Page Optimization | | vivekrathore0 -
Similar URLs
I'm making a site of LSAT explanations. The content is very meaningful for LSAT students. I'm less sure the urls and headings are meaningful for Google. I'll give you an example. Here are two URLs and heading for two separate pages: http://lsathacks.com/explanations/lsat-69/logical-reasoning-1/q-10/ - LSAT 69, Logical Reasoning I, Q 10 http://lsathacks.com/explanations/lsat-69/logical-reasoning-2/q10/ - LSAT 69, Logical Reasoning II, Q10 There are two logical reasoning sections on LSAT 69. For the first url is for question 10 from section 1, the second URL is for question 10 from the second LR section. I noticed that google.com only displays 23 urls when I search "site:http://lsathacks.com". A couple of days ago it displayed over 120 (i.e. the entire site). 1. Am I hurting myself with this structure, even if it makes sense for users? 2. What could I do to avoid it? I'll eventually have thousands of pages of explanations. They'll all be very similar in terms of how I would categorize them to a human, e.g. "LSAT 52, logic games question 12" I should note that the content of each page is very different. But url, title and h1 is similar. Edit: I could, for example, add a random keyword to differentiate titles and urls (but not H1). For example: http://lsathacks.com/explanations/lsat-69/logical-reasoning-2/q10-car-efficiency/ LSAT 69, Logical Reasoning I, Q 10, Car efficiency But the url is already fairly long as is. Would that be a good idea?
On-Page Optimization | | graemeblake0 -
ECommerce URL's
This is based on a clothing retailer, eCommerce site. In an effort to reduce the length of our product names, we are considering removing terms like long-sleeve, short-sleeve, etc., but leaving that information in the URL. Now, the concern is that we would lose some traction in the SERP's if those descriptive words are left out as the product name is also our page title. Then I think keywords as broad as long-sleeve shirt wouldn't serve us well anyways. One idea we have is that the alt tag on the product image could still display the longer product name that would include long-sleeve, etc. thus having the keyword on the product page. Any ideas or suggestions? Hope this is clear. Seems redundant from a user standpoint to state long-sleeve, etc. in every product name. Thanks - your answers are always so helpful!
On-Page Optimization | | kennyrowe0 -
Using magentos own url re-writes
We are changing ecommerce platforms. Is it best to use magentos own url re-writes to redirect every page of a site from its old url to it new one?
On-Page Optimization | | LadyApollo0