Robots.txt: excluding URL
-
Hi,
spiders crawl some dynamic urls in my website (example: http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/ + http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/open=true) as different pages, resulting duplicate content of course.
What is syntax for disallow these kind of urls in robots.txt?
Thanks so much
-
You don't want to do this in robots.txt. If you serve pages with these parameters, people will inevitably link to them, and even if they're disallowed in your robots.txt file, Google maybe still index them, according to this: "While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web."
This is what the rel=canonical tag is designed for. You should use that to tell Google the page is duplicate content of another page on your site, and that it should refer to that other page. You can read (and watch a video) about that here.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Correct robots.txt for WordPress
Hi. So I recently launched a website on WordPress (1 main page and 5 internal pages). The main page got indexed right off the bat, while other pages seem to be blocked by robots.txt. Would you please look at my robots file and tell me what‘s wrong? I wanted to block the contact page, plugin elements, users’ comments (I got a discussion space on every page of my website) and website search section (to prevent duplicate pages from appearing in google search results). Looks like one of the lines is blocking every page after ”/“ from indexing, even though everything seems right. Thank you so much. FzSQkqB.jpg
On-Page Optimization | | AslanBarselinov1 -
URL Structure Suggestion
Hi
On-Page Optimization | | sandeep.clickdesk
My site url: http://goo.gl/AiOgu1
We are working on URL structure of our website. I have one query about URL structure.
Which one is good URL structure according to user and SEO prospective.
The targeted keyword for the particular page is "wordpress live chat". Is it worthful to rewrite the present url "https://www.abc.com/wordpress" to "https://www.abc.com/wordpress-live-chat" Please suggest.0 -
Similar URLs
I'm making a site of LSAT explanations. The content is very meaningful for LSAT students. I'm less sure the urls and headings are meaningful for Google. I'll give you an example. Here are two URLs and heading for two separate pages: http://lsathacks.com/explanations/lsat-69/logical-reasoning-1/q-10/ - LSAT 69, Logical Reasoning I, Q 10 http://lsathacks.com/explanations/lsat-69/logical-reasoning-2/q10/ - LSAT 69, Logical Reasoning II, Q10 There are two logical reasoning sections on LSAT 69. For the first url is for question 10 from section 1, the second URL is for question 10 from the second LR section. I noticed that google.com only displays 23 urls when I search "site:http://lsathacks.com". A couple of days ago it displayed over 120 (i.e. the entire site). 1. Am I hurting myself with this structure, even if it makes sense for users? 2. What could I do to avoid it? I'll eventually have thousands of pages of explanations. They'll all be very similar in terms of how I would categorize them to a human, e.g. "LSAT 52, logic games question 12" I should note that the content of each page is very different. But url, title and h1 is similar. Edit: I could, for example, add a random keyword to differentiate titles and urls (but not H1). For example: http://lsathacks.com/explanations/lsat-69/logical-reasoning-2/q10-car-efficiency/ LSAT 69, Logical Reasoning I, Q 10, Car efficiency But the url is already fairly long as is. Would that be a good idea?
On-Page Optimization | | graemeblake0 -
Two different keywords - one URL
We're new to SEO, but have two keywords that are really not quite the same, but Google has targeted the same URL for us ... which means that SEO Moz is recommending we optimize the same URL, for opposite keywords (using the on page SEO). For example, the keywords (these aren't our keywords) of say, "beer brewing" and "ways to make beer for small breweries" are both pointing at our home page. The on page SEO is showing that "beer brewing" is a rank of say, a google ranking of 9. However, "ways to ..." is a google ranking of 47. So ... what am I supposed to do now? Do I rewrite the page to have "ways to ..." more prominent? I cannot really have the title and h1's include both ... What do I do now? We have about 3 or 4 of these "pairs". -- Anthony
On-Page Optimization | | apresley0 -
Can duplicate content issues be solved with a noindex robot metatag?
Hi all I have a number of duplicate content issues arising from a recent crawl diagnostics report. Would using a robots meta tag (like below) on the pages I don't necessarily mind not being indexed be an effective way to solve the problem? Thanks for any / all replies
On-Page Optimization | | joeprice0 -
Close URL owned by competitors.
The following example is exactly analogous to our situation (site names slightly altered😞 We own www.business-skills.com. It's our main site. We don't own, and would rather avoid paying for, www.businessskills.com. It's a parked domain and the owners want a very large sum for it. We own www.business-skills.co.uk and point it to our main site. We don't own www.businessskills.co.uk. This is owned by our biggest competitor. We also own www.[ourbrand].com and .co.uk, and point them to the main site. My question is - how much traffic do you think we may be missing due to these nearly-but-not-quite URL matches? Does it matter in terms of lost revenue? What sort of things should I be looking at to get a very rough estimate?
On-Page Optimization | | JacobFunnell0 -
What's the best practice for implementing a "content disclaimer" that doesn't block search robots?
Our client needs a content disclaimer on their site. This is a simple "If you agree to these rules then click YES if not click NO" and you're pushed back to the home page. I have this gut feeling that this may cause an upset with the search robots. Any advice? R/ John
On-Page Optimization | | TheNorthernOffice790 -
Can someone please help me identify where all these URLS to my homepage are coming from?
Hi. I installed the SEOmoz toolbar for Firefox, and analyzed my home page, then clicked on 'get a full site analysis at Site Explorer'. This is what came up: http://www.opensiteexplorer.org/www.frs-solutions.com%252Fcontent%252Fhome/a!links?src=mb I hope that link works. If not, the URL is www.frs-solutions.com Anyway, there are about 57 different URLS within my site all pointing to my homepage! I have no idea where they are coming from. Can someone with an experienced eye take a quick look and tell me what I might be up against? Thank you!
On-Page Optimization | | aprilm-1890400