Google Indexing Duplicate URLs : Ignoring Robots & Canonical Tags
-
Hi Moz Community,
We have the following robots command that should prevent URLs with tracking parameters being indexed.
Disallow: /*?
We have noticed google has started indexing pages that are using tracking parameters. Example below.
These pages are identified as duplicate content yet have the correct canonical tags:
With various affiliate feeds available for our site, we effectively have duplicate versions of every page due to the tracking query that Google seems to be willing to index, ignoring both robots rules & canonical tags.
Can anyone shed any light onto the situation?
-
Google's multi-layered multi-algorithm system has come a long way in being able to "figure it all out", yet at the same time, falls far short of always successfully "getting it right".
Robots.txt files are no longer an absolute directive. They're now "just another signal", as are canonical tags, meta robots instructions, and their own Google Webmaster URL Parameters system.
Because of this its critical to be consistent across all signals. If you've got the robots.txt file set to not index pages, but also have inbound links from affiliates, that's a prime example of where inbound link signals can override the robots.txt file's instruction if they're not nofollowed links.
While they technically SHOULD not index them after discovering them off-site (because the destination says "index this other version"), that's part of their confused multilayered system.
I have a question though - from what limited information you've provided, this example is based on a url parameter of ?ec=
When I search Google using site:http://www.oakfurnitureland.co.uk/ inurl:ec
I see only three such pages indexed AND where those pages are "fully" indexed. All the rest (over 1,000 additional URLs), are in the Google system, however every one of those others has a meta description of "A description for this result is not available because of this site's robots.txt - learn more."
What that means is they are NOT fully indexing those pages - there is no worry to be had about duplicate content for those. Google is simply tracking that those URLs exist.
So - is that the only URL parameter you're worried about? If so, it's not a major problem on your site. Except for those few exceptions, Google is doing what you need them to do with those.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Near Duplicate Title Tag Checker
Hi Everyone, I know there are a lot of tools like Siteliner, which can check the uniqueness of body copy, but are there any that can restrict the check to the title tags alone? Alternatively, is there an Excel or Google Sheets function that would allow me to do the same thing? Thanks, Andy
Intermediate & Advanced SEO | | AndyRSB0 -
Displaying Vanity URL in Google Search Result
Hi Moz! Not sure if this has been asked before, but is there any way to tell Google to display a vanity URL (that has been 301d) instead of the actual URL in the SERP? Example: www.domainA.com is a vanity URL (bought specifically for Brand Identity reasons) that redirects to www.domainB.com. Is it possible to have the domainA Url show up in Google for a Branded search query? Thanks in advance! Arjun
Intermediate & Advanced SEO | | Lauriedechaseaux0 -
Href Lang & Canonical Tags
Hi I have 2 issues appearing on my site audit, for a number of pages. I don't think I actually have an issue but just want to make sure. Using this page as an example - http://www.key.co.uk/en/key/0-5-l-capacity-round-safety-can-149p210 The errors I get are: 1. Conflicting hreflang and rel=canonical Canonical page points to a different language URL - when using href & canonicals, it states I need a self referential canonical . The page above is a SKU page, so we include a canonical back to the original model page so we don't get lots of duplicate content issues. Our canonical will point to - http://www.key.co.uk/en/key/justrite-round-safety-cans 2. No self referencing hreflang. Are these big issues? I'd think the bigger issue would be if I add self referencing canonicals and end up with lots of duplicate content. Any advice would be much appreciated 🙂
Intermediate & Advanced SEO | | BeckyKey0 -
I currently have a canonical tag pointing to a different url for single page categories on eCommerce site. Is this wrong ?
Hi Mozzers, I have a query regarding canonical tags on my eCommerce site.. Basically on my category pages whereby I have more than 1 page, I currently use next/prev rel and also have a canonical tag pointing to the View all version of that page. This is believe is correct.(see example - http://goo.gl/2gz6LV However, from looking at the view source on my other pages, I have noticed I have canonical tags on all my category pages which are only a single page and these canonicaltag are pointing to a different url. I enclose an example . Please advise Category page - http://goo.gl/Pk4zYl This is where the canonical tag points to - http://goo.gl/EwKv26 Another example Category Page - http://goo.gl/4gWTdD This is where the canonical tag for that page points to http://goo.gl/qm4HV7 Should I either make sure that categories that are only 1 page , don't have a canonical tag at all ? or do I have a canonical tag on say every page on my website for safety pointing to the main url for that page. The later, I imagine would be a belt and braces approach but I don't want to screw up anything if it's not advised? Please help/ Kind regards Pete
Intermediate & Advanced SEO | | PeteC120 -
How to get content to index faster in Google.....pubsubhubbub?
I'm curious to know what tools others are using to get their content to index faster (other than html sitmap and pingomatic, twitter, etc) Would installing the wordpress pubsubhubbub plugin help even though it uses pingomatic? http://wordpress.org/extend/plugins/pubsubhubbub/
Intermediate & Advanced SEO | | webestate0 -
Duplicate URL home page
I just got a duplicate URL error on by SEOMOZ report - and I wonder if I should worry about it Assume my site is named www.widgets.com I'm getting duplicate url from http://www.widgets.com & http://www.widgets.com/ Do the search engines really see this as different on the home page? The general drift on the web is that You site should look like Home page = http://www.widgets.com And subpages http://www.widgets.com/widget1/ Of course it seems as though the IIS7 slash tool will rewrite everything Including the home page to a slash.
Intermediate & Advanced SEO | | ThomasErb0 -
Capitals in url creates duplicate content?
Hey Guys, I had a quick look around however I couldn't find a specific answer to this. Currently, the SEOmoz tools come back and show a heap of duplicate content on my site. And there's a fair bit of it. However, a heap of those errors are relating to random capitals in the urls. for example. "www.website.com.au/Home/information/Stuff" is being treated as duplicate content of "www.website.com.au/home/information/stuff" (Note the difference in capitals). Anyone have any recommendations as to how to fix this server side(keeping in mind it's not practical or possible to fix all of these links) or to tell Google to ignore the capitalisation? Any help is greatly appreciated. LM.
Intermediate & Advanced SEO | | CarlS0 -
Index.php canonical/dup issues
Hello my fellow SEOs! I would LOVE some additional insight/opinions on the following... I have a client who is an industry leader, big site, ranks for many competitive phrases, blah blah..you get the picture. However, they have a big dup content/canonical issue. Most pages resolve with and without the /index.php at the end of the URL. Obviously this is a dup content issue but more importantly they SEs sometimes serve an "index.php" version of the page, sometimes they don't, and it is constantly changing which version it serves and the rank goes up and down. Now, I've instructed them that we are going to need to write a sitewide redirect to attempt a uniform structure. Most people would say, redirect to the non index.php version buttttt 1. The index.php pages consistently outperforms the non index.php versions, except the homepage. 2. The client really would prefer to have the "index.php" at the end of the URL The homepage performs extremely well for a lot of competitive phrases. I'd like to redirect all pages to the "index.php" version except the homepage and I'm thinking that if I redirect all pages EXCEPT the homepage to the index.php version, it could cause some unforeseen issues. I can not use rel=canonical because they have many different versions of the their pages with different country codes in the URL..example, if I make the US version canonical, it will hurt the pages trying to rank with a fr URL, de URL, (where fr/de are country codes in the URL depending where the user is, it serves the correct version). Any advice would be GREATLY appreciated. Thanks in advance! Mike
Intermediate & Advanced SEO | | MikeCoughlin0