Duplicate pages in Google index despite canonical tag and URL Parameter in GWMT
-
Good morning Moz...
This is a weird one. It seems to be a "bug" with Google, honest...
We migrated our site www.three-clearance.co.uk to a Drupal platform over the new year. The old site used URL-based tracking for heat map purposes, so for instance
www.three-clearance.co.uk/apple-phones.html
..could be reached via
www.three-clearance.co.uk/apple-phones.html?ref=menu or
www.three-clearance.co.uk/apple-phones.html?ref=sidebar and so on.
GWMT was told of the ref parameter and the canonical meta tag used to indicate our preference. As expected we encountered no duplicate content issues and everything was good.
This is the chain of events:
-
Site migrated to new platform following best practice, as far as I can attest to.
-
Only known issue was that the verification for both google analytics (meta tag) and GWMT (HTML file) didn't transfer as expected so between relaunch on the 22nd Dec and the fix on 2nd Jan we have no GA data, and presumably there was a period where GWMT became unverified.
-
URL structure and URIs were maintained 100% (which may be a problem, now)
-
Yesterday I discovered 200-ish 'duplicate meta titles' and 'duplicate meta descriptions' in GWMT. Uh oh, thought I. Expand the report out and the duplicates are in fact ?ref= versions of the same root URL. Double uh oh, thought I.
-
Run, not walk, to google and do some Fu:
http://is.gd/yJ3U24 (9 versions of the same page, in the index, the only variation being the ?ref= URI)
Checked BING and it has indexed each root URL once, as it should.
Situation now:
-
Site no longer uses ?ref= parameter, although of course there still exists some external backlinks that use it. This was intentional and happened when we migrated.
-
I 'reset' the URL parameter in GWMT yesterday, given that there's no "delete" option. The "URLs monitored" count went from 900 to 0, but today is at over 1,000 (another wtf moment)
I also resubmitted the XML sitemap and fetched 5 'hub' pages as Google, including the homepage and HTML site-map page.
- The ?ref= URls in the index have the disadvantage of actually working, given that we transferred the URL structure and of course the webserver just ignores the nonsense arguments and serves the page. So I assume Google assumes the pages still exist, and won't drop them from the index but will instead apply a dupe content penalty. Or maybe call us a spam farm. Who knows.
Options that occurred to me (other than maybe making our canonical tags bold or locating a Google bug submission form ) include
A) robots.txt-ing .?ref=. but to me this says "you can't see these pages", not "these pages don't exist", so isn't correct
B) Hand-removing the URLs from the index through a page removal request per indexed URL
C) Apply 301 to each indexed URL (hello BING dirty sitemap penalty)
D) Post on SEOMoz because I genuinely can't understand this.
Even if the gap in verification caused GWMT to forget that we had set ?ref= as a URL parameter, the parameter was no longer in use because the verification only went missing when we relaunched the site without this tracking. Google is seemingly 100% ignoring our canonical tags as well as the GWMT URL setting - I have no idea why and can't think of the best way to correct the situation.
Do you?
Edited To Add: As of this morning the "edit/reset" buttons have disappeared from GWMT URL Parameters page, along with the option to add a new one. There's no messages explaining why and of course the Google help page doesn't mention disappearing buttons (it doesn't even explain what 'reset' does, or why there's no 'remove' option).
-
-
GWT numbers sometimes ignore parameter handling, oddly, and can be hard to read. I'm only seeing about 40 indexed pages with "ref" in the URL, which hardly seems disastrous. One note - once the pages get indexed, for whatever reason, de-indexing can take weeks, even if you do everything correctly. Don't change tactics every couple of days, or you're only going to make this worse, long-term. I think canonicals are fine for this, and they should be effective. It just may take Google some time to re-crawl and dis-lodge the pages. You actually may want to create an XML sitemap (for Google only) that just contains the "ref=" pages Google has indexed. This can nudge them to re-crawl and honor the canonical. Otherwise, the pages could sit there forever. You could 301-redirect - it would be perfectly valid in this case, since those URLs have no value to visitors. I wouldn't worry about the Bing sitemaps - just don't include the "ref=" URLs in the Bing maps, and you'll be fine.
-
Monday morning, still the same, still no reset/add parameters buttons in GMWT any more, still not understanding why Google is being so stubborn about this.
3 identical pages in the index, Google ignoring both GWMT URL parameter and canonical meta tag.
Sigh.
-
Nope, nice clean site map that GWMT says provides the right number of URLs with no 404s and no ?ref= links.
It's like Google has always indexed these links separately but for some reason has decided to only show them now they no longer exist..
-
They arent in your xml sitemap are they? You probably generated a new one when you moved the site over... that could possibly be overriding the parameters... maybe... weird...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URLs dropping from index (Crawled, currently not indexed)
I've noticed that some of our URLs have recently dropped completely out of Google's index. When carrying out a URL inspection in GSC, it comes up with 'Crawled, currently not indexed'. Strangely, I've also noticed that under referring page it says 'None detected', which is definitely not the case. I wonder if it could be something to do with the following? https://www.seroundtable.com/google-ranking-index-drop-30192.html - It seems to be a bug affecting quite a few people. Here are a few examples of the URLs that have gone missing: https://www.ihasco.co.uk/courses/detail/sexual-harassment-awareness-training https://www.ihasco.co.uk/courses/detail/conflict-resolution-training https://www.ihasco.co.uk/courses/detail/prevent-duty-training Any help here would be massively appreciated!
Technical SEO | | iHasco0 -
Duplicate title while setting canonical tag.
Hi Moz Fan, My websites - https://finance.rabbit.co.th/ has run financial service, So our main keywords is about "Insurance" in Thai, But today I have an issues regarding to carnonical tag. We have a link that containing by https://finance.rabbit.co.th/car-insurance?showForm=1&brand_id=9&model_id=18&car_submodel_id=30&ci_source_id=rabbit.co.th&car_year=2014 and setting canonical to this url - https://finance.rabbit.co.th/car-insurance within 5,000 items. But in this case I have an warning by site audit tools as Duplicate Page Title (Canonical), So is that possible to drop our ranking. What should we do, setting No-Index, No-Follow for all URL that begin with ? or keep them like that.
Technical SEO | | ASKHANUMANTHAILAND0 -
Test site got indexed in Google - What's the best way of getting the pages removed from the SERP's?
Hi Mozzers, I'd like your feedback on the following: the test/development domain where our sitebuilder works on got indexed, despite all warnings and advice. The content on these pages is in active use by our new site. Thus to prevent duplicate content penalties we have put a noindex in our robots.txt. However off course the pages are currently visible in the SERP's. What's the best way of dealing with this? I did not find related questions although I think this is a mistake that is often made. Perhaps the answer will also be relevant for others beside me. Thank you in advance, greetings, Folko
Technical SEO | | Yarden_Uitvaartorganisatie0 -
Why are only a few of our pages being indexed
Recently rebuilt a site for an auctioneers, however it has a problem in that none of the lots and auctions are being indexed by Google on the new site, only the pages like About, FAQ, home, contact. Checking WMT shows that Google has crawled all the pages, and I've done a "Fetch as Google" on them and it loads up fine, so there's no crawling issues that is standing out. I've set the "URL Parameters" to no effect too. Also built a sitemap with all the lots in, pushed to Google which then crawled them all (massive spike in Crawl rate for a couple days), and still just indexing a handful of pages. Any clues to look into would be greatly appreciated. https://www.wilkinsons-auctioneers.co.uk/auctions/
Technical SEO | | Blue-shark0 -
Canonical tag problem
Hello I'm newbie here i dont know very well about seo but i would like to ask your help? I'm running report about my website and on report I dont have canonical tag on my products. But if i check from on page report link by link it shows that I have canonical tag. At the same time if i check my pages code i can see below canonical tag codes? Do we use canonical tags wrong? What can cause this different information? Could you please help me? Is it important to use canonical tag beginning or end? I'm using now trial version and trying to understand report is correct what is my mistakes. Thanks in advance My code is
Technical SEO | | FRUTIKO0 -
Duplicate pages, overly dynamic URL’s and long URL’s in Magento
Hi there, I’ve just completed the first crawl of my Magento site and SEOMOZ has picked up 1,000’s of duplicate pages, overly dynamic URL’s and long URL’s due to the sort function which appends URL’s with variables when sorting products (e.g. www.example.com?dir=asc&order=duration). I’m not particularly concerned that this will affect our rankings as Google has stated that they are familiar with the structure of popular CMS’s and Magento is pretty popular. However it completely dominates my crawl diagnostics so I can’t see if there are any real underlying issues. Does anyone know a way of preventing this? Cheers,
Technical SEO | | WendyWuTours
Al.1 -
Should I allow index of category / tag pages on Wordpress?
Quite simply, is it best to allow index of category / tag pages on a Wordpress blog or no index them? My thought is Google will / might see it as duplicate content? Thanks, K
Technical SEO | | SEOKeith0 -
Home page URL disappears in Google after switching to WordPress
It was a 10 page static HTML page website. 3 year old, PR2. Monday night, copied a WordPress from somewhere to this website's public_html folder and activate it. The home page was "index.html" before switching to WordPress. Now this html file (index.html) has been deleted, so WordPress' Home page can work. All other 9 static html pages are still there in Google index. Just notice it today that the home page URL disappears in Google completely. Why? All other 9 static html pages' URL are still in Google. robots.txt is Allow: / What may have gone wrong to remove the home domain URL from Google index? Thank you for your help!
Technical SEO | | johnzhel0