Any SEO-wizards out there who can tell me why Google isn't following the canonicals on some pages?

Inevo

Hi,

I am banging my head against the wall regarding the website of a costumer: In "duplicate title tags" in GSC I can see that Google is indexing a whole bunch parametres of many of the url's on the page. When I check the rel=canonical tag, everything seems correct. My costumer is the biggest sports retailer in Norway. Their webshop has approximately 20 000 products. Yet they have more than 400 000 pages indexed by Google.

So why is Google indexing pages like this? What is missing in this canonical?https://www.gsport.no/herre/klaer/bukse-shorts?type-bukser-334=regnbukser&order=price&dir=descWhy isn't Google just cutting off the ?type-bukser-334=regnbukser&order=price&dir=desc part of the url?Can it be the canonical-tag itself, or could the problem be somewhere in the CMS?

Looking forward to your answers

Sigurd

Inevo

Thank you all! I have forwarded this to the owner of the page, so now we'll just sit back and see the effects

danatanseo

Hi Inevo,

David and Jake's comments and recommendations are spot on correct. You need to update your robots.txt file. Jake is correct when he said "just because a canonical tag is in place, that doesn't prevent Google from crawling and indexing the page."

Sincerely,

Dana

davebuts

Hi Inevo,

Canonical tags are being used correctly and it doesn't actually look like any of the URLs with query strings are indexed in Google.

I'm going to go off the topic of canonicals now, but still related to the crawl and index of the site:

Has the site changed CMS in the last year or two? It's possible that some of the 400k URLs indexed are old or were not canonicalized properly at some point in time, so they were indexed.

The problem with how the site it currently setup is that it is basically impossible for search engines to crawl because of the product filter. I wrote an article about this a while ago (link), specifically to do with product filters in Magento. Product filters can turn your site into a 'black hole' for search engines - which is definitely happening in this case (try crawling it with Screaming Frog).

I'd recommend blocking product filter URLs from being crawled so that search engines are only crawling important pages on the site.

You should be able to fix this be adding these 3 lines to your Robots.txt:

Disallow: *?
Disallow: *+
Allow: *?p=

(Note: please check that you don't need to add more parameters to Allow)

These changes will make crawling your site much more efficient - from millions of crawlable URLs, to probably 30-35k.

Let me know how this goes for you

Cheers,

David

HiveDigitalInc

I would definitely check to make sure the canonical tag is being properly used. Make sure it is an absolute url vs. a relative url.

That being said, please note that just because a canonical tag is in place, that doesn't prevent Google from crawling and indexing the page, and including the page in search results with the site:domain command. If you see the canonicalized URLs outranking their canonical, then you can start to question why Google isn't honoring the canonical.

Please note that canonical tags are a recommendation and not a directive.. meaning Google doesn't have to honor them if they do not feel the page is truly a canonical.

-Jake

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Any SEO-wizards out there who can tell me why Google isn't following the canonicals on some pages?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Google didn't show my correct language-version homepage.

Google Ignoring region settings on contact pages

Duplicate page errors from pages don't even exist

Google insists robots.txt is blocking... but it isn't.

Home page deindexed by google

132 pages reported as having Duplicate Page Content but I'm not sure where to go to fix the problems?

Our Development team is planning to make our website nearly 100% AJAX and JavaScript. My concern is crawlability or lack thereof. Their contention is that Google can read the pages using the new #! URL string. What do you recommend?

I have 15,000 pages. How do I have the Google bot crawl all the pages?