Pages excluded from Google's index due to "different canonicalization than user"

SvenRi

Hi MOZ community,

A few weeks ago we noticed a complete collapse in traffic on some of our pages (7 out of around 150 blog posts in question). We were able to confirm that those pages disappeared for good from Google's index at the end of January '18, they were still findable via all other major search engines.

Using Google's Search Console (previously Webmastertools) we found the unindexed URLs in the list of pages being excluded because "Google chose different canonical than user". Content-wise, the page that Google falsely determines as canonical instead has little to no similarity to the pages it thereby excludes from the index.

About our setup:

We are a SPA, delivering our pages pre-rendered, each with an (empty) rel=canonical tag in the HTTP header that's then dynamically filled with a self-referential link to the pages own URL via Javascript. This seemed and seems to work fine for 99% of our pages but happens to fail for one of our top performing ones (which is why the hassle ).

What we tried so far:

going through every step of this handy guide: https://moz.com/blog/panic-stations-how-to-handle-an-important-page-disappearing-from-google-case-study --> inconclusive (healthy pages, no penalties etc.)
manually requesting re-indexation via Search Console --> immediately brought back some pages, others shortly re-appeared in the index then got kicked again for the aforementioned reasons
checking other search engines --> pages are only gone from Google, can still be found via Bing, DuckDuckGo and other search engines

Questions to you:

How does the Googlebot operate with Javascript and does anybody know if their setup has changed in that respect around the end of January?
Could you think of any other reason to cause the behavior described above?

Eternally thankful for any help!

ldWB9

R0bin_L0rd

Hi SvenRi, that's an interesting one! The message you're getting from Google suggests that, rather than not finding the canonical tag, the system has reason to believe that the canonical is not representative of the best content.

One thing I'd bear in mind is that Google doesn't take canonical tags as gospel, but rather guidance, so it can just ignore them without there necessarily being a problem in how you've implemented that tag. Another is that while Google says that their crawlers can parse JavaScript, there's evidence that it doesn't parse the page content perfectly.

What happens when you fetch and render the pages in question using Search Console (both the page you want to rank and the page Google is selecting)? Can you see all of the content? Google uses the same JavaScript rendering as Chrome 41 (see here) have you tried accessing with that? You could also try a tool like Screaming Frog with JavaScript rendering switched on to see what kind of page content comes back. It could be worth making sure the canonical is generated properly but I'd also be checking that the page content is being rendered properly to make sure Google is seeing the pages as different as you describe. I'd also check to make sure there isn't a second, conflicting, canonical tag on the page. I know some SPA frameworks can have issues with double-opening HTML tags when one page is accessed after another, that could be something that would confuse a crawler so you could double-check that.

As ever, there are the rumours that Google will start giving much more weight to mobile in terms of indexing. Given your question about things changing recently - does your site have desktop and mobile parity?

If it looks as though everything is kosher, is it possible that the page Google is suggesting is much more heavily linked to internally or externally? If internally you could consider reviewing your internal linking (Will wrote a post about ways to think about internal linking here). You could use a tool like Majestic to look at who is linking to these pages externally, it may be worth double checking that all the links are genuine.

TL;DR I would start with the whole page content, not just the search directives, to make sure that's always being understood properly, then I would look in to linking. These are mainly areas of investigation and next debug steps, hopefully they'll help narrow down the search for you!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Pages excluded from Google's index due to "different canonicalization than user"

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Indexed Pages Different when I perform a "site:Google.com" site search - why?

Is the image property really required for Google's breadcrumbs structured data type?

Why isn't my site being indexed by Google?

Pages getting into Google Index, blocked by Robots.txt??

Why isn't my uneven link flow among index pages causing uneven search traffic?

What are the ranking factors for "Google News"? How can we compete?

Thousands of Web Pages Disappered from Google Index

Is 301 redirecting your index page to the root '/' safe to do or do you end up in an endless loop?