Pages excluded from Google's index due to "different canonicalization than user"
-
Hi MOZ community,
A few weeks ago we noticed a complete collapse in traffic on some of our pages (7 out of around 150 blog posts in question). We were able to confirm that those pages disappeared for good from Google's index at the end of January '18, they were still findable via all other major search engines.
Using Google's Search Console (previously Webmastertools) we found the unindexed URLs in the list of pages being excluded because "Google chose different canonical than user". Content-wise, the page that Google falsely determines as canonical instead has little to no similarity to the pages it thereby excludes from the index.
About our setup:
We are a SPA, delivering our pages pre-rendered, each with an (empty) rel=canonical tag in the HTTP header that's then dynamically filled with a self-referential link to the pages own URL via Javascript. This seemed and seems to work fine for 99% of our pages but happens to fail for one of our top performing ones (which is why the hassle ).
What we tried so far:
- going through every step of this handy guide: https://moz.com/blog/panic-stations-how-to-handle-an-important-page-disappearing-from-google-case-study --> inconclusive (healthy pages, no penalties etc.)
- manually requesting re-indexation via Search Console --> immediately brought back some pages, others shortly re-appeared in the index then got kicked again for the aforementioned reasons
- checking other search engines --> pages are only gone from Google, can still be found via Bing, DuckDuckGo and other search engines
Questions to you:
- How does the Googlebot operate with Javascript and does anybody know if their setup has changed in that respect around the end of January?
- Could you think of any other reason to cause the behavior described above?
Eternally thankful for any help!
-
Hi SvenRi, that's an interesting one! The message you're getting from Google suggests that, rather than not finding the canonical tag, the system has reason to believe that the canonical is not representative of the best content.
One thing I'd bear in mind is that Google doesn't take canonical tags as gospel, but rather guidance, so it can just ignore them without there necessarily being a problem in how you've implemented that tag. Another is that while Google says that their crawlers can parse JavaScript, there's evidence that it doesn't parse the page content perfectly.
What happens when you fetch and render the pages in question using Search Console (both the page you want to rank and the page Google is selecting)? Can you see all of the content? Google uses the same JavaScript rendering as Chrome 41 (see here) have you tried accessing with that? You could also try a tool like Screaming Frog with JavaScript rendering switched on to see what kind of page content comes back. It could be worth making sure the canonical is generated properly but I'd also be checking that the page content is being rendered properly to make sure Google is seeing the pages as different as you describe. I'd also check to make sure there isn't a second, conflicting, canonical tag on the page. I know some SPA frameworks can have issues with double-opening HTML tags when one page is accessed after another, that could be something that would confuse a crawler so you could double-check that.
As ever, there are the rumours that Google will start giving much more weight to mobile in terms of indexing. Given your question about things changing recently - does your site have desktop and mobile parity?
If it looks as though everything is kosher, is it possible that the page Google is suggesting is much more heavily linked to internally or externally? If internally you could consider reviewing your internal linking (Will wrote a post about ways to think about internal linking here). You could use a tool like Majestic to look at who is linking to these pages externally, it may be worth double checking that all the links are genuine.
TL;DR I would start with the whole page content, not just the search directives, to make sure that's always being understood properly, then I would look in to linking. These are mainly areas of investigation and next debug steps, hopefully they'll help narrow down the search for you!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ranking 1st for a keyword - but when 's' is added to the end we are ranking on the second page
Hi everyone - hope you are well. I can't get my head around why we are ranking 1st for a specific keyword, but then when 's' is added to the end of the keyword - we are ranking on the second page. What could be the cause of this? I thought that Google would class both of the keywords the same, in this case, let's say the keyword was 'button'. We would be ranking 1st for 'button', but 'buttons' we are ranking on the second page. Any ideas? - I appreciate every comment.
Intermediate & Advanced SEO | | Brett-S0 -
Could this be seen as duplicate content in Google's eyes?
Hi I'm an in-house SEO and we've recently seen Panda related traffic loss along with some of our main keywords slipping down the SERPs. Looking for possible Panda related issues I was wondering if the following could be seen as duplicate content. We've got some very similar holidays (travel company) on our website. While they are different I'm concerned it may be seen as creating content that is too similar: http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/the-wildlife-and-beaches-of-kenya.aspx http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/ultimate-kenya-wildlife-and-beaches.aspx http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/wildlife-and-beach-family-safari.aspx They do all have unique text but as you can see from the titles, they are very similar (note from an SEO point of view the tabbed content is all within the same page at source level). At the top level of the holiday pages we have a filtered search:
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays.aspx These pages have a unique introduction but the content snippets being pulled into the boxes is drawn from each of the individual holiday pages. I'm just concerned that these could be introducing some duplicating issues. Any thoughts?0 -
Using unique content from "rel=canonical"ized page
Hey everyone, I have a question about the following scenario: Page 1: Text A, Text B, Text C Page 2 (rel=canonical to Page 1): Text A, Text B, Text C, Text D Much of the content on page 2 is "rel=canonical"ized to page 1 to signalize duplicate content. However, Page 2 also contains some unique text not found in Page 1. How safe is it to use the unique content from Page 2 on a new page (Page 3) if the intention is to rank Page 3? Does that make any sense? 🙂
Intermediate & Advanced SEO | | ipancake0 -
Add noindex,nofollow prior to removing pages resulting in 404's
We're working with another site that unfortunately due to how their website has been programmed creates a bit of a mess. Whenever an employee removes a page from their site through their homegrown 'content management system', rather than 301'ing to another location on their site, the page is deleted and results in a 404. The interim question until they implement a better solution in managing their website is: Should they first add noindex,nofollow to the pages that are scheduled to be removed. Then once they are removed, they become 404's? Of note, it is possible that some of these pages will be used again in the future, and I would imagine they could submit them to Google through Webmaster Tools and adding the pages to their sitemap.
Intermediate & Advanced SEO | | Prospector-Plastics0 -
Google's Exact Match Algorithm Reduced Our Traffic!
Google's first Panda de-valued our Web store, www.audiobooksonline.com, and our traffic went from 2500 - 3000 (mostly organic referrals) per month to 800 - 1000. Google's under-valuing of our Web store continued to reduce our traffic to 400-500 for the past few months. From 4/5/2013 to 4/6/2013 our traffic dropped 50% more, because (I believe) of Google's "exact domain match" algorithm implementation. We were, even after Panda and up to 4/5/2013 getting a significant amount of organic traffic for search terms such as "audiobooks online," "audio books online," and "online audiobooks." We no longer get traffic for these generic keywords. What I don't understand is why a UK company, www.audiobooksonline.co.uk/, with a very similar domain name, ranks #5 for "audio books online" and #4 for "audiobooks online" while we've almost disappeared from Google rankings. By any measurement I am aware of, our site should rank higher than audiobooksonline.co.uk. Market Samurai reports for "audio books online" and "audiobooks online" shows that our Web store is significantly "stronger" than audiobooksonline.co.uk but they show up on Google's first page and we are down several pages. I also checked a few titles on audiobooksonline.co.uk and confirmed they are using the same publisher descriptions we and many other online book / audiobook merchants do = duplicate content. We have never received notice that our Web store was being penalized. Why would audiobooksonline.co.uk rank so much higher than audiobooksonline.com? Does Google treat non-USA sites different than USA sites?
Intermediate & Advanced SEO | | lbohen0 -
Can SEO increase a page's Authority? Or can Authority only be earned via #RCS?
Hi all. I am asking this question to purposefully provoke a discussion. The CEO of the company where I am the in-house SEO sent me a directive this morning. The directive is to take our Website from a PR3 site to a PR5....in 6 months. Now, I know Page Rank is a bit of a deprecated concept, but I'm sure you would agree that "Authority" is still crucial to ranking well. When he first sent me the directive it was worded like this "I want a plan in place with the goal being to "beat" a specific competitor in 6 months." When I prodded him to define "beat," i.e. did he mean "outrank" for every keyword, he answered that he wanted our site to have the same "Authority" that this particular competitor has. So I am left pondering this question: Is it possible for SEO to increase the authority of a page? Or does "Authority" come from #RCS? The second part of this question is what would you do if you were in my shoes? I have been devoting huge amounts of time on technical SEO because the Website is a mess. Because I've dedicated so much time to technical issues, link-earning has taken a back seat. In my mind, why would anyone want to link to a crappy site that has serious technical issues (slow load times, no persistent cart, lots of 404s, etc)? Shouldn't we make the site awesome before trying to get people to link to us? Given this directive to improve our site's "Authority" - would you scrap the technical SEO and go whole hog into a link-earning binge, or would you hunker down and pound away at the technical issues? Which one would you do first if you couldn't do both at the same time? Comments, thoughts and insights would be greatly appreciated.
Intermediate & Advanced SEO | | danatanseo1 -
Is there any negative SEO effect of having comma's in URL's?
Hello, I have a client who has a large ecommerce website. Some category names have been created with comma's in - which has meant that their software has automatically generated URL's with comma's in for every page that comes beneath the category in the site hierarchy. eg. 1 : http://shop.deliaonline.com/store/music,-dvd-and-games/dvds-and-blu_rays/ eg. 2 : http://shop.deliaonline.com/store/music,-dvd-and-games/dvds-and-blu_rays/action-and-adventure/ etc... I know that URL's with comma's in look a bit ugly! But is there 'any' SEO reason why URL's with comma's in are any less effective? Kind Regs, RB
Intermediate & Advanced SEO | | RichBestSEO0 -
Does Google crawl the pages which are generated via the site's search box queries?
For example, if I search for an 'x' item in a site's search box and if the site displays a list of results based on the query, would that page be crawled? I am asking this question because this would be a URL that is non existent on the site and hence am confused as to whether Google bots would be able to find it.
Intermediate & Advanced SEO | | pulseseo0