"Google-selected canonical different to user-declared" - issues
-
Hi Moz!
We are having issues on a number of our international sites where Google is choosing our page 2 of a category as the canonical over page 1. Example; https://www.yoursclothing.de/kleider-grosse-groessen (Image attached).
We currently use infinite loading, however when javascript is disabled we have a text link to page 2 which is done via a query string of '?filter=true&view=X&categoryid=X&page=2'
Page 2 is blocked via robots.txt and has a canonical pointing at page 1.
Due to Google selecting page 2 as the canonical, the page is no longer ranking. For the main keyphrase a subcategory page is ranking poorly.
-
Sounds like you had the best of intentions by giving a non-JS fallback but that it came back to bite you
By the way, this gives evidence to something else that I'm always, always banging on about - Google 'can' render JS and do headless browser renders of a web-page when crawling, but they don't do this for everyone and they don't do it all the time (even for sites large enough to warrant such increased crawl resources). Rendered crawling is like 10x slower than basic source code scraping, and Google's mission is to index the web. Obviously they're not going to take a 10x efficiency hit on their MO for just anyone
Sorry about that, needed to get it off my chest as people are always linking articles saying "LOOK! Google can do JS crawling now we don't have to make sure our non-modified source code is solid any more". YES YOU DO - INTERNET
Ok done now. Let's focus on the query at hand
So you have this lovely page here which you have quoted: https://www.yoursclothing.de/kleider-grosse-groessen
It looks like this:
https://d.pr/i/QVNfKR.png (screenshot)
And you can scroll it down, and it infinitely loads - and you only see the bottom of the results (with no page changing button) when results run out, like this:
https://d.pr/i/XECK5Q.png (screenshot)
But when JS is disabled (or if you're fast like some kind of ninja cat, and you scroll down to the bottom of the page and find the button before the infinite load modifies the page-contents... but no mainly, just when JS is disabled) - then you get this button here:
https://d.pr/i/4Y9T9Y.png (screenshot)
... and when you click the button you end up on another page like this one:
https://www.yoursclothing.de/kleider-grosse-groessen?filter=true&view=32&categoryid=3440&page=2
... where you see "&page=2" at the end there, which is the parameter modifier which changes the active page of contents
Google are sometimes choosing the sub-pages of results as canonical when you guys don't want them to do that. You want to know why, what you have done isn't really working and what you could do instead. Got it
IMPORTANT Disclaimer: Google decides to rank pages for a number of reasons. If Google really does feel that sometimes, sub-pages of your results are 'better' (maybe they have better products on some of the paginated URLs, a better mix of products or products which fit Google's idea of fair pricing better than the default feed...) - there is no guarantee that 'correcting' this 'error' will result in the same rankings you have now. I just want to be 100% clear on that point, you might even lose some rankings if Google is really decided. They have told you, they are overriding your choice and usually there's some kind of reason on that. Sometimes it's a 'just past the post' decision where you can correct them and get basically the same rankings on other pages, other times you can lose rankings or they just won't shift it
Still with me? Ok let's look at what you did here:
-
On the page 2 (and page 3, and however many paginated URLs there are) you have a canonical tag pointing to the parent
-
And you have blocked the paginated URLs in robots.txt
I need to start by querying the fact that you say the page 2s (and assumedly other sub pages, like page 3s - e.g: https://www.yoursclothing.de/kleider-grosse-groessen?filter=true&view=32&categoryid=3440&page=3) - are blocked in robots.txt
DeepCrawl's indexation plugin doesn't see them as blocked:
https://d.pr/i/1cRShK.png (screenshot)
It says about the canonical tag, but it says nothing about the robots.txt at all!
So lets look at your robots.txt file:
https://www.yoursclothing.de/robots.txt
https://d.pr/i/YbyEGl.png (screenshot)
Nothing under # BlockSecureAreas handles pagination
But then under # NoIndex we have this entry:
Disallow: /filter=true
That _should _handle it, as pagination never occurs without a filter being applied (at least as far as I can see)
Indeed using this tool that I like, if I just paste in only the relevant parts:
https://d.pr/i/TVafTL.png (screenshot)
**We can see that the block is effective **(so DeepCrawl, your Chrome tool is probably wrong somehow - maybe they will see this new link, read and fix it!)
I did notice that there's some weird, unrequired indentation in your robots.txt file. Could that cause problems for Google? Could it, at the least - make Google think "well if there's syntax errors in here, maybe it's not worth obeying as it's probably wrong" - quite possibly
In my opinion that's not likely to be part of it
So if it's not that, then what!?
Well it could be that you're using robots.txt in the wrong capacity. Robots.txt _doesn't _stop Google from indexing web pages or tell them not to index web-pages (which is why it's funny that you have commented with "# NoIndex" - that's not what robots.txt does!)
Robots.txt dissuades Google from 'crawling' (but not indexing) a URL. If they can find signals from around the web (maybe backlinks) or if they believe the content on the URL is better via other means, they can (and will) still index a URL without necessarily crawling it. Robots.txt does not do, what Meta no-index does (which can be fired through the HTTP header, or via HTML)
Also, riddle me this if you will. If Google isn't allowed to crawl your URLs any more, how will it continue to find your canonical tags and find any new no-index tags? Why give Google a directive (canonical tags) on a URL which Google isn't allowed to crawl, and thus they will never see the directive? Sounds backwards to me
My proposed steps:
-
Read, understand and make your own decision on the "disclaimer" I wrote up earlier in this very post
-
If you still want to go ahead, enact the following (otherwise don't!)
-
Remove the robots.txt block so Google can crawl those URLs, or if that rule covers more than just the paginated URLs - leave it in place but add an exclusion for the paginated URLs so they may be crawled
-
Leave all the canonical tags on, good work. Maybe supplement these with a 'no-index' directive which would tell Google not to index those pages (there is no guarantee the canonical URL will replace the no-indexed URL, but you can try your luck - read the disclaimer)
-
Maybe serve status code 410, only to Googlebot (user-agent) when it visits the paginated URLs specifically - to try and encourage Google to think of those URLs as gone. Leave the contents alone, otherwise it's cloaking. Serve the same content to Google and users, but serve googlebot a 410 (gone) status
-
Before enacting the super-aggressive 410 stance, give Google plenty of time to swallow the new "no-index" tags on paginated URLs which weren't there before. A 410 whilst powerful, may cause these not to be read - so do give Google time (a few weeks IMO)
-
If you do adopt the 410 stance, one down-side will be that Google will think your JS fallback is a broken link and this will appear in Google Search Console. To make this less severe (though it probably still will happen), add no-follow directives to the pagination JS-fallback link / button where it appears
-
Once Google seems to have swallowed your wishes and seems to have removed most of these URLs from their index, THEN put the robots.txt block for paginated URLs back on (so it won't all happen again in the future)
-
Try removing the weird indentation formatting from your robots.txt file
-
Smile
Well, that's it from me. Thanks for this one, it was pretty interesting
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ecommerce Canonical Question
Hi all, first question (eek) Could I pick the brains of fellow users around an issue we are having with canonical urls on a magento website. At the moment we do not have these enabled as it seems to break our indexing. Cut a long story short, we have thousands of products but haven't rewritten many of the descriptions from the manufacturers yet and so have noindexed all the product pages (freeing them as we go). The goal, for now, is to pull in traffic via the filtering options we have on the site The goal, for now, is to pull in traffic via the filtering options we have on the site. For example, if you go to Dresses, there then are several filtering options which would allow you to choose a colour, shape and material - if you wished to filter that precisely. These filtering options are all crawlable and so we would then have a page that google could index for, for example, Green Lace Maxi Dress. All good there, few people search for specific products and a lot search for types of products so we are covered. To get back to the issue at hand. If we enable the canonical option on our magento plugin it will stop us from being able to target these terms. Whereas the filtering option would create domain.com/dress/green/maxi/lace with the page title of Green Lace Maxi Dress, if we enable the canonical part of the seo plugin the canonical link which would be added to the page would be - instantly removing our ability to rank for longer tail dress related searches (we are not going to compete with the big players on the premium terms, yet!). There are alternative plugins we can buy for magento to add the correct tag, however, if every page's canonical just points back it itself like this, is there really much point spending nearly $1000 on the 4 licences we would need to cover our range of sites. Is it really necessary, in this case, that we have a canonical for the product filtering? Sorry for the long post, hope it made sense. Thanks for any assistance.
On-Page Optimization | | DSCarl0 -
Duplicate ecommerce domains and canonical
Hi everybody! I'd like to discuss the SEO strategy I've thought regarding a client of mine and ask for help because it's a serious case of duplicate content. There is a main website (the business model one) where he compares the cost of medicines in several pharmacies, to show the cheapest shopping cart to the customer. But the shopping has to been made in another domain, within the selected pharmacie, because my country's law in Europe says that is compulsory to sell the medicines only on the pharmacy website. So my client has started to create domains, one for each pharmacy, where the differences between them are only some products, the business information of the pharmacy and the template's colour. But all of them shares the same product data base. My aim is to rank the comparing website (it contains all the products), not each pharmacy, so I've started to create different content for this one. Should I place rel=canonical in the pharmacies domains t the original one? For instance: www.pharmacie1.com >> www.originaltorank.com www.pharmacie2.com >> www.originaltorank.com www.pharmacie1.com/product-10 >> www.originaltorank.com/product-10 I've already discuss the possibilities to focus all the content in only one website, but it's compulsory to have different domains in order to sell medicines By the way, I can't redirect 301 because I need these websites exist for the same reason (the law) He is creating 1-3 new domains every week so obviously he has had a drop in his SEO traffic that I have to solve this fast. Do you think the canonical will be the best solution? I dont want to noindex these domains beacuse we're creating Google Local pages for each one in order to be found in their villages. Please, I'll appreciate any piece of advice. Thanks!
On-Page Optimization | | Estherpuntu0 -
Query string parameters and canonical links
Hello everyone, My site uses query string parameters in a few places to manage tasks like pagination of lists. Eg: http://www.example.com/destinations/somewhere?page=2 I have set a canonical link with the href of the page without the query string but still getting thousands of duplicate title/meta description reports from these pages. Is there something I can do to change this? Do search engines actually penalise for use of query string parameters like this? They seem so commonplace, even for sites which use an absolute URI with no query string to serve content. Thanks 🙂
On-Page Optimization | | JHWXS0 -
Google Authorship Problems
Hi, I seem to be having a few problems with getting google authors set up on Wordpress. I've set up my G+ account, put the link to my blog http://appointedd.com/blog/ and then registered it on the yoast plugin. However, I'm not sure it's set up correctly and I can't seem to be able to get it to work. I'm hoping a fine someone here has experience in this as I'm a little flustered. thanks.
On-Page Optimization | | LeahHutcheon0 -
Can Rankings in Google differ so much from computer to computer.
I was telling my friend via facebook to go on my website, I told him to search 'nightlife forum' in google. To which, I believed it was 11th, top of second page. On his computer, its currently ranking at 1st place is it possible to have a difference of 10 places? even though he lives in the same city as me. Would be good to see what it ranks on your computers too google "nightlife forum" look for www.talknightlife.co.uk (don't get confused with the .COM one out there) Cheers Guys
On-Page Optimization | | Lukescotty0 -
SERP - Hi How come I get different results on page one of Google with the same query from my colleague who sits next to me? We are both logged out of Google and it’s on google.co.uk thanks in advance Daniel
Hi How come I get different results on page one of Google with the same query from my colleague who sits next to me? We are both logged out of Google and it’s on google.co.uk thanks in advance Daniel
On-Page Optimization | | ds80 -
How to design a site map page for users (not for Google)
I would like to design a site map for my visitors so they can have a quick view on the whole content of the website. 2 questions : 1 - is this kind of site map can help in terms of SEO ? 2 - if so, what are the best practices to design it ? Thanks in advance.
On-Page Optimization | | betadvisor0 -
Canonical Tag for a 404 page
Hi i have a got a 404 page for example : www.example.com/404.aspx can i use canonical tag on this page so that when the search engine hits the page www.example.com/123123123 13123 it will say Will this be right method ?
On-Page Optimization | | usef4u0