"Google-selected canonical different to user-declared" - issues
-
Hi Moz!
We are having issues on a number of our international sites where Google is choosing our page 2 of a category as the canonical over page 1. Example; https://www.yoursclothing.de/kleider-grosse-groessen (Image attached).
We currently use infinite loading, however when javascript is disabled we have a text link to page 2 which is done via a query string of '?filter=true&view=X&categoryid=X&page=2'
Page 2 is blocked via robots.txt and has a canonical pointing at page 1.
Due to Google selecting page 2 as the canonical, the page is no longer ranking. For the main keyphrase a subcategory page is ranking poorly.
-
Sounds like you had the best of intentions by giving a non-JS fallback but that it came back to bite you
By the way, this gives evidence to something else that I'm always, always banging on about - Google 'can' render JS and do headless browser renders of a web-page when crawling, but they don't do this for everyone and they don't do it all the time (even for sites large enough to warrant such increased crawl resources). Rendered crawling is like 10x slower than basic source code scraping, and Google's mission is to index the web. Obviously they're not going to take a 10x efficiency hit on their MO for just anyone
Sorry about that, needed to get it off my chest as people are always linking articles saying "LOOK! Google can do JS crawling now we don't have to make sure our non-modified source code is solid any more". YES YOU DO - INTERNET
Ok done now. Let's focus on the query at hand
So you have this lovely page here which you have quoted: https://www.yoursclothing.de/kleider-grosse-groessen
It looks like this:
https://d.pr/i/QVNfKR.png (screenshot)
And you can scroll it down, and it infinitely loads - and you only see the bottom of the results (with no page changing button) when results run out, like this:
https://d.pr/i/XECK5Q.png (screenshot)
But when JS is disabled (or if you're fast like some kind of ninja cat, and you scroll down to the bottom of the page and find the button before the infinite load modifies the page-contents... but no mainly, just when JS is disabled) - then you get this button here:
https://d.pr/i/4Y9T9Y.png (screenshot)
... and when you click the button you end up on another page like this one:
https://www.yoursclothing.de/kleider-grosse-groessen?filter=true&view=32&categoryid=3440&page=2
... where you see "&page=2" at the end there, which is the parameter modifier which changes the active page of contents
Google are sometimes choosing the sub-pages of results as canonical when you guys don't want them to do that. You want to know why, what you have done isn't really working and what you could do instead. Got it
IMPORTANT Disclaimer: Google decides to rank pages for a number of reasons. If Google really does feel that sometimes, sub-pages of your results are 'better' (maybe they have better products on some of the paginated URLs, a better mix of products or products which fit Google's idea of fair pricing better than the default feed...) - there is no guarantee that 'correcting' this 'error' will result in the same rankings you have now. I just want to be 100% clear on that point, you might even lose some rankings if Google is really decided. They have told you, they are overriding your choice and usually there's some kind of reason on that. Sometimes it's a 'just past the post' decision where you can correct them and get basically the same rankings on other pages, other times you can lose rankings or they just won't shift it
Still with me? Ok let's look at what you did here:
-
On the page 2 (and page 3, and however many paginated URLs there are) you have a canonical tag pointing to the parent
-
And you have blocked the paginated URLs in robots.txt
I need to start by querying the fact that you say the page 2s (and assumedly other sub pages, like page 3s - e.g: https://www.yoursclothing.de/kleider-grosse-groessen?filter=true&view=32&categoryid=3440&page=3) - are blocked in robots.txt
DeepCrawl's indexation plugin doesn't see them as blocked:
https://d.pr/i/1cRShK.png (screenshot)
It says about the canonical tag, but it says nothing about the robots.txt at all!
So lets look at your robots.txt file:
https://www.yoursclothing.de/robots.txt
https://d.pr/i/YbyEGl.png (screenshot)
Nothing under # BlockSecureAreas handles pagination
But then under # NoIndex we have this entry:
Disallow: /filter=true
That _should _handle it, as pagination never occurs without a filter being applied (at least as far as I can see)
Indeed using this tool that I like, if I just paste in only the relevant parts:
https://d.pr/i/TVafTL.png (screenshot)
**We can see that the block is effective **(so DeepCrawl, your Chrome tool is probably wrong somehow - maybe they will see this new link, read and fix it!)
I did notice that there's some weird, unrequired indentation in your robots.txt file. Could that cause problems for Google? Could it, at the least - make Google think "well if there's syntax errors in here, maybe it's not worth obeying as it's probably wrong" - quite possibly
In my opinion that's not likely to be part of it
So if it's not that, then what!?
Well it could be that you're using robots.txt in the wrong capacity. Robots.txt _doesn't _stop Google from indexing web pages or tell them not to index web-pages (which is why it's funny that you have commented with "# NoIndex" - that's not what robots.txt does!)
Robots.txt dissuades Google from 'crawling' (but not indexing) a URL. If they can find signals from around the web (maybe backlinks) or if they believe the content on the URL is better via other means, they can (and will) still index a URL without necessarily crawling it. Robots.txt does not do, what Meta no-index does (which can be fired through the HTTP header, or via HTML)
Also, riddle me this if you will. If Google isn't allowed to crawl your URLs any more, how will it continue to find your canonical tags and find any new no-index tags? Why give Google a directive (canonical tags) on a URL which Google isn't allowed to crawl, and thus they will never see the directive? Sounds backwards to me
My proposed steps:
-
Read, understand and make your own decision on the "disclaimer" I wrote up earlier in this very post
-
If you still want to go ahead, enact the following (otherwise don't!)
-
Remove the robots.txt block so Google can crawl those URLs, or if that rule covers more than just the paginated URLs - leave it in place but add an exclusion for the paginated URLs so they may be crawled
-
Leave all the canonical tags on, good work. Maybe supplement these with a 'no-index' directive which would tell Google not to index those pages (there is no guarantee the canonical URL will replace the no-indexed URL, but you can try your luck - read the disclaimer)
-
Maybe serve status code 410, only to Googlebot (user-agent) when it visits the paginated URLs specifically - to try and encourage Google to think of those URLs as gone. Leave the contents alone, otherwise it's cloaking. Serve the same content to Google and users, but serve googlebot a 410 (gone) status
-
Before enacting the super-aggressive 410 stance, give Google plenty of time to swallow the new "no-index" tags on paginated URLs which weren't there before. A 410 whilst powerful, may cause these not to be read - so do give Google time (a few weeks IMO)
-
If you do adopt the 410 stance, one down-side will be that Google will think your JS fallback is a broken link and this will appear in Google Search Console. To make this less severe (though it probably still will happen), add no-follow directives to the pagination JS-fallback link / button where it appears
-
Once Google seems to have swallowed your wishes and seems to have removed most of these URLs from their index, THEN put the robots.txt block for paginated URLs back on (so it won't all happen again in the future)
-
Try removing the weird indentation formatting from your robots.txt file
-
Smile
Well, that's it from me. Thanks for this one, it was pretty interesting
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using Canonical Tags on Every Page
I'm doing competitive research and noticed that one of our competitors (who outranks us) uses canonical tags on every page on their site. The canonical tags reference the page they are on. For example. www.competitor.com/product has a canonical tag of www.competitor.com/product. Does anyone use this practice? It seems strange to me. Thank you, Kristen
On-Page Optimization | | Ksink0 -
How does user behaviour signalled at Google affect rankings?
Hi, Just a quick question. How important are the signals received by Google on a websites user behaviour? For example: Bounce Rate, Avg Time on site, # of pages viewed, Geo of user etc Thanks.
On-Page Optimization | | followuk0 -
Catergories not appaearing in google help!
Hi I call upon the seomoz experts again this time regarding getting our categories to show in google when you type in our company name Tidy Books. Ive added 3 pictures 1 of our UK store www.tidy-books.co.uk 2 of US store - www.tidy-books.com 3 of a competitor who has over taken us and had the Google mark up we require. Would some love help some help in the last few months we've lost a few places and traffic is going down Thanks everyone ItiqzWV 6g8e1Zy I2qEJCc
On-Page Optimization | | tidybooks0 -
Rel="canonical" at the same page
Hello Everyone!! We have a Joomla Site and in the template we have a php function that create the **link rel="canonical" **and in the href inserts the same page url. For example, if the we do a search and the url have some cookies. That Url is gonna be the **rel="canonical" **for that page. Is it working correctly? We need an advice to to set it up correctly! Thanks!
On-Page Optimization | | mycostaricalink0 -
How to "on page" seo a small local service business - particularly headers
First off, let me apologize if this question is posted elsewhere, worded differently. I've looked around quite a bit and have been unable to find the answer. Basically, we are a small web design firm just getting our feet with with SEO. Most of our clients, especially initially, will be quite small, local, service businesses. For example: and electrician, a pet sitter, a retail printing and map store, a surgeon etc. Almost all of their sites will follow a basic "business card on the web" format... Home Page - About Us - Testimonials - Rates - FAQ - Contact Us - Etc So, from what I've read about on-page optimization, making sure my keywords are in the title, header, body, and meta description is one of the easiest and quickest things we can do for our clients. This is a straightforward concept for me when applied to the homepage. For example, take the local pet sitting business. Her keywords are: Pet sitting, Dog walking, and the city we live in, Anytown USA. So, I've used those keywords in all the appropriate places on the home page: title: Dog Walking and Pet Sitting Service in Anytown USA header: Dog Walking and Pet Sitting Service in Anytown USA first sentence of body: We are a professional Dog Walking and Pet Sitting Service in Anytown USA meta description: We are a professional Dog Walking and Pet Sitting Service in Anytown USA. At Business Name your furry friends become a part of our family. So, my question is: Do I also optimize the "about us" page? I've changed the title of all the pages to follow this format: Dog Walking and Pet Sitting in Anytown USA - Home Dog Walking and Pet Sitting in Anytown USA - About Us Dog Walking and Pet Sitting in Anytown USA - Rates Dog Walking and Pet Sitting in Anytown USA - FAQ Dog Walking and Pet Sitting in Anytown USA - Etc Easy enough so far. Also pretty easy for the meta description, and the body. However, how would I add keywords to the header without making it look ridiculous? We use wordpress with the genesis framework, and child themes from studiopress. The header is always prominently visible at the top of the page. Most people would expect to see the header be the same as the link they clicked on the nav bar: for example, on the "about us" page, people expect the header to be: "about us" Not: "dog walking and pet sitting in Anytown USA - About Us" Do I just not worry about the headers on the other pages? For that matter, I'd really like people to "land" on the home page, not any of the other pages, so should I not optimize them at all? Does optimizing the rest of the pages help the home page to show up higher in the SERPS? If I do end up optimizing the rest of the pages, should I use slightly different spellings of the keywords: like Dog walker instead of dog walking? Or pet sitter instead of pet sitting? I've repeatedly seen people talk about not using the same keywords on more than one page... but for most of these businesses there are really fairly few keywords. There just isn't that many different ways that someone is going to search for an electrician, or a plumber, or a pet sitter. By the second or third page that I optimize on one site, I imagine I'll start running out of different variations of the keywords. I recognize that a lot of what we'll do that will be most helpful to local clients has nothing to do with on page optimization (setting up google places, google+, yahoo + bing local, etc). I'd just like to make sure that I'm doing the on page stuff as perfectly as possible. Thanks for your time and responses! -Matt p.s. while I'm at it, let me ask another question about domain names as well. Right now the pet sitting client mentioned above is using: www.petcare_Anytown_.com After operating her business for the last year she realized she is much more interested in dog walking than pet sitting. We are in the processes of redesigning the site, and when finished, are considering moving it to: www.dogwalking_Anytown_.com My assumption is that as long as we use permanent redirects from the old site to the new one, we shouldn't lose much SEO value. Is this thinking correct? On a related note though: another article I read mentioned that using a brand name in the domain may actually be more useful than the keyword rich domains above. However, www._businessname._com happens to already be taken by a pet sitting business at the other end of the country. We could however use: www.businessnameAnytown.com Which one do you think would work better? The keyword/location domain, or the businessname/location domain? Thanks!
On-Page Optimization | | Webformix0 -
Issue: Duplicate Page Title
When you are in Error status for Duplicate Page Titles - but it is because of the root domain: Example.com and Example.com/index How to you go about changing the title of the same page without looking un-natural. My client has built his site with the - index file pulling to the root - but the crawlers are seeing TWO separate pages - when in reality they are the same. Riddle me this batman?
On-Page Optimization | | Chenzo0 -
Canonical tag help
Hi, We have a product which is marketed by affiliates . Affiliates send referrals to our sale page by adding their affiliate IDs to our product page like http://www.mysite.com/products.php?ref= 12345. We want to avoid the content duplication impression to Google by using canonical tags but we are not clear about its use. Should we use it on http://www.mysite.com/products.php ( actual page) or we should create temporary pages for each referral id i.e http://www.mysite.com/products.php?ref= 12345 and then add canonical tags to all those pages linking to proper page i.e http://www.mysite.com/products.php ? Thanks, shaz
On-Page Optimization | | shaz_lhr0 -
Google and display:none
Hi Guys, i want to know what you think about solution which i have to switch content in tab on my page. Here: http://www.exprestlac.sk/beta/produkt/vizitky i have some important content in tabs, which are switching via javascript. So when you click there on O produkte next to Ceny it will show you product description. My problem is that in source code when page is loaded i have this: Product description.. And after user click on O produkte javascript remove that display:none and show content. But Google will see only display:none as i think. Can i get penalty from Google? Will it index this text? Thanks for your suggestions how to resolve this.
On-Page Optimization | | xman870