What could cause Google to not honor canonical URLs?
-
I have a strange situation on a website, when I do a Google query of site:example.com all the top indexed results appear to be queries that users can perform on the website. So any random term the user searches for on the website for some reason is causing the search result page to get indexed - like example.com/search/query/random-keywords
However, the search results page has a canonical tag on it that points to example.com/search, but that doesn't seem to be doing anything. Any thoughts or ideas why this could be happening?
-
Hi there,
First of all, its a mistake to think that when searching with _site: _operator, the first results are the most important nor the more relevant. Google has said a few times that we shouldn't rely that much on what that search in terms of what's being shown.
Blocking search results with robots.txt wont be of help, as it will not remove already indexed pages and cant prevent for new pages to be indexed (if there's an external link to a robots.txt blocked page, google can still index it) it'll only prevent Googlebot from discovering new ones FROM YOUR SITE.
Again, i'd try to dig deeper to understand where are the links to internal searches that google is finding. Googlebot will not do any search in your site.
The thing with GSC, might be related to quite a few reasons. I cant say much because I don't know any more specifics, but from what you are telling me it looks like you are getting impressions in searches that you don't relate to your site and that land on pages that google is noindexing. Yeah im repeating the obvious, hehe.
In my experience, Google can have these strange behaviours. You know, there are cases when a page is canonicalized, but it can still be shown in SERPS. Dont ask me why, but it happens. It takes a little time to google fully replace it with the correct one.
I'd wait a little longer to see how Google is handling them.I don't know if im helping you.
it kinda took me a few minutes to understand/process what you wrote and come up with an answer.Please, feel free ask again or comment on my reply if I misunderstood something.
Best luck,
Gaston -
Hi here's some more background info on this situation that makes it even stranger. I can perform some pretty specific searches on Google where these indexed search result pages show up. And I can look in Google Search Console under the performance section and see that those pages receive impressions and clicks. However, if I inspect the URL, Search Console says it is not included in Google's index, and the reason it gives under indexing is because it says it is honoring the canonical URL. So search console is saying it isn't indexed because of the canonical, but I can do searches and find that exact URL in the index. Any ideas what this could be from?
-
Hi Gaston,
Thanks for the response. I can confirm that the example, /search and /search?q=foo are pretty much identical. However that may not always be the case, only when a user searches for something that would return no results. So, a website that sells widgets, /search and /search?q=widgets would not be identical, and in that case it would make sense that Google would not honor the canonical link. What's really strange is if I search google for the site: operator of the domain, the top pages are not user queries for things that make sense. The top indexed pages are random, non-relevant user searches.
I do not have a way with this system to control noindex tags on these search result pages. The only thing I could do is take the nuclear option and just block it all with robots.txt using wildcards. But that means no search result pages would get indexed, relevant or not.
-
Hi there,
in my experience, when google doesn't honor Canonicals, is because pages arent similar.
In its definition, canonical are there for two or more pages that have the same content.If you are finding it problematic, i'd suggest to use noindex tags for that search pages.
I'd investigate If there are links pointing to those internal search pages, as its not common for google to discover search pages.Hope it helps,
Best luck.
Gaston
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can you help by advising how to stop a URL from referring to another URL on my website please?
Stopping a redirect from one URL to another due to a 404 error? Referred URL which is (https://webwritinglab.com/know-exactly-what-your-ideal-clients-want-in-8-easy-steps/%5Bnull%20id=43484%5D) Referring URL (https://webwritinglab.com/know-exactly-what-your-ideal-clients-want-in-8-easy-steps/)
Technical SEO | | Nichole.wynter20200 -
Will canonical solve this?
Hi all, I look after a website which sells a range of products. Each of these products has different applications, so each product has a different product page. For eg. Product one for x application Product one for y application Product one for z application Each variation page has its own URL as if it is a page of its own. The text on each of the pages is slightly different depending on the application, but generally very similar. If I were to have a generic page for product one, and add canonical tags to all the variation pages pointing to this generic page, would that solve the duplicate content issue? Thanks in advance, Ethan
Technical SEO | | Analoxltd0 -
Canonical URL
Hi there Our website www.snowbusiness.com has a non www version and this one has 398 backlinks. What is the best way of transfering this link value if i establish the www. address as the canonical URL? Thanks, Ben
Technical SEO | | SnowFX0 -
Google Sitemap - How Long Does it Take Google To Index?
We have changed our sitemap about 1 month ago and Google is yet to index it. We have run a site: search and we still have many pages indexed but we are wondering how long does it take for google to index our sitemap? The last sitemap we put up had thousands of pages indexed within a fortnight, but for some reason this version is taking way longer. We are also confident that there are no errors in this version. Help!
Technical SEO | | JamesDFA0 -
Duplicate pages in Google index despite canonical tag and URL Parameter in GWMT
Good morning Moz... This is a weird one. It seems to be a "bug" with Google, honest... We migrated our site www.three-clearance.co.uk to a Drupal platform over the new year. The old site used URL-based tracking for heat map purposes, so for instance www.three-clearance.co.uk/apple-phones.html ..could be reached via www.three-clearance.co.uk/apple-phones.html?ref=menu or www.three-clearance.co.uk/apple-phones.html?ref=sidebar and so on. GWMT was told of the ref parameter and the canonical meta tag used to indicate our preference. As expected we encountered no duplicate content issues and everything was good. This is the chain of events: Site migrated to new platform following best practice, as far as I can attest to. Only known issue was that the verification for both google analytics (meta tag) and GWMT (HTML file) didn't transfer as expected so between relaunch on the 22nd Dec and the fix on 2nd Jan we have no GA data, and presumably there was a period where GWMT became unverified. URL structure and URIs were maintained 100% (which may be a problem, now) Yesterday I discovered 200-ish 'duplicate meta titles' and 'duplicate meta descriptions' in GWMT. Uh oh, thought I. Expand the report out and the duplicates are in fact ?ref= versions of the same root URL. Double uh oh, thought I. Run, not walk, to google and do some Fu: http://is.gd/yJ3U24 (9 versions of the same page, in the index, the only variation being the ?ref= URI) Checked BING and it has indexed each root URL once, as it should. Situation now: Site no longer uses ?ref= parameter, although of course there still exists some external backlinks that use it. This was intentional and happened when we migrated. I 'reset' the URL parameter in GWMT yesterday, given that there's no "delete" option. The "URLs monitored" count went from 900 to 0, but today is at over 1,000 (another wtf moment) I also resubmitted the XML sitemap and fetched 5 'hub' pages as Google, including the homepage and HTML site-map page. The ?ref= URls in the index have the disadvantage of actually working, given that we transferred the URL structure and of course the webserver just ignores the nonsense arguments and serves the page. So I assume Google assumes the pages still exist, and won't drop them from the index but will instead apply a dupe content penalty. Or maybe call us a spam farm. Who knows. Options that occurred to me (other than maybe making our canonical tags bold or locating a Google bug submission form 😄 ) include A) robots.txt-ing .?ref=. but to me this says "you can't see these pages", not "these pages don't exist", so isn't correct B) Hand-removing the URLs from the index through a page removal request per indexed URL C) Apply 301 to each indexed URL (hello BING dirty sitemap penalty) D) Post on SEOMoz because I genuinely can't understand this. Even if the gap in verification caused GWMT to forget that we had set ?ref= as a URL parameter, the parameter was no longer in use because the verification only went missing when we relaunched the site without this tracking. Google is seemingly 100% ignoring our canonical tags as well as the GWMT URL setting - I have no idea why and can't think of the best way to correct the situation. Do you? 🙂 Edited To Add: As of this morning the "edit/reset" buttons have disappeared from GWMT URL Parameters page, along with the option to add a new one. There's no messages explaining why and of course the Google help page doesn't mention disappearing buttons (it doesn't even explain what 'reset' does, or why there's no 'remove' option).
Technical SEO | | Tinhat0 -
Canonical Question
Can someone please help me with a question, I am learning about Canonical URls at the moment and have had some errors come up, it is saying ```![Priority 1](http://try.powermapper.com/Reports/89db420a-2cf2-46dc-bae4-543efbefc241/report/Report/p1.png)This page has multiple rel=canonical tags.Line 9 Best Practice[![](http://try.powermapper.com/Reports/89db420a-2cf2-46dc-bae4-543efbefc241/report/Report/dropbox.png)](http://try.powermapper.com/Reports/89db420a-2cf2-46dc-bae4-543efbefc241/report/res/2.view.htm#)![Help](http://try.powermapper.com/Reports/89db420a-2cf2-46dc-bae4-543efbefc241/report/Report/help.png)Search engine behavior is unpredictable when a page has multiple canonical tags. <link rel="canonical" href="http://www.finalduties.co.uk/" /><link rel="alternate" type="application/rss+xml" title="Final Duties – Low cost probate RSS Feed" href="http://www.finalduties.co.uk/feed/" /> <link rel="alternate" type="application/atom+xml" title="Final Duties – Low cost probate Atom Feed" href="http://www.finalduties.co.uk/feed/atom/" /><link rel="pingback" href="http://www.finalduties.co.uk/xmlrpc.php" />That canonical link to Feed? should that be there, I know the Plugin has done this but I am lost to what should be there, I have no duplicate pages as far as I am aware than needs a canonical URL ??Thanks ``` >
Technical SEO | | Chris__Chris0 -
Google is indexing my directories
I'm sure this has been asked before, but I was looking at all of Google's results for my site and I found dozens of results for directories such as: Index of /scouting/blog/wp-includes/js/swfupload/plugins Obviously I don't want those indexed. How do I prevent Google from indexing those? Also, it only seems to be doing it with Wordpress, not any of the directories on my main site. (We have a wordpress blog, which is only a portion of the site)
Technical SEO | | UnderRugSwept0 -
Google.com
Hi We are managing a .com site for a client working on getting the site ranking. The site is hosted in the US. The content is rich, deep and unique. The site is in a competitive market but had begun ranking top 50 for a selection of keywords and we could see many more in the top 100. The site is now going backwards and only has a few keywords ranking top 50 and all the others have disappeared from the rankings all together. Any thought as to what could cause this. The site is managed from the Uk but as mentioned is hosted in the US. No penguin issues as all content unique, rich, relevant and fresh. SEO is also managed from the UK. Thoughts
Technical SEO | | SEOwins0