What could cause Google to not honor canonical URLs?
-
I have a strange situation on a website, when I do a Google query of site:example.com all the top indexed results appear to be queries that users can perform on the website. So any random term the user searches for on the website for some reason is causing the search result page to get indexed - like example.com/search/query/random-keywords
However, the search results page has a canonical tag on it that points to example.com/search, but that doesn't seem to be doing anything. Any thoughts or ideas why this could be happening?
-
Hi there,
First of all, its a mistake to think that when searching with _site: _operator, the first results are the most important nor the more relevant. Google has said a few times that we shouldn't rely that much on what that search in terms of what's being shown.
Blocking search results with robots.txt wont be of help, as it will not remove already indexed pages and cant prevent for new pages to be indexed (if there's an external link to a robots.txt blocked page, google can still index it) it'll only prevent Googlebot from discovering new ones FROM YOUR SITE.
Again, i'd try to dig deeper to understand where are the links to internal searches that google is finding. Googlebot will not do any search in your site.
The thing with GSC, might be related to quite a few reasons. I cant say much because I don't know any more specifics, but from what you are telling me it looks like you are getting impressions in searches that you don't relate to your site and that land on pages that google is noindexing. Yeah im repeating the obvious, hehe.
In my experience, Google can have these strange behaviours. You know, there are cases when a page is canonicalized, but it can still be shown in SERPS. Dont ask me why, but it happens. It takes a little time to google fully replace it with the correct one.
I'd wait a little longer to see how Google is handling them.I don't know if im helping you.
it kinda took me a few minutes to understand/process what you wrote and come up with an answer.Please, feel free ask again or comment on my reply if I misunderstood something.
Best luck,
Gaston -
Hi here's some more background info on this situation that makes it even stranger. I can perform some pretty specific searches on Google where these indexed search result pages show up. And I can look in Google Search Console under the performance section and see that those pages receive impressions and clicks. However, if I inspect the URL, Search Console says it is not included in Google's index, and the reason it gives under indexing is because it says it is honoring the canonical URL. So search console is saying it isn't indexed because of the canonical, but I can do searches and find that exact URL in the index. Any ideas what this could be from?
-
Hi Gaston,
Thanks for the response. I can confirm that the example, /search and /search?q=foo are pretty much identical. However that may not always be the case, only when a user searches for something that would return no results. So, a website that sells widgets, /search and /search?q=widgets would not be identical, and in that case it would make sense that Google would not honor the canonical link. What's really strange is if I search google for the site: operator of the domain, the top pages are not user queries for things that make sense. The top indexed pages are random, non-relevant user searches.
I do not have a way with this system to control noindex tags on these search result pages. The only thing I could do is take the nuclear option and just block it all with robots.txt using wildcards. But that means no search result pages would get indexed, relevant or not.
-
Hi there,
in my experience, when google doesn't honor Canonicals, is because pages arent similar.
In its definition, canonical are there for two or more pages that have the same content.If you are finding it problematic, i'd suggest to use noindex tags for that search pages.
I'd investigate If there are links pointing to those internal search pages, as its not common for google to discover search pages.Hope it helps,
Best luck.
Gaston
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Weird Cigarette URLs showing up in Google Webmaster Tools
Hi there, I'm noticing a bunch of URLs showing up in my google webmaster tools that are all cigarette related (they are appearing as 404s in the crawl error report). They are throwing 404 errors which is why they are listed here... Anyone have any idea of what this could be? I recently switched from Wordpress to Shopify and these weird URLs just started appearing on my webmaster tools in the last week. Kinda bizarre / a little alarming! Thanks,
Technical SEO | | TheBatesMillStore
Bianca0 -
Hashtag in url seems to remove the google plus one
My site has a catalogue page (catalog in US) with #anchors so that customers can get straight to them. I even link from other pages to the #anchor on the catalogue page. so I have for example: www.example.co.uk/catalogue.htm www.example.co.uk/catalogue.htm#blueitems www.example.co.uk/catalogue.htm#redtems I understand google doesn't index after the #, here is the post I found: http://moz.com/community/q/hashtag-anchor-text-within-content#reply_91192 So I shouldn't have an seo problem. BUT, if I navigate to www.example.co.uk/catalogue.htm and plusone the page it will show the plusone and then I navigate to www.example.co.uk/catalogue.htm#blueitems the plus one is gone. The same happens in reverse, if i plusone www.example.co.uk/catalogue.htm#redtems, then that plusone doesn't show in www.example.co.uk/catalogue.htm. I added rel=canonical and that fixed the plusone problem, now if you plus one /catalogue.htm#redtems it still shows on catalogue.htm This seems a bit extreme and did I do the right things??
Technical SEO | | Peter24680 -
I was googling the word "best web hosting" and i notice the 1st and 3rd result were results with google plus. Does Google plus now play a role in improving ranking for the website?
I was googling the word "best web hosting" and i notice the 1st and 3rd result were results with google plus. Does Google plus now play a role in improving ranking for the website?I see a person's name next to the website too
Technical SEO | | mainguy0 -
Will rel=canonical cause a page to be indexed?
Say I have 2 pages with duplicate content: One of them is: http://www.originalsite.com/originalpage This page is the one I want to be indexed on google (domain rank already built, etc.) http://www.originalpage.com is more of an ease of use domain, primarily for printed material. If both of these sites are identical, will rel=canonical pointing to "http://www.originalsite.com/originalpage" cause it to be indexed? I do not plan on having any links on my site going to "http://www.originalsite.com/originalpage", they would instead go to "http://www.originalpage.com".
Technical SEO | | jgower0 -
Canonical
I am seeing canonical implementation in many sites for non identical pages. Google honoring these implementation and didn't have any issue. Did anyone have different experience? Thanks.
Technical SEO | | gmk15670 -
Home page URL disappears in Google after switching to WordPress
It was a 10 page static HTML page website. 3 year old, PR2. Monday night, copied a WordPress from somewhere to this website's public_html folder and activate it. The home page was "index.html" before switching to WordPress. Now this html file (index.html) has been deleted, so WordPress' Home page can work. All other 9 static html pages are still there in Google index. Just notice it today that the home page URL disappears in Google completely. Why? All other 9 static html pages' URL are still in Google. robots.txt is Allow: / What may have gone wrong to remove the home domain URL from Google index? Thank you for your help!
Technical SEO | | johnzhel0 -
Google causing Magento Errors
I have an online shop - run using Magento. I have recently upgraded to version 1.4, and I installed a extension called Lightspeed, a caching module which makes tremendous improvements to Magento's performance. Unfortunately, a confoguration problem, meant that I had to disable the module, because it was generating errors relating to the session, if you entered the site from any page other than the home page. The site is now working as expected. I have Magento's error notification set to email - I've not received emails for errors generated by visitors. However over a 72 hour period, I received a deluge of error emails, which where being caused by Googlebot. It was generating an erro in a file called lightspeed.php Here is an example: URL: http://www.jacksgardenstore.com/tahiti-vulcano-hammock IP Address: 66.249.66.186 Time: 2011-06-11 17:02:26 GMT Error: Cannot send headers; headers already sent in /home/jack/jacksgardenstore.com/user/jack_1.4/htdocs/lightspeed.php, line 444 So several things of note: I deleted lightspeed.php from the server, before any of these error messages began to arrive. lightspeed.php was never exposed in the URL, at anytime. It was referred to in a mod_rewrite rule in .htaccess, which I also commented out. If you clicked on the URL in the error message, it loaded in the browser as expected, with no error messages. It appears that Google has cached a version of the page which briefly existed whilst Lightspeed was enabled. But I though that Google cached generated HTML. Since when does cache a server-side PHP file ???? I've just used the Fetch as Googlebot facility on Webmaster Tools for the URL in the above error message, and it returns the page as expected. No errors. I've had to errors at all in the last 48 hours, so I'm hoping it's just sorted itself out. However I'm concerned about any Google related implications. Any insights would be greatly appreciated. Thanks Ben
Technical SEO | | atticus70