Any SEO-wizards out there who can tell me why Google isn't following the canonicals on some pages?
-
Hi,
I am banging my head against the wall regarding the website of a costumer: In "duplicate title tags" in GSC I can see that Google is indexing a whole bunch parametres of many of the url's on the page. When I check the rel=canonical tag, everything seems correct. My costumer is the biggest sports retailer in Norway. Their webshop has approximately 20 000 products. Yet they have more than 400 000 pages indexed by Google.
So why is Google indexing pages like this? What is missing in this canonical?https://www.gsport.no/herre/klaer/bukse-shorts?type-bukser-334=regnbukser&order=price&dir=descWhy isn't Google just cutting off the ?type-bukser-334=regnbukser&order=price&dir=desc part of the url?Can it be the canonical-tag itself, or could the problem be somewhere in the CMS?
Looking forward to your answers
- Sigurd
-
Thank you all! I have forwarded this to the owner of the page, so now we'll just sit back and see the effects
-
Hi Inevo,
David and Jake's comments and recommendations are spot on correct. You need to update your robots.txt file. Jake is correct when he said "just because a canonical tag is in place, that doesn't prevent Google from crawling and indexing the page."
Sincerely,
Dana
-
Hi Inevo,
Canonical tags are being used correctly and it doesn't actually look like any of the URLs with query strings are indexed in Google.
I'm going to go off the topic of canonicals now, but still related to the crawl and index of the site:
Has the site changed CMS in the last year or two? It's possible that some of the 400k URLs indexed are old or were not canonicalized properly at some point in time, so they were indexed.
The problem with how the site it currently setup is that it is basically impossible for search engines to crawl because of the product filter. I wrote an article about this a while ago (link), specifically to do with product filters in Magento. Product filters can turn your site into a 'black hole' for search engines - which is definitely happening in this case (try crawling it with Screaming Frog).
I'd recommend blocking product filter URLs from being crawled so that search engines are only crawling important pages on the site.
You should be able to fix this be adding these 3 lines to your Robots.txt:
Disallow: *?
Disallow: *+
Allow: *?p=(Note: please check that you don't need to add more parameters to Allow)
These changes will make crawling your site much more efficient - from millions of crawlable URLs, to probably 30-35k.
Let me know how this goes for you
Cheers,
David
-
I would definitely check to make sure the canonical tag is being properly used. Make sure it is an absolute url vs. a relative url.
That being said, please note that just because a canonical tag is in place, that doesn't prevent Google from crawling and indexing the page, and including the page in search results with the site:domain command. If you see the canonicalized URLs outranking their canonical, then you can start to question why Google isn't honoring the canonical.
Please note that canonical tags are a recommendation and not a directive.. meaning Google doesn't have to honor them if they do not feel the page is truly a canonical.
-Jake
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Not Indexing Pages (Wordpress)
Hello, recently I started noticing that google is not indexing our new pages or our new blog posts. We are simply getting a "Discovered - Currently Not Indexed" message on all new pages. When I click "Request Indexing" is takes a few days, but eventually it does get indexed and is on Google. This is very strange, as our website has been around since the late 90's and the quality of the new content is neither duplicate nor "low quality". We started noticing this happening around February. We also do not have many pages - maybe 500 maximum? I have looked at all the obvious answers (allowing for indexing, etc.), but just can't seem to pinpoint a reason why. Has anyone had this happen recently? It is getting very annoying having to manually go in and request indexing for every page and makes me think there may be some underlying issues with the website that should be fixed.
Technical SEO | | Hasanovic1 -
Redirects and site map isn't showing
We had a malware hack and spent 3 days trying to get Bluehost to fix things. Since they have made changes 2 things are happening: 1. Our .xml sitemap cannot be created https://www.caffeinemarketing.co.uk/sitmap.xml we have tried external tools 2. We had 301 redirects from the http (www and non www versions) nad the https;// (non www version) throughout the whole website to https://www.caffeinemarketing.co.uk/ and subsequent pages Whilst the redirects seem to be happening, when you go into the tools such as https://httpstatus.io every version of every page is a 200 code only whereas before ther were showing the 301 redirects Have Bluehost messed things up? Hope you can help thanks
Technical SEO | | Caffeine_Marketing0 -
Rel=Canonical for filter pages
Hi folks, I have a bit of a dilemma that I'd appreciate some advice on. We'll just use the solid wood flooring of our website as an example in this case. We use the rel=canonical tag on the solid wood flooring listings pages where the listings get sorted alphabetically, by price etc.
Technical SEO | | LukeyB30
e.g. http://www.kensyard.co.uk/products/category/solid-wood-flooring/?orderBy=highestprice uses the canonical tag to point to http://www.kensyard.co.uk/products/category/solid-wood-flooring/ as the main page. However, we also uses filters on our site which allows users to filter their search by more specific product features e.g.
http://www.kensyard.co.uk/products/category/solid-wood-flooring/f/18mm/
http://www.kensyard.co.uk/products/category/solid-wood-flooring/f/natural-lacquered/ We don't use the canonical tag on these pages because they are great long-tail keyword targeted pages so I want them to rank for phrases like "18mm solid wood flooring". But, in not using the canonical tag, I'm finding google is getting confused and ranking the wrong page as the filters mean there is a huge number of possible URLs for a given list of products. For example, Google ranks this page for the phrase "18mm solid wood flooring" http://www.kensyard.co.uk/products/category/solid-wood-flooring/f/18mm,116mm/ This is no good. This is a combination of two filters and so the listings are very refined, so if someone types the above phrase into Google and lands on this page their first reaction will be "there are not many products here". Google should be ranking the page with only the 18mm filter applied: http://www.kensyard.co.uk/products/category/solid-wood-flooring/f/18mm How would you recommend I go about rectifying this situation?
Thanks, Luke0 -
Data Highlighter doesn't show page
We have an event related website http://www.sbo.nl so i wanted to use data highlighter because most of our event pages are the same. But data highlighter doesn't show those pages, I will see only an empty page. For example http://www.sbo.nl/veiligheid/brandveiligheid-gebouwen/ Does someone of you understand what is going on. Data highlighter does show the homepage. I am thinking it is maybe because of tabbed browsing, or the chat function in that page. Hope someone can help. You can see a screenshot of Data Highlighter http://www.clipular.com/c?6659030=2X2gQv4O8_9RzcZ1Hk_7xGtCPYo&f=d40975c80bdd11dc357f050cafa73a80 I hope someone can help because i am lost 🙂 Cheers Ruud
Technical SEO | | RuudHeijnen0 -
Google insists robots.txt is blocking... but it isn't.
I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch?
Technical SEO | | ahockley0 -
Best way to handle indexed pages you don't want indexed
We've had a lot of pages indexed by google which we didn't want indexed. They relate to a ajax category filter module that works ok for front end customers but under the bonnet google has been following all of the links. I've put a rule in the robots.txt file to stop google from following any dynamic pages (with a ?) and also any ajax pages but the pages are still indexed on google. At the moment there is over 5000 pages which have been indexed which I don't want on there and I'm worried is causing issues with my rankings. Would a redirect rule work or could someone offer any advice? https://www.google.co.uk/search?q=site:outdoormegastore.co.uk+inurl:default&num=100&hl=en&safe=off&prmd=imvnsl&filter=0&biw=1600&bih=809#hl=en&safe=off&sclient=psy-ab&q=site:outdoormegastore.co.uk+inurl%3Aajax&oq=site:outdoormegastore.co.uk+inurl%3Aajax&gs_l=serp.3...194108.194626.0.194891.4.4.0.0.0.0.100.305.3j1.4.0.les%3B..0.0...1c.1.SDhuslImrLY&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.&fp=ff301ef4d48490c5&biw=1920&bih=860
Technical SEO | | gavinhoman0 -
We changed the URL structure 10 weeks ago and Google hasn't indexed it yet...
We recently modified the whole URL structure on our website, which resulted in huge amount of 404 pages changing them to nice human readable urls. We did this in the middle of March - about 10 weeks ago... We used to have around 5000 404 pages in the beginning, but this number is decreasing slowly. (We have around 3000 now). On some parts of the website we have also set up a 301 redirect from the old URLs to the new ones, to avoid showing a 404 page thus making the “indexing transmission”, but it doesn’t seem to have made any difference. We've lost a significant amount of traffic, because of the URL changes, as Google removed the old URLs, but hasn’t indexed our new URLs yet. Is there anything else we can do to get our website indexed with the new URL structure quicker? It might also be useful to know that we are a page rank 4 and have over 30,000 unique users a month so I am sure Google often comes to the site quite often and pages we have made since then that only have the new url structure are indexed within hours sometimes they appear in search the next day!
Technical SEO | | jack860 -
How can I get Google to crawl my site daily?
I was wndering if there was a trick to getting google to crawl my website daily?
Technical SEO | | labradoodlelocator0