Large site with faceted navigation using rel=canonical, but Google still has issues
-
First off, I just wanted to mention I did post this on one other forum so I hope that is not completely against the rules here or anything. Just trying to get an idea from some of the pros at both sources. Hope this is received well. Now for the question.....
"Googlebot found an extremely high number of URLs on your site:"
Gotta love these messages in GWT. Anyway, I wanted to get some other opinions here so if anyone has experienced something similar or has any recommendations I would love to hear them.
First off, the site is very large and utilizes faceted navigation to help visitors sift through results. I have implemented rel=canonical for many months now to have each page url that is created based on the faceted nav filters, push back to the main category page. However, I still get these damn messages from Google every month or so saying that they found too many pages on the site. My main concern obviously is wasting crawler time on all these pages that I am trying to do what they ask in these instances and tell them to ignore and find the content on page x.
So at this point I am thinking about possibly using robots.txt file to handle these, but wanted to see what others around here thought before I dive into this arduous task. Plus I am a little ticked off that Google is not following a standard they helped bring to the table.
Thanks for those who take the time to respond in advance.
-
Yes that's a different situation. You're now talking about pagination, which quite rightly, canonicals to parent page is not to be used.
For faceted/filtered navigation it seems like canonical usage is indeed the right way to go about it, given Peter's experience just mentioned above, and the article you linked to that says, "...(in part because Google only indexes the content on the canonical page, so any content from the rest of the pages in the series would be ignored)."
-
As for my situation it worked out quite nicely, I just wasn't patient enough. After about 2 months the issue corrected itself for the most part and I was able to reduce about a million "waste" pages out of the index. This is a very large site so losing a million pages in a handful of categories helped me gain in a whole lot of other areas and spread the crawler around to more places that were important for us.
I also spent some time doing some restructuring of internal linking from some of our more authoritative pages that I believe also assisted with this, but in my case rel="canonical" worked out pretty nicely. Just took some time and patience.
-
I should actually add that Google doesn't condone using rel-canonical back to the main search page or page 1. They allow canonical to a "View All" or a complex mix of rel-canonical and rel=prev/next. If you use rel-canonical on too many non-identical pages, they could ignore it (although I don't often find that to be true).
Vanessa Fox just did a write-up on Google's approach:
http://searchengineland.com/implementing-pagination-attributes-correctly-for-google-114970
I have to be honest, though - I'm not a fan of Google's approach. It's incredibly complicated, easy to screw up, doesn't seem to work in all cases, and doesn't work on Bing. This is a very complex issue and really depends on the site in question. Adam Audette did a good write-up:
http://searchengineland.com/five-step-strategy-for-solving-seo-pagination-problems-95494
-
Thanks Dr Pete,
Yes I've used meta no-index on pages that are simply not useful in any way shape or form for Google to find.
I would be hesitant noindexing my filters in question, but it sounds promising that you are backing the canonical approach and there is a latency on reporting. Our PA and DA is extremely high and we get crawled daily, so curious about your measurement tip (inurl) which is a good one!
Many thanks.
Simon
-
I'm working on a couple of cases now, and it is extremely tricky. Google often doesn't re-crawl/re-cache deeper pages for weeks or months, so getting the canonical to work can be a long process. Still, it is generally a very effective tag and can happen quickly.
I agree with others that Robots.txt isn't a good bet. It also tends to work badly with pages that are already indexed. It's good for keeping things out of the index (especially whole folders, for example), but once 1000s of pages are indexed, Robots.txt often won't clean them up.
Another option is META NOINDEX, but it depends on the nature of the facets.
A couple of things to check:
(1) Using site: with inurl:, monitor the faceted navigation pages in the Google index. Are the numbers gradually dropping? That's what you want to see - the GWT error may not update very often. Keep in mind that these numbers can be unreliable, so monitor them daily over a few weeks.
(2) Are there are other URLs you're missing? On a large, e-commerce site, it's entirely possibly this wasn't the only problem.
(3) Did you cut the crawl paths? A common problem is that people canonical, 301-redirect, or NOINDEX, but then nofollow or otherwise cut links to those duplicates. Sounds like a good idea, except that the canonical tag has to be crawled to work. I see this a lot, actually.
-
Did you find a solution for this? I have exactly the same issue and have implemented the rel canonical in exactly the same way.
The issue you are trying to address is improving crawl bandwidth/equity by not letting Google crawl these faceted pages.
I am thinking of Ajax loading in these pages to the parent category page and/or adding nofollow to the links. But the pages have already been indexed, so I wonder if nofollow will have any effect.
Have you had any progress? Any further ideas?
-
Because rel canonical does nothing more than give credit to teh chosen page and aviod duplicat content. it does not tell the SE to stop indexing or redirect. as far as finding the links it has no affect
-
thx
-
OK, sorry I was thinking too many pages, not links.
using no-index will not stop PR flowing, the search engine will still follow the links. -
Yeah that is why I am not real excited about using robots.txt or even a no index in this instance. They are not session ids, but more like:
www.example.com/catgeoryname/a,
www.example.com/catgeoryname/b
www.example.com/catgeoryname/c
etc
which would show all products that start with those letters. There are a lot of other filters too, such as color, size, etc, but the bottom line is I point all those back to just www.example.com/categoryname using rel canonical and am not understanding why it isn't working properly.
-
There are a large number of urls like this because of the way the faceted navigation works and I have considered no index, but somewhat concerned as we do get links to some of these urls and would like to maintain some of that link juice. The warning shows up in Google Webmaster tools when Googlebot finds a large number of urls. The rest of the message reads like this:
"Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site."
rel canonical should fix this, but apparently it is not
-
Check how you are getting these pages.
Robots.txt is not an ideal solution. If Google finds pages in other places, still these pages will be crawled.
Normally print pages won't have link value and you may no index them.
If there are pages with session ids or campaign codes, use canonical if they have link value. Otherwise no index will be good.
-
the rel canonical with stop you getting duplicate content flags, but there is still a large number of pages its not going to hide them.
I have never seen this warning, how many pages are we talking about?, either it is very very high, or they are confusing the crawler.You may need to no index them
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Still a good idea to have a niche store?
I have had a niche store since 2008. It has a higher conversion rate than my main site, and does fairly well. The site template is an exact copy of my main site, and all the product URLs are the same. The only difference is, it only has items from a certain category. My questions is, are niche store still a good idea? Will I be better of doing a page to page 301 to my main site, and just focus on a single site? Am I better of as far as SEO with one main site? If anyone has experience with this, I would like to hear about your experience. Any thoughts are appreciated.
Algorithm Updates | | inhouseseo0 -
Am I the only one experiencing this Google SERP problem?
I perform Google searches every single day, sometimes several times in a day. These searches have nothing to do with being a marketer--they're simply as a consumer, researcher, person who needs a question answered, or in other words: a typical person. For about the past month or so, I have been unsuccessful at finding what I'm looking for on the first try EVERY SINGLE TIME. Yes, I mean it--every single time. I'm left either going all the way to the third page, clicking dozens of results and retuning to the SERPs, or having to start over with a differently worded query. This is far too often to be a coincidence. Has this been happening to anymore else? I know there was a recent significant algorithm update, right? I always look at algorithm updates through the eyes of an SEO, but I'm currently looking at it through the eyes for an average searcher, and I'm frustrated! It's been like trying to find something on Bing!
Algorithm Updates | | UnderRugSwept0 -
Why is Google changing my title tags?
I have a few sites set up this way with their title tags: "Keyword rich phrase(s) | Company name" and Google is showing more and more of them like this in the SERPs - "Company name: Keyword rich phrase(s)" I don't see this happening to many other sites...am I hallucinating or what's going on here? Is this happening to anyone else? I don't see it necessarily affecting rankings, but for my sites with little brand recognition I want those keywords first. Bueller? Bueller?
Algorithm Updates | | NetvantageMarketing0 -
Should I use canonical tags on my site?
I'm trying to keep this a generic example, so apologies if this is too vague. On my main website, we've always had a duplicate content issue. The main focus of our site is breaking down to specific, brick and mortar locations. We have to duplicate the description of product/service for every geographic location (this is a legal requirement). So for example, you might have the parent "product/service" page targeting the term, and then 100's of sub pages with "product/service San Francisco", "product/service Austin", etc. These pages have identical content except for the geographic location is dynamically swapped out. There is also additional useful content like google map of area, local resources, etc. As I said this was always seen as an SEO issue, specifically you could see in the way that googlebot would crawl pages and how pagerank flowed through the site that having 100's of pages with identical copy and just swapping out the geographic location wasn't seen as good content, however we still always received traffic and conversions for the long tail geographic terms so we left it. Las year, with Panda, we noticed a drop in traffic and thought it was due to this duplicate issue so I added canonical tags to all our geographic specific product/service pages that pointed back to the parent page, that seemed to be received well by google and traffic was back to normal in short order. However, recently what I notice a LOT in our SERP pages is if I type in a geographic specific term, i.e. "product/service san francisco", our deep page with the canonical tag is what google is ranking. Google inserts its own title tag on the SERP page and leaves the description blank as it doesn't index the page due to the canonical tag on the page. Essentially what I think it is rewarding is the site architecture which organizes the content to the specific geo in the URL: site.com/service/location/san-francisco. Other than that there is no reason for it to rank that page. Sorry if this is lengthy, thanks for reading all of that! Essentially my question is, should I keep the canonical tags on the site or take them off since Google insists on ranking the page? If I am ranking already then the potential upside to doing that is ranking higher (we're usually in the 3-6 spot on the result page) and also higher CTR because we can get a description back on our resulting page. The counter argument is I'm already ranking so leave it and focus on other things. Appreciate your thoughts on this!
Algorithm Updates | | edu-SEO0 -
Google Places......
Hi guys, Anyone come across a problem with Google places and it impacting organic search results? I have a client the was ranking top 3 on Google for all local keywords (letting agent). They have set up Google Places and now appear with a nice Google places ad but the organic position has now disappeared....Gone down to page 8 and nothing. Any ideas? Ta
Algorithm Updates | | SEOwins0 -
Google Update on the 6th July
Hi Mozzers, Has anyone noticed a Google update on the 6th July? A price comparison site I optimise has fallen off the SERPs for most generic terms, however still getting traffic for longer tail phrases. Cheers Aran
Algorithm Updates | | Entrusteddev0 -
What are the good strategies using satellite sites in SEO??
Hello to everybody, We'are thinking about launching a massive amount of satellite websites in order to promote our website. Is it really efficient in terms of link building? Or is the ROI really small due to the amount of time and money needed to create and manage these websites? Thanks a lot!!! Update: Thanks to all of you for all these interesting answers!
Algorithm Updates | | sarenausa1 -
What are the differences between Google SEO and Bing SEO?
I came across this question on why the poster's rankings in Bing/Yahoo were so much lower than his rankings in Google. One of the links responded with was a presentation Rand gave about the difference in ranking elements of Google and Bing. My purpose for looking into this is to boost rankings in Bing to be more in line with my Google rankings. My takeaways from Rand's presentation were that Bing likes shorter URLs than Google and it's better to have more links from more root domains with more precise anchor text. Unfortunately this presentation was given at last year's SMX Advanced and is almost a year old. Since then Microsoft has been accused of basically scraping the Google SERPs and Google unleashed at least two maybe three rabid Pandas. Needless to say the environment has changed. So my question is for those people who are happy with how they rank in Bing: What SEO factors are you seeing make a bigger impact in Bing vs. how they impact your Google rankings?
Algorithm Updates | | rball11