Is there a limit to the number of duplicate pages pointing to a rel='canonical ' primary?
-
We have a situation on twiends where a number of our 'dead' user pages have generated links for us over the years. Our options are to 404 them, 301 them to the home page, or just serve back the home page with a canonical tag.
We've been 404'ing them for years, but i understand that we lose all the link juice from doing this. Correct me if I'm wrong?
Our next plan would be to 301 them to the home page. Probably the best solution but our concern is if a user page is only temporarily down (under review, etc) it could be permanently removed from the index, or at least cached for a very long time.
A final plan is to just serve back the home page on the old URL, with a canonical tag pointing to the home page URL. This is quick, retains most of the link juice, and allows the URL to become active again in future. The problem is that there could be 100,000's of these.
Q1) Is it a problem to have 100,000 URLs pointing to a primary with a rel=canonical tag? (Problem for Google?)
Q2) How long does it take a canonical duplicate page to become unique in the index again if the tag is removed? Will google recrawl it and add it back into the index? Do we need to use WMT to speed this process up?
Thanks
-
I'll add this article by Rand that I came across too. I'm busy testing the solution presented in it:
https://moz.com/blog/are-404-pages-always-bad-for-seo
In summary, 404 all dead pages with a good custom 404 page so as to not waste crawl bandwidth. Then selectively 301 those dead pages that have accrued some good link value.
Thanks Donna/Tammy for pointing me in this direction..
-
In this scenario yes, a customized 404 page with a link to a few top level ( useful) links would be better served to both the user and to Google. From a strictly SEO standpoint, 100,000 redirects and or canonical tags would not benefit your SEO.
-
Thanks Donna, good points..
We return a hard 404, so it's treated correctly by google. We are just looking at this from a SEO point of view now to see if there's any way to reclaim this lost link juice.
Your point about looking at the value of those incoming links is a good one. I suppose it's not worth making google crawl 100,000 more pages for the sake of a few links. We've just starting seeing these pop up in Moz Analytics as link opportunities, and we can see them as 404's in site explorer too. There are a few hundred of these incoming links that point to a 404, so we feel this could have an impact.
I suppose we could selectively 301 any higher value links to the home page.. It will be an administrative nightmare, but doable..
How do others tackle this problem. Does everyone just hard 404 a page when that loses the link juice for incoming links to it..?
Thanks
-
Hi David,
When you say "we've been 404'ing them for years", does that mean you've created a custom 404 page that explains the situation to site visitors or does it mean you've been letting them naturally error and return the appropriate 404 (page not found) error to Google? It makes a difference. If the pages truly no longer exist and there is no equivalent replacement, you should be letting them naturally error (return a 404 return code) so as not to mislead Google's robots and site visitors.
Have you looked at the value of those incoming links? They may be low value anyway. There may be more valuable things you could be doing with your time and budget.
To answer your specific questions:
_Q1) Is it a problem to have 100,000 URLs pointing to a primary with a rel=canonical tag? (Problem for Google?) _
Yes, if those pages (or valuable replacements) don't actually exist. You'd be wasting valuable crawl budget. This looks like it might be especially true in your case given the size of your site. Check out this article. I think you might find it very helpful. It's an explanation of soft 404 errors and what you should do about them.
Q2) How long does it take a canonical duplicate page to become unique in the index again if the tag is removed? Will google recrawl it and add it back into the index? Do we need to use WMT to speed this process up?
If the canonical tag is changed or removed, Google will find and reindex it next time it crawls your site (assuming you don't run out of crawl budget). You don't need to use WMT unless you're impatient and want to try to speed the process up.
-
Thanks Sandi, I did.. It's a great article and it answered many questions for me, but i couldn't really get clarity on my last two questions above..
-
Hey David
Check this MOZ Blog post about Rel=Canlonical appropriately named Rel=Confused?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonicals
I dynamically generated pages using rewrite functions in wordpress (new-york, san-diego, san-francisco). All these pages look the same but with different content. yoast (seo wordpress plugin) was unaware of this and set canonicals up relating to the wordpress page used as the template page for the dynamic pages (City_home_page). so all these pages had the canonical https://dinnerdancecruises.com/City_Home_Page. using search console, i saw google indexed my site, looked at all these dynamically created pages (which is about 30 pages) and took them all in as duplicate pages. this happen sometime in april or may. I fixed this problem and set unique canonicals up for each dynamically created page. but now google is not crawling them for some reason. im not sure why its been months and these pages are not indexed. i thought to myself is it because these links end up on multiple pages? sort of like having "terms of agreement" link at the footer. every single page has that terms of agreement link. does google look at those links as duplicates and not index the page at all. this is where my issue lies. i need google to crawl regularly and see those pages with their new, unique canonicals and re-index them correctly. but it seems to save cpu resources, google feels once a thief always a thief. i could be wrong but this is why i need your suggestion. thank you.
On-Page Optimization | | bobperez7360950 -
Should I be worried about our 'Duplicate' content
Hi guys... I've just been working through some issues to give our site a little cleanup. I'm working through our duplicate content issues (we have some legitimate duplicate pages that need removing, and some of our dynamic content is problematic. Are web developers are going to sort with canonical tags this week.) However... There are some pages that are actually different products, but are very similar pages that are 'triggering' MOZ to say we have duplicate pages. Here an example... http://www.toaddiaries.co.uk/filofax-refills/filo-12-month-inserts-personal-size/fortnight-view-filofax-personal and http://www.toaddiaries.co.uk/filofax-refills/filo-12-month-inserts-personal-size/week-to-a-view-filofax-personal They are very similar refill products, it's just the diary format is different. Question: Should I be worried about this? I've never seen our rankings change in the past when 'cleaning up' duplicate content. What do you guys think? Isaac.
On-Page Optimization | | isaac6630 -
Pages with near duplicate content
Hi Mozzers, I need your opinion on the following. Imagine that we have a product X (brand Sony for example), so if we sell parts for different models of items of this product X, we then have numerous product pages with model number. Sony camera parts for Sony Camera XYZ parts for Sony Camera XY etc. So the thing is that these pages are very very similar, like 90% duplicate and they do duplicate pages for Panasonic, Canon let's say with small tweaks in content. I know that those are duplicates and I would experiment removing a category for one brand only (least seached for), but at the same time I cannot remove for the rest as they convert a lot, being close to the search query of the customer (customer looks for parts for Sony XYZ, lands on the page and buys, insteading of staying on a page for Sony parts where should additionally browse for model number). What would you advise to make as unique as possible these pages, I am thinking about: change page titles. meta descriptions tweak the content as much as I can (very difficult, there is nothing fancy or different in those :(() i will start with top top pages that really drive traffic first and see how it goes. I will remove least visited pages and prominently put the model number in Sony parts page to see how it goes in terms of organic and most importantly conversions Any other ideas? I am really concerned about dupes and a penalty, but I try to think of solutions in order not to kill conversions at this point. Have a lovely Monday
On-Page Optimization | | SammyT0 -
Recommendation: Add a canonical URL tag referencing this URL to the header of the page.
Please clarify: In the page optimization tool, seomoz recommends using the canonical url tag on the unique page itself. Is it the same canonical url tag used when want juice to go to the original page? Although the canonical URL tag is generally thought of as a way to solve duplicate content problems, it can be extremely wise to use it on every (unique) page of a site to help prevent any query strings, session IDs, scraped versions, licensing deals or future developments to potentially create a secondary version and pull link juice or other metrics away from the original. We believe the canonical URL tag is a best practice to help prevent future problems, even if nothing is specifically duplicate/problematic today. Please give example.
On-Page Optimization | | AllIsWell0 -
Different pages for OS's vs 1 Page with Dynamic Content (user agent), what's the right approach?
We are creating a new homepage and the product are at different stages of development for different OS's. The value prop/messaging/some target keywords will be different for the various OS's for that reason. Question is, for SEO reasons, is it better to separate them into different pages or use 1 page and flip different content in based on the user agent?
On-Page Optimization | | JoeLin0 -
Image URL's have knocked my sub-pages down (WP)
I had most of my keywords within the top 10 for this site, some were even ranking in the top 5. For a possible minor boost, more-so to cover all the bases, I decided to add images to all of the pages, and they were uploaded as a gallery with most of the image file names being the same as the keyword. Thus, url's were created with our targeted phrases, extending off of the corresponding sub-page. After that, Google quickly picked up the url's to the images and began indexing them, when that occurred the sub-page which was to be the landing page, quickly tanked. Nothing else on-site changed besides the uploading of the images, so I'm sure they're conflicting and for whatever reason Google can't decide which page to index. The page that contains the images used, or the actual intended landing page. With WP I didn't see a way to not have them link to anything at all, and just be static when using a gallery, stock at least. So, my question is how can I quickly alleviate this problem and what should I do in the future to avoid this? I believe if I change link thumbnails to image file instead of attachment page, that should fix the issue... Then, I'll have dead URL's which I suppose I should 301 to the sub-page. Alternatively, is there a better solution that will work, I was also thinking about no-indexing the attachment URL's, but that doesn't seem to be an option.
On-Page Optimization | | JayAdams320 -
Canonical URL's - Fixed but still negatively impacted
I recently noticed that our canonical url's were not set up correctly. The incorrect setup predates me but it could have been in place for close to a year, maybe a bit more. Each of the url's had a "sortby" parameter on all of them. I had our platform provider make the fix and now everything is as it should be. I do see issues caused by this in Google Webmaster, for instance in the HTML suggestions it's telling me that pages have duplicate title tags when in fact this is the same page but with a variety of url parameters at the end of the url. To me this just highlights that there is a problem and we are being negatively impacted by the previous implementation. My question is has anyone been in this situation? Is there any way to flush this out or push Google to relook at this? Or is this a sit and be patient situation. I'm also slightly curious if Google will at some point look and see that the canonical urls were changed and then throw up a red flag even though they are finally the way they should be. Any feedback is appreciated. Thanks,
On-Page Optimization | | dgmiles
Dave0 -
How do you see a list of URLs with duplicate page titles?
When looking at the Duplicate Page Title report, the Other URLs column has various numbers that presumably indicate the number of pages that share the same title. When I click on one of these numbers, say a URL that shows 4 in that column, the next page reports "No sample duplicate URLs to report". Why isn't it showing me the other 3 URLs with the same page title?
On-Page Optimization | | jkenyon0