Is there a limit to the number of duplicate pages pointing to a rel='canonical ' primary?
-
We have a situation on twiends where a number of our 'dead' user pages have generated links for us over the years. Our options are to 404 them, 301 them to the home page, or just serve back the home page with a canonical tag.
We've been 404'ing them for years, but i understand that we lose all the link juice from doing this. Correct me if I'm wrong?
Our next plan would be to 301 them to the home page. Probably the best solution but our concern is if a user page is only temporarily down (under review, etc) it could be permanently removed from the index, or at least cached for a very long time.
A final plan is to just serve back the home page on the old URL, with a canonical tag pointing to the home page URL. This is quick, retains most of the link juice, and allows the URL to become active again in future. The problem is that there could be 100,000's of these.
Q1) Is it a problem to have 100,000 URLs pointing to a primary with a rel=canonical tag? (Problem for Google?)
Q2) How long does it take a canonical duplicate page to become unique in the index again if the tag is removed? Will google recrawl it and add it back into the index? Do we need to use WMT to speed this process up?
Thanks
-
I'll add this article by Rand that I came across too. I'm busy testing the solution presented in it:
https://moz.com/blog/are-404-pages-always-bad-for-seo
In summary, 404 all dead pages with a good custom 404 page so as to not waste crawl bandwidth. Then selectively 301 those dead pages that have accrued some good link value.
Thanks Donna/Tammy for pointing me in this direction..
-
In this scenario yes, a customized 404 page with a link to a few top level ( useful) links would be better served to both the user and to Google. From a strictly SEO standpoint, 100,000 redirects and or canonical tags would not benefit your SEO.
-
Thanks Donna, good points..
We return a hard 404, so it's treated correctly by google. We are just looking at this from a SEO point of view now to see if there's any way to reclaim this lost link juice.
Your point about looking at the value of those incoming links is a good one. I suppose it's not worth making google crawl 100,000 more pages for the sake of a few links. We've just starting seeing these pop up in Moz Analytics as link opportunities, and we can see them as 404's in site explorer too. There are a few hundred of these incoming links that point to a 404, so we feel this could have an impact.
I suppose we could selectively 301 any higher value links to the home page.. It will be an administrative nightmare, but doable..
How do others tackle this problem. Does everyone just hard 404 a page when that loses the link juice for incoming links to it..?
Thanks
-
Hi David,
When you say "we've been 404'ing them for years", does that mean you've created a custom 404 page that explains the situation to site visitors or does it mean you've been letting them naturally error and return the appropriate 404 (page not found) error to Google? It makes a difference. If the pages truly no longer exist and there is no equivalent replacement, you should be letting them naturally error (return a 404 return code) so as not to mislead Google's robots and site visitors.
Have you looked at the value of those incoming links? They may be low value anyway. There may be more valuable things you could be doing with your time and budget.
To answer your specific questions:
_Q1) Is it a problem to have 100,000 URLs pointing to a primary with a rel=canonical tag? (Problem for Google?) _
Yes, if those pages (or valuable replacements) don't actually exist. You'd be wasting valuable crawl budget. This looks like it might be especially true in your case given the size of your site. Check out this article. I think you might find it very helpful. It's an explanation of soft 404 errors and what you should do about them.
Q2) How long does it take a canonical duplicate page to become unique in the index again if the tag is removed? Will google recrawl it and add it back into the index? Do we need to use WMT to speed this process up?
If the canonical tag is changed or removed, Google will find and reindex it next time it crawls your site (assuming you don't run out of crawl budget). You don't need to use WMT unless you're impatient and want to try to speed the process up.
-
Thanks Sandi, I did.. It's a great article and it answered many questions for me, but i couldn't really get clarity on my last two questions above..
-
Hey David
Check this MOZ Blog post about Rel=Canlonical appropriately named Rel=Confused?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
When trying to sculpt an internal link structure, is there any point in placing text links to top level pages that are already in the main menu?
Does Google recognise a link in the content if there is already a link in the menu? My understanding is that Google only counts the first link it finds.
On-Page Optimization | | bittristo0 -
Is it better to keep a glossary or terms on one page or break it up into multiple pages?
We have a very large glossary of over 1000 industry terms on our site with links to reference material, embedded video, etc. Is it better for SEO purposes to keep this on one page or should we break it up into multiple pages, a different page for each letter for example? Thanks.
On-Page Optimization | | KenW0 -
Pages I don't want
Hello Friends, A few days ago, I set up my wordpress site and everything is great so far, there's just one small problem. Wenn I search for site:mysite.com I see a lot of pages in the index I don't want to have on my site. It looks like demo pages from the theme I bought. Examples: http://mysite.com/themefusion_es_groups/group1/ http://mysite.com/slide/youtube/ http://mysite.com/lorem-ipsum-2/ http://mysite.com/slide/demo-5 What would be the best way to handle it? Simply delete the pages, then a soft 404 shows. Delete the pages and 301 them to the start page. Are they disapearing from the infex then? I am very grateful for tips regarding this! Thanks in Advance:)
On-Page Optimization | | grobro0 -
Does having landing page text beneath the products at the base of the page hinder SEO?
I have a site that is capable of hosting the landing page description either above the products under the H1 or below them at the bottom of the page before the footer. I have always chosen to keep the text "above the fold" as presumably this would be crawled sooner in relation to the rest of the page content than had it been at the bottom. However, this means that I can only really write just a few sentences for each landing page - otherwise the products would shift further down the page - and I don't think this is good from a UX POV. Question: If I move the bulk of my landing page descriptions to the text snippet located underneath the products, could this negatively affect my SEO? Text at the bottom of the page is obviously not significant for users, so is there a chance this could be seen as spam?
On-Page Optimization | | Silkstream0 -
Should i make all of my pages with canonical tag
Hi, Im using thesis Wordpress theme, and their default option is "Add canonical <acronym title="Uniform Resource Locator">URL</acronym>s to your site" im just wandering if i should keep that box checked and apply canonical <acronym title="Uniform Resource Locator">URL</acronym>s to all of my pages? Thank You
On-Page Optimization | | Vmezoz0 -
Using phrases like 'NO 1' or 'Best' int he title tag
Hi All, Quick question - is it illegal, against any rule etc to use phrases such as 'The No 1 rest of the title tag | Brand Name' on a site?
On-Page Optimization | | Webrevolve0 -
Duplicate Page Content Should we 301 - Best Practices?
What would be the best way to avoid a Duplicate Page Content for these type of pages. Our website generates user friendly urls, for each page..
On-Page Optimization | | 365ToursSafaris
So it is the same exact page, just both versions of the url work.. Example: http://www.safari365.com/about-africa/wildebeest-migration http://www.safari365.com/wildebeest-migration I don't think adding code to the page will work because its the same page for the incorrect and correct versions of the page. I don't think i can use the URL parameter setting because the version with /about-africa/ is the correct (correct as it it follows the site navigation) I was thinking of using the htaccess to redirect to the correct version.. Will that work ? and does it follow best Practices ? any other suggestions that would work better ?0 -
Duplicate page content errors
Site just crawled and report shows many duplicate pages but doesn't tell me which ones are dups of each other. For you experienced duplicate page experts, do you have a subscription with copyscape and pay $.05 per test? What is the best way to clear these? Thanks in advance
On-Page Optimization | | joemas990