Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Internal search : rel=canonical vs noindex vs robots.txt
-
Hi everyone,
I have a website with a lot of internal search results pages indexed. I'm not asking if they should be indexed or not, I know they should not according to Google's guidelines. And they make a bunch of duplicated pages so I want to solve this problem.
The thing is, if I noindex them, the site is gonna lose a non-negligible chunk of traffic : nearly 13% according to google analytics !!!
I thought of blocking them in robots.txt. This solution would not keep them out of the index. But the pages appearing in GG SERPS would then look empty (no title, no description), thus their CTR would plummet and I would lose a bit of traffic too...
The last idea I had was to use a rel=canonical tag pointing to the original search page (that is empty, without results), but it would probably have the same effect as noindexing them, wouldn't it ? (never tried so I'm not sure of this)
Of course I did some research on the subject, but each of my finding recommanded one of the 3 methods only ! One even recommanded noindex+robots.txt block which is stupid because the noindex would then be useless...
Is there somebody who can tell me which option is the best to keep this traffic ?
Thanks a million
-
Yeah, normally I'd say to NOINDEX those user-generated search URLs, but since they're collecting traffic, I'd have to side with Alan - a canonical may be your best bet here. Technically, they aren't "true" duplicates, but you don't want the 1K pages in the index, you don't want to lose the traffic (which NOINDEX would do), and you don't want to kill those pages for users (which a 301 would do).
Only thing I'd add is that, if some of these pages are generating most of the traffic (e.g. 10 pages = 90% of the traffic for these internal searches), you might want to make those permanent pages, like categories in your site architecture, and then 301 the custom URLs to those permanent pages.
-
Huh not sure since I'm not a developer (and didn't work on that website dev) but I'd say all of the above^^. If useful, here are their url structure, there's two kind :
- /searchpage.htm?action=search&pagenumber=xx&query=product+otherterms
So I guess they are generated when a user makes a search
paginated (about 15 pages generally),
and I can approximately know how much they are duplicates, I can tell some are probably overlapping when there's a lot of variations for the product. There are just a few complete duplicates (when the product searched is the same with different added terms, doesn't happen a lot in this list).
- /searchpage-searchterm-addedterm-number.htm
Those I find surprising, I don't know if they are pages generated with a fixed url, or if they are rewritten (Haven't looked at the htaccess yet, but I will, god I have a headache just thinking about reading that thing lol)
There's about a thousand of them all (from GGanalytics, about half of each sort, and nearly all are indexed by Google), on a website with about 12 thou total in pages.
Maybe the traffic loss will be compensated by the removed competition between those search pages and the product pages (and the rel=canonical is surely way less brutal than a noindex for that matter), but without experience in these kind of situations it's hard to make a decision...
Really appreciate you guys taking the time to help !
-
Alan's absolutely right about how canonical works, but I just want to clarify something - what about these pages is duplicated? In other words, are these regular searches (like product searches) with duplicate URLs, are these paginated searches (with page 2, 3, etc. that appear thin), or are these user-generated searches spinning out into new search pages (not exact duplicates but overlapping)? The solutions can vary a bit with the problem, and internal search is tricky.
-
Just one more point, a canonical is just a hint to the search engines, it is not a directive, so if they think that the pages should not be merged, they will ignore them, so in that way, they may make the decision for you
-
Not a lot of real duplicates, they're more alike, and the most visited are unique, so I'll keep the most important ones and just toss a few duplicates.
Thanks a lot for your help, problem solved !
-
no not like a noindex. more like a merge.
will it make you rank for many keywords? not necessarly, as a page all about blue widgets is going to rank higher then a page has many different subjects including blue widgets.
A canonical is really for duplicate content, or very alike content.
So you have to decide what your page is, is it duplicate or alike content, or is it unique?
if the pages are unique then do nothing, let them rank. if yopu think they are alike, then use a canonical. if there are only a few, then i would not worry either way.
if you decide they are unique, they I would look at making the page title unique also, maybe even description too.
-
Thanks for your answer
Ok you're saying indeed it will act like a noindex over time.
So if one of the result page would have ranked for a particular query, it will not rank any more, like with a noindex => it will lose the 13% of traffic it generated...
Otherwise it would be too easy to make a page rank for the keywords used in a bunch of other pages that refer to it via rel=canonical... wouldn't it ?
I'm starting to think I can't do anything... Maybe just noindex a bunch of them that cause duplicates, and leave the rest in the index.
-
Rel=canonical is tge way to go, it will tell the search results that all credit for all diffrent urls go to the original search page. eventual onl;y the original search page will exist in the index.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt allows wp-admin/admin-ajax.php
Hello, Mozzers!
Technical SEO | | AndyKubrin
I noticed something peculiar in the robots.txt used by one of my clients: Allow: /wp-admin/admin-ajax.php What would be the purpose of allowing a search engine to crawl this file?
Is it OK? Should I do something about it?
Everything else on /wp-admin/ is disallowed.
Thanks in advance for your help.
-AK:2 -
Rel=canonical on Godaddy Website builder
Hey crew! First off this is a last resort asking this question here. Godaddy has not been able to help so I need my Moz Fam on this one. So common problem My crawl report is showing I have duplicate home pages www.answer2cancer.org and www.answer2cancer.org/home.html I understand this is a common issue with apache webservers which is why the wonderful rel=canonical tag was created! I don't want to go through the hassle of a 301 redirect of course for such a simple issue. Now here's the issue. Godaddy website builder does not make any sense to me. In wordpress I could just go add the tag to the head in the back end. But no such thing exist in godaddy. You have to do this weird drag and drop html block and drag it somewhere on the site and plug in the code. I think putting before the code instead of just putting it in there. So I did that but when I publish and inspect in chrome I cannot see the tag in the head! This is confusing I know. the guy at godaddy didn't stand a chance lol. Anyway much love for any replies!
Technical SEO | | Answer2cancer0 -
Is there a limit to how many URLs you can put in a robots.txt file?
We have a site that has way too many urls caused by our crawlable faceted navigation. We are trying to purge 90% of our urls from the indexes. We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags. Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file? Could this cause any issues for us? Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.
Technical SEO | | kcb81780 -
301 redirect: canonical or non canonical?
Hi, Newbie alert! I need to set up 301 redirects for changed URLs on a database driven site that is to be redeveloped shortly. The current site uses canonical header tags. The new site will also use canonical tags. Should the 301 redirects map the canonical URL on the old site to the corresponding canonical for the new design . . . or should they map the non canonical database URLs old and new? Given that the purpose of canonicals is to indicate our preferred URL, then my guess is that's what I should use. However, how can I be sure that Google (for example) has indexed the canonical in every case? Thx in anticipation.
Technical SEO | | ztalk1120 -
Tool to search relative vs absolute internal links
I'm preparing for a site migration from a .co.uk to a .com and I want to ensure all internal links are updated to point to the new primary domain. What tool can I use to check internal links as some are relative and others are absolute so I need to update them all to relative.
Technical SEO | | Lindsay_D0 -
Googlebot does not obey robots.txt disallow
Hi Mozzers! We are trying to get Googlebot to steer away from our internal search results pages by adding a parameter "nocrawl=1" to facet/filter links and then robots.txt disallow all URLs containing that parameter. We implemented this late august and since that, the GWMT message "Googlebot found an extremely high number of URLs on your site", stopped coming. But today we received yet another. The weird thing is that Google gives many of our nowadays robots.txt disallowed URLs as examples of URLs that may cause us problems. What could be the reason? Best regards, Martin
Technical SEO | | TalkInThePark0 -
Oh no googlebot can not access my robots.txt file
I just receive a n error message from google webmaster Wonder it was something to do with Yoast plugin. Could somebody help me with troubleshooting this? Here's original message Over the last 24 hours, Googlebot encountered 189 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%. Recommended action If the site error rate is 100%: Using a web browser, attempt to access http://www.soobumimphotography.com//robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot. If your robots.txt is a static page, verify that your web service has proper permissions to access the file. If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure. If the site error rate is less than 100%: Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors. The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website. After you think you've fixed the problem, use Fetch as Google to fetch http://www.soobumimphotography.com//robots.txt to verify that Googlebot can properly access your site.
Technical SEO | | BistosAmerica0 -
Syndication: Link back vs. Rel Canonical
For content syndication, let's say I have the choice of (1) a link back or (2) a cross domain rel canonical to the original page, which one would you choose and why? (I'm trying to pick the best option to save dev time!) I'm also curious to know what would be the difference in SERPs between the link back & the canonical solution for the original publisher and for sydication partners? (I would prefer not having the syndication partners disappeared entirely from SERPs, I just want to make sure I'm first!) A side question: What's the difference in real life between the Google source attribution tag & the cross domain rel canonical tag? Thanks! PS: Don't know if it helps but note that we can syndicate 1 article to multiple syndication partners (It would't be impossible to see 1 article syndicated to 50 partners)
Technical SEO | | raywatson0