Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Internal search : rel=canonical vs noindex vs robots.txt
-
Hi everyone,
I have a website with a lot of internal search results pages indexed. I'm not asking if they should be indexed or not, I know they should not according to Google's guidelines. And they make a bunch of duplicated pages so I want to solve this problem.
The thing is, if I noindex them, the site is gonna lose a non-negligible chunk of traffic : nearly 13% according to google analytics !!!
I thought of blocking them in robots.txt. This solution would not keep them out of the index. But the pages appearing in GG SERPS would then look empty (no title, no description), thus their CTR would plummet and I would lose a bit of traffic too...
The last idea I had was to use a rel=canonical tag pointing to the original search page (that is empty, without results), but it would probably have the same effect as noindexing them, wouldn't it ? (never tried so I'm not sure of this)
Of course I did some research on the subject, but each of my finding recommanded one of the 3 methods only ! One even recommanded noindex+robots.txt block which is stupid because the noindex would then be useless...
Is there somebody who can tell me which option is the best to keep this traffic ?
Thanks a million
-
Yeah, normally I'd say to NOINDEX those user-generated search URLs, but since they're collecting traffic, I'd have to side with Alan - a canonical may be your best bet here. Technically, they aren't "true" duplicates, but you don't want the 1K pages in the index, you don't want to lose the traffic (which NOINDEX would do), and you don't want to kill those pages for users (which a 301 would do).
Only thing I'd add is that, if some of these pages are generating most of the traffic (e.g. 10 pages = 90% of the traffic for these internal searches), you might want to make those permanent pages, like categories in your site architecture, and then 301 the custom URLs to those permanent pages.
-
Huh not sure since I'm not a developer (and didn't work on that website dev) but I'd say all of the above^^. If useful, here are their url structure, there's two kind :
- /searchpage.htm?action=search&pagenumber=xx&query=product+otherterms
So I guess they are generated when a user makes a search
paginated (about 15 pages generally),
and I can approximately know how much they are duplicates, I can tell some are probably overlapping when there's a lot of variations for the product. There are just a few complete duplicates (when the product searched is the same with different added terms, doesn't happen a lot in this list).
- /searchpage-searchterm-addedterm-number.htm
Those I find surprising, I don't know if they are pages generated with a fixed url, or if they are rewritten (Haven't looked at the htaccess yet, but I will, god I have a headache just thinking about reading that thing lol)
There's about a thousand of them all (from GGanalytics, about half of each sort, and nearly all are indexed by Google), on a website with about 12 thou total in pages.
Maybe the traffic loss will be compensated by the removed competition between those search pages and the product pages (and the rel=canonical is surely way less brutal than a noindex for that matter), but without experience in these kind of situations it's hard to make a decision...
Really appreciate you guys taking the time to help !
-
Alan's absolutely right about how canonical works, but I just want to clarify something - what about these pages is duplicated? In other words, are these regular searches (like product searches) with duplicate URLs, are these paginated searches (with page 2, 3, etc. that appear thin), or are these user-generated searches spinning out into new search pages (not exact duplicates but overlapping)? The solutions can vary a bit with the problem, and internal search is tricky.
-
Just one more point, a canonical is just a hint to the search engines, it is not a directive, so if they think that the pages should not be merged, they will ignore them, so in that way, they may make the decision for you
-
Not a lot of real duplicates, they're more alike, and the most visited are unique, so I'll keep the most important ones and just toss a few duplicates.
Thanks a lot for your help, problem solved !
-
no not like a noindex. more like a merge.
will it make you rank for many keywords? not necessarly, as a page all about blue widgets is going to rank higher then a page has many different subjects including blue widgets.
A canonical is really for duplicate content, or very alike content.
So you have to decide what your page is, is it duplicate or alike content, or is it unique?
if the pages are unique then do nothing, let them rank. if yopu think they are alike, then use a canonical. if there are only a few, then i would not worry either way.
if you decide they are unique, they I would look at making the page title unique also, maybe even description too.
-
Thanks for your answer
Ok you're saying indeed it will act like a noindex over time.
So if one of the result page would have ranked for a particular query, it will not rank any more, like with a noindex => it will lose the 13% of traffic it generated...
Otherwise it would be too easy to make a page rank for the keywords used in a bunch of other pages that refer to it via rel=canonical... wouldn't it ?
I'm starting to think I can't do anything... Maybe just noindex a bunch of them that cause duplicates, and leave the rest in the index.
-
Rel=canonical is tge way to go, it will tell the search results that all credit for all diffrent urls go to the original search page. eventual onl;y the original search page will exist in the index.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Tester - syntax not understood
I've looked in the robots.txt Tester and I can see 3 warnings: There is a 'syntax not understood' warning for each of these. XML Sitemaps:
Technical SEO | | JamesHancocks1
https://www.pkeducation.co.uk/post-sitemap.xml
https://www.pkeducation.co.uk/sitemap_index.xml How do I fix or reformat these to remove the warnings? Many thanks in advance.
Jim0 -
One robots.txt file for multiple sites?
I have 2 sites hosted with Blue Host and was told to put the robots.txt in the root folder and just use the one robots.txt for both sites. Is this right? It seems wrong. I want to block certain things on one site. Thanks for the help, Rena
Technical SEO | | renalynd270 -
Link rel="prev" AND canonical
Hi guys, When you have several tabs on your website with products, you can most likely navigate to page 2, 3, 4 etc...
Technical SEO | | AdenaSEO
You can add the link rel="prev" and link rel="next" tags to make sure that 1 page get's indexed / ranked by Google. am I correct? However this still means that all the pages can get indexed, right? For example a webshop makes use of the link rel="prev" and ="next" tags. In the Google results page though, all the seperate tabs pages are still visible/indexed..
http://www.domain.nl/watches/?tab=1
http://www.domain.nl/watches/?tab=24
http://www.domain.nl/watches/?tab=19
etc..... Can we prevent this, and make sure only the main page get's indexed and ranked, by adding a canonical link on every 'tab page' to the main page --> www.domain.nl/watches/ I hope I explained it well and I'm looking forward to hearing from you. Regards, Tom1 -
how to set rel canonical on wordpress.com sites
I know how to do this with a wordpress.org site but I have a client that does not want to switch and without a plugin I am lost. any help would be greatly appreciated. Jeremy Wood
Technical SEO | | SOtBOrlando0 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Rel canonical between mirrored domains
Hi all & happy new near! I'm new to SEO and could do with a spot of advice: I have a site that has several domains that mirror it (not good, I know...) So www.site.com, www.site.edu.sg, www.othersite.com all serve up the same content. I was planning to use rel="canonical" to avoid the duplication but I have a concern: Currently several of these mirrors rank - one, the .com ranks #1 on local google search for some useful keywords. the .edu.sg also shows up as #9 for a dirrerent page. In some cases I have multiple mirrors showing up on a specific serp. I would LIKE to rel canonical everything to the local edu.sg domain since this is most representative of the fact that the site is for a school in Singapore but...
Technical SEO | | AlexSG
-The .com is listed in DMOZ (this used to be important) and none of the volunteers there ever respoded to requests to update it to the .edu.sg
-The .com ranks higher than the com.sg page for non-local search so I am guessing google has some kind of algorithm to mark down obviosly local domains in other geographic locations Any opinions on this? Should I rel canonical the .com to the .edu.sg or vice versa? I appreciate any advice or opinion before I pull the trigger and end up shooting myself in the foot! Best regards from Singapore!0 -
Google insists robots.txt is blocking... but it isn't.
I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch?
Technical SEO | | ahockley0 -
How to handle (internal) search result pages?
Hi Mozers, I'm not quite sure what the best way is to handle internal search pages. In this case it's for an ecommerce website with about 8.000+ products and search pages currently look like: example.com/search.php?search=QUERY+HERE. I'm leaning towards making them follow, noindex. Since pages like this can be easily abused for duplicate content and because I'd rather have the category pages ranked. How would you handle this?
Technical SEO | | Qon0