Canonical OR redirect
-
Hi,
i've a site about sport which cover matches. for each match i've a page.
last week there was a match between: T1 v T2
so a page was created:
www.domain.com/match/T1vT2 - Page1
this week T2 host T1, so there's a new page
www.domain.com/match/T2vT1 - Page2
each page has a unique content with Authorship, but the URL, Title, Description, H1
look very similar cause the only difference is T2 word before T1.
though Page2 is available for a few days, on site links & sitemap, for the search query "T2 T1 match" Page1 appears on the SERP (high location).
of course i want Page2 to be on SERP for the above query cause it's the relevant match.
i even don't see Page2 anywhere on the SERP and i think it wasn't indexed.
Questions:
1. do you think google see both pages as duplicated though the content is different?
2. is there a difference when you search for
T1 vs T2
OR
T2 vs T1
?
3. should i redirect 301 Page1 to Page2? consider that all content for Page1 and the Authorship G+ will be lost.
4. should i make rel=canonical on Page1 to Page2?
5. should i let google sort it out?
i know it's a long one, thanks for your patience.
Thanks,
Assaf
-
Thanks for everything.
i'll stick to the slower method and see what's going on in the index.
-
(2) It could take a while, yes. There is no speedy way to de-index a lot of content that is no longer crawlable, I'm afraid, unless it's currently in a directory that can be removed in Google Webmaster Tools.
(3) So, basically, let's say all the pages live under "/events" - you'd create "/events2", put all the new events in that going forward, and them remove "/events" in GWT?
It could work for removal, but changing your site architecture that way carries a significant amount of risk. You'll also have to make sure that you have a plan going forward for de-indexing new content that becomes outdated, because this is not something you want to do every couple of months. Honestly, unless you know the old content is harming your rankings, I probably wouldn't do this. I'd stick to the slower method.
-
Dear Dr. Meyers,
very insightful!!!
i must clear all the irrelevant page and the sooner the better.
(1) could take months or years
(2) sounds as a very good approach - i'm building my Sitemap with code so it's not a problem. the only problem is with a few hundreds at a time it could also take a long time. and wouldn't google spend a lot of time on crawling those pages and index less of the fresh new ones?
(3) what about google removal tool - and it's connected to my point on last post about setting a new site architecture:
- for all new matches=Pages create a new directory (without the irrelevant pages)
- ask WMT removal tool to remove the old directory and with it all the irrelevant pages (of course according to the guidelines for this tool)
what do you think about this approach?
Thanks again for all your help, i really appreciate it!
Assaf.
-
Oh, wow - yeah if only 2K are current and 120K are indexed, you definitely should be proactive about this. Unfortunately de-indexing content that's already been indexed is tough. Robots.txt isn't terribly effective after-the-fact, and the folder-based approach you've described won't work. You can move the pages and remove the folder (either with Robots.txt or in Webmaster Tools), but you haven't tied the old URLs to the new URLs. To remove them, first you have to tell Google they've moved.
First, pick your method. If these old events have any links/traffic/etc., then you may want to rel=canonical or 301-redirect. Otherwise, you could META NOINDEX or even 404. It depends a bit on their value. Then, a couple of options:
(1) You can wait and see. Let Google clear out the old events over time. If you're not at any risk, this may be fine. Monitor and see what happens.
(2) Encourage Google to re-crawl the old pages by creating a new, stand-alone sitemap. Then, monitor that sitemap in GWT for indexation. You don't have to do all 120K at once, but you could start with a few hundred (hopefully, you can build the XML with code, not by hand) and see how it progresses).
-
Dear Dr. Meyers,
i'm starting to understand i've a much bigger problem.
all finished matches are not relevant anymore and though you can reach them (their Page) from SERP or direct URL, they don't appear on site links OR sitemap. so the best idea is to remove all these old pages from google index - they don't contribute + they made my index status contain 120k pages while only 2000 are currently relevant.
this causes waste of google crawling on irrelevant pages and a potential that google may see some of them as dupes cause in some cases most of the page is relatively similar.
one suggestion i got is - after a match finishes pragmatically add to the page and google will remove it from it's index. - will it remove it if there're no links/sitemap to this page???
but i also have to handle the problem of the huge index - the above approach may/or not handle pages from now on, but what about all the other far past pages with finished matches??? how can i remove them all from the index.
-
adding <meta name="robots" content="noindex,follow">to all of them could take months or more to clean the index cause they're probably rarely crawled.</meta name="robots" content="noindex,follow">
-
more aggressive approach would be to change this site architecture and restrict by robot.txt the folder that holds all the past irrelevant pages.
so if today a match URL is like this: www.domain.com/sport/match/T1vT2
restrict www.domain.com/sport/match/ on robots.txt
and from now on create all new matches on different folder like: www.domain.com/sport/new-match/T1vT2
-
is this a good solution?
-
wouldn't google penalize me for removing a directory with 100k pages?
-
if it's a good approach, how much time it will take for google to clear all those pages from it's index?
I know it's a long one and i'll really appreciate your response.
Thanks a lot,
Assaf.
-
-
The problem with (2) is that, if you cut the crawl path, Google can't process any on-page directives, like 301s, canonicals, etc. Now, eventually, they might try to re-crawl from the index (knowing the URL used to exist), but that can take a long time. So, while canonical is probably appropriate here, you may have to leave the old event/URL active long enough for Google to process the tag.
If these are really isolated cases, I wouldn't worry too much. Maybe rel=canonical them, and eventually Google will flush out the old URL. If this starts happening a lot, I'd really consider some kind of permanent URL for certain match-ups and events.
There's no easy answer. This stuff is very site-specific and can be tricky.
-
i've got some good responses, but i'm not sure what to do.
any other opinions will be highly appreciated.
Thanks!
-
Hi Dr. Meyers,
thanks for your detailed response.
just wanted to refine my scenario:
1. the case of pairs (repeat match after a short term) is rare, but i encountered it.
2. there're no links or sitemap entry for the match that already finished. but google keeps it in the index. the page is reachable ONLY by direct URL address or from the SERP.
3. i don't think i can enforce google to automatically remove the old match from the index and doing it manually for 1000's of matches is not an option.
4. i thought google recognize the content of each page to determine if it's duplicate and not only by the URL/title - by tool the content is only 66% similar.
5. currently i've this problem twice - so for one case i've made rel=canonical and the other one i'm letting google to decide. when google encounters a rel=canonical does it goes to the URL of the canonical?
Thanks,
Assaf.
-
This is a pretty common problem with event-oriented sites, and there's no easy solution. It's a trade-off - if you keep creating new URLs every time a new event is listed, you risking producing a lot of near duplicates and eventually diluting your index. At best, you could have dozens or hundreds of pages competing for the same keywords.
You could canonical or 301-redirect to the most recent event, but that has trade-offs, too. For one, a huge number of either can look odd to Google. Also, the latest event may not always be an appropriate target page, especially if more than just the data is changing. Unfortunately, without seeing the content, it's really tough to tell.
The other option is to create a static URL for every pairing and update the content on that page (maybe creating archival URLs for the old content, that are lower priority in the site architecture). That way, the most current URL never changes. Again, this depends a lot on the site and the scope.
If you'r just talking about a couple of URLs for a handful of events, I wouldn't worry too much about it. I probably wouldn't reverse the URL ("A vs. B" --> "B vs. A"), as it doesn't gain you much, but I also wouldn't lose sleep over it. If each pairing can generate dozens of URLs, though, I think you may want to consider a change in your site architecture.
-
Thanks Jesse!
1. the content is different - according to a comparison tool they are 64% similar and considering the menus, header of the site and other element that appears on each page - you can say they're unique - don't i? even so google haven't indexed the 2nd page and it's up for 5 days - sitemap indexing rate is 90% according to google webmaster tools. so what wrong here?
2. including the date seems like a good idea! but 2 questions about it:
-won't the URL look messy with these numeric inputs?
- the same same match can be repeated in the future isn't it a good idea that the page is already indexed? i mean the URL will stay the same, just the content will be different.
Thanks,
Assaf.
-
Highland thanks for your quick response.
the pages are created dynamically cause at every moment we have more then 1000 matches on our DB. it's impossible to create a manual URL for each page.
the case i described is rare, but it happened for a very important match.
-
1. If the content is different then you should have no problem and you can allow both pages to be indexed without needing to noindex or canonicalize either page
2. Could you perhaps include the date in the url?
As long as each page does have different content, I would say you are fine. I would definately consider adding the date to the url. What if the two teams play again at a later date, adding the date would help differienate those pages even more and I believe help Google.
-
You need to better differentiate the content. T1vsT2 is not the best way to segment your content. So I would actually change URL structures to something like
www.domain.com/match/week1/T1vT2
www.domain.com/match/week2/T2vT1It better segments your content and makes it obvious there's a difference because, to an end user, the original URLs are confusing and that confusion has extended to Google. Google will not see the order as important unless you quote your search (which normal users won't do). Google matches content and context first.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to setup redirects
Hi Guys, If you're doing an SEO migration and have multiple versions of the same page example: Version 1: http://naturesway.com.au/superfood/super-maca-powder Version 2: https://naturesway.com.au/superfood/super-maca-powder Version 3: https://www.naturesway.com.au/superfood/super-maca-powder And you want to redirect them to a new URL (new site): New Site: https://www.naturesway.com.au/nw-superfoods-maca-powder-100g How would you ensure you redirect all the different versions of URL (versions 1,2,3) to the new URL on the new site? Cheers.
Intermediate & Advanced SEO | | brandonegroup0 -
What is true impact of permanent magento redirects?
Hi everyone, I got a tough technical SEO question, that is bugging almost everyone in the (ecommerce) company at the moment. Due to a very "unhealthy" structure of Magento folders, with different countries using same folders in different store views, many of our URL's do change almost on a weekly basis and this is terrifying us. What happens is, that there is a "-numberx"(ex. /category/product-1.html) added to hundreds of URLs so that we are more and more concerned about the impact on SEO. I checked the redirect information with the Moz Toolbar and saw, the following information: http://prntscr.com/81v23e So, even though we had URLs with /category/product-1.html, /category/product-2.html,... the redirect seems to go straight to the last number. My question? -Can this be interpreted as one redirect and therefore it is "less" painful from an SEO point of view?
Intermediate & Advanced SEO | | ennovators
-As we do not have a constant target URL, where does the link juice go if the target page constantly keeps changing (number goes still up) Any advice would be much appreciated. Thanks0 -
Clarity needed on 301 redirects
Looking to get a bit of clarity on redirects: We're getting ready to launch a new website with a simplified url structure (we're consolidating pages & content) & I already know that I'll have to employ 301 redirects from the old url structure to the new. What I'm not clear about is how specifc I should be. Here's an example of my file structure: Old website: www.website.com
Intermediate & Advanced SEO | | JSimmons17
New website: www.website.com Old website: www.website.com/vacations
New website: www.website.com/vacations Old website: www.website.com/vacations/costa-rica
New website: www.website.com/vacations/central-america Old website: www.website.com/vacations/costa-rica/guanacaste
New website: www.website.com/vacations/central-america Old website: www.website.com/vacations/mexico
New website: www.website.com/vacations/central-america Old website: www.website.com/vacations/mexico/cancun
New website: www.website.com/vacations/central-america Old website: www.website.com/vacations/bolivia
New website: www.website.com/vacations/south-america Old website: www.website.com/vacations/bolivia/la-paz
New website: www.website.com/vacations/south-america Do I need to redirect each and every page or would just redirecting just the folder be enough to keep my SEO juice? Many thanks in advance for any help!0 -
How would you handle this duplicate content - noindex or canonical?
Hello Just trying look at how best to deal with this duplicated content. On our Canada holidays page we have a number of holidays listed (PAGE A)
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/destinations/north-america/canada/suggested-holidays.aspx We also have a more specific Arctic Canada holidays page with different listings (PAGE B)
http://www.naturalworldsafaris.com/destinations/arctic-and-antarctica/arctic-canada/suggested-holidays.aspx Of the two, the Arctic Canada page (PAGE B) receives a far higher number of visitors from organic search. From a user perspective, people expect to see all holidays in Canada (PAGE A), including the Arctic based ones. We can tag these to appear on both, however it will mean that the PAGE B content will be duplicated on PAGE A. Would it be the best idea to set up a canonical link tag to stop this duplicate content causing an issue. Alternatively would it be best to no index PAGE A? Interested to see others thoughts. I've used this (Jan 2011 so quite old) article for reference in case anyone else enters this topic in search of information on a similar thing: Duplicate Content: Block, Redirect or Canonical - SEO Tips0 -
SEO for a redirected domain name
Our client is a law firm with a name that is challenging to spell. We have procured a domain name for them that is catchy, easy to spell, and plays well into their brand, or at least the current campaign. We're using the campaign domain to direct traffic to their website with a 301 redirect. We have placed the campaign domain in a variety of offline mediums including print and outdoor. The client is currently in the number 1 spot for a good number of our highest priority keywords, so I do not want to do anything to jeopardize that. I'm also not sure this campaign will be their "brand" long-term so I don't want to risk making a switch and making it back. So for now, I'm most comfortable leaving the campaign domain as a redirect to their primary domain. Recently, the client approached me complaining (legitimately) that when people google the campaign domain, they are brought to search results for an entirely different domain because Google "corrects" the domain name for them. This is obviously a bad thing, with many users defaulting to entering urls into Google instead of the address bar. If you tell Google that it was wrong about the autocorrection, our site is in the number 1 position. I liken the situation to Overstock.com using O.co as their offline domain, but overstock.com as their online domain. But imagine if you googled o.co and google brought you to a list of results for "on.co" because it assumed you fat-fingered it. Is there anything I can do to prevent the domain name from getting corrected by Google?
Intermediate & Advanced SEO | | steverobinson0 -
302 redirects in the sitemap?
My website uses a prefix at the end to instruct the back-end about visitor details. The setup is similar to this site - http://sanfrancisco.giants.mlb.com/index.jsp?c_id=sf with a 302 redirect from the normal link to the one with additional info and a canonical tag on the actual URL without the extra info ((the normal one here being http://sanfrancisco.giants.mlb.com,) However, when I used www.xml-sitemaps.com to create a sitemap they did so using the URLs with the extra info on the links... what should I do to create a sitemap using the normal URLs (which are the ones I want to be promoting)
Intermediate & Advanced SEO | | theLotter0 -
Mobile alternates and redirects
Hi! We have a desktop version of our site at http://www.domain.com, and some weeks ago, we launched a mobile edition at http://m.domain.com, replicating the most important sections of the site, but not yet all of them. Actually, if you access with a mobile device userAgent to any desktop url you are redirected to the home of the mobile web. This is the only redirect implemented about mobile and desktop versions. A) Shall we also redirect "Googlebot-Mobile" to our mobile site, or it could be considered cloaking?
Intermediate & Advanced SEO | | marianoSoler98
B) Its necessary to implement the rel="alternate" media="handheld" tag in all of our Desktop SEO URLs? And in our mobile ones? Can't it be implemented via sitemaps like the rel="alternate" hreflang="x" tag?
C) Would the linkbuilding job done on the Desktop version affects the Mobile also, or we would still need to do a separate job? Thanks!0 -
Where to point Rel = Canonical?
I have a client who is using the rel=canonical tag across their e-commerce site. Here is an example of how it is set up. URLs 1. http://www.beautybrands.com/category/makeup/face/bronzer.do?nType=22. http://www.beautybrands.com/category/makeup/face/bronzer.doThe canonical tag points to the second URL. Both pages are indexed by Google.The first page has a higher page authority (most of the internal site links go to the first URL) than the second one. Should the page with the highest authority be the one that the canonical tag points to? Is there a better way to handle these situations? Does any authority get passed through the tag?Thanks!
Intermediate & Advanced SEO | | AlightAnalytics0