There are a couple of ways to go about it, but this is a situation where the rel-canonical tag is probably your best bet. It'll attribute all of those URLs back to the parent URL, set up correctly. If you can provide some specifics (or even a semi-fictional example), I can try to give you more guidance.
Best posts made by Dr-Pete
-
RE: How to prevent channel identifiers from showing up in the SERPs?
-
RE: International Keyword Ranking
Just heard from a GeoEdge rep who says that they have proxy servers internationally (100+ locations) and a toolbar-based version as well, so sounds like it's worth checking out.
-
RE: Dropped ranking - Penguin penalty or duplicate content issue?
It's unlikely the canonical is to blame here, if I'm understanding it correctly. If you tried to canonicalize Page B to Page A, and they were clearly different, one of two things should happen:
(1) Google will just ignore it.
(2) Google will follow it anyway, and drop Page B from the index.
Now, it's theoretically possible that, if Google thought you were using the canonical tag inappropriately to benefit Page A, they could punish Page A, but I've honestly never seen that happen (I've seen it with 301-redirects). Typically, Page B would also have to have a lot of links that you were trying to "clean" (think money laundering). Since Page B is new, this seems very unlikely.
If you're hitting exact-match (or close to it) anchor text hard on Page A, it's certainly possible Penguin came into play, especially if Page A is pushing keywords a bit too hard. It's been tough to confirm Penguin cases, but most of the verified ones I've seen are sudden drops. It's not a subtle, gradual impact.
You could wait for the next Penguin data update, but I suspect you may have to do some link clean up. If there's anything that's not only exact-match anchor text but is sitewide (especially footer links), I'd start there. They seem to be major targets of Penguin. Truthfully, though, we're still collecting data on it.
-
RE: When do i use disallow links in WMT?
So, these are sites that scraped your post and then linked back to it? If that's the case, the links are good, in a sense - they help Google remove the duplicates. I'm not sure what you mean by "there are 2-3 always".
What does your link profile look like outside of this. If there are 66 links like this, and these are the only 66 links you have, it's possible you could be at risk. If these are 66 out of 6,000 links, then I probably wouldn't worry about it, especially if they're not paid links or somehow engineered (part of a link network, etc.).
-
RE: Organic search traffic down 60% since 8/1/18\. What now?
Unfortunately, I think the "not a penalty" line is sometimes a bit of a cop-out on Google's part -- anything that moves a bunch of sites up, is naturally going to move a bunch of other sites down. Even if those sites have technically done nothing wrong, it sure feels like to them like they're being penalized. I also don't like that Google claims they're rewarding good behavior, but then won't really tell us what that good behavior is. If it's genuinely good behavior, give us some guidelines (we're not asking for chunks of code from the algorithm).
I'm comparing a case right now where there are two sites in the same industry, both seem decent, but one got a huge boost on 8/1 and one took a big hit. Hoping to glean some insights, but there's just a ton of speculation at the moment.
-
RE: Canonical or 301 redirect, that is the question?
I'd agree that, theoretically, 301-redirects are better here, but if it's just the home-page, a canonical tag can definitely sweep up any problem duplicates. If you're getting www and non-www versions of multiple pages indexed, then you probably need 301s. I'd check the index with the site: operator and see. If you're really getting multiples of both indexed, you probably have internal linking issues (inconsistencies). Step 1 in any de-duplication is to make sure you're always linking to the same version. Same with "index.html" - link to "/" internally or the absolute URL of the site (without "index.html").
PHP (code-based) redirects should be fine, as long as they resolve correctly. I've used code-based headers in some other languages (like ColdFusion) and it's generally been ok. If that gets messy, though, and if it's just the home-page, the canonical tag will do in a pinch.
-
RE: External Linking for ROOT DOMAIN METRICS
Open Site Explorer can be filtered just to show links that we think pass equity - it's under the [Show] pulldown:
http://www.opensiteexplorer.org/
...then set the [to] pulldown to "pages on this root domain".
Sorry, I'm not sure I'm fully understanding what you're looking for (and may have misunderstood), so please feel free to provide more details.
-
RE: Googlebot found an extremely high number of URLs on your site
Although I generally find NOINDEX works better than Google claims, I think @donford is essentially right - you still need to solve some of the architecture issues, or Google will attempt to re-crawl.
It's a complex problem, and sometimes a combination of NOINDEX, canonical, 301s, 404s, rel=prev/next, etc. all come into play. You don't usually need a "perfect" solution, but one tool rarely fits all situations these days.
Google has suggested that you try parameter handling in GWT. NOINDEX won't prevent crawling (just indexation), but GWT parameters help save crawler bandwidth. I've had mixed results on large sites, honestly, but it may be worth a try.
-
RE: SEO Impact of IPv4 and iPv6?
Shared IPs can cause issues with reverse lookups, which could make spiders confuse you with sites sharing that IP. Admittedly, it's rare, but there have been cases of penalties jumping within an IP, for example. I suspect that, as IP sharing grows, Google is getting better about this, but I generally like to avoid it.
-
RE: redirect 404 pages to homepage
I'm very confused about where the 404 fits in - are these dead pages? To re-capture the link-juice, you'll need to 301-redirect (which it seems like you're also putting into play). In some cases, if the pages are gone or never existed, just let them be gone. If you take every single URL everyone could ever visit to the home-page, you may generate so many 301s that Google starts to get suspicious. We've seen some issues with this lately. In many cases, it's also just not a good user experience.
-
RE: Does Bing support cross-domain canonical tags?
General consensus seems to be that cross-domain canonical is not supported on Bing. One SEO I know says he's tested it and confirmed that. Granted, it's hard to conclusively prove something doesn't work, but it seems like this one is Google-only for now.
-
RE: How to best use our blog posts for SEO?
I guess it depends on why this is on a separate domain at all. If you have the technical capacity to host the blog content under your company website, then in 95%+ of cases, I'd move the blog. Unless there's a clear need to brand a separate domain or legally separate the blog from the company website, having a separate domain is just splitting your authority. All the links that come to your blog are going to just funnel into 1 linking domain to your main site. If that content was on your main site, you'd have dozens or (down the road) hundreds of linking domains instead.
If it has to be separate, than I agree with Andrea - I wouldn't copy it. You're setting up duplicates and confusing your potential audience. In general, though, I'd want to understand why you made the split. Fully integrating them has much more bang for your buck, SEO-wise.
-
RE: Is a "Critical Acclaim" considered duplicate content on an eCommerce site?
I think you have to be a little careful here, and not just from an SEO standpoint. Now, you're talking about potentially taking someone else's content from behind their paywall and posting it publicly. I don't know the context or the industry very well, but you may be encroaching on a legal gray-area.
-
RE: Social Media valuable for organic rankings?
Agreed with comments on community-building and the broader value, but it does look like social mentions are having indexing and probably ranking impact. The Twitter connection is a little unclear, now that Google has cut the contract with Twitter for their direct data, but it does seem to be coming into play. Danny had a great interview on it with search engineers earlier this year:
http://searchengineland.com/what-social-signals-do-google-bing-really-count-55389
It looks like Google+ is definitely coming into play, too, especially with personalized results. If you're logged in and have a Google+ account, chances are it's already impacting your rankings. This will probably only increase during 2012.
-
RE: Is it ok to use the H1 tag for bullet points?
I doubt it would harm you, but it is a bit unorthodox. Why not make the table header a header tag?
I'd argue with the "take up space unnecessarily" point a little. Headers aren't just for SEO (their SEO impact is probably pretty small these days) - they're for visitors. If these pages get direct visits (including from search), a prominent header can really help people know their on the right path. Breadcrumbs are great for people who are already on your site and have a sense of it, but they're too small and complex for that 5-second test of "Am I in the right place?"
-
RE: Could you use a robots.txt file to disalow a duplicate content page from being crawled?
Generally agree, although I'd just add that Robots.txt also isn't so great at removing content that's already been indexed (it's better at prevention). So, I find that it's not just not ideal - it sometimes doesn't even work in these cases.
Rel-canonical is generally a good bet, and it should go on the duplicate (you can actually put it on both, although it's not necessary).
-
RE: How to extract URLs from a site (without bringing the server down!)
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
RE: Is using a platform to automatically cross post on the social bookmarking websites good or bad for SEO?
I tend to agree with Valery - it depends a lot on the audience and your engagement. If you're active on a site and have a strong profile, then auto-posting your new content often makes sense. If you're just submitting content to 100s of social bookmarking sites, with no real profile and very low authority, you're not going to accomplish much. Google will likely just ignore it, for the most part.
-
RE: Canonical tag help
I want to expand on what I think Istvan was trying to say. First, the canonical on "products.php" will consolidate all of the affiliate IDs. That should be a perfectly valid solution here.
The only warning is whether you have other IDs on that page that drive different product views. If it's just one single product page, then the canonical is great here. If, however, you have something like:
http://www.mysite.com/products.php?prod=1&ref=12345
http://www.mysite.com/products.php?prod=2&ref=12345
...where "prod=" (or something like it) represents separate products, then a canonical tag to "/products.php" would collapse ALL of your product pages into one. That's certainly not what you want. So, it does depend a lot on the details. In that case, the "?prod=1", etc. version would actually be the canonical version (you'd have to set the tag dynamically).
-
RE: Keeping Roger Happy - The Dynamic Dilema!
I'm seeing a lot of duplicates in your forum pages - I think the issue is that any attempts to click into the forum go to the login page, but the URL stays the same. You may want to block those from crawlers somehow (META NOINDEX, for example), since Google can't log into member areas.
They don't seem to be currently in the Google index, but there is potential to dilute your site's ranking ability and for Google to think that your content is "thin". I do think it's a problem you should address.
-
RE: Best way to handle different views of the same page?
I'll 80% agree with Alan, although I've found that, in practice, the self-referencing canonical tag is usually fine. It wasn't the original intent, but at worst the search engines ignore it. For something like a session_ID, it can be pretty effective.
I would generally avoid Robots.txt blocking, as Alan said. If you can do a selective META NOINDEX, that's a safer bet here (for all 3 cases). You're unlikely to have inbound links to these versions of your pages, so you don't have to worry too much about link-juice. I just find that Robots.txt can be unpredictable, and if you block tons of pages, the search engines get crabby.
The other option for session_ID is to capture that ID as a cookie or server session, then 301-redirect to the URL with no session_ID. This one gets tricky fast, though, as it depends a lot on your implementation.
Unless you're seeing serious problems (like a Panda smackdown), I'd strongly suggest tackling one at a time, so that you can measure the changes. Large-scale blocking and indexation changes are always tricky, and it's good to keep a close eye on the data. If you try to remove everything at once, you won't know which changes accomplished what (good or bad). It all comes down to risk/reward. If you aren't having trouble and are being proactive, take it one step at a time. If you're having serious problems, you may have to take the plunge all at once.
-
RE: Rel Publisher Tag is showing in SERPS, with Rel Author Picture
Truthfully, we're having a terrible time with authorship here on Moz. Google is showing avatars of commenters and all sorts of weird stuff, and this is without even putting rel-publisher into the mix.
Wish I had some answers, but from what I'm seeing with new Google features, this landscape is evolving so quickly that they haven't really figured out how best to promote and display publisher info. Trying to tweak the signals to remove authorship could backfire, so I'd honestly probably let it ride for a couple of months unless it's a major issue for the brand. I don't know of any way to tell Google to remove a rich snippet.
-
RE: Duplicate content
I'm not seeing that Google is currently indexing either of these pages, so they may be too deep or duplicated in other ways. Pagination is a tough issue, but in general pages 2+ have little or no search value (and, post-Panda, can actually harm you).
I would strongly recommend NOT using a canonical tag to page 1 - Google generally advises against this. You can use rel=prev/next, although it's a bit tough to implement and isn't honored by Bing. Generally, I'd advise one of two things:
(1) META NOINDEX, FOLLOW pages 2, 3, etc. - they really have no SEO value.
(2) If you have a View All page, link to it and rel-canonical to view all. This seems to be accepted by Google, but then the larger page will rank.
Generally, I find (1) easier and pretty effective.
Sorry, just saw Nakul's comment, and didn't realize you already have canonical tags in place. While it's not preferred solution, since it's already there and seems to be keeping these pages out of the index, I'd probably leave it alone. It doesn't look like Google is indexing these pages at all right now, though, which you may need to explore in more depth.
-
RE: URL for offline purposes
I agree with Matt - as long as your primary, internal links are consistent, it's ok to use a short version for offline purposes. The canonical tag is perfectly appropriate for this.
The other option would be to use a third-party shortener that has built-in tracking, like Bit.ly. It uses a 301-redirect, but also captures the data. If you're just doing a test case, this might be easier all-around.
-
RE: Is there a work around for Rel Canonical without header access?
I hate to say this, but I'm going to, because I have no tolerance for design companies and hosting companies who hold clients hostage (and I've worked at a design/hosting company, so I don't buy 98% of the excuses for that behavior)...
Is there any way to hack the plug-in or META data, based on the access you DO have. For example, the META description sits in the header. What if you entered a description like:
This is my meta description.">
Short of that, there's not a lot you can do with no access. Push comes to shove, you may have to let the client know that, to do your job, they need to divorce the design from the hosting. A WordPress CMS can live anywhere - there's no reason the design company should be sitting on it.
Actually, just for reference, I'll add that there are other solutions, but they're usually very technical and somewhat costly. For example, some SEO companies have proxy hardware/software that sits on top of existing sites. What it basically does is inject code on top of what gets served up by the web server. That way, the SEO company can add tags, etc. without direct access to the server. You still need access to the host, though (or cooperation), and typically this is an enterprise-level solution (in other words, $$$).
-
RE: 1099 Google Plus 1 in a few days?
This is very odd - like Thomas said, the +1s seem legit (in that they exist - not sure if they're actually legitimate). Internet Archive has no history of the site, and it's odd that you have 1,300 +1s but no other social signals. We're not tracking any 301-redirects from other domains, so it must be something historical about the domain.
The other possibility is that they were just driving up social signals to make the domain more attractive when they sold it. If you bought it for $10, though, they apparently didn't think that plan through very well.
-
RE: Custom Landing Page URLs
It's a complicated issue, but adding 50K variations to 27K product pages can definitely be dangerous, especially post-Panda. At best, you're diluting your index and your ranking ability. At worst, Google could actually start de-indexing or at least devaluing core pages. Personally, I don't think the long-tail gains are worth the risk - these kinds of pages were behind the "May Day" update in 2010, and Panda continued that core philosophy. Google considers it a low-value tactic in 2012 - of that, I have no doubt at all.
Of course, it does depend on how you use them. To have custom landers for PPC and not index them is perfectly fine, for example. If you're tripling your indexed page count with thin content just to target SEO keywords, though, you're taking a very real risk, IMO.
-
RE: Rel="prev" and rel="next" implementation
Technically, rel=prev/next doesn't de-duplicate the way the canonical tag does, but it should solve any problems for Google. I don't believe we currently consider rel=prev/next when determining duplicate titles. Klarke is right - you could just give those pages semi-unique titles. We're not handling rel=prev/next as well as we could be (it turns out to be a tricky tag to parse well).
Looking at your pages, your implementation appears to be correct. My gut reaction is that your probably ok here. You're doing what Google claims they want (at least what they want this week).
-
RE: Omniture tracking code URLs creating duplicate content
I think the canonical probably your best bet here. You can solve it with a 301-redirect, too, but it's a lot trickier. If you're really running into trouble, parameter blocking in GWT is ok here. Again, it's not my first choice, but it's not a black-and-white issue (just ideal vs. not-so-ideal).
If your pages are truly static, you'd have to write a canonical tag for each one, but most sites at least have a shared header and some dynamic components. In other words, your 1000s of pages may only actually be a few physical pages of code. In that case, you may be able to add the canonical tags on as little as one template (with some code). Unfortunately, this is completely dependent on the platform you're on - there's no universal answer (and the code is completely dependent on your URL structure). You'll probably need some quality time with your coders on that one.
The first thing I'd do, though, is to monitor your site with the "site:" operator in Google, along with "inurl:s_cid". In some cases, Google doesn't crawl these tracking URLs (or knows they're common to an analytics package). If they aren't being indexed, you may not have a problem here.
-
RE: Tweet favourites or re-tweets, how does it affect ?
I think it's important to note that we currently have no strong evidence that tweets (favorites or RTs) directly impact rankings. Google cut off the Twitter "firehose" data and claims they don't factor in social as a direct ranking factor (even Google+). I think "direct" is an important word there, and that makes sense - social is relatively easy to manipulate, at least in terms of raw signals. They're still trying to figure out the right mix.
That caveat aside:
(1) I agree with Ratan that RT's are generally more advantageous indirectly. They expose more people to your tweet, and those people will click through, drive up engagement, and potentially link to you. Eventually, this can have an indirect but very real impact on SEO. It's unlikely that favoriting has much impact even indirectly, IMO.
(2) Social signals, like RTs, can definitely be used by Google for indexing new content. Content posted on G+, for example, is indexed incredibly fast from decent accounts (not just big names, but any account that's clearly real). This isn't "ranking" per se, but you can't win if you don't play, so it matters.
(3) I strongly suspect social will be a corroborating layer, if it isn't already. In other words, if you have a piece of content with 500 +1s but no tweets, not Likes, and no links, that's going to look like spam to Google. If that same piece shows signals across very dimensions, then those 500 +1s may have an impact. RTs may eventually be part of that equation (personally, I don't think they are right now).
-
RE: Should I Remove This Subdirectory From Google?
It's tough - I do think it boils down to the numbers. If you were talking about 1,700 pages and the rest of your site had 50K indexed pages, I'd probably say not to worry about it (unless, as @Dejan said, you experienced a traffic drop or other problems). When the rest of your site is 500 pages, though, I'd start to worry, especially with Panda updates hitting sites with too much copied content (even if legitimately syndicated).
The fact that your linking back does help (you're not trying to claim these are your articles), but if these pages, which are more than 75% of your index, only represent 4% of visits, I'd really start to question the usefulness and whether it's worth the potential SEO risk.
There might be a partial solution - you could NOINDEX a large chunk of the pages, but leave 50-100 of the articles, if those account for 90%+ of the traffic you're getting. Of course, that's going to take some analysis and is a bit trickier to implement, but it could let you keep most of that 3.9%.
I'd also see where that traffic is coming from - if it's 3.9% of total traffic, but only 0.9% of search traffic (mostly direct visits, bookmarks, etc.), then you've got even less to worry about if you de-index the whole subdirectory.
-
RE: Pagination & SEO
Unfortunately, I'm not a WordPress expert by a long shot, but I'm seeing what Stuart is seeing - no rel=prev/next. I think that probably is one of the safer bets these days. Your site isn't huge, but the paginated archives pages straight from the home page could dilute your index a bit and lessen the ability of your individual posts to rank well. I don't think it's a disaster, but testing out Joost's plugin probably is a good bet.
-
RE: Does Google pass link juice a page receives if the URL parameter specifies content and has the Crawl setting in Webmaster Tools set to NO?
If you're already use rel-canonical, then there's really no reason to also block the parameter. Rel-canonical will preserve any link-juice, and will also keep the page available to visitors (unlike a 301-redirect).
Are you seeing a lot of these pages indexed (i.e. is the canonical tag not working)? You could block the parameter in that case, but my gut reaction is that it's unnecessary and probably counter-productive. Google may just need time to de-index (it can be a slow process).
I suspect that Google passes some link-juice through blocked parameters and treats it more like a canonical, but it may be situational and I haven't seen good data on that. So many things in Google Webmaster Tools end up being a bit of a black box. Typically, I view it as a last resort.
-
RE: Authorship's Back. Could a custom URL be why?
Interesting - thanks. It's a bit hard to pin down, because Google has been changing the "volume" on authorship mark-up a lot these past two months, and that means both up and down on any given day. Authorship also seems to be page-based and probably query-based, which means any given site could have and not have the mark-up depending on the page and/or query in play. It's a real-time evaluation on Google's part.
-
RE: Nofollow on site-wide banner links
Generally agree with the comments - links between two of your own sites shouldn't be a huge problem. Google could devalue them a bit, if they see the connection, but devalued isn't penalized. As Philipp said, if you also put contextual links to the webshop in the blog, it could be that the banners keep those links from counting - Google tends to disregard the 2nd, 3rd, etc. link to Page B from any given Page A. If the contextual links are specific, deep pages, though, you should still get value.
-
RE: Pagination & SEO
Truthfully, implementing these tags properly is way too complex, and I'm not thrilled with Google for how this solution has been structured. Stuart's absolutely right, though - rel=prev/next aren't being used properly on the current site. If you're mid-redesign than probably best to do it right on the new site, as it's not a catastrophe.
-
RE: How to noindex lots of content properly : bluntly or progressively ?
I wish I could convince people that more DOES NOT EQUAL better when it comes to index size. You'd think Panda would've been the nail in that coffin, but too many webmasters are still operating in 2005.
-
RE: Google+ Brand Page in Knowledge Graph
Hmmm... I'm seeing the Knowledge Panel for your brand for searches on both "tacticalgear.com" and "tactical gear" (which is generally a good sign), but, as you said, no Google+ (just Twitter). This is a bit of a black box, unfortunately, and a couple of networks disappeared a week or so ago. Many others seem to shuffle.
Your structured data looks fine, and your G+ page is successfully linked to your site. So, you've got the right signals in both directions. Your "sameAs" data even checks out on Google's own structured data testing too.
Unfortunately, this may be a weird query-time phenomenon that doesn't have much to do with what you have or haven't done. Everything looks by-the-book to me. As Christopher said, you could try to put up a couple of text posts (and not all videos), but at this point it's probably going to come down to some trial and error.
Sorry, I know that's not really an answer, but I suspect you're in that frustrating territory where you've done everything you're supposed to and Google is either midway through changing the rules or making some real-time decision that we can't see.
-
RE: New CMS system - 100,000 old urls - use robots.txt to block?
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
RE: Sitelinks (breadcrumbs) in SERPs
Just to clarify - you can use rel-canonical cross-domain, but it's use is a bit more restrictive and Google may ignore the tag in some cases. I haven't seen this particular issue before, but it does seem like Google is selectively applying your canonical tag.
Did the "hreflang" tags not help? I see you have them in place (I think we may have even discussed this on another question. That would be the more appropriate choice here, but again, Google's application isn't always consistent.
They deprecated the syndication-source tag, so short of 301-redirecting, the canonical tag is about your only other tool. Personally, I'd probably let Google stick with just the hreflang tags for a bit and drop the canonical, giving it a few weeks to see what happens, but it does depend on your goal. Google can be really stubborn about same-language content in nearby countries. We see it a lot with England/Ireland and Holland/Belgium.
-
RE: Pagination question
Rel=prev/next is still pretty new, so we don't have a lot of data, but it seems to work like a canonical tag. It should pass link-juice up the chain. That said, it's pretty rare for "page X" or search results (where X > 1) to have inbound links or much in the way of search value. I think cleaning up pagination can help a lot, if it's a big chunk of your search index.
-
RE: Contacted by an SEO company..
I'll just add that, if by "legit", you mean the company will do what they say they'll do, then possible. The problems is that:
(1) They automatically get more from the exchange than the customer does. You're giving them links before they do anything, so it's not free.
(2) They take no risk and you take all the risk. You're more likely to be penalized by linking out to this link-farm than they are, in some ways (fair or not).
So, they may do what they outline, but it's not legit in my sense of the word and it's high-risk, low-reward SEO.
That's best-case. Worst case, these folders contain malware or dangerous payloads that could steal data and destroy sites.
-
RE: New CMS system - 100,000 old urls - use robots.txt to block?
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.