Best strategy to handle over 100,000 404 errors.
-
I recently been given a site that has over one-hundred thousand 404 error codes listed in Google Webmasters.
It is really odd because according to Google Webmasters, the pages that are linking to these 404 pages are also pages that no longer exist (they are 404 pages themselves).
These errors were a result of site migration that had occurred.
Appreciate any input on how one might go about auditing and repairing large amounts of 404 errors.
Thank you.
-
This is a pretty thorough outline of what you need to do: http://moz.com/blog/web-site-migration-guide-tips-for-seos
My steps are usually:
- Identify pages that get significant organic traffic by pulling the Organic Traffic report in Google Analytics for the past year or so.
- Identify pages that have a significant number of links (or, have links from high traffic sources) in Open Site Explorer.
- Map where that content should be now, and 301 redirect to new pages.
- Completely remove all old pages from the index by 404ing them and making sure that no links on new pages point to old pages.
Sounds quick and simple, but this definitely takes time. Good luck!
-
Kristina - thanks for the feedback.
By any chance, would you have a site migration guideline that you recommend?
-
There really isn't a problem with having 100,000 404 "errors." Google's telling you that it thinks 100,000 pages exist, but when it tries to find them, it's getting a 404 code. That's fine: 404s tell Google that a page doesn't exist and to remove the page from Google's index. That's what we want.
The real problem is with your site migration, as FCBM pointed out. If you properly 301 redirect old pages to new, Google will be redirected to the new page, it won't just hit a 404. If you fix the problems with the site migration (not focusing on Google too much), the 404 errors will naturally subside.
The other option is to just take the hit from the migration, and Google will eventually remove all of these pages from its index and stop reporting on them, as long as there aren't live links pointing to the removed pages.
Good luck!
-
It is a problem with the site migration.
Never the less, I have a site right now with over 100,000 errors dealing with 404.
I'm looking for a game plan on how to deal with this many 404 errors in a time effective way.
Any ideas with type of tools or shortcuts? Has anyone else had to deal with a similar issue?
-
Here's one thought to start the quest. ID if the migration was done correctly.
eg If you had a site that was example.com/mens did the 301 look like newsite.com/mens? If not then you might be having tons of issues with a bad planned migration.
-
The WMT notion helps. Thank you.
The main concern is really timing. Are there any effective ways of going through thousands of 404 pages and finding valuable redirects?
-
404s are not founds which are fine if they are really not found and there isn't a different url to point the original page to. One big issue could be that during the migration the old pages weren't 301'd which would result in tons of 404s.
Go through the 404s and see if they are issues or just relics from old data. Then you can mark in fixed in WMTs.
Hope that helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I best handle parameters?
Thank you for your help in advance! I've read a ton of posts on this forum on this subject and while they've been super helpful I still don't feel entirely confident in what the right approach I should take it. Forgive my very obvious noob questions - I'm still learning! The problem: I am launching a site (coursereport.com) which will feature a directory of schools. The directory can be filtered by a handful of fields listed below. The URL for the schools directory will be coursereport.com/schools. The directory can be filtered by a number of fields listed here: Focus (ex: “Data Science”) Cost (ex: “$<5000”) City (ex: “Chicago”) State/Province (ex: “Illinois”) Country (ex: “Canada”) When a filter is applied to the directories page the CMS produces a new page with URLs like these: coursereport.com/schools?focus=datascience&cost=$<5000&city=chicago coursereport.com/schools?cost=$>5000&city=buffalo&state=newyork My questions: 1) Is the above parameter-based approach appropriate? I’ve seen other directory sites that take a different approach (below) that would transform my examples into more “normal” urls. coursereport.com/schools?focus=datascience&cost=$<5000&city=chicago VERSUS coursereport.com/schools/focus/datascience/cost/$<5000/city/chicago (no params at all) 2) Assuming I use either approach above isn't it likely that I will have duplicative content issues? Each filter does change on page content but there could be instance where 2 different URLs with different filters applied could produce identical content (ex: focus=datascience&city=chicago OR focus=datascience&state=illinois). Do I need to specify a canonical URL to solve for that case? I understand at a high level how rel=canonical works, but I am having a hard time wrapping my head around what versions of the filtered results ought to be specified as the preferred versions. For example, would I just take all of the /schools?focus=X combinations and call that the canonical version within any filtered page that contained other additional parameters like cost or city? Should I be changing page titles for the unique filtered URLs? I read through a few google resources to try to better understand the how to best configure url params via webmaster tools. Is my best bet just to follow the advice on the article below and define the rules for each parameter there and not worry about using rel=canonical ? https://support.google.com/webmasters/answer/1235687 An assortment of the other stuff I’ve read for reference: http://www.wordtracker.com/academy/seo-clean-urls http://www.practicalecommerce.com/articles/3857-SEO-When-Product-Facets-and-Filters-Fail http://www.searchenginejournal.com/five-steps-to-seo-friendly-site-url-structure/59813/ http://googlewebmastercentral.blogspot.com/2011/07/improved-handling-of-urls-with.html
Technical SEO | | alovallo0 -
I have a 404 error on my site i can't find.
I have looked everywhere. I thought it might have just showed up while making some changes, so while in webmaster tools i said it was fixed.....It's still there. Even moz pro found it. error is http://mydomain.com/mydomain.com No idea how it even happened. thought it might be a plugin problem. Any ideas how to fix this?
Technical SEO | | NateStewart0 -
404 errors is webmaster - should I 301 all pages?
Currently working on a retail site that shows over 1200 404 errors coming from urls that are from products that were on the site, but have now been removed as they are seasonal/out of stock. What is the best way of dealing with this situation ongoing? I am aware of the fact that these 404s are being marked as url errors in Google Webmaster. Should I redirect these 404s to a more appropriate live page or should I leave them as they are and not redirect them? I am concerned that Google may give the site a penalty as these 404s are growing (as the site is a online retail store and has products removed from its page results regularly). I thought Google was able to recognise 404s and after a set period of time would push them out of the error report. Also is there a tool out there that on mass I can run all the 404s urls through to see their individual page strength and the number of links that point at each one? Thanks.
Technical SEO | | Oxfordcomma0 -
Numerous 404 errors on crawl diagnostics (non existent pages)..
As new as them come to SEO so please be gentle.... I have a wordpress site setup for my photography business. Looking at my crawl diagnostics I see several 4xx (client error) alerts. These all show up to non existent pages on my site IE: | http://www.robertswanigan.com/happy-birthday-sara/109,97,105,108,116,111,58,104,116,116,112,58,47,47,109,97,105,108,116,111,58,105,110,102,111,64,114,111,98,101,114,116,115,119,97,110,105,103,97,110,46,99,111,109 | Totally lost on what could be causing this. Thanks in advance for any help!
Technical SEO | | Swanny8110 -
"/blogroll" causing 404 error
I'm running a campaign, and the crawling report for my site returned a lot of 4xx errors. When I look at the URLs, they all have a "/blogroll" in the end, like: mysite.com/post-number-1/blogroll mysite.com/post-number-2/blogroll And so on, for pretty much all the pages. The thing is, I removed the blogroll widget completely, so I really wouldn't know what can possibly point to links like that. Is there anything to fix on the site? Thanks
Technical SEO | | Baffo0 -
Strategy for recovering from Penguin
I have a web site that has been hit hard by the penguin update. I believe that main cause our problem has been links from low quality blogs and article sites with overly optimized keyword anchor text. Some questions I have are: I have noticed that we still have good ranking on long tail search terms on pages that did not have unnatural links. This leads me to believe that the penalty is URL specific, i.e. only URL with unnatural linking patterns have been penalized. Is that correct? Are URLs that have been penalized permanently tainted to the point that it is not worth adding content to them and continuing to get quality links to them? Should new contact go on new pages that have no history thus no penalty, or is the age of a previously highly ranked page still of great benefit in ranking? Is it likely that the penalty will go away over time if there are no more unnatural links coming in?
Technical SEO | | mhkatz0 -
Duplicate content handling.
Hi all, I have a site that has a great deal of duplicate content because my clients list the same content on a few of my competitors sites. You can see an example of the page here: http://tinyurl.com/62wghs5 As you can see the search results are on the right. A majority of these results will also appear on my competitors sites. My homepage does not seem to want to pass link juice to these pages. Is it because of the high level of Dup Content or is it because of the large amount of links on the page? Would it be better to hide the content from the results in a nofollowed iframe to reduce duplicate contents visibilty while at the same time increasing unique content with articles, guides etc? or can the two exist together on a page and still allow link juice to be passed to the site. My PR is 3 but I can't seem to get any of my internal pages(except a couple of pages that appear in my navigation menu) to budge of the PR0 mark even if they are only one click from the homepage.
Technical SEO | | Mulith0 -
Best Dynamic Sitemap Generator
Hello Mozers, Could you please share the best Dynamic Sitemap Generator you are using. I have found this place: http://www.seotools.kreationstudio.com/xml-sitemap-generator/free_dynamic_xml_sitemap_generator.php Thanks in advanced for your help.
Technical SEO | | SEOPractices0