Best strategy to handle over 100,000 404 errors.
-
I recently been given a site that has over one-hundred thousand 404 error codes listed in Google Webmasters.
It is really odd because according to Google Webmasters, the pages that are linking to these 404 pages are also pages that no longer exist (they are 404 pages themselves).
These errors were a result of site migration that had occurred.
Appreciate any input on how one might go about auditing and repairing large amounts of 404 errors.
Thank you.
-
This is a pretty thorough outline of what you need to do: http://moz.com/blog/web-site-migration-guide-tips-for-seos
My steps are usually:
- Identify pages that get significant organic traffic by pulling the Organic Traffic report in Google Analytics for the past year or so.
- Identify pages that have a significant number of links (or, have links from high traffic sources) in Open Site Explorer.
- Map where that content should be now, and 301 redirect to new pages.
- Completely remove all old pages from the index by 404ing them and making sure that no links on new pages point to old pages.
Sounds quick and simple, but this definitely takes time. Good luck!
-
Kristina - thanks for the feedback.
By any chance, would you have a site migration guideline that you recommend?
-
There really isn't a problem with having 100,000 404 "errors." Google's telling you that it thinks 100,000 pages exist, but when it tries to find them, it's getting a 404 code. That's fine: 404s tell Google that a page doesn't exist and to remove the page from Google's index. That's what we want.
The real problem is with your site migration, as FCBM pointed out. If you properly 301 redirect old pages to new, Google will be redirected to the new page, it won't just hit a 404. If you fix the problems with the site migration (not focusing on Google too much), the 404 errors will naturally subside.
The other option is to just take the hit from the migration, and Google will eventually remove all of these pages from its index and stop reporting on them, as long as there aren't live links pointing to the removed pages.
Good luck!
-
It is a problem with the site migration.
Never the less, I have a site right now with over 100,000 errors dealing with 404.
I'm looking for a game plan on how to deal with this many 404 errors in a time effective way.
Any ideas with type of tools or shortcuts? Has anyone else had to deal with a similar issue?
-
Here's one thought to start the quest. ID if the migration was done correctly.
eg If you had a site that was example.com/mens did the 301 look like newsite.com/mens? If not then you might be having tons of issues with a bad planned migration.
-
The WMT notion helps. Thank you.
The main concern is really timing. Are there any effective ways of going through thousands of 404 pages and finding valuable redirects?
-
404s are not founds which are fine if they are really not found and there isn't a different url to point the original page to. One big issue could be that during the migration the old pages weren't 301'd which would result in tons of 404s.
Go through the 404s and see if they are issues or just relics from old data. Then you can mark in fixed in WMTs.
Hope that helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Internal links best practices
In looking at the inbound links to a client’s Home page, I see that the link from each page of the website back to the Home page is an image, and the ALT text is “Home.” I have a few questions about this, and would appreciate help understanding best practices: --Does it matter that the link back to the Home page is an image (presumably the client’s logo)? -- If we keep the image link, wouldn’t it be better to use “client’s company name” as ALT text rather than “Home”? --Should I recommend using an HTML link back to the Home page, and using the company name as anchor text? (I don't think it's relevant, but the site is built in Drupal.) Thanks!
Technical SEO | | jrae0 -
Is 307 the best way to handle temporarily disabled items ?
I was wondering what would be the best way to handle temporarily disabled items. There is 302 and 307 and from what I understand 307 redirect is the HTTP 1.1 successor of the 302 redirect. Any one has any experience on how Google handles 307? I am thinking I 307 the temp disabled pages to a generic page like site.com/we-are-adding-some-final-touches-to-this.html where we will explain to users why an item would be disabled and will give them an option to get notification when it goes back up. Finally when it goes back up I remove the 307 redirect and make the page live.
Technical SEO | | Saijo.George0 -
OMG!! 1300 404 Errors. HELP ME!!!
Day by day google detecting 404 errors urls. Currently it is cross 1340 urls. Please help me to get out of this shit. You can check the screenshot here You can see the screenshot here- http://img856.imageshack.us/img856/429/954b503e0781462c8a15774.png Please check the website - www.plugnbuy.com Kindly help me. I use nofollow tag but still don't know why google detecting those errors.
Technical SEO | | chandubaba1 -
Locating 404 Page Errors for Deletion
On my SEOmoz report, there are several 404 pages that I assume need deletion. Yes? When I am looking at my pages from the back-end of WordPress, how do I identify these to delete or fix them? In the list of pages I have created, it is not at all apparent when I click into "edit" the page that any of these are broken pages. I think the 404 pages are urls from pages that I changed the url to be more seo friendly, but they don't really exist. I hope this makes sense - it is baffling to me : ) Thank you for any insight and help with getting these cleared. The errors are listed below from the report. Sheryl | 404 : Error http://durangocodentists.com/durango-dentists-why-greg-mann/dentists-in-durango-co/Cosmetic_Dentistry_Services_Teeth_Whitening_Montezuma_CO.html 404 1 0 404 : Error http://durangocodentists.com/durango-dentists-why-greg-mann/dentists-in-durango-co/General_Dentistry_Services_White_Fillings_Montezuma_CO.html 404 1 0 404 : Error http://durangocodentists.com/durango-dentists-why-greg-mann/dentists-in-durango-co/Request_an_Appointment.html 404 1 0 404 : Error http://durangocodentists.com/videos/repairing-teeth/pid%3A4078865 404 1 0 404 : Error http://durangocodentists.com/videos/teeth-whitening/pid%3A4078865 404 1 0 404 : Error http://durangocodentists.com/videos/veneers/pid%3A4078865 | 404 | 1 | 0 |
Technical SEO | | TOMMarketingLtd.0 -
Seek help correcting large number of 404 errors generated, 95% traffic halt
Hi, The following GWT screen tells a bit of the story: site: http://bit.ly/mrgdD0 http://www.diigo.com/item/image/1dbpl/wrbp On about Feb 8 I decided to fix a large number of 'duplicate title' warnings being reported in GWT "HTML Suggestions" -- these were for URLs which differed only in parameter case, and which had Canonical tags, but were still reported as dups in GWT. My traffic had been steady at about 1000 clicks/day. At midnight on 2/10, google traffic completely halted, down to 11 clicks/day. I submitted a recon request and was told 'no manual penalty' Also, the 'sitemap' indexes in GWT showed 'pending' for 24x7 starting then. By about the 18th, the 'duplicate titles' count dropped to about 600 or so... the next day traffic hopped right back to about 800 clicks/day - for a week - then stopped again, down to 10/day, a week later, on the 26th. I then noticed that GWT was reporting 20K page-not found errors - this has now grown to 35K such errors! I realized that bogus internal links were being generated as I failed to disable the PHP warning messages.... so I disabled PHP warnings and fixed what I thought was the source of the errors. However, the not-found count continues to climb -- and I don't know where these bad internal links are coming from, because the GWT report lists these link sources as 'unavailable'. I'v been through a similar problem last year and it took months (4) for google to digest all the bogus pages ad recover. If I have to wait that long again I will lose much $$. Assuming that the large number of 404 internal errors is the reason for the sudden shutoff... How can I a) verify the source of these internal links, given that google says the source pages are 'unavailable'.. Most critically, how can I do a 'RESET" and have google re-spider my site -- or block the signature of these URLs in order to get rid of these errors ASAP?? thanks
Technical SEO | | mantucket0 -
Best hosting
We understand that some companies offering class c ips can still be fingerprinted.. Is there any hosting site that does offer class c ips that prevents that? Or is the best bet using privacy on all domains and then using multiple hosting companies, checking the ips they offer as you go? If that is the case, are there any recommendations for the best host companies that offer the least fingerprinting?
Technical SEO | | Stevej240 -
404-like content
A site that I look after is having lots of soft 404 responses for pages that are not 404 at all but unique content pages. the following page is an example: http://www.professionalindemnitynow.com/medical-malpractice-insurance-clinics This page returns a 200 response code, has unique content, but is not getting indexed. Any ideas? To add further information that may well impact your answer, let me explain how this "classic ASP" website performs the SEO Friendly url mapping: All pages within the custom CMS have a unique ID which are referenced with an ?intID=xx parameter. The custom 404.asp file receives a request, looks up the ID to find matching content in the CMS, and then server.transfers the visitor to the correct page. Like I said, the response codes are setup correctly, as far as Firebug can tell me. any thoughts would be most appreciated.
Technical SEO | | eseyo20 -
Best practices for temporary articles
Hello, I would like to have expert inputs about the best way to manage temporary content? In my case, I've a page (ex : mydomain.com/agenda) where I have listing of temporary article, with a lifetime of 1 month to 6 months for some of them. My articles also have a specific url like for ex : mydomain.com/agenda/12-02-2011/thenameofmyarticle/ As you can guess, I got hundreds of 404 😞 I'm already using canonical tag, should I use a in the listing page? I'm a bit lost here..
Technical SEO | | Alexandre_0