Mass 404 Checker?
-
Hi all,
I'm currently looking after a collection of old newspaper sites that have had various developments during their time. The problem is there are so many 404 pages all over the place and the sites are bleeding link juice everywhere so I'm looking for a tool where I can check a lot of URLs at once.
For example from an OSE report I have done a random sampling of the target URLs and some of them 404 (eek!) but there are too many to check manually to know which ones are still live and which ones have 404'd or are redirecting. Is there a tool anyone uses for this or a way one of the SEOMoz tools can do this?
Also I've asked a few people personally how to check this and they've suggested Xenu, Xenu won't work as it only checks current site navigation.
Thanks in advance!
-
Hi,
we are seo agency at turkey, our name clicksus. We can deadlinkchecker.com and it is very easy & good.
-
Glad I was able to help!
It would be great if you could mark the answers you found helpful, and mark the question as answered if you feel you got the information you needed. That will make it even more useful for other users.
Paul
-
Wow nice one mate did not know that in the Top Pages tab that is perfect! I'll remember to click around more often now.
I found this tool on my adventures which was exactly what I was after: http://www.tomanthony.co.uk/tools/bulk-http-header-compare/
Also cheers for your walkthrough, having problems with the site still bleeding 404 pages, first thing first however is fixing these pages getting high quality links to them
Cheers again!
-
Sorry, one additional - since you mentioned using Open Site Explorer...
Go to the Top Pages tab in OSE and filter the results to include only incoming links. One of the columns in that report is HTTP Status. It will tell you if the linked page's status is 404. Again, just download the full CSV, sort the resulting spreadsheet by the Status column and you'll be able to generate a list of URLs that no longer have pages associated with them to start fixing.
Paul
-
Ollie, if I'm understanding your question correctly, the easiest place for you to start is with Google Webmaster Tools. You're looking to discover URLs of pages that used to exist on the sites, but no longer do, yes?
If you click on the Health link in left sidebar, then click Crawl Errors, you get a page showing different kinds of errors the Google crawler has detected. Click on the Not Found error box and you'll get a complete list of all the pages Google is aware of that can no longer be found on your site (i.e. 404s).
You can then download the whole list as a CSV and start cleaning them up from there.
This list will basically include pages that have been linked to at one time or another from other sites on the web, so while not exhaustive, it will show the ones that are most likely to still be getting traffic. For really high-value incoming links, you might even want to contact the linking site and see if you can get them to relink to the correct new page.
Alternatively, if you can access the sites' server logs, they will record all the incoming 404s with their associated URLs as well and you can get a dump from the log files to begin creating your work list. I just find it's usually easier to get access to Webmaster Tools than to get at a clients server log files.
Is that what you're looking for?
Paul
-
To be honest, I don't know anyone who has bad things to say about Screaming Frog - aside from the cost, but as you said, really worth it.
However, it is free for up to 500 page crawl limit, so perhaps give it a go?
Andy
-
Cheers Andy & Kyle
Problem with this tool as it works similar to Xenu which is great for making sure your current navigation isn't causing problems.
My problem is there are over 15k links pointing to all sorts of articles and I have no idea what's live and what's not. Running the site through that tool won't report the pages that aren't linked in the navigation anymore but are still being linked to.
Example is manually checking some of the links I've found that the site has quite a few links from the BBC going to 404 pages. Running the site through Xenu or Screamy Frog doesn't find these pages.
Ideally I'm after a tool I can slap in a load of URLs and it'll do a simple HTTP header check on them. Only tools I can find do 1 or 10 at a time which would take quite a while trying to do 15k!
-
Agree with Screaming Frog. It's more comprehensive than **Xenu's Link Sleuth. **
It costs £99 for a year but totally worth it.
I had a few issues with Xenu taking too long to compile a report or simply crashing.
-
Xenu Liunk Seuth - its free and will go through internal links, external or both, it will also show you where the 404 page is being linked from.
Also can report 302s.
-
Screaming Frog Spider does a pretty good job...
As simple as enter the URL and leave it to report back when completed.
Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Mass 301s
Hi All, im trying to find a way to do a mass list of 301s instead of just doing them individually, does anyone have any ideas or tips into how i can do this?
Technical SEO | | Kennelstore0 -
I have custom 404 page and getting so much 404 error on Google webmaster, what should i do?
I have a custom 404 page with popular post and category links in the page, everyday i have 404 crawl error on webmaster tools, what should i do?
Technical SEO | | rimon56930 -
How to properly remove 404 errors
Hi, According to seomoz report I have two 404 errors on my site. (http://screencast.com/t/2FG8fA1dvGB) I removed them from google webmasters central about 2 weeks ago (http://screencast.com/t/MQ8XBvrFm ) , but they're still showing as an error in the next report (weekly update). Is there anything else you do about 404 or just remove urls through gwc? Or maybe seomoz data is delayed? Thanks in advance, JJ
Technical SEO | | jjtech0 -
Locating 404 Page Errors for Deletion
On my SEOmoz report, there are several 404 pages that I assume need deletion. Yes? When I am looking at my pages from the back-end of WordPress, how do I identify these to delete or fix them? In the list of pages I have created, it is not at all apparent when I click into "edit" the page that any of these are broken pages. I think the 404 pages are urls from pages that I changed the url to be more seo friendly, but they don't really exist. I hope this makes sense - it is baffling to me : ) Thank you for any insight and help with getting these cleared. The errors are listed below from the report. Sheryl | 404 : Error http://durangocodentists.com/durango-dentists-why-greg-mann/dentists-in-durango-co/Cosmetic_Dentistry_Services_Teeth_Whitening_Montezuma_CO.html 404 1 0 404 : Error http://durangocodentists.com/durango-dentists-why-greg-mann/dentists-in-durango-co/General_Dentistry_Services_White_Fillings_Montezuma_CO.html 404 1 0 404 : Error http://durangocodentists.com/durango-dentists-why-greg-mann/dentists-in-durango-co/Request_an_Appointment.html 404 1 0 404 : Error http://durangocodentists.com/videos/repairing-teeth/pid%3A4078865 404 1 0 404 : Error http://durangocodentists.com/videos/teeth-whitening/pid%3A4078865 404 1 0 404 : Error http://durangocodentists.com/videos/veneers/pid%3A4078865 | 404 | 1 | 0 |
Technical SEO | | TOMMarketingLtd.0 -
Proper way to 404 a page on an Ecommerce Website
Hello. I am working on a website that has over 15000 products. When one of these is no longer available - like it's discontinued or something - the page it's on 302s to a 404 page. Example - www.greatdomain.com/awesome-widget Awesome widget is no longer available www. greatdomain.com/awesome-widget 302s to -www.greatdomain.com/404 page. For the most part, these are not worthy of 301s because of lack of page rank/suitable LPs, but is this the correct way to handle them for search engines? I've seen varying opinions. Thanks!
Technical SEO | | Blenny0 -
I am using SEOmoz pro software and my blog tags are bringing up 404 errors.
After checking they do bring back a 404 page, so i am wondering what to do. Do i remove all the blog tags? We use a Drupal cms system.
Technical SEO | | AITLtd0 -
404 vs 301
My company is planning on discontinuing one of the product lines we currently offer. In terms of SEO, would it be better to implement a 301 redirect to a generic page page (such as the homepage or main product page), or to create a custom 404 page explaining that the product line with links to other pages (according to the most next viewed pages in Google Analytics). Thanks!
Technical SEO | | theLotter0 -
404-like content
A site that I look after is having lots of soft 404 responses for pages that are not 404 at all but unique content pages. the following page is an example: http://www.professionalindemnitynow.com/medical-malpractice-insurance-clinics This page returns a 200 response code, has unique content, but is not getting indexed. Any ideas? To add further information that may well impact your answer, let me explain how this "classic ASP" website performs the SEO Friendly url mapping: All pages within the custom CMS have a unique ID which are referenced with an ?intID=xx parameter. The custom 404.asp file receives a request, looks up the ID to find matching content in the CMS, and then server.transfers the visitor to the correct page. Like I said, the response codes are setup correctly, as far as Firebug can tell me. any thoughts would be most appreciated.
Technical SEO | | eseyo20