Mass 404 Checker?
-
Hi all,
I'm currently looking after a collection of old newspaper sites that have had various developments during their time. The problem is there are so many 404 pages all over the place and the sites are bleeding link juice everywhere so I'm looking for a tool where I can check a lot of URLs at once.
For example from an OSE report I have done a random sampling of the target URLs and some of them 404 (eek!) but there are too many to check manually to know which ones are still live and which ones have 404'd or are redirecting. Is there a tool anyone uses for this or a way one of the SEOMoz tools can do this?
Also I've asked a few people personally how to check this and they've suggested Xenu, Xenu won't work as it only checks current site navigation.
Thanks in advance!
-
Hi,
we are seo agency at turkey, our name clicksus. We can deadlinkchecker.com and it is very easy & good.
-
Glad I was able to help!
It would be great if you could mark the answers you found helpful, and mark the question as answered if you feel you got the information you needed. That will make it even more useful for other users.
Paul
-
Wow nice one mate did not know that in the Top Pages tab that is perfect! I'll remember to click around more often now.
I found this tool on my adventures which was exactly what I was after: http://www.tomanthony.co.uk/tools/bulk-http-header-compare/
Also cheers for your walkthrough, having problems with the site still bleeding 404 pages, first thing first however is fixing these pages getting high quality links to them
Cheers again!
-
Sorry, one additional - since you mentioned using Open Site Explorer...
Go to the Top Pages tab in OSE and filter the results to include only incoming links. One of the columns in that report is HTTP Status. It will tell you if the linked page's status is 404. Again, just download the full CSV, sort the resulting spreadsheet by the Status column and you'll be able to generate a list of URLs that no longer have pages associated with them to start fixing.
Paul
-
Ollie, if I'm understanding your question correctly, the easiest place for you to start is with Google Webmaster Tools. You're looking to discover URLs of pages that used to exist on the sites, but no longer do, yes?
If you click on the Health link in left sidebar, then click Crawl Errors, you get a page showing different kinds of errors the Google crawler has detected. Click on the Not Found error box and you'll get a complete list of all the pages Google is aware of that can no longer be found on your site (i.e. 404s).
You can then download the whole list as a CSV and start cleaning them up from there.
This list will basically include pages that have been linked to at one time or another from other sites on the web, so while not exhaustive, it will show the ones that are most likely to still be getting traffic. For really high-value incoming links, you might even want to contact the linking site and see if you can get them to relink to the correct new page.
Alternatively, if you can access the sites' server logs, they will record all the incoming 404s with their associated URLs as well and you can get a dump from the log files to begin creating your work list. I just find it's usually easier to get access to Webmaster Tools than to get at a clients server log files.
Is that what you're looking for?
Paul
-
To be honest, I don't know anyone who has bad things to say about Screaming Frog - aside from the cost, but as you said, really worth it.
However, it is free for up to 500 page crawl limit, so perhaps give it a go?
Andy
-
Cheers Andy & Kyle
Problem with this tool as it works similar to Xenu which is great for making sure your current navigation isn't causing problems.
My problem is there are over 15k links pointing to all sorts of articles and I have no idea what's live and what's not. Running the site through that tool won't report the pages that aren't linked in the navigation anymore but are still being linked to.
Example is manually checking some of the links I've found that the site has quite a few links from the BBC going to 404 pages. Running the site through Xenu or Screamy Frog doesn't find these pages.
Ideally I'm after a tool I can slap in a load of URLs and it'll do a simple HTTP header check on them. Only tools I can find do 1 or 10 at a time which would take quite a while trying to do 15k!
-
Agree with Screaming Frog. It's more comprehensive than **Xenu's Link Sleuth. **
It costs £99 for a year but totally worth it.
I had a few issues with Xenu taking too long to compile a report or simply crashing.
-
Xenu Liunk Seuth - its free and will go through internal links, external or both, it will also show you where the 404 page is being linked from.
Also can report 302s.
-
Screaming Frog Spider does a pretty good job...
As simple as enter the URL and leave it to report back when completed.
Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404 vs 410 vs 301
Hi guys, I am managing a real estate website, and obviously we have a LOT of pages detailing each property. As those properties get sold and removed from the website, I'm wondering how best to handle this - I know 404, 410 and 301's are all valid ways to go, but I want to provide the best UX combined with the best SEO effect. My thinking is to customise a 410 page to show the page has been permanently removed, and has a relevant message (rather than a generic 404 message) and shows a search box - possibly pre-populated according to the page they were looking for.
Technical SEO | | LoonyToons
I think this gives a good UX and helps Google to understand the importance of the 000's of pages on our website.
I'd also like to clear property detail 404's as quick as possible to make it easier to see if we have problems elsewhere on the site. Having explained this to our development/SEO agency, they are strongly pushing for 301 redirects or leave as 404.
I think 301's would be the worst for UX, and as explained earlier, the volume of 404's is massive and makes it difficult to see real errors. They seem to think this is a better UX and better for SEO. Just wondering what you guys would recommend?0 -
150+ Pages of URL Parameters - Mass Duplicate Content Issue?
Hi we run a large e-commerce site and while doing some checking through GWT we came across these URL parameters and are now wondering if we have a duplicate content issue. If so, we are wodnering what is the best way to fix them, is this a task with GWT or a Rel:Canonical task? Many of the urls are driven from the filters in our category pages and are coming up like this: page04%3Fpage04%3Fpage04%3Fpage04%3F (See the image for more). Does anyone know if these links are duplicate content and if so how should we handle them? Richard I7SKvHS
Technical SEO | | Richard-Kitmondo0 -
Wordpress 404 Errors
Hi Guys, One of my clients is scratching his head after a site migration. He has moved to wordpress and now GWT is creating weird and wonderful strange 404 errors. For example http://www.allsee-tech.com/digital-signage-blog/category/clients.html There are loads like the above which seem to be made up out of his blog and navigation http://www.allsee-tech.com/clients.html works! Any ideas? Is it a rogue plugin? How do we fix? Kind Regards Neil
Technical SEO | | nezona0 -
404 Error
Hello, Seomoz flagged a url as having a 404 client error. The reason the link doesn't return a proper content page is because the url name was changed. What should we do? Will this error disappear when Google indexes our site again? Or is there some way to manually eliminate it? Thanks!
Technical SEO | | OTSEO0 -
Should we redirect 404 errrors seen in webmaster tools with ... (dot.dot,dot) ?
Lately I have seen lots of 404 errors showing in webmaster tools that are not really links. Many of them from shammy pages. (I did not put them there) One of the most common types is ones that show the link ending in ... ( dot, dot, dot) The appearance of the link is being sent from pages like this http://www.the-pick.com/00_fahrenheit,2.html For example a link like this would show up in webmaster tools as a 404 error. http://www.ehow.com/how_2352088_easily-... Are these worth redirecting? So far I have redirected some of them and found that is was not helpful and possibly harmful. Anyone else had the same experience? Also getting lots of partial urls showing up from pages that reference my site but the url is cut off and the link is not active. Does Google really count these as links? Is redirecting a link from a spammy page acknowledging acceptance and could it count against you?
Technical SEO | | KentH0 -
Status code 404??!
among other things... I'm getting this error: http://worldvoicestudio.com/blog/"http://worldvoicestudio.com/" http://worldvoicestudio.com/blog/"http://worldvoicestudio.com/" Any ideas on how to fix this? many thanks!!
Technical SEO | | malexandro0 -
Hundreds of 404 Pages, What Should I do?
Hi, My client just had there website redeveloped within wordpress. I just ran a crawl errors test for their website using Google Webmasters. I discovered that the client has about six hundred, 404 pages. Most of the error pages originated from their previous image gallery. I already have a custom 404 page set-up, but is there something else I should be doing? Is it worth while to 301 redirect every single page within the .htaccess file, or will Google filter these pages out of its index naturally? Thanks Mozers!
Technical SEO | | calindaniel0 -
404 erros on wordpress blog
Both SEOMOZ and Google webmaster tools report lots of 404 errors throughout my wordpress blog. I have the url structure set to category/title Most of the 404 errors seem to be that the crawler is looking for a /home.html page. Each time I add a new post I get more 404 errors. I could, of course, add 301 redirects but I presume there is an easy way to do this within the WP setup. Any ideas? Thanks
Technical SEO | | bjalc20110