Mass 404 Checker?
-
Hi all,
I'm currently looking after a collection of old newspaper sites that have had various developments during their time. The problem is there are so many 404 pages all over the place and the sites are bleeding link juice everywhere so I'm looking for a tool where I can check a lot of URLs at once.
For example from an OSE report I have done a random sampling of the target URLs and some of them 404 (eek!) but there are too many to check manually to know which ones are still live and which ones have 404'd or are redirecting. Is there a tool anyone uses for this or a way one of the SEOMoz tools can do this?
Also I've asked a few people personally how to check this and they've suggested Xenu, Xenu won't work as it only checks current site navigation.
Thanks in advance!
-
Hi,
we are seo agency at turkey, our name clicksus. We can deadlinkchecker.com and it is very easy & good.
-
Glad I was able to help!
It would be great if you could mark the answers you found helpful, and mark the question as answered if you feel you got the information you needed. That will make it even more useful for other users.
Paul
-
Wow nice one mate did not know that in the Top Pages tab that is perfect! I'll remember to click around more often now.
I found this tool on my adventures which was exactly what I was after: http://www.tomanthony.co.uk/tools/bulk-http-header-compare/
Also cheers for your walkthrough, having problems with the site still bleeding 404 pages, first thing first however is fixing these pages getting high quality links to them
Cheers again!
-
Sorry, one additional - since you mentioned using Open Site Explorer...
Go to the Top Pages tab in OSE and filter the results to include only incoming links. One of the columns in that report is HTTP Status. It will tell you if the linked page's status is 404. Again, just download the full CSV, sort the resulting spreadsheet by the Status column and you'll be able to generate a list of URLs that no longer have pages associated with them to start fixing.
Paul
-
Ollie, if I'm understanding your question correctly, the easiest place for you to start is with Google Webmaster Tools. You're looking to discover URLs of pages that used to exist on the sites, but no longer do, yes?
If you click on the Health link in left sidebar, then click Crawl Errors, you get a page showing different kinds of errors the Google crawler has detected. Click on the Not Found error box and you'll get a complete list of all the pages Google is aware of that can no longer be found on your site (i.e. 404s).
You can then download the whole list as a CSV and start cleaning them up from there.
This list will basically include pages that have been linked to at one time or another from other sites on the web, so while not exhaustive, it will show the ones that are most likely to still be getting traffic. For really high-value incoming links, you might even want to contact the linking site and see if you can get them to relink to the correct new page.
Alternatively, if you can access the sites' server logs, they will record all the incoming 404s with their associated URLs as well and you can get a dump from the log files to begin creating your work list. I just find it's usually easier to get access to Webmaster Tools than to get at a clients server log files.
Is that what you're looking for?
Paul
-
To be honest, I don't know anyone who has bad things to say about Screaming Frog - aside from the cost, but as you said, really worth it.
However, it is free for up to 500 page crawl limit, so perhaps give it a go?
Andy
-
Cheers Andy & Kyle
Problem with this tool as it works similar to Xenu which is great for making sure your current navigation isn't causing problems.
My problem is there are over 15k links pointing to all sorts of articles and I have no idea what's live and what's not. Running the site through that tool won't report the pages that aren't linked in the navigation anymore but are still being linked to.
Example is manually checking some of the links I've found that the site has quite a few links from the BBC going to 404 pages. Running the site through Xenu or Screamy Frog doesn't find these pages.
Ideally I'm after a tool I can slap in a load of URLs and it'll do a simple HTTP header check on them. Only tools I can find do 1 or 10 at a time which would take quite a while trying to do 15k!
-
Agree with Screaming Frog. It's more comprehensive than **Xenu's Link Sleuth. **
It costs £99 for a year but totally worth it.
I had a few issues with Xenu taking too long to compile a report or simply crashing.
-
Xenu Liunk Seuth - its free and will go through internal links, external or both, it will also show you where the 404 page is being linked from.
Also can report 302s.
-
Screaming Frog Spider does a pretty good job...
As simple as enter the URL and leave it to report back when completed.
Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Links On Out Of Stock Product Pages Causing 404
Hi Moz Community! We're doing an audit of our e-commerce site at the moment and have noticed a lot of 404 errors coming from out of stock/discontinued product pages that we've kept 200 in the past. We kept these and added links on them for categories or products that are similar to the discontinued items but many other links of the page like images, blog posts, and even breadcrumbs have broken or are no longer valid causing lots of additional 404s. If the product has been discontinued for a long time and gets no traffic and has no link equity would you recommend adding a noindex robots tag on these pages so we're not wasting time fixing all the broken links on these? Any thoughts?Thanks
Technical SEO | | znotes0 -
Help: Blog post translations resulting in 404 Not Found?
A client set up a website that has multilingual functionality (WPML) and the back end is a bit of a mess. The site has around 6 translated versions of the 30 or so existing English blog posts in French, Italian and Spanish - all with their own URLs. The problem is that on the remaining 24 English blog posts, the language changer in the header is still there - even though the majority of posts have not been translated - so when you go to change the language to French, it adds **?lang=fr **onto the existing english URL, and is a page not found (4xx client error). I can't redirect anything because the page does not exist. Is there a way to stop this from happening? I have noticed it's also creating italian/french/spanish translation of the english Categories too. Thanks in advance.
Technical SEO | | skehoe0 -
GWT Soft 404 count is climbing. Important to fix?
In GWT I am seeing my mobile site's soft 404 count slowly rise from 5 two weeks ago to over 100 as of today. If I do nothing I expect it will continue to rise into the thousands. This is due to there being followed links on external sites to thousands of discontinued products we used to offer. The landing page for these links simply says the product is no longer available and gives links to related areas of our site. I know I can address this by returning a 404 for these pages, but doing so will cause these pages to be de-indexed. Since these pages still have utility in redirecting people to related, available products, I want these pages to stay in the index and so I don't want to return a 404. Another way of addressing this is to add more useful content to these pages so that Google no longer classifies them as soft 404. I have images and written content for these pages that I'm not showing right now, but I could show if necessary. But before investing any time in addressing these soft 404s, does anyone know the real consequences of not addressing them? Right now I'm getting 275k pages indexed and historically crawl budget has not been an issue on my site, nor have I seen any anomalous crawl activity since the climb in soft 404s began. Unchecked, the soft 404s could climb to 20,000ish. I'm wondering if I should start expecting effects on the crawl, and also if domain authority takes a hit when there are that many soft 404s being reported. Any information is appreciated.
Technical SEO | | merch_zzounds0 -
301 or 404 old Event pages
I have a site that lists events and then removes them from the site once the date and event has passed. Is it best to let the old event page 404 or 301 back up to a subfolder that lists the current events?
Technical SEO | | Marketing_Today0 -
Spam pages / content created due to hack. 404 cleanup.
A hosting company's server was hacked and one of our customer's sites was injected with 7,000+ pages of fake, bogus, promotional content. Server was patched and spammy content removed from the server. Reviewing Google Webmaster's Tools we have all the hacked pages showing up as 404's and have a severe drop in impressions, rank and traffic. GWT also has 'Some manual actions apply to specific pages, sections, or links'... What do you recommend for: Cleaning up 404's to spammy pages? (I am not sure redirect to home page is a right thing to do - is it?) Cleaning up links that were created off site to the spam pages Getting rank bank // what would you do in addition to the above?
Technical SEO | | GreenStone0 -
Automatically write Mass 301 redirects for csv
Hi Guys Does anyone know if there is away to write say 30 x 301 redirects in one go? I have a list from a client with old links and new links and I want to do it all at once. Any suggestions would be appreciate?
Technical SEO | | Cocoonfxmedia0 -
301 redirect all 404 pages
Hi I would like to have a second opinion on this. I am working on an ecommerce website that they 301 redirect all 404 pages (including the URLs entered incorrectly) to the “All categories page”. Will this have any negative SEO impact?
Technical SEO | | iThinkMedia0 -
Seek help correcting large number of 404 errors generated, 95% traffic halt
Hi, The following GWT screen tells a bit of the story: site: http://bit.ly/mrgdD0 http://www.diigo.com/item/image/1dbpl/wrbp On about Feb 8 I decided to fix a large number of 'duplicate title' warnings being reported in GWT "HTML Suggestions" -- these were for URLs which differed only in parameter case, and which had Canonical tags, but were still reported as dups in GWT. My traffic had been steady at about 1000 clicks/day. At midnight on 2/10, google traffic completely halted, down to 11 clicks/day. I submitted a recon request and was told 'no manual penalty' Also, the 'sitemap' indexes in GWT showed 'pending' for 24x7 starting then. By about the 18th, the 'duplicate titles' count dropped to about 600 or so... the next day traffic hopped right back to about 800 clicks/day - for a week - then stopped again, down to 10/day, a week later, on the 26th. I then noticed that GWT was reporting 20K page-not found errors - this has now grown to 35K such errors! I realized that bogus internal links were being generated as I failed to disable the PHP warning messages.... so I disabled PHP warnings and fixed what I thought was the source of the errors. However, the not-found count continues to climb -- and I don't know where these bad internal links are coming from, because the GWT report lists these link sources as 'unavailable'. I'v been through a similar problem last year and it took months (4) for google to digest all the bogus pages ad recover. If I have to wait that long again I will lose much $$. Assuming that the large number of 404 internal errors is the reason for the sudden shutoff... How can I a) verify the source of these internal links, given that google says the source pages are 'unavailable'.. Most critically, how can I do a 'RESET" and have google re-spider my site -- or block the signature of these URLs in order to get rid of these errors ASAP?? thanks
Technical SEO | | mantucket0