How to find all 404 deadlinks - webmaster only allows 1000 to be downloaded...
-
Hi Guys
I have a question...I am currently working on a website that was hit by a spam attack.
The website was hacked and 1000's of adult censored pages were created on the wordpress site.
The hosting company cleared all of the dubious files - but this has left 1000's of dead 404 pages.
We want to fix the dead pages but Google webmaster only shows and allows you to download 1000.
There are a lot more than 1000....does any know of any Good tools that allows you to identify all 404 pages?
Thanks, Duncan
-
The Moz crawl report will also show 404s. I sometimes find that different spiders may find different things. Between the Search Console report, Screaming Frog (great investment) and Moz, you should have a nice collection of things to fix.
-
I must second Dirk's suggestion of screaming frog, great tool and I use it daily, a license is well worth the cost. Although spider crawl of the site will only point out 404's that have are links from an existing page, so if the hosting company cleaned up the not all of these 404's will surface.
One approach I would suggest is run the current 1000 404's in GWT through Screaming frog as a manually added list, (do it in 2 batches if you have the free version), start a spreadsheet of the resulting 404's and start working through that list. Once you have the 404's mark those as fixed as GWT tools set a reminder to check back in a few days and after a few days export the new list of 1000 404's and run these through screaming frog adding the resulting list to your spreadsheet. Keep doing this until you get the 404's errors in GWT down a manageable level.
I hope that helps, good luck.
-
Probably the easiest solution is to buy a licence from Screaming Frog & to crawl your site locally. The tool can do a lot of useful stuff to audit sites and will show you not only the full list of 4xx errors but also the pages that link to them.
There is also a free version but that allows you to crawl only 500 pages - which in your case is probably not sufficient but it would allow you to see how it works.
Hope this helps,
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How will google respond to allowing multilingual search terms for a single language website?
We would like to set up a website in English language only and promote this in various European countries. As said the website will only be available in English language, but we will keep translations (google translate) in backend. When a user in France then enters search query in French language in browser, a search can be done in French content, but we will present relevant content in English. Does anyone have any experience with that? Will it be allowed given the fact that the result (in English language) will probably not include any of the terms that was searched on (in French language).
Intermediate & Advanced SEO | | hansdef0 -
Should I Keep adding 301s or use a noindex,follow/canonical or a 404 in this situation?
Hi Mozzers, I feel I am facing a double edge sword situation. I am in the process of migrating 4 domains into one. I am in the process of creating URL redirect mapping The pages I am having the most issues are the event pages that are past due but carry some value as they generally have one external followed link. www.example.com/event-2008 301 redirect to www.newdomain.com/event-2016 www.example.com/event-2007 301 redirect to www.newdomain.com/event-2016 www.example.com/event-2006 301 redirect to www.newdomain.com/event-2016 Again these old events aren't necessarily important in terms of link equity but do carry some and at the same time keep adding multiple 301s pointing to the same page may not be a good ideas as it will increase the page speed load time which will affect the new site's performance. If i add a 404 I will lose the bit of equity in those. No index,follow may work since it won't index the old domain nor the page itself but still not 100% sure about it. I am not sure how a canonical would work since it would keep the old domain live. At this point I am not sure which direction I should follow? Thanks for your answers!
Intermediate & Advanced SEO | | Ideas-Money-Art0 -
Is a 404, then a meta refresh 301 to the home page OK for SEO?
Hi Mozzers I have a client that had a lot of soft 404s that we wanted to tidy up. Basically everything was going to the homepage. I recommended they implement proper 404s with a custom 404 page, and 301 any that really should be redirected to another page. What they have actually done is implemented a 404 (without the custom 404 page) and then after a short delay 301 redirected to the homepage. I understand why they want to do this as they don't want to lose the traffic, but is this a problem with SEO and the index? Or will Google treat as a hard 404 anyway? Many thanks
Intermediate & Advanced SEO | | Chammy0 -
HTTP Status Bad Request - 404, but also, add a 400 HTTP Status in certain circumstances?
We currently have a custom 404 page set up for our clients, but the developer has it returning a HTTP 200 for the status code. Big no, no. I'm having that fixed right now. My question is, currently, the custom 404 page is only returned for urls with the extension .aspx: For example : ilovepizza.com/pepperni.aspx would return a 404 page because the correct page is ilovepizza.com/pepperoni.aspx Any other format of URL without the extension (example ilovepizza.com/thumbtack) does not trigger the custom 404 page we've created, but it does trigger a server error with a 404 HTTP status page. I want to change this so this type of error also triggers the custom 404 page because it's more user-friendly and would return them to the website. My question: Is there any benefit to making the /thumbtack errors return the custom 404 page but with a 400 Bad Request HTTP Status? Kind of a novice here in those aspects, but does the 400 Bad Request status indicate that it was a user mistake and not a mistake created on the website? Other suggestions?
Intermediate & Advanced SEO | | EEE30 -
Do 410 show in the 404 not found section in Google Webmaster Tools?
Question: Do 410 show in the 404 not found section in Google Webmaster Tools? Specific situation: We got rid of an entire subdomain except for a few pages that we 301'd to relevant content on our main domain. The rest return a 404 not found. These show up in our google webmaster tools as crawl errors. I was wondering since 410 is a content gone error and we intentionally want this content gone, if we switch it to 410, does Google still report it as a 404 error? Thanks
Intermediate & Advanced SEO | | MarloSchneider0 -
Is it allowed to have different alt on same image on different pages?
Hi, I have images that match several different keywords and I wondered if I can give them different alts based on the page that they are displayed or will Google be angry with me? Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
Any way to find which domains are 301 redirected to competitors' websites?
By looking at the work from an SEO collegue it became clear that his weak linkbuilding graph probably is not the cause for his good rankings for a pretty competitive keyword. (also no social mentions where found) I was wondering what it could be, site structure and other on page optimization factors seems to be ok and I don't think there will be exceptionally good or bad user behavior... Finally I looked at the competitors and found that they have more links, better content en better design, so I got a little stuck. The only reason I can think of is that he is doing 301 redirects (or is rel=canonical tags). Is there a way to trace these redirects back to the source in order to include this important variable in your competitor research? thnx
Intermediate & Advanced SEO | | djingel10 -
Magic keywords in Google Webmaster Tools
Hi All, Recently moved a friend to a new WP back-end website as they were on Flash which is pretty, but not necessarily the best for SEO. http://francesphotography.com My question is that once Google finally indexed the site, I noticed in Google Webmaster tools that it found the most significant keyword to be: automatically On the following top pages: | tag/snow-boarding-photography/ |
Intermediate & Advanced SEO | | BoulderJoe
| tag/style-photography/ |
| tag/underwater-photography/ |
| tag/vacation-photography/ |
| tag/wedding-photography-beaver-creek/ |
| tag/wedding-photography-copper-mountain/ |
| tag/wedding-photography-denver/ |
| tag/wedding-photography/ |
| underwater-photography-scuba-diving-cozumel-mexico/ |
| wedding-photography/ | The goofy thing is I can find anywhere that "automatically" is used - perhaps it is coming from a plug-in or magically keyword beans that Google found? Any guidance is appreciated.0