Google Webmaster tools -Fixing over 20,000+ crawl errors
-
Hi,
I'm trying to gather all the 404 crawl errors on my website after a recent hacking that I've been trying to rectify and clean up. Webmaster tools states that I have over 20 000+ crawl errors. I can only download a sample of 1000 errors. Is there any way to get the full list instead of correcting 1000 errors, marking them as fixed and waiting for the next batch of 1000 errors to be listed in Webmaster tools?
The current method is quite timely and I want to take care of all errors in one shot instead of over a course of a month.
-
You can use Screaming Frog to pinpoint where your 404s are coming from. Here's a great write-up with a few different ways to use SF for this: https://www.screamingfrog.co.uk/broken-link-checker/
Another option is Google Analytics.
- First, navigate to your All Pages report, then set primary dimension to Page Title.
- Next, go to your site and trigger a 404, take note of the page title, it should be something like 'Page Not Found'.
- Whatever that page title is on your 404 page, enter that in the inline filtering and it'll narrow the reporting down to just 404 pages.
- Then drill down into that result and see a full list of URLs that are throwing a 404.
- Set the secondary dimension to Previous Page Path to see the page that linked to the broken page.
Hope that's helpful!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Detecting Real Page as Soft 404 Error
We've migrated my site from HTTP to HTTPS protocols in Sep 2017 but I noticed after migration soft 404 granularly increasing. Example of soft 404 page: https://bit.ly/2xBjy4J But these soft 404 error pages are real pages but Google still detects them as soft 404. When I checked the Google cache it shows me the cache but with HTTP page. We've tried all possible solutions but unable to figure out why Google is still indexing to HTTP pages and detecting HTTPS pages as soft 404 error. Can someone please suggest a solution or possible cause for this issue or anyone same issue like this in past.
Intermediate & Advanced SEO | | bheard0 -
SEO Value of Google+?
Hi Mozers, Does having a Google+ page really impact SEO? Thanks, Yael
Intermediate & Advanced SEO | | yaelslater1 -
Would google consider this the anchor text?
Hi guys, For a button based link, can you define the anchor text google will use. I have attached screenshot of what i mean. Cheers. geavig
Intermediate & Advanced SEO | | bridhard80 -
Webmaster Tools says that Structured Data is missing (author and updated)
Hi, Google Webmaster Tools tells me, that every blog category and blog post is missing: 'updated' 'author' I find this data under 'Structured Data' => The datatype is 'hentry'. Markup is microformats.org. Is this a problem for SEO? How can I fix this? Best, Robin
Intermediate & Advanced SEO | | soralsokal0 -
Webmaster Tools Content Keywords & Meta Tagging
In Webmaster tools , Content keywords give an indication of what Google thinks a site is about. This site is a health site ( online shopping - health supplements ) - but one of the terms it thinks the site is about is "Dollar" . I'm guessing this is because on every page there is Currency Selection from multiple currencies. How do I tell Google that this part of the page is nothing to do with what my site is about? Thanks for your reply in advance!
Intermediate & Advanced SEO | | s_EOgi_Bear0 -
Access denied errors in webmaster tools
I notice today Ihave 2 access denied errors. I checked the help which says: Googlebot couldn’t access a URL on your site because your site requires users to log in to view all or some of your content. (Tip: You can get around this by removing this requirement for user-agent Googlebot.) Therefore I think it may be because I have added a login page for users and googlebot can't access it. I'm using wordpress and presume I need to amend the robots.txt to remove the requirement for google to log in but how do I do that? Unless I'm misunderstanding the problem altogether!
Intermediate & Advanced SEO | | SamCUK0 -
Magic keywords in Google Webmaster Tools
Hi All, Recently moved a friend to a new WP back-end website as they were on Flash which is pretty, but not necessarily the best for SEO. http://francesphotography.com My question is that once Google finally indexed the site, I noticed in Google Webmaster tools that it found the most significant keyword to be: automatically On the following top pages: | tag/snow-boarding-photography/ |
Intermediate & Advanced SEO | | BoulderJoe
| tag/style-photography/ |
| tag/underwater-photography/ |
| tag/vacation-photography/ |
| tag/wedding-photography-beaver-creek/ |
| tag/wedding-photography-copper-mountain/ |
| tag/wedding-photography-denver/ |
| tag/wedding-photography/ |
| underwater-photography-scuba-diving-cozumel-mexico/ |
| wedding-photography/ | The goofy thing is I can find anywhere that "automatically" is used - perhaps it is coming from a plug-in or magically keyword beans that Google found? Any guidance is appreciated.0 -
How to prevent Google from crawling our product filter?
Hi All, We have a crawler problem on one of our sites www.sneakerskoopjeonline.nl. On this site, visitors can specify criteria to filter available products. These filters are passed as http/get arguments. The number of possible filter urls is virtually limitless. In order to prevent duplicate content, or an insane amount of pages in the search indices, our software automatically adds noindex, nofollow and noarchive directives to these filter result pages. However, we’re unable to explain to crawlers (Google in particular) to ignore these urls. We’ve already changed the on page filter html to javascript, hoping this would cause the crawler to ignore it. However, it seems that Googlebot executes the javascript and crawls the generated urls anyway. What can we do to prevent Google from crawling all the filter options? Thanks in advance for the help. Kind regards, Gerwin
Intermediate & Advanced SEO | | footsteps0