Crawl errors in GWT!
-
I have been seeing a large number of access denied and not found crawl errors. I have since fixed the issued causing these errors; however, I am still seeing the in webmaster tools.
At first I thought the data was outdated, but the data is tracked on a daily basis!
Does anyone have experience with this? Does GWT really re-crawl all those pages/links everyday to see if the errors still exist?
Thanks in advance for any help/advice.
-
Neither access denied nor not found crawl errors are dealbreakers as far as Google is concerned. A not found error usually just means you have links pointing to pages that don't exist (this is how you can be receiving more errors than pages crawled - a not found error means that a link to that page was crawled, but since there's no page there, no page was crawled). Access denied is usually caused by either requiring a login or blocking the search bots with robots.txt.
If the links causing 404 errors aren't on your site it's certainly possible that errors would still be appearing. One thing you can do is double-check your 404 page to make sure it really is returning an error of 404: not found at the URL level. One common thing I've seen all over the place is that sites will institute a 302 redirect to one 404 page (like www.example.com/notfound). Because the actual URL isn't returning a 404, bots will sometimes just keep crawling those links over and over again.
Google doesn't necessarily crawl everything every day or update everything every day. If your traffic isn't being affected by these errors I would just try as best you can to minimize them, and otherwise not worry too much ab out it.
-
Crawl errors are also due to links of those pages on other sites or in google's own index. When Google revisits those pages and does not find them, they flag off as 404 errors.
-
BTW, the crawls stats show Google crawling about 3-10K pages a day. The daily errors are numbering over 100K. Is this even possible? How can if find so many errors if the spiders are not even crawling that many pages?
Thanks again!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl Budget and Faceted Navigation
Hi, we have an ecommerce website with facetted navigation for the various options available. Google has 3.4 million webpages indexed. Many of which are over 90% duplicates. Due to the low domain authority (15/100) Google is only crawling around 4,500 webpages per day, which we would like to improve/increase. We know, in order not to waste crawl budget we should use the robots.txt to disallow parameter URL’s (i.e. ?option=, ?search= etc..). This makes sense as it would resolve many of the duplicate content issues and force Google to only crawl the main category, product pages etc. However, having looked at the Google Search Console these pages are getting a significant amount of organic traffic on a monthly basis. Is it worth disallowing these parameter URL’s in robots.txt, and hoping that this solves our crawl budget issues, thus helping to index and rank the most important webpages in less time. Or is there a better solution? Many thanks in advance. Lee.
Intermediate & Advanced SEO | | Webpresence0 -
URL Errors Help - 350K Page Not Founds in 22 days
Got a good one for you all this time... For our site, Google Search Console is reporting 436,758 "Page Not Found" errors within the Crawl Error report. This is an increase of 350,000 errors in just 22 days (on Sept 21 we had 87,000 errors which was essentially consistently at that number for the previous 4 months or more). Then on August 22nd the errors jumped to 140,000, then climbed steadily from the 26th until the 31st reaching 326,000 errors, and then climbed again slowly from Sept 2nd until today's 436K. Unfortunately I can only see the top 1,000 erroneous URLs in the console, of which they seem to be custom Google tracking URLs my team uses to track our pages. A few questions: 1. Is there anyway to see the full list of 400K URLs Google is reporting they cannot find?
Intermediate & Advanced SEO | | usnseomoz
2. Should we be concerned at all about these?
3. Any other advice? thanks in advance! C0 -
Does google still not crawl forms with a method=post?
I know back in 08 Google started crawling forms using the method=get however not method=post. whats the latest? is this still valid?
Intermediate & Advanced SEO | | Turkey0 -
Rankings gone, no WMT errors, help!
Hi, Client Google rankings have been seriously hit. We have done everything we know of to see why this is the case, and there is no obvious explanation. The client dominated search terms, and are no down on page 7/8 for these search terms. There are no errors in WMT, so we can not resubmit for reconsideration. This is a genuine client and their business has been seriously affected. Can anybody offer help? Thanks in advance!
Intermediate & Advanced SEO | | roadjan0 -
VisitSweden indexing error
Hi all Just got a new site up about weekend travel for VisitSweden, the official tourism office of Sweden. Everything went just fine except som issues with indexing. The site can be found here at weekend.visitsweden.com/no/ For some weird reason the "frontpage" of the site does not get indexed. What I have done myself to find the issue: Added sitemaps.xml Configured and added site to webmaster tools Checked 301s so they are not faulty By doing a simple site:weekend.visitsweden.com/no/ you can see that the frontpage is simple not in the index. Also by doing a cache:weekend.visitsweden.com/no/ I see that Google tries to index the page without the trailing /no/ for some reason. http://webcache.googleusercontent.com/search?q=cache:http://weekend.visitsweden.com/no/ Any smart ideas to get this fixed or where to start looking? All help greatly appreciated Kind regards Fredrik
Intermediate & Advanced SEO | | Resultify0 -
Should I let Google crawl my production server if the site is still under development?
I am building out a brand new site. It's built on Wordpress so I've been tinkering with the themes and plug-ins on the production server. To my surprise, less than a week after installing Wordpress, I have pages in the index. I've seen advice in this forum about blocking search bots from dev servers to prevent duplicate content, but this is my production server so it seems like a bad idea. Any advice on the best way to proceed? Block or no block? Or something else? (I know how to block, so I'm not looking for instructions). We're around 3 months from officially launching (possibly less). We'll start to have real content on the site some time in June, even though we aren't planning to launch. We should have a development environment ready in the next couple of weeks. Thanks!
Intermediate & Advanced SEO | | DoItHappy0 -
Custom Error and page not found responses
When there is a 500 Internal Server Error, is it better to return an HTTP 500 response and custom error page from the requested URL, or is it better to return a 302 redirect? The redirect would send the browser to the custom error page, which would return the HTTP 500 result. We tell Google not to index or follow our error pages, so if Google sees an error at a URL, we don't necessarily want Google to think that the URL should be ignored. That's why the alternative would be to redirect to a custom error page with it's own URL. Similarly, what's the best approach if the response is a 404? Return HTTP 404 and custom 404 page from the requested URL, or redirect? Thanks.
Intermediate & Advanced SEO | | dbuckles0 -
Old pages still crawled by SE returning 404s. Better to put 301 or block with robots.txt ?
Hello guys, A client of ours has thousand of pages returning 404 visibile on googl webmaster tools. These are all old pages which don't exist anymore but Google keeps on detecting them. These pages belong to sections of the site which don't exist anymore. They are not linked externally and didn't provide much value even when they existed What do u suggest us to do: (a) do nothing (b) redirect all these URL/folders to the homepage through a 301 (c) block these pages through the robots.txt. Are we inappropriately using part of the crawling budget set by Search Engines by not doing anything ? thx
Intermediate & Advanced SEO | | H-FARM0