Crawl errors for pages that no longer exist
-
Hey folks,
I've been working on a site recently where I took a bunch of old, outdated pages down. In the Google Search Console "Crawl Errors" section, I've started seeing a bunch of "Not Found" errors for those pages. That makes perfect sense.
The thing that I'm confused about is that the "Linked From" list only shows a sitemap that I ALSO took down. Alternatively, some of them list other old, removed pages in the "Linked From" list.
Is there a reason that Google is trying to inform me that pages/sitemaps that don't exist are somehow still linking to other pages that don't exist? And is this ultimately something I should be concerned about?
Thanks!
-
Thanks for the question, this can definitely be annoying for webmasters!
Unfortunately, bots can don't everything in parallel. They have to take steps...
Step 1. Take List #1 of links.
Step 2. Crawl those links and build List #2.
Step 3. Crawl List #3 and build List #4...Now, sometimes it doesn't follow that same order. Let's say that in Step 3 it finds a bunch of pages with unique content. Maybe the next time around, it goes and checks some of those links in Step 3 without first checking if they were still linked. Why start the crawl all the way from the beginning again when you have a big list of URLs?
But, this creates a problem. When some of those links it crawled in Step 3 aren't there any more, Google will tell you they aren't there and tell you how they originally found them (which happened to be from a page in List #1). But what if Google hasn't checked that link in List #1 recently? What if you just removed it too?
Well, for a little while, at least, you will end up with errors.
Now, here comes the real rub - how long will it take for Google to find and correct that message it left you in the crawl report? Days? Weeks? Months? Who knows. Your best bet is to mark them as fixed and force Google to keep rechecking. Eventually, they will figure it out.
TL;DR; it is a data freshness and reporting issue that isn't your fault and isn't worth your time.
-
No - Google is just showing how slow it is when updating data in Webmaster tools.
Don't worry - if you wait long enough they'll go away. You could also mark them as solved (do this only if you are sure that there are no links pointing to these pages - to check if your internal linking is ok Screaming Frog is great tool)
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to Diagnose "Crawled - Currently Not Indexed" in Google Search Console
The new Google Search Console gives a ton of information about which pages were excluded and why, but one that I'm struggling with is "crawled - currently not indexed". I have some clients that have fallen into this pit and I've identified one reason why it's occurring on some of them - they have multiple websites covering the same information (local businesses) - but others I'm completely flummoxed. Does anyone have any experience figuring this one out?
Reporting & Analytics | | brettmandoes2 -
Google Analytics reporting traffic for 404 pages
Hi guys, Unique issue with google analytics reporting for one of our sites. GA is reporting sessions for 404 pages (landing pages, organic traffic) e.g. for this page: http://www.milkandlove.com.au/breastfeeding-dresses/index.php the page is currently a 404 page but GA (see screenshot) is reporting organic traffic (to the landing page). Does anyone know any reasons why this is happening? Cheers. http://www.milkandlove.com.au/breastfeeding-dresses/index.php GK0zDzj.jpg
Reporting & Analytics | | jayoliverwright2 -
How can I see what Google sees when it crawls my page?
In other words, how can see the text and what not it sees from start to finish on each page. I know there was a site, but I can't remember it.
Reporting & Analytics | | tiffany11030 -
Switch to www from non www preference negatively hit # pages indexed
I have a client whose site did not use the www preference but rather the non www form of the url. We were having trouble seeing some high quality inlinks and I wondered if the redirect to the non www site from the links was making it hard for us to track. After some reading, it seemed we should be using the www version for better SEO anyway so I made a change on Monday but had a major hit to the number of pages being indexed by Thursday. Freaking me out mildly. What are people's thoughts? I think I should roll back the www change asap - or am I jumping the gun?
Reporting & Analytics | | BrigitteMN0 -
Figuring Out the Source of "direct traffic" by looking at landing page parameters
I have a client who runs an e-commerce website, and I noticed that 40% of his traffic and 25% of his sales are all attributable to Direct Traffic. At first, I tried to solve this problem by tagging all of the previously untagged links in his e-newsletter, which I expect to be very helpful. However, then I looked at the landing pages for his direct traffic, and I see that it is almost entirely filled with thousands of unique URLs that begin with a question mark followed by the name of his e-newsletter or shopping cart vendor. It would be the equivalent of having a url like the following: "www.willmarlow.com/?constantcontact=keya;sldkfjsdlfkjdf;sldkjf" If we have this amount of information in the link, shouldn't there be a way to add additional parameters to the URL to move this traffic out of the Direct column? Has anyone encountered this before? Thanks.
Reporting & Analytics | | williammarlow0 -
2 days in the past week Google has crawled 10x the average pages crawled per day. What does this mean?
For the past 3 months my site www.dlawlesshardware.com has had an average of about 400 pages crawled per day by google. We have just over 6,000 indexed pages. However, twice in the last week, Google crawled an enormous percentage of my site. After averaging 400 pages crawled for the last 3 months, the last 4 days of crawl stats say the following. 2/1 - 4,373 pages crawled 2/2 - 367 pages crawled 2/3 - 4,777 pages crawled 2/4 - 437 pages crawled What is the deal with these enormous spike in pages crawled per day? Of course, there are also corresponding spikes in kilobytes downloaded per day. Essentially, Google averages crawling about 6% of my site a day. But twice in the last week, Google decided to crawl just under 80% of my site. Has this happened to anyone else? Any ideas? I have literally no idea what this means and I haven't found anyone else with the same problem. Only people complaining about massive DROPS in pages crawled per day. Here is a screenshot from Webmaster Tools: http://imgur.com/kpnQ8EP The drop in time spent downloading a page corresponded exactly to an improvement in our CSS. So that probably doesn't need to be considered, although I'm up for any theories from anyone about anything.
Reporting & Analytics | | dellcos0 -
Please Give This Page a Good Ass Kicking
This page on my site has a high bounce rate (around 90%) despite being right on point for the search queries that lead visitors to it (i.e. keyword data shows visitors are searching for this information exactly). Also, Google keeps giving the page good placement and it receives a good bit of traffic. Anyone have thoughts as to why the bounce rate is so high? Feel free to offer candid criticism.
Reporting & Analytics | | JSOC0