Crawl errors for pages that no longer exist
-
Hey folks,
I've been working on a site recently where I took a bunch of old, outdated pages down. In the Google Search Console "Crawl Errors" section, I've started seeing a bunch of "Not Found" errors for those pages. That makes perfect sense.
The thing that I'm confused about is that the "Linked From" list only shows a sitemap that I ALSO took down. Alternatively, some of them list other old, removed pages in the "Linked From" list.
Is there a reason that Google is trying to inform me that pages/sitemaps that don't exist are somehow still linking to other pages that don't exist? And is this ultimately something I should be concerned about?
Thanks!
-
Thanks for the question, this can definitely be annoying for webmasters!
Unfortunately, bots can don't everything in parallel. They have to take steps...
Step 1. Take List #1 of links.
Step 2. Crawl those links and build List #2.
Step 3. Crawl List #3 and build List #4...Now, sometimes it doesn't follow that same order. Let's say that in Step 3 it finds a bunch of pages with unique content. Maybe the next time around, it goes and checks some of those links in Step 3 without first checking if they were still linked. Why start the crawl all the way from the beginning again when you have a big list of URLs?
But, this creates a problem. When some of those links it crawled in Step 3 aren't there any more, Google will tell you they aren't there and tell you how they originally found them (which happened to be from a page in List #1). But what if Google hasn't checked that link in List #1 recently? What if you just removed it too?
Well, for a little while, at least, you will end up with errors.
Now, here comes the real rub - how long will it take for Google to find and correct that message it left you in the crawl report? Days? Weeks? Months? Who knows. Your best bet is to mark them as fixed and force Google to keep rechecking. Eventually, they will figure it out.
TL;DR; it is a data freshness and reporting issue that isn't your fault and isn't worth your time.
-
No - Google is just showing how slow it is when updating data in Webmaster tools.
Don't worry - if you wait long enough they'll go away. You could also mark them as solved (do this only if you are sure that there are no links pointing to these pages - to check if your internal linking is ok Screaming Frog is great tool)
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Non-Interaction Virtual page views
Hello, I would like to know if it is possible in Google Analytics, to generate "non-Interaction" Virtual Page views. In other words, Virtual Page Views, that do not affect the bounce rate (exactly like when we set "non-Interaction": True when we fire an event). I am presently fixing a Google Analytics implementation, where virtual page views were used all over the place instead of events. I am pulling my hairs. Cheers Luca
Reporting & Analytics | | Lvet0 -
Page Tracking using Custom URLs - is this viable?
Hi Moz community! I’ll try to make this question as easy to understand as possible, but please excuse me if it isn’t clear. Just joined a new team a few months ago and found out that on some of our most popular pages we use “custom URLs” to track page metrics within Google Analytics. NOTE: I say “custom URLs” because that is the best way for me to describe them. As an example: This page exists to our users: http://usnews.rankingsandreviews.com/cars-trucks/Ram_HD/2012/photos-interior/ But this is the URL we have coded on the page: cars-trucks/used-cars/reviews/2012-Ram-HD/photos-interior/ (within the custom variance script labeled as “var l_tracker=” ) It is this custom URL that we use within GA to look up metrics about this page. This is just one example of many across our site setup to do the same thing Here is a second example: Available page to user: http://usnews.rankingsandreviews.com/cars-trucks/Cadillac_ATS/2015/ Custom “var l_tracker=” /cars-trucks/2015-Cadillac-ATS/overview/ NOTE: There is a small amount of fear that the above method was implemented years ago as a work-around to a poorly structured URL architecture. Not validated, but that is a question that arose. Main Questions: Is the above implementation a normal and often used method to track pages in GA? (coming from an Omniture company before – this would not be how we handled page level tracking) Team members at my current company are divided on this method. Some believe this is not a proper implementation and are concerned that trying to hide these from Google will raise red flags (i.e. fake URLs in general = bad) I cannot find any reference to this method anywhere on the InterWebs - If method is not normal: Any recommendations on a solution to address this? Potential Problems? GA is currently cataloguing these tracking URLs in the Crawl Error report. Any concerns about this? The team wants to hide the URLs in the Robots.txt file, but some team members are concerned this may raise a red flag with Google and hurt us more than help us. Thank you in advance for any insight and/or advice. Chris
Reporting & Analytics | | usnseomoz0 -
Google Crawl Stats
Hi all Wondering if anyone could help me out here. I am seeing massive variations in WMT of google crawl stats on a site I run. Just wondering if this is normal (see attached image). The site is an eccommerce site and gets a handful of new products added every couple of weeks. The total no of products is about 220k so this is only a very small %. I notice in WMT I have an amber warning under Server connectivity. About 10 days back I had warnings under DNS, Server and Robots. This was due to bad server performance. I have since moved to a new server and the other two warnings have gone back to green. I expect the Server connectivity one to update any day now. Ive included the graph for this incase it is relevant here. Many thanks for assistance. Carl crawlstats.png connect.png
Reporting & Analytics | | daedriccarl0 -
Suspect Links from Yeusaigon.net Causing Server Errors
Good morning, Webmaster Tools is reporting an increase in server errors on our site due to some very suspect links from Yeusaigon.net. After taking a quick look, it appears they are some form of search engine attempting to link to our images by using incomplete URLs. For example: http://yeusaigon.net/search/images.php?q=htc%20one%20max%20phone%20cases&page=1044 Is linking to: http://www.mobilemadhouse.co.uk/caseflex-htc-one-max-real-leather-flip... As this URL is incomplete, it's throwing up a server error. There are currently 139 instances of there errors from the same domain, and is increasing by around 5-10 per day. The domain, however, is linking to some of our pages/images correctly, but I fear Google may look at these as spammy links - they certainly look that way! So, what can we do? I can't find any contact details on Yeusaigon website so I have disavowed the entire domain. Is this the right thing to do? How do I stop the ever-increasing number of sever errors due to incorrect URLs? Cheers, Lewis
Reporting & Analytics | | PeaSoupDigital0 -
Why is Google Analytics reporting 20% fewer goals than Unique pageviews of same thank you page?
This is really puzzling me and my research has not thrown out the answer. I have always understood URL goals to be unique pageviews of the thank you page you are tracking. UPVs and goals should both only be counted once per session... Has anyone else seen this issue? Goals were not set up historically so I wanted to use unique pageviews of the thank you page for year on year comparisons, but 20% is a big difference! Background There are multiple pages to track so goal is set up using Regex There is no mistake in the goal set up (honest!) The goal URLs all match the unique pageview URLs, there are no rogue URLs There has been no change to the site or the tracking set up Data is not being sampled It's a lead gen site in an area where multiple enquiries within one visit would be very unusual Thanks in advance!
Reporting & Analytics | | McCannSEO0 -
Solving link and duplicate content errors created by Wordpress blog and tags?
SEOmoz tells me my site's blog (a Wordpress site) has 2 big problems: a few pages with too many links and duplicate content. The problem is that these pages seem legit the way they are, but obviously I need to fix the problem, sooooo... Duplicate content error: error is a result of being able to search the blog by tags. Each blog post has mutliple tags, so the url.com/blog/tag pages occasionally show the same articles. Anyone know of a way to not get penalized for this? Should I exclude these pages from being crawled/sitemapped? Too many links error: SEOmoz tells me my main blog page has too many links (both url.com/blog/ and url.com/blog-2/) - these pages have excerpts of 6 most recent blog posts. I feel like this should not be an error... anyone know of a solution that will keep the site from being penalized by these pages? Thanks!
Reporting & Analytics | | RUNNERagency0 -
OSE shows URLs redirecting to our custom created error page, is this a problem?.
When I check the link metrics for my product pages in OSE, it shows a message saying that the page redirects to our custom error page. This page was recently created to display when there is an error with the website. Do I need to be concerned that OSE is seeing all product pages as redirecting to this error page? Will it affect page authority etc,? I have attached a screen shot of the message that OSE displays for reference. YWbpM.jpg
Reporting & Analytics | | pugh0 -
Should I delete a page that gets search traffic, that I don't care about?
I have a page on my site that consistently gets traffic, every month. Googlers seems to love it. But I don't like it at all. Webmaster tools shows that google allows us a certain number of search impressions each day. - it flatlines, they are limiting the impressions we get. We also getthe same number of clickthroughs each day. So my question is for anyone who has this same experience, who may have experimented by deleting a page you don't care about. Did you just lose that number of clicks each day or did other pages on your site get displayed and clicked through instead?
Reporting & Analytics | | loopyal0