Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Do search engines crawl links on 404 pages?
-
I'm currently in the process of redesigning my site's 404 page. I know there's all sorts of best practices from UX standpoint but what about search engines? Since these pages are roadblocks in the crawl process, I was wondering if there's a way to help the search engine continue its crawl.
Does putting links to "recent posts" or something along those lines allow the bot to continue on its way or does the crawl stop at that point because the 404 HTTP status code is thrown in the header response?
-
Okay, thanks Alan!
-
Hi Brad
Sorry I have only just come back to you - it was late night here in the UK, but it looks like Alan has already answered your question

Have you tested your 404 page with fetch as Google in webmaster tools - you should see that it can see the links on your 404 page and as such will continue crawling them as Alan has said.
So what is a benefit to a user will also be a benefit to Google crawling your site in my opinion

-
Sorry, yes, it should crawl the links - they used to do that.
But you can prove it to yourself, by doing what I said - and then report back.
-
Yes it will continue crawling or yes it will stop the crawl?
-
Yes and you can test it by creating a page that is linked from nowhere else and then check your logs or analytics
-
Hey Matt,
Thanks for the reply. I'm aware of all the best practice stuff but thanks for sending through. It didn't quite answer my question so let me rephrase...
Will a bot follow a hyperlink (like the example below) on a 404 page or will it stop the crawl on that page (not on the whole site) because the header response code is a 404?
-
Hi Brad
Firstly it is great from a usability point of view to have a custom 404 page and I would link it to your most popular content and maybe add a search feature on the page for your site to help find the content that is missing. I have come across some nice 404s that actually have very concise sitemap in order to help the visitor navigate the site.In order to prevent Google from indexing your 404 page you need to make sure it returns an actuall 404 HTTP status code.
In order to understand how Goolgebot crawls your site I would look at the following post from Google themselves - https://support.google.com/webmasters/answer/182072?hl=en
Rather than being concerned about a 404 page having links on to keep the crawl going make sure you have an XML sitemap that you have submitted to Google via Webmaster Tools as this will help your crawl process.
Googlebot alots a set amount of time to crawling your site and it doesn't just stop crawling because it encounters a 404 error. However make sure that you monitor Google Webmaster Tools and take care of any reported 404s with 301 redirects for instance if the page has changed location. You will notice that Googlebot reports 404 erros on the days it finds them and these can often be multiple 404 errors encountered in one visit to your site by Googlebot. Keeing an eye on this and making sure you keep it updated will make your site as crawl efficient as possible which is clearly what you are after - as we all are

I thought this would also be interesting reading in relation to this - http://googlewebmastercentral.blogspot.co.uk/2011/05/do-404s-hurt-my-site.html
Hope this helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is a 404, then a meta refresh 301 to the home page OK for SEO?
Hi Mozzers I have a client that had a lot of soft 404s that we wanted to tidy up. Basically everything was going to the homepage. I recommended they implement proper 404s with a custom 404 page, and 301 any that really should be redirected to another page. What they have actually done is implemented a 404 (without the custom 404 page) and then after a short delay 301 redirected to the homepage. I understand why they want to do this as they don't want to lose the traffic, but is this a problem with SEO and the index? Or will Google treat as a hard 404 anyway? Many thanks
Intermediate & Advanced SEO | | Chammy0 -
Link Juice + multiple links pointing to the same page
Scenario
Intermediate & Advanced SEO | | Mark_Ch
The website has a menu consisting of 4 links Home | Shoes | About Us | Contact Us Additionally within the body content we write about various shoe types. We create a link with the anchor text "Shoes" pointing to www.mydomain.co.uk/shoes In this simple example, we have 2 instances of the same link pointing to the same url location.
We have 4 unique links.
In total we have 5 on page links. Question
How many links would Google count as part of the link juice model?
How would the link juice be weighted in terms of percentages?
If changing the anchor text in the body content to say "fashion shoes" have a different impact? Any other advise or best practice would be appreciated. Thanks Mark0 -
Is it better "nofollow" or "follow" links to external social pages?
Hello, I have four outbound links from my site home page taking users to join us on our social Network pages (Twitter, FB, YT and Google+). if you look at my site home page, you can find those 4 links as 4 large buttons on the right column of the page: http://www.virtualsheetmusic.com/ Here is my question: do you think it is better for me to add the rel="nofollow" directive to those 4 links or allow Google to follow? From a PR prospective, I am sure that would be better to apply the nofollow tag, but I would like Google to understand that we have a presence on those 4 social channels and to make clearly a correlation between our official website and our official social channels (and then to let Google understand that our social channels are legitimate and related to us), but I am afraid the nofollow directive could prevent that. What's the best move in this case? What do you suggest to do? Maybe the nofollow is irrelevant to allow Google to correlate our website to our legitimate social channels, but I am not sure about that. Any suggestions are very welcome. Thank you in advance!
Intermediate & Advanced SEO | | fablau9 -
Best practice for removing indexed internal search pages from Google?
Hi Mozzers I know that it’s best practice to block Google from indexing internal search pages, but what’s best practice when “the damage is done”? I have a project where a substantial part of our visitors and income lands on an internal search page, because Google has indexed them (about 3 %). I would like to block Google from indexing the search pages via the meta noindex,follow tag because: Google Guidelines: “Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.” http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35769 Bad user experience The search pages are (probably) stealing rankings from our real landing pages Webmaster Notification: “Googlebot found an extremely high number of URLs on your site” with links to our internal search results I want to use the meta tag to keep the link juice flowing. Do you recommend using the robots.txt instead? If yes, why? Should we just go dark on the internal search pages, or how shall we proceed with blocking them? I’m looking forward to your answer! Edit: Google have currently indexed several million of our internal search pages.
Intermediate & Advanced SEO | | HrThomsen0 -
How to properly link to products from category pages?
Hi All, We have an e-commerce website and the category pages are built so that there is a product image and below it there is the title. Both the image and the title are in a href (each on its own). I encountered the following unfinished discussion here at MOZ:
Intermediate & Advanced SEO | | BeytzNet
http://www.seomoz.org/q/how-to-optimize-achor-text-links-on-ecommerce-category-page#post-93758 The discussion states that its improper. The question is - if it is wrong then why? (maybe because Google will give its weight to the image anchor instead of the text anchor since it is higher in the page). The other question is how to resolve the matter?
Should I add nofollow to the image href? Thanks0 -
Best way to block a search engine from crawling a link?
If we have one page on our site that is is only linked to by one other page, what is the best way to block crawler access to that page? I know we could set the link to "nofollow" and that would prevent the crawler from passing any authority, and we can set the page to "noindex" to prevent it from appearing in search results, but what is the best way to prevent the crawler from accessing that one link?
Intermediate & Advanced SEO | | nicole.healthline0 -
Generating 404 Errors but the Pages Exist
Hey I have recently come across an issue with several of a sites urls being seen as a 404 by bots such as Xenu, SEOMoz, Google Web Tools etc. The funny thing is, the pages exist and display fine. This happens on many of the pages which use the Modx CMS, but the index is fine. The wordpress blog in /blog/ all works fine. The only thing I can think of is that I have a conflict in the htaccess, but troubleshooting this is difficult, any tool I have found online seem useless. Have tried to rollback to previous versions but still does not work. Anyone had any experience of similar issues? Many thanks K.
Intermediate & Advanced SEO | | Found0 -
Is 404'ing a page enough to remove it from Google's index?
We set some pages to 404 status about 7 months ago, but they are still showing in Google's index (as 404's). Is there anything else I need to do to remove these?
Intermediate & Advanced SEO | | nicole.healthline0