Do search engines crawl links on 404 pages?
-
I'm currently in the process of redesigning my site's 404 page. I know there's all sorts of best practices from UX standpoint but what about search engines? Since these pages are roadblocks in the crawl process, I was wondering if there's a way to help the search engine continue its crawl.
Does putting links to "recent posts" or something along those lines allow the bot to continue on its way or does the crawl stop at that point because the 404 HTTP status code is thrown in the header response?
-
Okay, thanks Alan!
-
Hi Brad
Sorry I have only just come back to you - it was late night here in the UK, but it looks like Alan has already answered your question
Have you tested your 404 page with fetch as Google in webmaster tools - you should see that it can see the links on your 404 page and as such will continue crawling them as Alan has said.
So what is a benefit to a user will also be a benefit to Google crawling your site in my opinion
-
Sorry, yes, it should crawl the links - they used to do that.
But you can prove it to yourself, by doing what I said - and then report back.
-
Yes it will continue crawling or yes it will stop the crawl?
-
Yes and you can test it by creating a page that is linked from nowhere else and then check your logs or analytics
-
Hey Matt,
Thanks for the reply. I'm aware of all the best practice stuff but thanks for sending through. It didn't quite answer my question so let me rephrase...
Will a bot follow a hyperlink (like the example below) on a 404 page or will it stop the crawl on that page (not on the whole site) because the header response code is a 404?
-
Hi Brad
Firstly it is great from a usability point of view to have a custom 404 page and I would link it to your most popular content and maybe add a search feature on the page for your site to help find the content that is missing. I have come across some nice 404s that actually have very concise sitemap in order to help the visitor navigate the site.In order to prevent Google from indexing your 404 page you need to make sure it returns an actuall 404 HTTP status code.
In order to understand how Goolgebot crawls your site I would look at the following post from Google themselves - https://support.google.com/webmasters/answer/182072?hl=en
Rather than being concerned about a 404 page having links on to keep the crawl going make sure you have an XML sitemap that you have submitted to Google via Webmaster Tools as this will help your crawl process.
Googlebot alots a set amount of time to crawling your site and it doesn't just stop crawling because it encounters a 404 error. However make sure that you monitor Google Webmaster Tools and take care of any reported 404s with 301 redirects for instance if the page has changed location. You will notice that Googlebot reports 404 erros on the days it finds them and these can often be multiple 404 errors encountered in one visit to your site by Googlebot. Keeing an eye on this and making sure you keep it updated will make your site as crawl efficient as possible which is clearly what you are after - as we all are
I thought this would also be interesting reading in relation to this - http://googlewebmastercentral.blogspot.co.uk/2011/05/do-404s-hurt-my-site.html
Hope this helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Adsbot crawling order confirmation pages?
Hi, We have had roughly 1000+ requests per 24 hours from Google-adsbot to our confirmation pages. This generates an error as the confirmation page cannot be viewed after closing or by anyone who didn't complete the order. How is google-adsbot finding pages to crawl that are not linked to anywhere on the site, in the sitemap or linked to anywhere else? Is there any harm in a google crawler receiving a higher percentage of errors - even though the pages are not supposed to be requested. Is there anything we can do to prevent the errors for the benefit of our network team and what are the possible risks of any measures we can take? This bot seems to be for evaluating the quality of landing pages used in for Adwords so why is it trying to access confirmation pages when they have not been set for any of our adverts? We included "Disallow: /confirmation" in the robots.txt but it has continued to request these pages, generating a 403 page and an error in the log files so it seems Adsbot doesn't follow robots.txt. Thanks in advance for any help, Sam
Intermediate & Advanced SEO | | seoeuroflorist0 -
Landing pages, are my pages competing?
If I have identified a keyword which generates income and when searched in google my homepage comes up ranked second, should I still create a landing page based on that keyword or will it compete with my homepage and cause it to rank lower?
Intermediate & Advanced SEO | | The_Great_Projects0 -
How to handle broken links to phantom pages appearing in webmaster tools
Hi,Would love to hear different experiences and thoughts on this one. We have a site that is plagued with 404's in the Webmaster Tools. A significant number of them have never existed, for instance affiliates have linked to them with the wrong URL or scraper sites have linked to them with a truncated version of the URL and an ellipsis eg; /my-nonexistent... What's the best way to handle these? If we do nothing and mark as fixed, they reappear in the broken links report. If we 301 redirect and mark as fixed they reappear. We tried 410 (gone forever) and marking as fixed; they re-appeared. We have a lot of legacy broken links and we would really like to clean up our WMT broken link profile - does anyone know of a way we can make these links to non extistent pages disappear once and for all? Many thanks in advance!
Intermediate & Advanced SEO | | dancape0 -
Intra-linking to pages with a different Canonical url ?
Hello Moz Community! I'm hoping to get some advice around intra-linking practices and the benefits when a page that is being linked to has a different canonical tag than it's own URL. Confused? Allow me to elaborate. Scenario: Background: Ecommerce Company is trying to increase its organic ranking for key, broad terms in the cycling industry. Ecommerce company is trying to rank its category pages for a main term. To help this, the company focusing on increasing the quality of its intra-linking structure (the links and anchor texts that link to another page within the site). Example goal: to have it's Road Cassettes category page rank for 'Road Cassettes' Company's 'cassettes' main category page is here: /Components/Drivetrain/Cassettes/ And the company uses filtered navigation logic to drill down into 'road cassettes' specifically: /Components/Drivetrain/Cassettes/?page_no=1&fq=ATR_RoadBiking:True SEOs are instructed to include occasional links back to this page, with SEO friendly anchor text, to help strengthen it's authority for the main term. The Issue / Question: Main category URL: /Components/Drivetrain/Cassettes/ Road Cassettes category URL: /Components/Drivetrain/Cassettes/?page_no=1&fq=ATR_RoadBiking:True Road Cassettes Canonical URL: /Components/Drivetrain/Cassettes/ The canonical URL of the filtered Road Cassettes category is its main category URL. Will Company be able to effectively rank its Road Cassettes category URL for 'Road Cassettes' if the canonical URL is the main category? Should the canonical URL not be the main category? OR Will increasing the intra-linking to the Road Cassettes URL help the main category URL rank for 'Road Cassettes' - by passing all it's authority?
Intermediate & Advanced SEO | | Ray-pp0 -
Varying Internal Link Anchor Text with Each New Page Load
I'm asking for people's opinions on varying internal anchor text. Before you jump in and say, "Oh yes, varying your anchor text is always a good idea", let me explain. I'm not talking about varying anchor text on different links scattered throughout a site. We all know that is a wise thing to do for a variety of reasons that have been covered in many places. What I'm talking about is including semi-useful links below the fold and then varying the anchor text with each page load. Each time Googlebot crawls a page, it sees different anchor text for each link. That way, Googlebot is seeing, for example, 'san diego bars', 'taverns in san diego', 'san diego clubs', and 'pubs in san diego' all pointing to a San Diego bar/tavern/club/pub page. I'm wondering if there is value in this approach. Will it help a site rank well for multiple search queries? Could it potentially be better than static anchor text as it may help Google better understand the targeted page? Is it a good way to protect a large site with a huge number of internal links from Penguin? To summarize, we're talking about the impact of varying the anchor text on a single page with each page load as opposed to varying the anchor text on different pages. Thoughts?
Intermediate & Advanced SEO | | RyanOD0 -
How can Google index a page that it can't crawl completely?
I recently posted a question regarding a product page that appeared to have no content. [http://www.seomoz.org/q/why-is-ose-showing-now-data-for-this-url] What puzzles me is that this page got indexed anyway. Was it indexed based on Google knowing that there was once content on the page? Was it indexed based on the trust level of our root domain? What are your thoughts? I'm asking not only because I don't know the answer, but because I know the argument is going to be made that if Google indexed the page then it must have been crawlable...therefore we didn't really have a crawlability problem. Why Google index a page it can't crawl?
Intermediate & Advanced SEO | | danatanseo0 -
Do 404 Pages from Broken Links Still Pass Link Equity?
Hi everyone, I've searched the Q&A section, and also Google, for about the past hour and couldn't find a clear answer on this. When inbound links point to a page that no longer exists, thus producing a 404 Error Page, is link equity/domain authority lost? We are migrating a large eCommerce website and have hundreds of pages with little to no traffic that have legacy 301 redirects pointing to their URLs. I'm trying to decide how necessary it is to keep these redirects. I'm not concerned about the page authority of the pages with little traffic...I'm concerned about overall domain authority of the site since that certainly plays a role in how the site ranks overall in Google (especially pages with no links pointing to them...perfect example is Amazon...thousands of pages with no external links that rank #1 in Google for their product name). Anyone have a clear answer? Thanks!
Intermediate & Advanced SEO | | M_D_Golden_Peak0 -
Better to re-direct to a completely un-related page or 404?
We have about 1000 pages we need to eliminate from our site (of about 18000 URLs). these URLs don't see a ton of traffic, but may have some valuable links. Would we be better to 404 these or re-direct them to our homepage? Could re-directing to our homepage hurt us?
Intermediate & Advanced SEO | | nicole.healthline0