Thousands of 404 Pages Indexed - Recommendations?

BeTheBoss

Background: I have a newly acquired client who has had a lot of issues over the past few months.

What happened is he had a major issue with broken dynamic URL's where they would start infinite loops due to redirects and relative links. His previous SEO didn't pay attention to the sitemaps created by a backend generator, and it caused hundreds of thousands of pages to be indexed. Useless pages.

These useless pages were all bringing up a 404 page that didn't have a 404 server response (it had a 200 response) which created a ton of duplicate content and bad links (relative linking).

Now here I am, cleaning up this mess. I've fixed the 404 page so it creates a 404 server response. Google webmaster tools is now returning thousands of "not found" errors, great start. I fixed all site errors that cause infinite redirects. Cleaned up the sitemap and submitted it.

When I search site:www.(domainname).com I am still getting an insane amount of pages that no longer exist.

My question: How does Google handle all of these 404's? My client wants all the bad pages removed now but I don't have as much control over that. It's a slow process getting Google to remove these pages that are returning a 404. He is continuously dropping in rankings still.

Is there a way of speeding up the process? It's not reasonable to enter tens of thousands of pages into the URL Removal Tool.

I want to clean house and have Google just index the pages in the sitemap.

BeTheBoss

yeah all of the 301's are done - but I am trying to get around submitting tens of thousands of URL's to the URL removal tool.

BlueprintMarketing

Make sure you pay special attention to implementing the correct rel canonical was first introduced we wanted to be a little careful. We didn’t want to open it up for potential abuse so you could only use rel canonical within one domain. The only exception to that was you could do between IP addresses and domains.

But over time we didn’t see people abusing it a lot and if you think about it, if some evil malicious hacker has hacked your website and he’s going to do something to you he’s probably going to put some malware on the page or do a 301 redirect. He’s probably not patient enough to add a rel canonical and then wait for it to be re-crawled and re-indexed and all that sort of stuff.

So we sort of saw that there didn’t seem to be a lot of abuse. Most webmasters use rel canonical in really smart ways. We didn’t see a lot of people accidentally shooting themselves in the foot, which is something we do have to worry about and so a little while after rel canonical was introduced we added the ability to do cross domain rel canonical.

It basically works essentially like a 301 redirect. If you can do a 301 redirect that is still preferred because every search engine knows how to handle those and new search engines will know how to process 301s and permanent redirects.

But we do take a rel canonical and if it’s on one domain and points to another domain we will typically honor that. We always reserve the right to sort of hold back if we think that the webmaster is doing something wrong or making a mistake but in general we will almost always abide by that.

Hope that helps.

I had I have a client who unfortunately had a dispute with her prior IT person and the person made a mess of the site. It is not the quickest thing and I do agree 301 redirects are by far the quickest way to go about it. If you're getting 404 errors and the site is passing link juice. You're going to want to redirect those scattered about the website to the most relevant page.

http://jamesmartell.com/matt-cutts/how-does-google-handle-not-found-pages-that-do-not-return-a-404/

http://www.seroundtable.com/404-links-google-15427.html

http://support.google.com/customsearch/bin/topic.py?hl=en&topic=11493&parent=1723950&ctx=topic

https://developers.google.com/custom-search/docs/indexing

https://developers.google.com/custom-search/docs/api

I hope I was of help to you,

Thomas

SanketPatel

Have you redirected (301) to appropriate landing pages ? After redirection, use URL removal tool. Its work great for me, its shows the result in 24 hours to me. Its removes all the URLs from Google index that I have submitted into it.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Thousands of 404 Pages Indexed - Recommendations?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

To remove or not remove a redirected page from index

How do we decide which pages to index/de-index? Help for a 250k page site

Onsite calendar throwing out thousands of pages

Google indexing "noindex" pages

How long takes to a page show up in Google results after removing noindex from a page?

No index.no follow certain pages

Blocking Pages Via Robots, Can Images On Those Pages Be Included In Image Search

Ranking with other pages not index