404 page not found after site migration
-
Hi,
A question from our developer.
We have an issue in Google Webmaster Tools.
A few months ago we killed off one of our e-commerce sites and set up another to replace it. The new site uses different software on a different domain. I set up a mass 301 redirect that would redirect any URLs to the new domain, so domain-one.com/product would redirect to domain-two.com/product. As it turns out, the new site doesn’t use the same URLs for products as the old one did, so I deleted the mass 301 redirect.
We’re getting a lot of URLs showing up as 404 not found in Webmaster tools. These URLs used to exist on the old site and be linked to from the old sitemap. Even URLs that are showing up as 404 recently say that they are linked to in the old sitemap. The old sitemap no longer exists and has been returning a 404 error for some time now. Normally I would set up 301 redirects for each one and mark them as fixed, but there are almost quarter of a million URLs that are returning 404 errors, and rising.
I’m sure there are some genuine problems that need sorting out in that list, but I just can’t see them under the mass of errors for pages that have been redirected from the old site. Because of this, I’m reluctant to set up a robots file that disallows all of the 404 URLs.
The old site is no longer in the index. Searching google for site:domain-one.com returns no results.
Ideally, I’d like anything that was linked from the old sitemap to be removed from webmaster tools and for Google to stop attempting to crawl those pages.
Thanks in advance.
-
I agree that the 301 redirect would be your best option as you can pass along not only users but the bots to the right page.. You may need to get a developer in to write some regular expressions to parse the incoming request and then automatically find the correct new URL. I have worked on sites with a large number of pages and using some sort of automation is the only way to go.
That said, if you simply want to kill the old URLs you can show the 404s or 410s. As you mention, then you end up with a bunch of 404 errors in GWT. I have been there too, it's like damned if you do, damned if you don't. We had some URLs that were tracking URLs from an old site and we are now here a year later (been showing 410s for over a year on the old tracking URLs) they still show up in GWT as errors.
We are trying a new solution for how to remove these URLs from the index without getting 404 errors. We show a 200 and then we put up a minimal html page with the meta robots noindex tag.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. "
So, we allow Google to find the page, get a 200 (so no 404 errors), but then use the meta noindex tag to tell Google to remove it from the index and stop crawling the page.
Remember, this is the "nuclear" option. You only want to do this to remove the pages from the Google index. Someone mentioned using GWT to remove URLs, but if I remember correctly, you only have so many pages you can do this with at a time.
If you list the files within the robots.txt. Google will not spider the files, but then if you remove the page from robots.txt file, they will start to try spidering again. I have seen Google come back a year later on URLs when I take them out of robots. This is what happened to us and so we tried just showing the 410/404, but Google still keeps crawling. We recently moved to this option with the 200/noindexmeta and it seems to be working.
Good luck!
-
You can but the 404s should stop being crawled on their own. There's a webmaster tool that you can use to make that happen faster as well
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=64033
-
Yeah it's a 404 http://www.tester.co.uk/17th-edition-equipment/multifunction-testers/fluke-1651b-multifunction-installation-tester
with over 200,000 404's its a lot to go through and 301. For some reason they it got migrated they just pointed the old url to a new one replacing the root domain name without creating matching url's. Doh.
I was thinking about robot.txt filling them all?
-
A 404 should cause Google to de-index the content. Go to one of the bad URLs and view the headers to make sure that your webserver is returning a status 404 and not just a 404 "page".
As hard and time consuming as it might be, I would still pursue a 301 option. It's the cleanest way to resolve the issue. Just start nibbling at it and you can make a dent. Doing nothing just lets the problem grow.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website Redesign and Migration to Squarespace killed my Ranking
My old website was dated, ugly, impossible to update and a mess between hard-coded pages and WP, but we were ranking #1 in the organic searches for our key words. I just redesigned my website using Squarespace. I kept most of the same text on the pages (for key words) and kept the same Meta-Tags and Title Tags for each page as much as possible. Once I was satisfied that I had done as much on-page optimization as I could, I changed the IP in our Domain Name Registry so that it would point to our new website on the Squarespace host. And our new website was live! ...Then I watched in dismay as our ranking fell into oblivion. I think this might have something to do with not doing any 301 redirects from the old website and losing all of my link juice. Is this the case? And, if so, how do I fix it? Our website url is www.kanataskinclinic.ca Thanks
Web Design | | StillLearning1 -
Address On Every page for e-Commerce site?
For a primarily e-commerce site, should you have your address on every page (in the footer, for example)? Or is it enough to just have it on the contact page? Thanks, Ruben
Web Design | | KempRugeLawGroup0 -
What To Do When Improved Site Speed & Layout Result In Higher Bounce Rates & Lower Time On Site
We launched a new Bootstrap 3.0 site template 2 weeks ago. The site loads 5x faster and has a much improved layout (utilizing most common above the fold recommendations ). It's only been two weeks, but our bounce rate has increased 5-10% and our avg time on site decreased by 10-18%. Here is the page for one of our most common products so you can see the general experience: <a>http://www.jwsuretybonds.com/surety-bonds/commercial-bonds/auto_dealer_bond.htm</a> (here is the old version: <a>http://199.119.123.134/surety-bonds/commercial-bonds/auto_dealer_bond.htm</a>) We spent two months implementing the new design and working on a speedy load time. We had anticipated a drastic improvement, not mild downturn in user behavior. I'm hopeful that the Analytics metrics aren't showing the true picture on the keywords we care about (can't see anymore due to "Not Provided" listed as most keywords now. Argh!) and perhaps some of the more important/accurate user behavior metrics that we can't see are improving. We know our industry and our clients needs VERY well. We THOUGHT our new content/layout was perfect so it will be tough for us to try to make improvements at this point. We believe our best plan of action now is to add more content on each page and A/B test it along with other subtle changes. The problem is that our new content is very concise and hits on all of the primary visitor intentions, so additions of content could be redundant and making concise answers more "fluffy", which is what we tried to get away from. What do you think? Is there reason for panic? What would your plan of attack be if your "sure shot" new design didn't provide the improvements you "knew" it would? 🙂
Web Design | | TheDude0 -
Looking for feedback on our nonprofit site
I work for a nonprofit org which of course means a low budget and paying out of pocket for things (such as training). Our current website is done by a 3rd party vendor and although it looks nice, we can't make any changes to it without paying for it. (We can only upload documents). I'm wondering if anyone in this group will give their feedback on the site in terms of SEO and recommend a platform that would be relatively easy for a small shop to manage. Our site is www.coastalcommunityfoundation.org Thanks in advance
Web Design | | TinaA0 -
Using content from other sites without duplicate content penalties?
Hi there, I am setting up a website, where i believe it would substantially benefit users experience if i setup a database of information on artists. I am torn because to feasibly do this correctly, i would have content that is built from multiple sources, but has no real unique content. It would have parts from Wikipedia, parts from other websites etc. All would be sourced of-course. My concern is that if i do this, am i risking in devaluing my website because of this. Is there a way i can handle this without taking a hit?
Web Design | | BorisD0 -
Should the parent directory of the main site-navigation be clickable or not?!?
Highly discussed in our team is the question: Should all parent navigation items be clickable, or only the ones that have no child menu appearing on mouse over? At Starwood Germany, we would like to adjust the main navigation for all our websites in order to improve consistency and user friendliness. At the moment, most of our websites feature both clickable non-clickable parent items, depending on whether the items have a corresponding child menu (appearing on mouse over) or not. See example here: http://www.imperialvienna.com/en Some of our team members believe it might be irritating and/or confusing for the user if some items are clickable while others are not. What do you think? Any thoughts and insights would be truly appreciated!
Web Design | | DFM_GSA0 -
Page Size
Hello Mozers, What is the best page size ( or the max page size ie KB ) for a home page or a 2nd level page. Thank you - I appreciate you looking at this question. Vijay
Web Design | | vijayvasu0 -
Google Bot cannot see the content of my pages
When I go to Google Webmaster tools and I type in any URL from the site http://www.ccisolutions.com in the "Fetch as Google Bot" feature, and then I click the link that says "success," Google bot is seeing my pages like this: <code>HTTP/1.1 200 OK Date: Tue, 26 Apr 2011 19:11:50 GMT Server: Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.7a DAV/2 PHP/5.2.4 mod_jk/1.2.25 Set-Cookie: CCISolutions-UT-Status=66.249.72.55.1303845110495128; path=/; expires=Thu, 25-Apr-13 19:11:50 GMT; domain=.ccisolutions.com Last-Modified: Tue, 28 Oct 2008 14:36:45 GMT ETag: "314b26-5a-2d421940" Accept-Ranges: bytes Content-Length: 90 Keep-Alive: timeout=15, max=99 Connection: Keep-Alive Content-Type: text/html Any clue as to why this could be happening?</code>
Web Design | | danatanseo0