404 page not found after site migration
-
Hi,
A question from our developer.
We have an issue in Google Webmaster Tools.
A few months ago we killed off one of our e-commerce sites and set up another to replace it. The new site uses different software on a different domain. I set up a mass 301 redirect that would redirect any URLs to the new domain, so domain-one.com/product would redirect to domain-two.com/product. As it turns out, the new site doesn’t use the same URLs for products as the old one did, so I deleted the mass 301 redirect.
We’re getting a lot of URLs showing up as 404 not found in Webmaster tools. These URLs used to exist on the old site and be linked to from the old sitemap. Even URLs that are showing up as 404 recently say that they are linked to in the old sitemap. The old sitemap no longer exists and has been returning a 404 error for some time now. Normally I would set up 301 redirects for each one and mark them as fixed, but there are almost quarter of a million URLs that are returning 404 errors, and rising.
I’m sure there are some genuine problems that need sorting out in that list, but I just can’t see them under the mass of errors for pages that have been redirected from the old site. Because of this, I’m reluctant to set up a robots file that disallows all of the 404 URLs.
The old site is no longer in the index. Searching google for site:domain-one.com returns no results.
Ideally, I’d like anything that was linked from the old sitemap to be removed from webmaster tools and for Google to stop attempting to crawl those pages.
Thanks in advance.
-
I agree that the 301 redirect would be your best option as you can pass along not only users but the bots to the right page.. You may need to get a developer in to write some regular expressions to parse the incoming request and then automatically find the correct new URL. I have worked on sites with a large number of pages and using some sort of automation is the only way to go.
That said, if you simply want to kill the old URLs you can show the 404s or 410s. As you mention, then you end up with a bunch of 404 errors in GWT. I have been there too, it's like damned if you do, damned if you don't. We had some URLs that were tracking URLs from an old site and we are now here a year later (been showing 410s for over a year on the old tracking URLs) they still show up in GWT as errors.
We are trying a new solution for how to remove these URLs from the index without getting 404 errors. We show a 200 and then we put up a minimal html page with the meta robots noindex tag.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. "
So, we allow Google to find the page, get a 200 (so no 404 errors), but then use the meta noindex tag to tell Google to remove it from the index and stop crawling the page.
Remember, this is the "nuclear" option. You only want to do this to remove the pages from the Google index. Someone mentioned using GWT to remove URLs, but if I remember correctly, you only have so many pages you can do this with at a time.
If you list the files within the robots.txt. Google will not spider the files, but then if you remove the page from robots.txt file, they will start to try spidering again. I have seen Google come back a year later on URLs when I take them out of robots. This is what happened to us and so we tried just showing the 410/404, but Google still keeps crawling. We recently moved to this option with the 200/noindexmeta and it seems to be working.
Good luck!
-
You can but the 404s should stop being crawled on their own. There's a webmaster tool that you can use to make that happen faster as well
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=64033
-
Yeah it's a 404 http://www.tester.co.uk/17th-edition-equipment/multifunction-testers/fluke-1651b-multifunction-installation-tester
with over 200,000 404's its a lot to go through and 301. For some reason they it got migrated they just pointed the old url to a new one replacing the root domain name without creating matching url's. Doh.
I was thinking about robot.txt filling them all?
-
A 404 should cause Google to de-index the content. Go to one of the bad URLs and view the headers to make sure that your webserver is returning a status 404 and not just a 404 "page".
As hard and time consuming as it might be, I would still pursue a 301 option. It's the cleanest way to resolve the issue. Just start nibbling at it and you can make a dent. Doing nothing just lets the problem grow.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Adding picture and new layout on jobs-overview page
Im running a castingsite today, where the jobs-overview page is the highest ranked on google on the important words. There is a big of reasons for that, it's updated daily, the domain is old and wellknown and so. Anyways, the today is this: (Yes it's ugly and old-school :))
Web Design | | KasperGJ
Current design:
http://www.onlinecasting.dk/auditions.asp I've created a new design, which is much nicer and with added pictures. The pictures in the new design, will be somewhat unique to the specific jobs, so the current ones are mostly for testing New design: (Not implemented)
http://www.onlinecasting.dk/auditionsnd.asp Question:
So my question is. Do you think this NEW design could affect my site / page in a bad way in SEO or?
I'm planning basically just to overwrite the old auditions.asp file with the new code. What do you guys think.0 -
Changing top level navigation between site sections
We've got an internal proposal to change our top level nav depending on the section of the site. For example, on our homepage it might read: Products, Library, About with relevant links dropping down below. As we have varied products, the drop down underneath it would include the various families. When arriving on the product family page the top-level nav would change to represent more specific offerings. For example: xxx.com 1. Products; 2. Library; 3. About xxx.com/xxx 1. Product family 1; 2. Product family 2; 3. Product family 3; 4. Library; 5. About What are the SEO/UX implications of this? It seems confusing but allows more specific navigation via the main nav depending on the section of the site. Also it seems that an alternating TLN might not be too Google-friendly.
Web Design | | gwelch0 -
Pushstate and Infinite Scrolling Article Pages: Is it detrimental to not change URLs as the page is being scrolled?
I've noticed a recent trend of news sites using infinite scrolling on article pages to garner more pageviews and I can assume serve up more ads. Here is an overview. Here is an article from NBC news that uses this technique: http://www.nbcnews.com/pop-culture/music/grammys-2016-here-s-why-adele-s-performance-was-out-n519186 Studies have shown that this technique has decreased bounce rates by +15% for some sites. My question is: If a site is using the technique without changing URLs as the user scrolls down what overall negative effects does this have? Obviously you wouldn't be getting credit for the extra pageviews but I was wondering if there were any indexation implications with this. Here is an example of article infinite scrolling without changing the URL: http://www.wftv.com/news/national-content/deputies-wife-attacks-husband-because-he-didnt-get-her-a-valentines-day-gift/87691927
Web Design | | Cox-Media-Group1 -
SEO tricks for a one page site with commented html content
Hi, I am building a website that is very similar to madebysofa.com : means it is one page site with entire content loaded (however are commented in html) and by clicking on sections it modify the DOM to make specific section visible. It is very interesting from UX point of view but as far as I know, since this way most of my content is always commented and hidden from crawlers, I will loose points regarding SEO. Is there any workaround you can recommend or you think sites like madebysofa.com are doomed to loose SEO points by nature? Best regards,
Web Design | | Ashkan10 -
What reason would scrapers, and syndication sites outrank all of our content?
Typing in any of our titles for content, scrapers and content syndication sites all outrank us by quite a bit. What is the main reason for this usually? I started noticing this happening quite a bit this year, and think maybe it has to do with panda. Has anyone figured out the reasoning?
Web Design | | upbuiltgames0 -
Order of my products on page?
Hi, I read somewhere that Google reads a page in a certain way. All my product pages are listed (or most of them) in Alphabetical order. Now say I am targeting brands named Cruyff and Money Clothing, should I put all the Cruyff and Money products above everything else? See here for example... http://www.designerboutique-online.com/jackets/ They are in Alph order, except the sales items at the bottom. So would it be beneficial to do this? To put my targeted brands at the top of the page? And if not, is there anything I should be doing with the layout of the products to improve/help with SEO? Thanks Will
Web Design | | WillBlackburn0 -
What is the optimal URL Structure for Internal Pages
Is it more SEO friendly to have an internal page URL structure that reads like www.smithlawfirm.com/personal-injury/car-accidents or www.smithlawfirm.com/personal-injury-car-accidents? The former structure has the benefit of showing Google all the sub-categories under personal injury; the later the benefit of a flatter structure. Thanks
Web Design | | rarbel0 -
Do iFrames embedded in a page get crawled?
Do iFrames embedded in a page get crawled? I have an iFrame which prints a page hosted by another company embedded in my page. Their links don't include rel=nofollow attributes, so I don't want Google to see them. Do spiders crawl the content in iFrames, or do I have to ensure that the links on this page include the nofollow attribute?
Web Design | | deuce1s0