404 page not found after site migration
-
Hi,
A question from our developer.
We have an issue in Google Webmaster Tools.
A few months ago we killed off one of our e-commerce sites and set up another to replace it. The new site uses different software on a different domain. I set up a mass 301 redirect that would redirect any URLs to the new domain, so domain-one.com/product would redirect to domain-two.com/product. As it turns out, the new site doesn’t use the same URLs for products as the old one did, so I deleted the mass 301 redirect.
We’re getting a lot of URLs showing up as 404 not found in Webmaster tools. These URLs used to exist on the old site and be linked to from the old sitemap. Even URLs that are showing up as 404 recently say that they are linked to in the old sitemap. The old sitemap no longer exists and has been returning a 404 error for some time now. Normally I would set up 301 redirects for each one and mark them as fixed, but there are almost quarter of a million URLs that are returning 404 errors, and rising.
I’m sure there are some genuine problems that need sorting out in that list, but I just can’t see them under the mass of errors for pages that have been redirected from the old site. Because of this, I’m reluctant to set up a robots file that disallows all of the 404 URLs.
The old site is no longer in the index. Searching google for site:domain-one.com returns no results.
Ideally, I’d like anything that was linked from the old sitemap to be removed from webmaster tools and for Google to stop attempting to crawl those pages.
Thanks in advance.
-
I agree that the 301 redirect would be your best option as you can pass along not only users but the bots to the right page.. You may need to get a developer in to write some regular expressions to parse the incoming request and then automatically find the correct new URL. I have worked on sites with a large number of pages and using some sort of automation is the only way to go.
That said, if you simply want to kill the old URLs you can show the 404s or 410s. As you mention, then you end up with a bunch of 404 errors in GWT. I have been there too, it's like damned if you do, damned if you don't. We had some URLs that were tracking URLs from an old site and we are now here a year later (been showing 410s for over a year on the old tracking URLs) they still show up in GWT as errors.
We are trying a new solution for how to remove these URLs from the index without getting 404 errors. We show a 200 and then we put up a minimal html page with the meta robots noindex tag.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. "
So, we allow Google to find the page, get a 200 (so no 404 errors), but then use the meta noindex tag to tell Google to remove it from the index and stop crawling the page.
Remember, this is the "nuclear" option. You only want to do this to remove the pages from the Google index. Someone mentioned using GWT to remove URLs, but if I remember correctly, you only have so many pages you can do this with at a time.
If you list the files within the robots.txt. Google will not spider the files, but then if you remove the page from robots.txt file, they will start to try spidering again. I have seen Google come back a year later on URLs when I take them out of robots. This is what happened to us and so we tried just showing the 410/404, but Google still keeps crawling. We recently moved to this option with the 200/noindexmeta and it seems to be working.
Good luck!
-
You can but the 404s should stop being crawled on their own. There's a webmaster tool that you can use to make that happen faster as well
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=64033
-
Yeah it's a 404 http://www.tester.co.uk/17th-edition-equipment/multifunction-testers/fluke-1651b-multifunction-installation-tester
with over 200,000 404's its a lot to go through and 301. For some reason they it got migrated they just pointed the old url to a new one replacing the root domain name without creating matching url's. Doh.
I was thinking about robot.txt filling them all?
-
A 404 should cause Google to de-index the content. Go to one of the bad URLs and view the headers to make sure that your webserver is returning a status 404 and not just a 404 "page".
As hard and time consuming as it might be, I would still pursue a 301 option. It's the cleanest way to resolve the issue. Just start nibbling at it and you can make a dent. Doing nothing just lets the problem grow.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Migration due to Corporate Acquisition
Hey everyone, Wanted to check-in on something that I've been thinking way too much about lately. I'll do my best to provide background, but due to some poor planning, it is rather confusing to wrap your head around. There are currently three companies involved, Holding Corp (H Corp) and two operating companies, both in the same vertical but one B2B and the other is B2C. B2C corp has been pushed down the line and we're focusing primarily on H Corp and B2B brand. Due to an acquisition of H Corp and all of it's holdings, things are getting shuffled and Ive been brought in to ensure things are done correctly. What's bizarre is H Corp and it's web property are the dominant authority in SERPs for the B2B brand. As in B2B brand loses on brand searches to H Corp, let alone any product/service related terms. As such, they want to effectively migrate all related content from H Corp site to B2B brand site and handover authority as effectively as possible. Summary: Domain Migration from H Corp site to B2B Brand site. Ive done a few migrations in my past and been brought in to recover a few post-launch so I have decent experience and a trusted process. One of my primary objectives initially is change as little as possible with content, url structure (outside the root) etc so 301s are easy but also so it doesn't look like we're trying to play any games. Here's the thing, the URL structure for H Corp is downright bad from both a UX perspective and a general organizational perspective. So Im feeling conflicted and wanted to get a few other opinions. Here are my two paths as I see and Id love opinions on both: stick with a similar URL structure to H Corp through the migration (my normal process) but deviate from pretty much every best practice for structuring URLs with keywords, common sense and logic. Pro: follow my process (which has always worked in the past) Con: don't implement SEO/On-page best practices at this stage and wait for the site redesign to implement best practices (more work) Implement new URL structure now and deviate from my trusted process. Do you see a third option? Am I overthinking it? Other important details: B2B brand is under-going a site redesign, mostly aesthetic but their a big corporation and will likely take 6-9 months to get up. Any input greatly appreciated. Cheers, Brent
Web Design | | pastcatch1 -
Does having too many wordpress portfolio pages with little content hurt a site's SEO?
I have a site that is for a service company, not image based like a photographer or artist. We utilize the Portfolio feature to create a gallery of floor coating finishes (images of all the flooring finish options available) but this solution has created /portfolio/file-name pages for each image. These pages have no other content besides the image. I've run SEMrush audits on this site which shows a high percentage of pages with low text/code ratio and duplicate content (a lot of the finishes have very similar names). This site has been extremely slow to improve any visibility online (more than 9 months) and I'm wondering if this is a factor by possibly having a negative effect on our site. We initially chose the portfolio option because it was the best-looking solution for our users but we can certainly change it to another format if that is better. Thanks!
Web Design | | WillGMG0 -
Https pages indexed but all web pages are http - please can you offer some help?
Dear Moz Community, Please could you see what you think and offer some definite steps or advice.. I contacted the host provider and his initial thought was that WordPress was causing the https problem ?: eg when an https version of a page is called, things like videos and media don't always show up. A SSL certificate that is attached to a website, can allow pages to load over https. The host said that there is no active configured SSL it's just waiting as part of the hosting package just in case, but I found that the SSL certificate is still showing up during a crawl.It's important to eliminate the https problem before external backlinks link to any of the unwanted https pages that are currently indexed. Luckily I haven't started any intense backlinking work yet, and any links I have posted in search land have all been http version.I checked a few more url's to see if it’s necessary to create a permanent redirect from https to http. For example, I tried requesting domain.co.uk using the https:// and the https:// page loaded instead of redirecting automatically to http prefix version. I know that if I am automatically redirected to the http:// version of the page, then that is the way it should be. Search engines and visitors will stay on the http version of the site and not get lost anywhere in https. This also helps to eliminate duplicate content and to preserve link juice. What are your thoughts regarding that?As I understand it, most server configurations should redirect by default when https isn’t configured, and from my experience I’ve seen cases where pages requested via https return the default server page, a 404 error, or duplicate content. So I'm confused as to where to take this.One suggestion would be to disable all https since there is no need to have any traces to SSL when the site is even crawled ?. I don't want to enable https in the htaccess only to then create a https to http rewrite rule; https shouldn't even be a crawlable function of the site at all.RewriteEngine OnRewriteCond %{HTTPS} offor to disable the SSL completely for now until it becomes a necessity for the website.I would really welcome your thoughts as I'm really stuck as to what to do for the best, short term and long term.Kind Regards
Web Design | | SEOguy10 -
Why is Google displaying meta descriptions for pages that are nowhere contained in said page metas?
Certain search keywords are pulling up incorrect page titles and meta descriptions for our site. I've looked through our code, and the text used by Google in the search results is nowhere found inside our site. I've also looked at previous iterations of our site from over a decade ago and still haven't found it. I then searched specifically for the exact phrased incorrect meta descriptions and found a long list of spammy sites linking to our domain with the exact, incorrect meta description. Is this why Google is displaying the incorrect data, and how do I get Google to use the meta descriptions from my actual site?
Web Design | | Closetstogo0 -
Should I Use An Animated Javascript Responsive Site
Hi, hope someone might be able to help me with this. I am setting my son up with a website for his small painting and decorating company. However, I am a wordpress stalwart and he has seen a theme which is a javascript animated responsive theme from template monster. Its not my choice just he is adamant that he wants it. However, I am slightly concerned that Google cannot index as well with these kind of sites as they would with a standard HTML site. I would be grateful if someone could confirm if they can be indexed etc. The content appears in what I can only describe as lightboxes. Thanks
Web Design | | denismilton0 -
Need help to implement microdata/microformat for ecommerce site
**Can somebody please help me to implement microdata/microformats codes for our ecommerce product pages? **
Web Design | | EastEssence22
Please guide me if you have some CSS example for the same. Thanks.0 -
How not to get penalized by having a Single Page Interface (SPI) ?
Guys, I run a real estate website where my clients pay me to advertise their properties. The thing is, from the beginning, I had this idea about a user interface that would remain entirely on the same page. On my site the user can filter the properties on the left panel, and the listings (4 properties at each time) are refreshed on the right side, where there is pagination. So when the user clicks on one property ad, the ad is loaded by ajax below the search panel in the same page .. there's a "back up" button that the user clicks to go back to the search panel and click on another property. People are loving our implementation and the user experience, so I simply can't let go of this UI "inovation" just for SEO, because it really is something that makes us stand out from our competitors. My question, then, is: how not to get penalized in SEO by having this Single Page Interface, because in the eyes of Google users might not be browsing my site deep enough ?
Web Design | | pqdbr0 -
How much content is too much? Best Pages For Content?
To my understanding content has a lot to do with organic rankings if written correctly. My question is, how much content is too much and what pages are best to place content. Our company sells very costly products. Our customers call to purchase, we do not have an eCommerce site. Write now we have on average 350 words per page. We have about 200+ pages. Each page is written for that general category and each product has its own unique content. It seems to me that the pages with less content, tend to rank a bit better. As we are in the process of redoing our website, is there any recommendations on writing content, or adjusting the amount of text. I am thinking a lot of our text is informative only to a certain extent. Would writing content just for the main category page be better, and then on the actual product page, have only about 250 words as a description? Are there any other recommendations for SEO that are fairly new? Besides the Title, Description, Heading Tags, Image Alts, URLS etc.
Web Design | | hfranz0