Killing 404 errors on our site in Google's index
-
Having moved a site across to Magento, obviously re-directs were a large part of that, ensuring all the old products and categories linked up correctly with the new site structure.
However, we came up against an issue where we needed to add, delete, then re-add products. This, coupled with a misunderstanding of the csv upload processing, meant that although the old urls redirected, some of the new Magento urls changed and then didn't redirect:
For Example:
mysite/product
would get deleted re-added and become:
mysite/product-1324
We now know what we did wrong to ensure it doesn't continue to happen if we weret o delete and re-add a product, but Google contains all these old URLs in its index which has caused people to search for products on Google, click through, then land on the 404 page - far from ideal.
We kind of assumed, with continual updating of sitemaps and time, that Google would realise and update the URL accordingly. But this hasn't happened - we are still getting plenty of 404 errors on certain product searches (These aren't appearing in SEOmoz, there are no links to the old URL on the site, only Google, as the index contains the old URL).
Aside from going through and finding the products affected (no easy task), and setting up redirects for each one, is there any way we can tell Google 'These URLs are no longer a thing, forget them and move on, let's make a fresh start and Happy New Year'?
-
No canonical back to the main product page?
-
Both helpful replies thanks. Further investigation led me to this Magento Bug:
http://www.magentocommerce.com/bug-tracking/issue/?issue=13662
(Need to have a magneto account to see the bug report).
Seems there's a spearate underlying issue which we need to fix first - the rewrite table grows exponentially every time we index Magento and creates a new URL for every configurable product. i.e. a product that has one or more associated products that will have the same name - used for displaying different sizes and colours. This means that Google is picking up a new page for each configurable product each time it indexes: different URL, same content, same product sku - a technical SEO nightmare!
-
Hey Sean
This should take care of itself but there are a few things you can do to help.
**1. **Firstly, using webbug or some such, just make sure the page is returning a HTTP 404 or 410 code to ensure that whilst it may be displaying some kind of 404 like page, that it is actually sending the 4XX code back to Google (so they can update this and remove them).
2. Then, you can log into webmaster tools and remove URLs from your site:
Webmaster Tools > Optimisation > Remove URLs
This way you can manually remove them.
Alternatively, you could always just manually add some 301 redirects for those pages which may be the quickest way to sort this out and certainly provides the best experience for any users clicking on those links in the SERPs.
Hope that helps!
Marcus -
complex thing. Not sure if this may help you or not -
Example meta tag
Add the following meta tag in the HTML source of your page:
<meta http-equiv="expires" content="mon, 27 sep 2010 14:30:00 GMT">
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Content From One Domain Mysteriously Indexing Under a Different Domain's URL
I've pulled out all the stops and so far this seems like a very technical issue with either Googlebot or our servers. I highly encourage and appreciate responses from those with knowledge of technical SEO/website problems. First some background info: Three websites, http://www.americanmuscle.com, m.americanmuscle.com and http://www.extremeterrain.com as well as all of their sub-domains could potentially be involved. AmericanMuscle sells Mustang parts, Extremeterrain is Jeep-only. Sometime recently, Google has been crawling our americanmuscle.com pages and serving them in the SERPs under an extremeterrain sub-domain, services.extremeterrain.com. You can see for yourself below. Total # of services.extremeterrain.com pages in Google's index: http://screencast.com/t/Dvqhk1TqBtoK When you click the cached version of there supposed pages, you see an americanmuscle page (some desktop, some mobile, none of which exist on extremeterrain.com😞 http://screencast.com/t/FkUgz8NGfFe All of these links give you a 404 when clicked... Many of these pages I've checked have cached multiple times while still being a 404 link--googlebot apparently has re-crawled many times so this is not a one-time fluke. The services. sub-domain serves both AM and XT and lives on the same server as our m.americanmuscle website, but answer to different ports. services.extremeterrain is never used to feed AM data, so why Google is associating the two is a mystery to me. the mobile americanmuscle website is set to only respond on a different port than services. and only responds to AM mobile sub-domains, not googlebot or any other user-agent. Any ideas? As one could imagine this is not an ideal scenario for either website.
Intermediate & Advanced SEO | | andrewv0 -
How to remove my site's pages in search results?
I have tested hundreds of pages to see if Google will properly crawl, index and cached them. Now, I want these pages to be removed in Google search except for homepage. What should be the rule in robots.txt? I use this rule, but I am not sure if Google will remove the hundreds of pages (for my testing). User-agent: *
Intermediate & Advanced SEO | | esiow2013
Disallow: /
Allow: /$0 -
Site Structure: How do I deal with a great user experience that's not the best for Google's spiders?
We have ~3,000 photos that have all been tagged. We have a wonderful AJAXy interface for users where they can toggle all of these tags to find the exact set of photos they're looking for very quickly. We've also optimized a site structure for Google's benefit that gives each category a page. Each category page links to applicable album pages. Each album page links to individual photo pages. All pages have a good chunk of unique text. Now, for Google, the domain.com/photos index page should be a directory of sorts that links to each category page. Alternatively, the user would probably prefer the AJAXy interface. What is the best way to execute this?
Intermediate & Advanced SEO | | tatermarketing0 -
Significant Google crawl errors
We've got a site that continuously like clockwork encounters server errors with when Google crawls it. Since the end of last year it will go a week fine, then it will have two straight weeks of 70%-100% error rate when Google tries to crawl it. During this time you can still put the URL in and go to the site, but spider simulators return a 404 error. Just this morning we had another error message, I did a fetch and resubmit, and magically now it's back. We changed servers on it in Jan to Go Daddy because the previous server (Tronics) kept getting hacked. IIt's built in html so I'm wondering if it's something in the code maybe? http://www.campteam.com/
Intermediate & Advanced SEO | | GregWalt1 -
Google Indexing Feedburner Links???
I just noticed that for lots of the articles on my website, there are two results in Google's index. For instance: http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html and http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+thewebhostinghero+(TheWebHostingHero.com) Now my Feedburner feed is set to "noindex" and it's always been that way. The canonical tag on the webpage is set to: rel='canonical' href='http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html' /> The robots tag is set to: name="robots" content="index,follow,noodp" /> I found out that there are scrapper sites that are linking to my content using the Feedburner link. So should the robots tag be set to "noindex" when the requested URL is different from the canonical URL? If so, is there an easy way to do this in Wordpress?
Intermediate & Advanced SEO | | sbrault740 -
URL errors in Google Webmaster Tool
Hi Within Google Webmaster Tool 'Crawl errors' report by clicking 'Not found' it shows 404 errors its found. By clicking any column headings and it will reorder them. One column is 'Priority' - do you think Google is telling me its ranked the errors in priority of needing a fix? There is no reference to this in the Webmaster tool help. Many thanks Nigel
Intermediate & Advanced SEO | | Richard5551 -
Google fluctuates its result on Chrome's private browsing
I have seen an interesting Google behaviour this morning. As usual, I would open Chrome's private browsing to see how a keyword is ranking. This was what I see... Typed in "sell my car", I see Auto Trader page on 3rd. (Ref:Sell My Car 1st result img) Googled something else, then re-Googled "sell my car" and saw that our page went to 2nd! I repeated the same process and saw that we went from 3rd to 2nd again. Has Google results gone mental? PaGXJ.png
Intermediate & Advanced SEO | | tmg.seo0 -
Changing Hosting Companies - Site Downtime - Google Indexing Concern
We are getting ready to switch to a new hosting company. When we make the switchover, our sites will be offline for a couple of hours and in some cases perhaps as long as 12 hours while DNS is configured -- should we be worried about Google trying to index pages and finding them unavailable? Any fear of Google de-indexing pages. Our guess was that Google would not de-index anything after just a short period of not being able to find pages -- it would have to be over an extended period of time before GOOGLE or BING would de-index pages -- CORRECT? Just want to gut check this before pulling the trigger on switch over to new hosting company. We appreciate input on this and/or any other thoughts regarding the switch over to new hosting company that we may not have thought of. Thanks, Matt
Intermediate & Advanced SEO | | MWM37720