Killing 404 errors on our site in Google's index
-
Having moved a site across to Magento, obviously re-directs were a large part of that, ensuring all the old products and categories linked up correctly with the new site structure.
However, we came up against an issue where we needed to add, delete, then re-add products. This, coupled with a misunderstanding of the csv upload processing, meant that although the old urls redirected, some of the new Magento urls changed and then didn't redirect:
For Example:
mysite/product
would get deleted re-added and become:
mysite/product-1324
We now know what we did wrong to ensure it doesn't continue to happen if we weret o delete and re-add a product, but Google contains all these old URLs in its index which has caused people to search for products on Google, click through, then land on the 404 page - far from ideal.
We kind of assumed, with continual updating of sitemaps and time, that Google would realise and update the URL accordingly. But this hasn't happened - we are still getting plenty of 404 errors on certain product searches (These aren't appearing in SEOmoz, there are no links to the old URL on the site, only Google, as the index contains the old URL).
Aside from going through and finding the products affected (no easy task), and setting up redirects for each one, is there any way we can tell Google 'These URLs are no longer a thing, forget them and move on, let's make a fresh start and Happy New Year'?
-
No canonical back to the main product page?
-
Both helpful replies thanks. Further investigation led me to this Magento Bug:
http://www.magentocommerce.com/bug-tracking/issue/?issue=13662
(Need to have a magneto account to see the bug report).
Seems there's a spearate underlying issue which we need to fix first - the rewrite table grows exponentially every time we index Magento and creates a new URL for every configurable product. i.e. a product that has one or more associated products that will have the same name - used for displaying different sizes and colours. This means that Google is picking up a new page for each configurable product each time it indexes: different URL, same content, same product sku - a technical SEO nightmare!
-
Hey Sean
This should take care of itself but there are a few things you can do to help.
**1. **Firstly, using webbug or some such, just make sure the page is returning a HTTP 404 or 410 code to ensure that whilst it may be displaying some kind of 404 like page, that it is actually sending the 4XX code back to Google (so they can update this and remove them).
2. Then, you can log into webmaster tools and remove URLs from your site:
Webmaster Tools > Optimisation > Remove URLs
This way you can manually remove them.
Alternatively, you could always just manually add some 301 redirects for those pages which may be the quickest way to sort this out and certainly provides the best experience for any users clicking on those links in the SERPs.
Hope that helps!
Marcus -
complex thing. Not sure if this may help you or not -
Example meta tag
Add the following meta tag in the HTML source of your page:
<meta http-equiv="expires" content="mon, 27 sep 2010 14:30:00 GMT">
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Detecting Real Page as Soft 404 Error
We've migrated my site from HTTP to HTTPS protocols in Sep 2017 but I noticed after migration soft 404 granularly increasing. Example of soft 404 page: https://bit.ly/2xBjy4J But these soft 404 error pages are real pages but Google still detects them as soft 404. When I checked the Google cache it shows me the cache but with HTTP page. We've tried all possible solutions but unable to figure out why Google is still indexing to HTTP pages and detecting HTTPS pages as soft 404 error. Can someone please suggest a solution or possible cause for this issue or anyone same issue like this in past.
Intermediate & Advanced SEO | | bheard0 -
Moved company 'Help Center' from Zendesk to Intercom, got lots of 404 errors. What now?
Howdy folks, excited to be part of the Moz community after lurking for years! I'm a few weeks into my new job (Digital Marketing at Rewind) and about 10 days ago the product team moved our Help Center from Zendesk to Intercom. Apparently the import went smoothly, but it's caused one problem I'm not really sure how to go about solving: https://help.rewind.io/hc/en-us/articles/*** is where all our articles used to sit https://help.rewind.io/*** is where all our articles now are So, for example, the following article has now moved as such: https://help.rewind.io/hc/en-us/articles/115001902152-Can-I-fast-forward-my-store-after-a-rewind- https://help.rewind.io/general-faqs-and-billing/frequently-asked-questions/can-i-fast-forward-my-store-after-a-rewind This has created a bunch of broken URLs in places like our Shopify/BigCommerce app listings, in our email drips, and in external resources etc. I've played whackamole cleaning many of these up, but these old URLs are still indexed by Google – we're up to 475 Crawl Errors in Search Console over the past week, all of which are 404s. I reached out to Intercom about this to see if they had something in place to help, but they just said my "best option is tracking down old links and setting up 301 redirects for those particular addressed". Browsing the Zendesk forms turned up some relevant-ish results, with the leading recommendation being to configure javascript redirects in the Zendesk document head (thread 1, thread 2, thread 3) of individual articles. I'm comfortable setting up 301 redirects on our website, but I'm in a bit over my head in trying to determine how I could do this with content that's hosted externally and sitting on a subdomain. I have access to our Zendesk admin, so I can go in and edit stuff there, but don't have experience with javascript redirects and have read that they might not be great for such a large scale redirection. Hopefully this is enough context for someone to provide guidance on how you think I should go about fixing things (or if there's even anything for me to do) but please let me know if there's more info I can provide. Thanks!
Intermediate & Advanced SEO | | henrycabrown1 -
Forwarded vanity domains, suddenly resolving to 404 with appended URL's ending in random 5 characters
We have several vanity domains that forward to various pages on our primary domain.
Intermediate & Advanced SEO | | SS.Digital
e.g. www.vanity.com (301)--> www.mydomain.com/sub-page (200) These forwards have been in place for months or even years and have worked fine. As of yesterday, we have seen the following problem. We have made no changes in the forwarding settings. Now, inconsistently, they sometimes resolve and sometimes they do not. When we load the vanity URL with Chrome Dev Tools (Network Pane) open, it shows the following redirect chains, where xxxxx represents a random 5 character string of lower and upper case letters. (e.g. VGuTD) EXAMPLE:
www.vanity.com (302, Found) -->
www.vanity.com/xxxxx (302, Found) -->
www.vanity.com/xxxxx (302, Found) -->
www.vanity.com/xxxxx/xxxxx (302, Found) -->
www.mydomain.com/sub-page/xxxxx (404, Not Found) This is just one example, the amount of redirects, vary wildly. Sometimes there is only 1 redirect, sometimes there are as many as 5. Sometimes the request will ultimately resolve on the correct mydomain.com/sub-page, but usually it does not (as in the example above). We have cross-checked across every browser, device, private/non-private, cookies cleared, on and off of our network etc... This leads us to believe that it is not at the device or host level. Our Registrar is Godaddy. They have not encountered this issue before, and have no idea what this 5 character string is from. I tend to believe them because per our analytics, we have determined that this problem only started yesterday. Our primary question is, has anybody else encountered this problem either in the last couple days, or at any time in the past? We have come up with a solution that works to alleviate the problem, but to implement it across hundreds of vanity domains will take us an inordinate amount of time. Really hoping to fix the cause of the problem instead of just treating the symptom.0 -
Product Pages not indexed by Google
We built a website for a jewelry company some years ago, and they've recently asked for a meeting and one of the points on the agenda will be why their products pages have not been indexed. Example: http://rocks.ie/details/Infinity-Ring/7170/ I've taken a look but I can't see anything obvious that is stopping pages like the above from being indexed. It has a an 'index, follow all' tag along with a canonical tag. Am I missing something obvious here or is there any clear reason why product pages are not being indexed at all by Google? Any advice would be greatly appreciated. Update I was told 'that each of the product pages on the full site have corresponding page on mobile. They are referred to each other via cannonical / alternate tags...could be an angle as to why product pages are not being indexed.'
Intermediate & Advanced SEO | | RobbieD910 -
How to make Google index your site? (Blocked with robots.txt for a long time)
The problem is the for the long time we had a website m.imones.lt but it was blocked with robots.txt.
Intermediate & Advanced SEO | | FCRMediaLietuva
But after a long time we want Google to index it. We unblocked it 1 week or 8 days ago. But Google still does not recognize it. I type site:m.imones.lt and it says it is still blocked with robots.txt What should be the process to make Google crawl this mobile version faster? Thanks!0 -
What to do when you buy a Website without it's content which has a few thousand pages indexed?
I am currently considering buying a Website because I would like to use the domain name to build my project on. Currently that domain is in use and that site has a few thousand pages indexed and around 30 Root domains linking to it (mostly to the home page). The topic of the site is not related to what I am planing to use it for. If there is no other way, I can live with losing the link juice that the site is getting at the moment, however, I want to prevent Google from thinking that I am trying to use the power for another, non related topic and therefore run the risk of getting penalized. Are there any Google guidelines or best practices for such a case?
Intermediate & Advanced SEO | | MikeAir0 -
How does Google index pagination variables in Ajax snapshots? We're seeing random huge variables.
We're using the Google snapshot method to index dynamic Ajax content. Some of this content is from tables using pagination. The pagination is tracked with a var in the hash, something like: #!home/?view_3_page=1 We're seeing all sorts of calls from Google now with huge numbers for these URL variables that we are not generating with our snapshots. Like this: #!home/?view_3_page=10099089 These aren't trivial since each snapshot represents a server load, so we'd like these vars to only represent what's returned by the snapshots. Is Google generating random numbers going fishing for content? If so, is this something we can control or minimize?
Intermediate & Advanced SEO | | sitestrux0 -
Can links indexed by google "link:" be bad? or this is like a good example by google
Can links indexed by google "link:" be bad? Or this is like a good example shown by google. We are cleaning our links from Penguin and dont know what to do with these ones. Some of them does not look quality.
Intermediate & Advanced SEO | | bele0