Ecommerce SEO - Indexed product pages are returning 404's due to product database removal. HELP!
-
Hi all,
I recently took over an e-commerce start-up project from one of my co-workers (who left the job last week). This previous project manager had uploaded ~2000 products without setting up a robot.txt file, and as a result, all of the product pages were indexed by Google (verified via Google Webmaster Tool).
The problem came about when he deleted the entire product database from our hosting service, godaddy and performed a fresh install of Prestashop on our hosting plan. All of the created product pages are now gone, and I'm left with ~2000 broken URL's returning 404's. Currently, the site does not have any products uploaded. From my knowledge, I have to either:
- canonicalize the broken URL's to the new corresponding product pages,
or
- request Google to remove the broken URL's (I believe this is only a temporary solution, for Google honors URL removal request for 90 days)
What is the best way to approach this situation? If I setup a canonicalization, would I have to recreate the deleted pages (to match the URL address) and have those pages redirect to the new product pages (canonicalization)?
Alex
-
Everett,
You're right on the money. I don't think you could have summarized my problem any better. I will take Dana's and your advice and let them sit "indexed" for a while and serve a 404. According to GWT's Index Status, the product pages were indexed about a month ago, so I guess it won't hurt to wait a few more weeks until those pages dropped out of Google's index naturally, especially since the site development won't be done for another 6~7 weeks.
Thanks a bunch for all of your insights
-
Right on Everett. I agree 100%
-
I want to make sure everyone, including myself, understands you Alex. Correct me if I'm wrong, but you're saying that the website is totally new (a start-up) and nothing (at least nothing owned by the company you're with) has ever been on that domain name. While building the site the previous guy accidentally allowed the development version of the site to be indexed, and/or allowed product pages that you don't want on the site at all to be indexed. Since it is a brand new site those "old" pages that were deleted didn't have any external links, and didn't have any traffic from Google or elsewhere outside of the company.
IF that is the case, then you can probably just let those pages stay as 404s. Eventually, since nobody is linking to them, they will drop out of the index on their own.
I wouldn't use the URL removal tool in this case. For one thing, it is a dangerous tool and if you don't have experience with this sort of thing it could do more harm than good. It should only take a few weeks for those URLs that were briefly live and indexed to go away if you are serving a 404 or 410 http header response code on those URLs.
I hope this helps. Please let us know if we have misinterpreted your problem.
-
Understood Alex. Yes, of course you would have to rebuild the pages first before you can 301, but it sounds like you are planning on rebuilding them (otherwise you wouldn't be able to use canonical tags either, because there wouldn't be a page to put them on).
I wouldn't just give up and ask Google to remove all of the old URLs. I agree with what Mike has to say about that below. A 302 is a good option if you are worried about the 404s sitting in the index while you are rebuilding your product pages. If you are still on the same platform (it sounds like that didn't change), I would suggest rebuilding as many of the old URLs as you can (if they were good SEO-friendly URLs). That way you could bypass the 301 redirect. If you want to create your pages so that product options are rolled in and separate colors of things no longer need separate pages, you can then choose whether to 301 redirect those old URLs or simply let them 404.
404s aren't necessarily always a bad thing. Regarding the 2,000 of them you have now, if some of those pages just need to go away, you can let them 404 and they will eventually drop out of Google's index. You aren't required to manually submit them via GWT in order for them to be removed.
-
Hi Mike,
Thanks for weighing in. Recreating all of the old pages seems like a pain in the butt... Besides, the site never launched, so I had no traffic at all. Considering there was no traffic at all to these pages, do you think it's a good idea to go through the URL removal from GWT and purge the broken links completely from Google's index?
- Alex
-
Hi Dana,
Thank you for your advice. I'm new at SEO, so I may be wrong but...
Mapping out the old/new URLs on a spreadsheet and setting up a 301 redirect to the new URLs is not a plausible option in my opinion, mainly because the new URLs literally do not exist (I have not created ANY product pages). According to your suggestion, I would have to create new product pages and do a 301 redirect from the broekn URLs to the newly created pages? Not quite sure if I'm understanding you correctly...
In addition, the previous project manager wasn't SEO-savvy (l'm not either... sigh..), so he didn't know that creating separate pages for a product with multiple attributes (such as flavor and size) would result in major duplicate content issues.
The site is going through some major design/layout overhaul, and I intend to come up with a SEO strategy before creating any categories or products.
Thus...
Do you think it's better to submit a URL removal request on GWT and get rid of the indexed URL's completely? I just re-read Google's policy on URL removal, and it states that as long as I have a 4xx (404 or 410, I'm assuming..) returned for the URLs, Google will honor the removal request.
- Alex
-
Rel Canonical is not quite the right thing for this sort of issue.
If you're worried about the 404s sitting around too long and losing traffic for the moment, you can 302 everything to a landing page, category page, or homepage while you work on setting everything else up. You have two choices at this point.... 1) recreate all of the old pages and old URLs then remove the 302s, or 2) Add new products and new URLs, then as Dana said you'll need to map out all your new product URLs and old URLs to determine what old URL should be 301 redirected where. Then set up your necessary 301s and test that they all work.
-
Hi Alex, I am sorry to hear about this. What a mess, no? If it were me, I wouldn't rely solely on the canonical tag. I would also create a spreadsheet and map all the old URLs to the new URLs and set up 301 redirects from the old to the new. 2,000 isn't too bad. You can probably knock them out in 2-3 days...but be sure to test all of the 301s and make sure they are performing the way you expect them to. Hope that helps a little!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
6 .htaccess Rewrites: Remove index.html, Remove .html, Force non-www, Force Trailing Slash
i've to give some information about my website Environment 1. i have static webpage in the root. 2. Wordpress installed in sub-dictionary www.domain.com/blog/ 3. I have two .htaccess , one in the root and one in the wordpress
Intermediate & Advanced SEO | | NeatIT
folder. i want to www to non on all URLs Remove index.html from url Remove all .html extension / Re-direct 301 to url
without .html extension Add trailing slash to the static webpages / Re-direct 301 from non-trailing slash Force trailing slash to the Wordpress Webpages / Re-direct 301 from non-trailing slash Some examples domain.tld/index.html >> domain.tld/ domain.tld/file.html >> domain.tld/file/ domain.tld/file.html/ >> domain.tld/file/ domain.tld/wordpress/post-name >> domain.tld/wordpress/post-name/ My code in ROOT htaccess is <ifmodule mod_rewrite.c="">Options +FollowSymLinks -MultiViews RewriteEngine On
RewriteBase / #removing trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ $1 [R=301,L] #www to non
RewriteCond %{HTTP_HOST} ^www.(([a-z0-9_]+.)?domain.com)$ [NC]
RewriteRule .? http://%1%{REQUEST_URI} [R=301,L] #html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]+)$ $1.html [NC,L] #index redirect
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index.html\ HTTP/
RewriteRule ^index.html$ http://domain.com/ [R=301,L]
RewriteCond %{THE_REQUEST} .html
RewriteRule ^(.*).html$ /$1 [R=301,L]</ifmodule> The above code do 1. redirect www to non-www
2. Remove trailing slash at the end (if exists)
3. Remove index.html
4. Remove all .html
5. Redirect 301 to filename but doesn't add trailing slash at the end0 -
Should I use individual product pages for different formats of the same product?
Hi All -- I'm working with a publishing client who is launching a new site. They have a large product catalogue offered in a number of format types (print, ebook, online learning, packages) with each one possessing a unique ISBN code. From past experience, I know that ISBN codes can be a really important ranking factor. We are currently trying to sort out product page guidelines. The proposed methods are: A single product page for all formats. The user then has the option to select which format they wish to purchase. The page would contain all key descriptors for each format, including: individual ISBN, format, title, price, author, etc. We would then use schema mark-up just to assist search engines with understanding and crawling. BUT we worry that the single page won't rank as well as say an invidual product page with a unique ISBN in the URL (for example: http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470573325.html) Which leads to the next option... Individual URLs for each format. We understand that most e-commerce guidelines state you shouldn't dilute link equity amongst multiple pages with very similar products and descriptions. BUT we want searchers to be able to search by individual ISBN and still find that specific format within the SERPs. This seems to rule out canonicalizing, because we don't prefer one format over the other and still want say the ebook to show up as much as the print version. If anyone has any other options or considerations that we haven't thought about, it would be greatly appreciated. Thanks, U
Intermediate & Advanced SEO | | HarborOneBank0 -
Remove URLs that 301 Redirect from Google's Index
I'm working with a client who has 301 redirected thousands of URLs from their primary subdomain to a new subdomain (these are unimportant pages with regards to link equity). These URLs are still appearing in Google's results under the primary domain, rather than the new subdomain. This is problematic because it's creating an artificial index bloat issue. These URLs make up over 90% of the URLs indexed. My experience has been that URLs that have been 301 redirected are removed from the index over time and replaced by the new destination URL. But it has been several months, close to a year even, and they're still in the index. Any recommendations on how to speed up the process of removing the 301 redirected URLs from Google's index? Will Google, or any search engine for that matter, process a noindex meta tag if the URL's been redirected?
Intermediate & Advanced SEO | | trung.ngo0 -
Do you add 404 page into robot file or just add no index tag?
Hi, got different opinion on this so i wanted to double check with your comment is. We've got /404.html page and I was wondering if you would add this page to robot text so it wouldn't be indexed or would you just add no index tag? What would be the best approach? Thanks!
Intermediate & Advanced SEO | | Rubix0 -
Why are new pages not being indexed, and old pages (now in robots.txt) remain in the index?
I currently have a site that was recently restructured, causing much of its content to be reposted, creating new URL's for each page. To avoid duplicates, all of the existing pages were added to the robots file. That said, it has now been over a week - I know Google has recrawled the site - and when I search for term X, it is stil the old page that is ranking, with the new one nowhere to be seen. I'm assuming it's a cached version, but why are so many of the old pages still appearing in the index? Furthermore, all "tags" pages (it's a Q&A site, like this one) were also added to the robots a few months ago, yet I think they are all still appearing in the index. Anyone got any ideas about why this is happening, and how I can get my new pages indexed?
Intermediate & Advanced SEO | | corp08030 -
My homepage doesn't rank anymore. It's been replaced by irrelevant subpages which rank around 100-200 instead of top 5.
Hey guys, I think I got some kind of penalty for my homepage. I was in top5 for my keywords. Then a few days ago, my homepage stopped ranking for anything except searching for my domain name in Google. sitename.com/widget-reviews/ previously ranked #3 for "widget reviews"
Intermediate & Advanced SEO | | wearetribe
but now....
sitename.com/widget-training-for-pet-cats/ is ranking #84 for widget reviews instead. Similarly across all my other keywords, irrelevant, wrong pages are ranking. Did I get some kind of penalty?0 -
Charity project for local women's shelter - need help: will Google notice if you alter the document title with Javascript after the page loads?
I am doing some pro-bono work with a local shelter for female victims of domestic abuse. I am trying to help visitors to the site cover their tracks by employing a document.title change when the page loads using JavaScript. This shelter receives a lot of traffic from Google. I worry that the Google bots will see this javascript change and somehow penalize this site or modify the title in the SERPs. Has anyone had any experience with this kind of javascript maneuver? All help would be greatly appreciated!
Intermediate & Advanced SEO | | jkonowitch0 -
What are best SEO practices for product pages of unique items when the item is no longer available?
Hello, my company sells used cars though a website. Each vehicle page contains photos and details of the unit, but once the vehicle is sold, all the contents are replaced by a simple text like "this vehicle is not available anymore".
Intermediate & Advanced SEO | | Darioz
Title of the page also change to a generic one.
URL remains the same. I doubt this is the correct way of doing, but I cannot understand what method would be better. The improvement I am considering for pages of no longer available vehicles is this: keep the page alive but with reduced vehicle details, a text like: this vehicles is not available anymore and automatic recommendations for similar items. What do you think? Is this a good practice or do you suggest anything different? Also, should I put a NOINDEX tag on the expired vehicles pages? Thank you in advance for your help.0