Site architecture change - +30,000 404's in GWT
-
So recently we decided to change the URL structure of our online e-commerce catalogue - to make it easier to maintain in the future.
But since the change, we have (partially expected) +30K 404's in GWT - when we did the change, I was doing 301 redirects from our Apache server logs but it's just escalated.
Should I be concerned of "plugging" these 404's, by either removing them via URL removal tool or carry on doing 301 redirections? It's quite labour intensive - no incoming links to most of these URL's, so is there any point?
Thanks,
Ben
-
Hi Ben,
The answer to your question boils down to usability and link equity:
- Usability: Did the old URLs get lots of Direct and Referring traffic? E.g., do people have them bookmarked, type them directly into the address bar, or follow links from other sites? If so, there's an argument to be made for 301 redirecting the old URLs to their equivalent, new URLs. That makes for a much more seamless user experience, and increases the odds that visitors from these traffic sources will become customers, continue to be customers, etc.
- Link equity: When you look at a Top Pages report (in Google Webmaster Tools, Open Site Explorer, or ahrefs), how many of those most-linked and / or best-ranking pages are old product URLs? If product URLs are showing up in these reports, they definitely require a 301 redirect to an equivalent, new URL so that link equity isn't lost.
However, if (as is common with a large number of ecommerce sites), your old product URLs got virtually zero Direct or Referring traffic, and had virtually zero deep links, then letting the URLs go 404 is just fine. I think I remember a link churn report in the early days of LinkScape when they reported that something on the order of 80% of the URLs they had discovered would be 404 within a year. URL churn is a part of the web.
If you decide not to 301 those old URLs, then you simply want to serve a really consistent signal to engines that they're gone, and not coming back. Recently, JohnMu from Google suggested recently that there's a tiny difference in how Google treats 404 versus 410 response codes - 404s are often re-crawled (which leads to those 404 error reports in GWT), whereas 410 is treated as a more "permanent" indicator that the URL is gone for good, so 410s are removed from the index a tiny bit faster. Read more: http://www.seroundtable.com/google-content-removal-16851.html
Hope that helps!
-
Hi,
Are you sure these old urls are not being linked from somewhere (probably internally)? Maybe the sitemap.xml was forgotten and is pointing to all the old urls still? I think that for 404's to show in GWT there needs to be a link to them from somewhere, so in the first instance in GWT go to the 404s and have a look at where they are linked from (you can do this with moz reports also). If it is an internal page like a sitemap, or some forgotten menu/footer feature or similar that is still linking to old pages then yes you certainly want to clear this up! If this is the case, once you have fixed the internal linking issues you should have significantly reduced list of 404s and can then concentrate on these on a more case by case basis (assuming they are being triggered by external links).
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I'm seeing thousands of no-follow links on spam sites. Can you help figure it out?
I noticed that we are receiving thousands of links from many different sites that are obviously disguised as something else. The strange part is that some of them are legitimate sites when you go to the root. I would say 99% of the page titles read something like : 1 Hour Loan Approval No Credit Check Vermont, go cash advance - africanamericanadaa.com. Can someone please help me? Here are some of the URL's we are looking at: http://africanamericanadaa.com/genialt/100-dollar-loans-for-people-with-no-credit-colorado.html http://muratmakara.com/sickn/index.php?recipe-for-cone-06-crackle-glaze http://semtechblog.com/tacoa/index.php?chilis-blue-raspberry-margarita http://wesleygcook.com/rearc/guaranteed-personal-loans-oregon.html
White Hat / Black Hat SEO | | TicketCity0 -
One page sites
HI Guys, I need help with a one page site What is the best method to getting the lower pages indexed? Linking back to the site(Deeplinking) is looking impossible. Will this hurt my SEO? Are there any other tips on one page websites that you can recommend?
White Hat / Black Hat SEO | | Johnny_AppleSeed0 -
On-site duplication working - not penalised - any ideas?
I've noticed a website that has been set up with many virtually identical pages. For example many of them have the same content (minimal text, three video clips) and only the town name varies. Surely this is something that Google would be against? However the site is consistently ranking near the top of Google page 1, e.g. http://www.maxcurd.co.uk/magician-guildford.html for "magician Guildford", http://www.maxcurd.co.uk/magician-ascot.html for "magician Ascot" and so on (even when searching without localisation or personalisation). For years I've heard SEO experts say that this sort of thing is frowned on and that they will get penalised, but it never seems to happen. I guess there must be some other reason that this site is ranked highly - any ideas? The content is massively duplicated and the blog hasn't been updated since 2012 but it is ranking above many established older sites that have lots of varied content, good quality backlinks and regular updates. Thanks.
White Hat / Black Hat SEO | | MagicianUK0 -
Sudden Recent Drop in Impressions in GWT - WTF?
I noticed this recent drop in impressions in Google Webmaster Tools. It started mid-February, and I know there was the page layout algorithm on the 6th, and I've heard mention of a Panda update around the 11th, so I started to wonder what was resposible. A manual penalty was just recently removed, too. As I dug deeper, I discovered other problems. For one a misredirected blog causing 404s, plus a redirected site whose duplicate pages were never removed from Google's index. There are also two exact match domains 301 redirected to the site, but there were no links or content prior to the redirect. In a site:operator search, one is showing a duplicate homepage. When the wordpress.com blog was redirected, it was not redirected to the /blog subdirectory. Could the resulting 404s which go back as far as I can see in GWT (3 month limit) be the cause of this drop? We're talking about hundreds of blog pages and their links. FYI the main nav in /blog pointed to the old site until 2/7 when I pointed them to the existing domain (so hundreds, if not thousands of links were being redirected) The million dollar question is: is it just the 301 redirect issue causing the problem here? It looks like I might just have exacerbated it when I fixed the nav menu links. Will fixing the redirect rescue the impressions? My plan of attack includes killing the 301 redirects from the exact match domains with no backlinks, and removing the old site from Google's index from within GWT. Any yays or nays? FYI, a 301 redirect of .index.html, default.asp, and non-www was done 1/8,
White Hat / Black Hat SEO | | kimmiedawn
the reconsideration request was sent 1/24, manual penalty lifted 2/10. Index.html still redirects twice, going to www.site.com/index.html before resolving at .com. Same with default.asp. IarDs8u0 -
Does Google Consider a Follow Affiliate Link into my site a paid link?
Let's say I have a link coming into my domain like this http://www.mydomain.com/l/freerol.aspx?AID=674&subid=Week+2+Freeroll&pid=120 Do you think Google recognizes this as paid link? These links are follow links. I am working on a site that has tons of these, but ranks fairly well. They did lose some ranking over the past month or so, and I am wondering if it might be related to a recent iteration of Penguin. These are very high PR inbound links and from a number of good domains, so I would not want to make a mistake and have client get affiliates to no follow if that is going to cause his rankings to drop more. Any thoughts would be appreciated.
White Hat / Black Hat SEO | | Robertnweil10 -
What to do if you've been hacked.....
Just logged into our CMS system and it appears we have been hacked. All page titles have been hijacked adding a secondary title tag linking out to website http://emapaydayloans.com with anchor text pay day loans. Our Web Dev team are working on fixing the hack now. My concern is the potential knock on effect to SEO. This looks like a bad neighbourhood site: 3 pages indexed PR 0 And for I don't know how long we've had almost every page on all our domains linking out with the following page title including the same link and anchor text: payday loans I assume its a wait and see at this stage.
White Hat / Black Hat SEO | | RobertChapman0 -
I think I've been hit by Penguing - Strategy Discusson
Hi, I have a network of 50 to 60 domain names which have duplicated content and whose domains are basically a geographical location + the industry I am in. All of these websites have links to my main site. Over the weekend I saw my traffic fall. I attribute our drop in rankings to what people are calling Penguing 1.1. I want to keep my other domains as we are slowly creating unique content for each of those sites. However, in the mean time, clearly I need to deal with the inbound linking and anchor text problem. Would adding a nofollow tag to all links that point to my main site resolve my issue with Google's penguin update? Thanks for the help.
White Hat / Black Hat SEO | | MangoMan160 -
What on-page/site optimization techniques can I utilize to improve this site (http://www.paradisus.com/)?
I use a Search Engine Spider Simulator to analyze the homepage and I think my client is using black hat tactics such as cloaking. Am I right? Any recommendations on to improve the top navigation under Resorts pull down. Each of the 6 resorts listed are all part of the Paradisus brand, but each resort has their own sub domain.
White Hat / Black Hat SEO | | Melia0