Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to de-index old URLs after redesigning the website?
-
Thank you for reading.
After redesigning my website (5 months ago) in my crawl reports (Moz, Search Console) I still get tons of 404 pages which all seems to be the URLs from my previous website (same root domain).
It would be nonsense to 301 redirect them as there are to many URLs. (or would it be nonsense?)
What is the best way to deal with this issue?
-
Thank you Clever PhD, really valuable insights!
-
I completely agree with all of the above - I've taken her point more like my own. Where receiving thousands of annoying 404 errors from pages that haven't existed for many months just gets annoying!

-
I respectfully disagree with all of the above. Please repeat after me, 404s are not bad, they are diagnostic, 404s are not bad, they are diagnostic, 404s are not bad, they are diagnostic.
After redesigning my website (5 months ago) in my crawl reports (Moz, Search Console) I still get tons of 404 pages which all seems to be the URLs from my previous website (same root domain).
**Part 1 Internal links that 404s from Moz Crawl: **The 404s that show up in the Moz crawl are only going to be from an internal link on your website. The Moz crawl only looks at internal links and not links from other website. In other words, if you see 404s in your Moz crawl, that means, somewhere, you are linking to those pages and that is why the 404s are showing up. Download the CSV and you will find them in your Moz crawl. Other tools such as screaming frog, Botify, Deep Crawl, will show you a similar analysis.
Simple solution. Go through your code and remove the internal links on your site that direct the Moz crawler to those pages and the 404s will go away. (FYI this same approach will work for any internal 301s) These 404 errors in the Moz report are great diagnostic signals on where to fix your site. It is bad for users to click on a link within your website and get sent to a page that does not exist.
**Part 2 external links from Search Console: **The 404s that show up in Search console can come from your internal links on your site AND external links from other sites. Google will keep trying to crawl these links due to other sites linking to pages on your site and your own internal links. For internal link fixing - see suggestion above. For external links you need a different approach.
Look at the external links, where are they coming from? Are they from quality websites? Do they go to formerly important pages on your websites (ie pages that were good converters? If so, then use the 301 redirect to send them to the correct replacement page (and this is not always the home page). You get users to the correct page and also any link equity is passed along as well and this can help with your site rankings. If the link goes to former page on your site that was not any good to start with and the links that come into it are poor quality, then you just let the page 404. Tools such as Moz Open Site Explorer or Ahrefs or Majestic can help with this assessment - but usually you can just look at a site linking to you and tell if it is crap or not.
You need to consider the above regardless of if you want to get the pages that are 404ing in question out of the Google index as if you get Google to remove the page from the index, it will then see the internal link on your site and then find the 404 again. If you have removed the links to the 404 pages on your site, eventually Google will stop crawling them and drop out of the index.
Important note regarding the use of robots.txt. Blocking Google from crawling the 404s will not remove the pages from the index, Google will just stop crawling them. Google has to be able to crawl the URL to see the 404 and then see that it is a bad page and then remove the page from the index. Blocking with robots.txt stops Google from doing that. As soon as you take the page out of robots Google will recrawl and the 404 shows up again. Robots.txt treats a symptom that is a red herring, allowing the 404 to occur takes care of the issue permanently.
Dead pages are a natural part of the web. Let Google see the 404 (if it truly is a page that should 404 and has no link equity that should be passed along with a 301). Google will crawl the 404 several times, you will see it in search console several times. It is ok. You are not penalized for X number of 404s. You may lose ranking if you 404 a page that Google used to rank well, but this is just because Google will not keep a page highly ranked that does not exist :-). Help Google out by cleaning up your internal link structure so when it sees that you do not link to the page any more, then that is a signal that the page should 404. Google knows that due to the nature of the web, pages will time out on occasion and show an error. Google will continue to recrawl a page just to make sure, it wants to give you the benefit of the doubt. Therefore, you have to give clear directives by not linking to dead pages so that after Google double and triple checks the page, it will finally drop it. You will see the 404 in your Search Console for several months then it will eventually go away.
Hope that makes sense. Good luck!
-
Hey Lana, If you really think that 301 does not make sense in that case you can always add the URLs in the robots.txt file and once Google will recrawl your website, Google will de-index the pages from the index.
Another thing you can do is using the de-index feature in Google webmaster tool. You can do that by getting in to your GWT, Optimization > Remove URLs and do that accordingly.
Hope this helps!
-
I see the point. Thanks Liam. As the most of our 404 pages starts with /en-GB/ i will do like this:
Disallow: /en-GB/
-
Hi Lana,
I've been having the same problem on one of our websites. I've been 301 redirecting over 5,000 URL's but still receive a lot of 404 errors. One of the main reasons for these 404 errors still appearing is other bots such as Bing Bot that is still crawling the old URL's.
To resolve this, I would just block them in your robots.txt file. We blocked our old product URL's that were under a "product directory like this:
User-agent: *
Disallow: /product/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does google ignore ? in url?
Hi Guys, Have a site which ends ?v=6cc98ba2045f for all its URLs. Example: https://domain.com/products/cashmere/robes/?v=6cc98ba2045f Just wondering does Google ignore what is after the ?. Also any ideas what that is? Cheers.
Intermediate & Advanced SEO | | CarolynSC0 -
Should I include URLs that are 301'd or only include 200 status URLs in my sitemap.xml?
I'm not sure if I should be including old URLs (content) that are being redirected (301) to new URLs (content) in my sitemap.xml. Does anyone know if it is best to include or leave out 301ed URLs in a xml sitemap?
Intermediate & Advanced SEO | | Jonathan.Smith0 -
Does Google Index URLs that are always 302 redirected
Hello community Due to the architecture of our site, we have a bunch of URLs that are 302 redirected to the same URL plus a query string appended to it. For example: www.example.com/hello.html is 302 redirected to www.example.com/hello.html?___store=abc The www.example.com/hello.html?___store=abc page also has a link canonical tag to www.example.com/hello.html In the above example, can www.example.com/hello.html every be Indexed, by google as I assume the googlebot will always be redirected to www.example.com/hello.html?___store=abc and will never see www.example.com/hello.html ? Thanks in advance for the help!
Intermediate & Advanced SEO | | EcommRulz0 -
URL Injection Hack - What to do with spammy URLs that keep appearing in Google's index?
A website was hacked (URL injection) but the malicious code has been cleaned up and removed from all pages. However, whenever we run a site:domain.com in Google, we keep finding more spammy URLs from the hack. They all lead to a 404 error page since the hack was cleaned up in the code. We have been using the Google WMT Remove URLs tool to have these spammy URLs removed from Google's index but new URLs keep appearing every day. We looked at the cache dates on these URLs and they are vary in dates but none are recent and most are from a month ago when the initial hack occurred. My question is...should we continue to check the index every day and keep submitting these URLs to be removed manually? Or since they all lead to a 404 page will Google eventually remove these spammy URLs from the index automatically? Thanks in advance Moz community for your feedback.
Intermediate & Advanced SEO | | peteboyd0 -
Mobile website on a different URL address?
My client has an old eCommerce website that is ranking high in Google. The website is not responsive for mobile devices. The client wants to create a responsive design mobile version of the website and put it on a different URL address. There would be a link on the current page pointing to the external mobile website. Is this approach ok or not? The reason why the client does not want to change the design of the current website is because he does not have the budget to do so and there are a lot of pages that would need to be moved to the new design. Any advice would be appreciated.
Intermediate & Advanced SEO | | andypatalak0 -
Remove URLs that 301 Redirect from Google's Index
I'm working with a client who has 301 redirected thousands of URLs from their primary subdomain to a new subdomain (these are unimportant pages with regards to link equity). These URLs are still appearing in Google's results under the primary domain, rather than the new subdomain. This is problematic because it's creating an artificial index bloat issue. These URLs make up over 90% of the URLs indexed. My experience has been that URLs that have been 301 redirected are removed from the index over time and replaced by the new destination URL. But it has been several months, close to a year even, and they're still in the index. Any recommendations on how to speed up the process of removing the 301 redirected URLs from Google's index? Will Google, or any search engine for that matter, process a noindex meta tag if the URL's been redirected?
Intermediate & Advanced SEO | | trung.ngo0 -
De-indexing product "quick view" pages
Hi there, The e-commerce website I am working on seems to index all of the "quick view" pages (which normally occur as iframes on the category page) as their own unique pages, creating thousands of duplicate pages / overly-dynamic URLs. Each indexed "quick view" page has the following URL structure: www.mydomain.com/catalog/includes/inc_productquickview.jsp?prodId=89514&catgId=cat140142&KeepThis=true&TB_iframe=true&height=475&width=700 where the only thing that changes is the product ID and category number. Would using "disallow" in Robots.txt be the best way to de-indexing all of these URLs? If so, could someone help me identify how to best structure this disallow statement? Would it be: Disallow: /catalog/includes/inc_productquickview.jsp?prodID=* Thanks for your help.
Intermediate & Advanced SEO | | FPD_NYC0 -
Adding index.php at the end of the url effect it's rankings
I have just had my site updated and we have put index.php at the end of all the urls. Not long after the sites rankings dropped. Checking the backlinks, they all go to (example) http://www.website.com and not http://www.website.com/index.php. So could this change have effected rankings even though it redirects to the new url?
Intermediate & Advanced SEO | | authoritysitebuilder0