Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to de-index old URLs after redesigning the website?
-
Thank you for reading.
After redesigning my website (5 months ago) in my crawl reports (Moz, Search Console) I still get tons of 404 pages which all seems to be the URLs from my previous website (same root domain).
It would be nonsense to 301 redirect them as there are to many URLs. (or would it be nonsense?)
What is the best way to deal with this issue?
-
Thank you Clever PhD, really valuable insights!
-
I completely agree with all of the above - I've taken her point more like my own. Where receiving thousands of annoying 404 errors from pages that haven't existed for many months just gets annoying!

-
I respectfully disagree with all of the above. Please repeat after me, 404s are not bad, they are diagnostic, 404s are not bad, they are diagnostic, 404s are not bad, they are diagnostic.
After redesigning my website (5 months ago) in my crawl reports (Moz, Search Console) I still get tons of 404 pages which all seems to be the URLs from my previous website (same root domain).
**Part 1 Internal links that 404s from Moz Crawl: **The 404s that show up in the Moz crawl are only going to be from an internal link on your website. The Moz crawl only looks at internal links and not links from other website. In other words, if you see 404s in your Moz crawl, that means, somewhere, you are linking to those pages and that is why the 404s are showing up. Download the CSV and you will find them in your Moz crawl. Other tools such as screaming frog, Botify, Deep Crawl, will show you a similar analysis.
Simple solution. Go through your code and remove the internal links on your site that direct the Moz crawler to those pages and the 404s will go away. (FYI this same approach will work for any internal 301s) These 404 errors in the Moz report are great diagnostic signals on where to fix your site. It is bad for users to click on a link within your website and get sent to a page that does not exist.
**Part 2 external links from Search Console: **The 404s that show up in Search console can come from your internal links on your site AND external links from other sites. Google will keep trying to crawl these links due to other sites linking to pages on your site and your own internal links. For internal link fixing - see suggestion above. For external links you need a different approach.
Look at the external links, where are they coming from? Are they from quality websites? Do they go to formerly important pages on your websites (ie pages that were good converters? If so, then use the 301 redirect to send them to the correct replacement page (and this is not always the home page). You get users to the correct page and also any link equity is passed along as well and this can help with your site rankings. If the link goes to former page on your site that was not any good to start with and the links that come into it are poor quality, then you just let the page 404. Tools such as Moz Open Site Explorer or Ahrefs or Majestic can help with this assessment - but usually you can just look at a site linking to you and tell if it is crap or not.
You need to consider the above regardless of if you want to get the pages that are 404ing in question out of the Google index as if you get Google to remove the page from the index, it will then see the internal link on your site and then find the 404 again. If you have removed the links to the 404 pages on your site, eventually Google will stop crawling them and drop out of the index.
Important note regarding the use of robots.txt. Blocking Google from crawling the 404s will not remove the pages from the index, Google will just stop crawling them. Google has to be able to crawl the URL to see the 404 and then see that it is a bad page and then remove the page from the index. Blocking with robots.txt stops Google from doing that. As soon as you take the page out of robots Google will recrawl and the 404 shows up again. Robots.txt treats a symptom that is a red herring, allowing the 404 to occur takes care of the issue permanently.
Dead pages are a natural part of the web. Let Google see the 404 (if it truly is a page that should 404 and has no link equity that should be passed along with a 301). Google will crawl the 404 several times, you will see it in search console several times. It is ok. You are not penalized for X number of 404s. You may lose ranking if you 404 a page that Google used to rank well, but this is just because Google will not keep a page highly ranked that does not exist :-). Help Google out by cleaning up your internal link structure so when it sees that you do not link to the page any more, then that is a signal that the page should 404. Google knows that due to the nature of the web, pages will time out on occasion and show an error. Google will continue to recrawl a page just to make sure, it wants to give you the benefit of the doubt. Therefore, you have to give clear directives by not linking to dead pages so that after Google double and triple checks the page, it will finally drop it. You will see the 404 in your Search Console for several months then it will eventually go away.
Hope that makes sense. Good luck!
-
Hey Lana, If you really think that 301 does not make sense in that case you can always add the URLs in the robots.txt file and once Google will recrawl your website, Google will de-index the pages from the index.
Another thing you can do is using the de-index feature in Google webmaster tool. You can do that by getting in to your GWT, Optimization > Remove URLs and do that accordingly.
Hope this helps!
-
I see the point. Thanks Liam. As the most of our 404 pages starts with /en-GB/ i will do like this:
Disallow: /en-GB/
-
Hi Lana,
I've been having the same problem on one of our websites. I've been 301 redirecting over 5,000 URL's but still receive a lot of 404 errors. One of the main reasons for these 404 errors still appearing is other bots such as Bing Bot that is still crawling the old URL's.
To resolve this, I would just block them in your robots.txt file. We blocked our old product URL's that were under a "product directory like this:
User-agent: *
Disallow: /product/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Someone redirected his website to ours
Hi all, I have strange issue as someone redirected website http://bukmachers.pl to ours https://legalnibukmacherzy.pl We don't know exactly what to do with it. I checked backlinks and the website had some links which now redirect to us. I also checked this website on wayback machine and back in 2017 this website had some low quality content but in 2018 they made similar redirection to current one but to different website (our competitor). Can such redirection be harmful for us? Should we do something with this or leave it, as google stop encouraging to disavow low quality links.
Intermediate & Advanced SEO | | Kahuna_Charles1 -
If my website do not have a robot.txt file, does it hurt my website ranking?
After a site audit, I find out that my website don't have a robot.txt. Does it hurt my website rankings? One more thing, when I type mywebsite.com/robot.txt, it automatically redirect to the homepage. Please help!
Intermediate & Advanced SEO | | binhlai0 -
How can I make a list of all URLs indexed by Google?
I started working for this eCommerce site 2 months ago, and my SEO site audit revealed a massive spider trap. The site should have been 3500-ish pages, but Google has over 30K pages in its index. I'm trying to find a effective way of making a list of all URLs indexed by Google. Anyone? (I basically want to build a sitemap with all the indexed spider trap URLs, then set up 301 on those, then ping Google with the "defective" sitemap so they can see what the site really looks like and remove those URLs, shrinking the site back to around 3500 pages)
Intermediate & Advanced SEO | | Bryggselv.no0 -
Wrong URLs indexed, Failing To Rank Anywhere
I’m struggling with a client website that's massively failing to rank. It was published in Nov/Dec last year - not optimised or ranking for anything, it's about 20 pages. I came onboard recently, and 5-6 weeks ago we added new content, did the on-page and finally changed from the non-www to the www version in htaccess and WP settings (while setting www as preferred in Search Console). We then did a press release and since then, have acquired about 4 partial match contextual links on good websites (before this, it had virtually none, save for social profiles etc.) I should note that just before we added the (about 50%) new content and optimised, my developer accidentally published the dev site of the old version of the site and it got indexed. He immediately added it correctly to robots.txt, and I assumed it would therefore drop out of the index fairly quickly and we need not be concerned. Now it's about 6 weeks later, and we’re still not ranking anywhere for our chosen keywords. The keywords are around “egg freezing,” so only moderate competition. We’re not even ranking for our brand name, which is 4 words long and pretty unique. We were ranking in the top 30 for this until yesterday, but it was the press release page on the old (non-www) URL! I was convinced we must have a duplicate content issue after realising the dev site was still indexed, so last week, we went into Search Console to remove all of the dev URLs manually from the index. The next day, they were all removed, and we suddenly began ranking (~83) for “freezing your eggs,” one of our keywords! This seemed unlikely to be a coincidence, but once again, the positive sign was dampened by the fact it was non-www page that was ranking, which made me wonder why the non-www pages were still even indexed. When I do site:oursite.com, for example, both non-www and www URLs are still showing up…. Can someone with more experience than me tell me whether I need to give up on this site, or what I could do to find out if I do? I feel like I may be wasting the client’s money here by building links to a site that could be under a very weird penalty 😕
Intermediate & Advanced SEO | | Ullamalm0 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
How can I get a list of every url of a site in Google's index?
I work on a site that has almost 20,000 urls in its site map. Google WMT claims 28,000 indexed and a search on Google shows 33,000. I'd like to find what the difference is. Is there a way to get an excel sheet with every url Google has indexed for a site? Thanks... Mike
Intermediate & Advanced SEO | | 945010 -
Best way to permanently remove URLs from the Google index?
We have several subdomains we use for testing applications. Even if we block with robots.txt, these subdomains still appear to get indexed (though they show as blocked by robots.txt. I've claimed these subdomains and requested permanent removal, but it appears that after a certain time period (6 months)? Google will re-index (and mark them as blocked by robots.txt). What is the best way to permanently remove these from the index? We can't use login to block because our clients want to be able to view these applications without needing to login. What is the next best solution?
Intermediate & Advanced SEO | | nicole.healthline0 -
How to deal with old, indexed hashbang URLs?
I inherited a site that used to be in Flash and used hashbang URLs (i.e. www.example.com/#!page-name-here). We're now off of Flash and have a "normal" URL structure that looks something like this: www.example.com/page-name-here Here's the problem: Google still has thousands of the old hashbang (#!) URLs in its index. These URLs still work because the web server doesn't actually read anything that comes after the hash. So, when the web server sees this URL www.example.com/#!page-name-here, it basically renders this page www.example.com/# while keeping the full URL structure intact (www.example.com/#!page-name-here). Hopefully, that makes sense. So, in Google you'll see this URL indexed (www.example.com/#!page-name-here), but if you click it you essentially are taken to our homepage content (even though the URL isn't exactly the canonical homepage URL...which s/b www.example.com/). My big fear here is a duplicate content penalty for our homepage. Essentially, I'm afraid that Google is seeing thousands of versions of our homepage. Even though the hashbang URLs are different, the content (ie. title, meta descrip, page content) is exactly the same for all of them. Obviously, this is a typical SEO no-no. And, I've recently seen the homepage drop like a rock for a search of our brand name which has ranked #1 for months. Now, admittedly we've made a bunch of changes during this whole site migration, but this #! URL problem just bothers me. I think it could be a major cause of our homepage tanking for brand queries. So, why not just 301 redirect all of the #! URLs? Well, the server won't accept traditional 301s for the #! URLs because the # seems to screw everything up (server doesn't acknowledge what comes after the #). I "think" our only option here is to try and add some 301 redirects via Javascript. Yeah, I know that spiders have a love/hate (well, mostly hate) relationship w/ Javascript, but I think that's our only resort.....unless, someone here has a better way? If you've dealt with hashbang URLs before, I'd LOVE to hear your advice on how to deal w/ this issue. Best, -G
Intermediate & Advanced SEO | | Celts180