How to de-index old URLs after redesigning the website?
-
Thank you for reading.
After redesigning my website (5 months ago) in my crawl reports (Moz, Search Console) I still get tons of 404 pages which all seems to be the URLs from my previous website (same root domain).
It would be nonsense to 301 redirect them as there are to many URLs. (or would it be nonsense?)
What is the best way to deal with this issue?
-
Thank you Clever PhD, really valuable insights!
-
I completely agree with all of the above - I've taken her point more like my own. Where receiving thousands of annoying 404 errors from pages that haven't existed for many months just gets annoying!
-
I respectfully disagree with all of the above. Please repeat after me, 404s are not bad, they are diagnostic, 404s are not bad, they are diagnostic, 404s are not bad, they are diagnostic.
After redesigning my website (5 months ago) in my crawl reports (Moz, Search Console) I still get tons of 404 pages which all seems to be the URLs from my previous website (same root domain).
**Part 1 Internal links that 404s from Moz Crawl: **The 404s that show up in the Moz crawl are only going to be from an internal link on your website. The Moz crawl only looks at internal links and not links from other website. In other words, if you see 404s in your Moz crawl, that means, somewhere, you are linking to those pages and that is why the 404s are showing up. Download the CSV and you will find them in your Moz crawl. Other tools such as screaming frog, Botify, Deep Crawl, will show you a similar analysis.
Simple solution. Go through your code and remove the internal links on your site that direct the Moz crawler to those pages and the 404s will go away. (FYI this same approach will work for any internal 301s) These 404 errors in the Moz report are great diagnostic signals on where to fix your site. It is bad for users to click on a link within your website and get sent to a page that does not exist.
**Part 2 external links from Search Console: **The 404s that show up in Search console can come from your internal links on your site AND external links from other sites. Google will keep trying to crawl these links due to other sites linking to pages on your site and your own internal links. For internal link fixing - see suggestion above. For external links you need a different approach.
Look at the external links, where are they coming from? Are they from quality websites? Do they go to formerly important pages on your websites (ie pages that were good converters? If so, then use the 301 redirect to send them to the correct replacement page (and this is not always the home page). You get users to the correct page and also any link equity is passed along as well and this can help with your site rankings. If the link goes to former page on your site that was not any good to start with and the links that come into it are poor quality, then you just let the page 404. Tools such as Moz Open Site Explorer or Ahrefs or Majestic can help with this assessment - but usually you can just look at a site linking to you and tell if it is crap or not.
You need to consider the above regardless of if you want to get the pages that are 404ing in question out of the Google index as if you get Google to remove the page from the index, it will then see the internal link on your site and then find the 404 again. If you have removed the links to the 404 pages on your site, eventually Google will stop crawling them and drop out of the index.
Important note regarding the use of robots.txt. Blocking Google from crawling the 404s will not remove the pages from the index, Google will just stop crawling them. Google has to be able to crawl the URL to see the 404 and then see that it is a bad page and then remove the page from the index. Blocking with robots.txt stops Google from doing that. As soon as you take the page out of robots Google will recrawl and the 404 shows up again. Robots.txt treats a symptom that is a red herring, allowing the 404 to occur takes care of the issue permanently.
Dead pages are a natural part of the web. Let Google see the 404 (if it truly is a page that should 404 and has no link equity that should be passed along with a 301). Google will crawl the 404 several times, you will see it in search console several times. It is ok. You are not penalized for X number of 404s. You may lose ranking if you 404 a page that Google used to rank well, but this is just because Google will not keep a page highly ranked that does not exist :-). Help Google out by cleaning up your internal link structure so when it sees that you do not link to the page any more, then that is a signal that the page should 404. Google knows that due to the nature of the web, pages will time out on occasion and show an error. Google will continue to recrawl a page just to make sure, it wants to give you the benefit of the doubt. Therefore, you have to give clear directives by not linking to dead pages so that after Google double and triple checks the page, it will finally drop it. You will see the 404 in your Search Console for several months then it will eventually go away.
Hope that makes sense. Good luck!
-
Hey Lana, If you really think that 301 does not make sense in that case you can always add the URLs in the robots.txt file and once Google will recrawl your website, Google will de-index the pages from the index.
Another thing you can do is using the de-index feature in Google webmaster tool. You can do that by getting in to your GWT, Optimization > Remove URLs and do that accordingly.
Hope this helps!
-
I see the point. Thanks Liam. As the most of our 404 pages starts with /en-GB/ i will do like this:
Disallow: /en-GB/
-
Hi Lana,
I've been having the same problem on one of our websites. I've been 301 redirecting over 5,000 URL's but still receive a lot of 404 errors. One of the main reasons for these 404 errors still appearing is other bots such as Bing Bot that is still crawling the old URL's.
To resolve this, I would just block them in your robots.txt file. We blocked our old product URL's that were under a "product directory like this:
User-agent: *
Disallow: /product/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I make a list of all URLs indexed by Google?
I started working for this eCommerce site 2 months ago, and my SEO site audit revealed a massive spider trap. The site should have been 3500-ish pages, but Google has over 30K pages in its index. I'm trying to find a effective way of making a list of all URLs indexed by Google. Anyone? (I basically want to build a sitemap with all the indexed spider trap URLs, then set up 301 on those, then ping Google with the "defective" sitemap so they can see what the site really looks like and remove those URLs, shrinking the site back to around 3500 pages)
Intermediate & Advanced SEO | | Bryggselv.no0 -
Website using search term as URL brand name to cheat Google
Google has come a long way over the past 5 years, the quality updates have really helped bring top quality content to the top that is relevant for users search terms, although there is one really ANNOYING thing that still has not been fixed. Websites using brand name as service search term to manipulate Google I have got a real example but I wouldn't like to use it in case the brand mentions flags up in their tools and they spot this post, but take this search for example "Service+Location" You will get 'service+location.com' rank #1 Why? Heaven knows. They have less than 100 backlinks which are of a very low, spammy quality from directories. The content is poor compared to the competition and the competitors have amazing link profiles, great social engagement, much better website user experience and the data does not prove anything. All the competitors are targeting the same search term but yet the worst site is ranking the highest. Why on earth is Google not fixing this issue. This page we are seeing rank #1 do not even deserve to be ranking on the first 5 pages.
Intermediate & Advanced SEO | | Jseddon920 -
New Site Launch - Redirecting Hundreds of Old Invalid URLs?
We have a client whose WMT shows a ton of "Not found" crawl errors after a new site launched. The URLs are largely a bunch of files that they had previously uploaded on their old site which have external links pointing at them. What is their best course of action, if they don't have files on the new site that correspond with these old files? Redirect to the home page? Leave them 404ing? Customize 404 message? (Note: They're mostly generic, non-human friendly URLs that are difficult to identify like /gallery3.php/ppage/3, g/interior.php/pid/3/sid/24, /uploads/1213196781.pdf)
Intermediate & Advanced SEO | | VTDesignWorks0 -
Switching Url
I started working with a Roofer/Contractor about a year ago. His website is http://www.lancasterparoofing.com/. The name of his business is Spicher Home Improvements. He used to have spicherhomeimprovements.com, well he still does. He was focusing on Roofing and Siding but now would like to branch to other areas like Interior remodeling. So adding interior work under LancasterPaRoofing.com is not applicable. I do not think starting another domain and having two is the best option. I think he should go back to using SpicherHomeImprovements.com and I assume he would take a small hit but in time he should be better off. Plus the url is more applicable to the real name of his business. Thanks for any feedback I receive. Chad
Intermediate & Advanced SEO | | ChadEisenhart0 -
Numbers (2432423) in URL
Hello All Mozers, Quick question on URL. I know URL is important and should include keywords and all that but my question is does including numbers (not date or page numbers but numbers for internal use) in the URL affect SEO? For example, www.domain.com/screw-driver,12,1,23345.htm Is that any better or worse than www.domain.com/screw-driver.htm? I understand that this is not user friendly but in SEO stand point does it hurt ranking? What's your opinion on this? Thank you!
Intermediate & Advanced SEO | | TommyTan0 -
Best way to permanently remove URLs from the Google index?
We have several subdomains we use for testing applications. Even if we block with robots.txt, these subdomains still appear to get indexed (though they show as blocked by robots.txt. I've claimed these subdomains and requested permanent removal, but it appears that after a certain time period (6 months)? Google will re-index (and mark them as blocked by robots.txt). What is the best way to permanently remove these from the index? We can't use login to block because our clients want to be able to view these applications without needing to login. What is the next best solution?
Intermediate & Advanced SEO | | nicole.healthline0 -
Why Would This Old Page Be Penalized?
Here's an old page on a trustworthy domain with no apparent negative SEO activity according to OSE and ahrefs: http://www.gptours.com/Monaco-Grand-Prix They went from page 1 to page 13 for "monaco grand prix" within about 4 weeks. Week 2 we pulled out all the duplicate content in the history section. When rank slipped further, we put it back. Yet it's still moving down, while other pages on the website are holding strong. Next steps will be to add some schema.org/Event microformats, but beyond that, do you have any ideas?
Intermediate & Advanced SEO | | stevewiideman0 -
How does a competing website with clearly black hat style SEO tactics, have a far higher domain authority than our website that only uses legitimate link building tactics?
Through SEO Moz link analysis tools, we looked at a competing websites external followed links and discovered a large number of links going to Blog pages with domain authorities in the 90's (their blog page authorities were between 40 and 60), however the single blog post written by this website was exactly the same in every instance and had been posted in August 2011. Some of these blog sites had 160 or so links linking back to this competing website whose domain authority is 49 while ours is 28, their Moz Trust is 5.43 while ours is 5.18. An example of some of the blogs that link to the competing website are: http://advocacy.mit.edu/coulter/blog/?p=13 http://pest-control-termite-inspection.posterous.com/\ However many of these links are "no follow" and yet still show up on Open Site Explorer as some of this competing websites top linking pages. Admittedly, they have 584 linking root domains while we have only 35, but if most of them are the kind of websites posted above, we don't understand how Google is rewarding them with a higher domain authority. Our website is www.anteater.com.au Are these tactics now the only way to get ahead?
Intermediate & Advanced SEO | | Peter.Huxley590