Old pages still crawled by SE returning 404s. Better to put 301 or block with robots.txt ?
-
Hello guys,
A client of ours has thousand of pages returning 404 visibile on googl webmaster tools. These are all old pages which don't exist anymore but Google keeps on detecting them. These pages belong to sections of the site which don't exist anymore. They are not linked externally and didn't provide much value even when they existed
What do u suggest us to do:
(a) do nothing
(b) redirect all these URL/folders to the homepage through a 301
(c) block these pages through the robots.txt.
Are we inappropriately using part of the crawling budget set by Search Engines by not doing anything ?
thx
-
Hi Matteo.
The first step I would suggest is determining the source of the links to these 404 pages. If these links are internal to your website, they should be removed or updated.
The next step I would recommend is to ensure your site has a helpful 404 page. The page should offer your site's navigation along with a search function so users can locate relevant content on your site.
I realize that thousands of broken links may seem overwhelming. It is a mess which should be cleaned up. How you proceed is dependent upon how much you value SEO. If your ranking is important and you want to be the best, you will have someone investigate every link and make the appropriate adjustments such as 301 redirecting them to the most appropriate page on your site, or allowing the link to continue to the 404 page.
It's a search engine's job to help users find content. 404s are a natural part of the web. There is nothing inherently wrong with having some 404 pages. Having thousands of pages really shows your site has significant issues. Google's algorithms are not revealed publicly but it's logical to believe they may consider sites with a high percentage of 404 pages less trustworthy. This is my belief but not necessarily that of the SEO community.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site moved. Unable to index page : Noindex detected in robots meta tag?!
Hope someone can shed some light on this: We moved our smaller site (into the main site ( different domains) . The smaller site that was moved ( https://www.bluegreenrentals.com)
Intermediate & Advanced SEO | | bgvsiteadmin
Directory where the site was moved (https://www.bluegreenvacations.com/rentals) Each page from the old site was 301 redirected to the appropriate page under .com/rentals. But we are seeing a significant drop in rankings and traffic., as I am unable to request a change of address in Google search console (a separate issue that I can elaborate on). Lots of (301 redirect) new destination pages are not indexed. When Inspected, I got a message : Indexing allowed? No: 'index' detected in 'robots' meta tagAll pages are set as Index/follow and there are no restrictions in robots.txtHere is an example URL :https://www.bluegreenvacations.com/rentals/resorts/colorado/innsbruck-aspen/Can someone take a look and share an opinion on this issue?Thank you!0 -
Robots.txt & Disallow: /*? Question!
Hi, I have a site where they have: Disallow: /*? Problem is we need the following indexed: ?utm_source=google_shopping What would the best solution be? I have read: User-agent: *
Intermediate & Advanced SEO | | vetofunk
Allow: ?utm_source=google_shopping
Disallow: /*? Any ideas?0 -
Is robots met tag a more reliable than robots.txt at preventing indexing by Google?
What's your experience of using robots meta tag v robots.txt when it comes to a stand alone solution to prevent Google indexing? I am pretty sure robots meta tag is more reliable - going on own experiences, I have never experience any probs with robots meta tags but plenty with robots.txt as a stand alone solution. Thanks in advance, Luke
Intermediate & Advanced SEO | | McTaggart1 -
Better Domain and Page Authority Than my compeitors
Hi All, I have a pretty extensive question but wanted a starting point if you don't mind. I have a situation where I created 4 sites that I would say are almost identical other than I have loaned my other websites to other agents. My content is rewritten but it's still roughly the same. You will see, when I give the URL's, that they are similar, and almost identical in templates.My question is going to be, Since I have built some authority on all of these sites, is it wise to simply take them down, or just change the templates and take away the content and start over. If so, what do I do with the existing pages? Or is there a better idea I'm not thinking of? My other question is, this site: goo.gl/Tf00rc Is my main site. It has a higher domain authority and page authority than any of my other local competitors, yet I'm still ranked #13-15 for my main keywords. I will say, many of my other competitors have older domains and I'm sure didn't try to manipulate the serps either. Thoughts and recommendations? Here are my other similar sites which have almost identical templates and very similar content but not copied and pasted content. 1. goo.gl/Wwb0Tg 2. goo.gl/3gpR1X 3. goo.gl/FwD8Bk 4. goo.gl/vpuQv2 My dilemma: I want to make sure that my other agents have a great site that can perform well, as well. If I completely remove these sites, they have no site. I'll say that right now the sites that get the most traffic are the goo.gl/Tf00rc and goo.gl/Wwb0Tg then is the goo.gl3gpR1X, and lastly goo.gl/FwD8Bk so they all get about 3k, 2k, and 1k and 500 visits a month respectively. The total visits of all of these is pretty good. I feel like the max would visits would be around 10k per month in my market. Any help would be greatly appreciated as I have spent a lot of time and money getting these sites where they are only to be penalized, I'm sure, for duplicate content.
Intermediate & Advanced SEO | | Veebs0 -
Help with Robots.txt On a Shared Root
Hi, I posted a similar question last week asking about subdomains but a couple of complications have arisen. Two different websites I am looking after share the same root domain which means that they will have to share the same robots.txt. Does anybody have suggestions to separate the two on the same file without complications? It's a tricky one. Thank you in advance.
Intermediate & Advanced SEO | | Whittie0 -
Removing Blogs and 301 redirect to blog home page?
Hi, I was at the MozCon conference in Seattle this Summer and heard great concepts about deleting a lot of pages on your site that are deemed excess. It got me thinking to remove all of our old blogs that were: Sales(ee) less than 400 words Flat out bad blogs When i begin removing these links, i know i will get a lot of 404 errors because of previous social links. So in your opinion, what would you do? Do i just 301 those blogs to my main /blog page? Thanks
Intermediate & Advanced SEO | | Shawn1240 -
Massive URL blockage by robots.txt
Hello people, In May there has been a dramatic increase in blocked URLs by robots.txt, even though we don't have so many URLs or crawl errors. You can view the attachment to see how it went up. The thing is the company hasn't touched the text file since 2012. What might be causing the problem? Can this result any penalties? Can indexation be lowered because of this? ?di=1113766463681
Intermediate & Advanced SEO | | moneywise_test0 -
Why are new pages not being indexed, and old pages (now in robots.txt) remain in the index?
I currently have a site that was recently restructured, causing much of its content to be reposted, creating new URL's for each page. To avoid duplicates, all of the existing pages were added to the robots file. That said, it has now been over a week - I know Google has recrawled the site - and when I search for term X, it is stil the old page that is ranking, with the new one nowhere to be seen. I'm assuming it's a cached version, but why are so many of the old pages still appearing in the index? Furthermore, all "tags" pages (it's a Q&A site, like this one) were also added to the robots a few months ago, yet I think they are all still appearing in the index. Anyone got any ideas about why this is happening, and how I can get my new pages indexed?
Intermediate & Advanced SEO | | corp08030