How to identify 404 that get links from external sites (but not search engines)?
-
one of our site had a poor site architecture causing now about 10.000s of 404 being currently reported in google webmaster tools.
-
Any idea about easily detecting among these thousands of 404, which ones are coming from links from external websites (so filtering out 404 caused by links from our own domain and 404 from search engines)?
-
crawl bandwidth seems to be an issue on this domain. Anything that can be done to accelerate google removing these 404 pages from their index? Due to number of 404 manual submission in google wbt one by one is not an option.
Or do you believe that google automatically will stop crawling these 404 pages within a month or so and no action needs to be taken?
thanks
-
-
Hi Robert,
thanks a lot. So I will not take action to get 404s out of google index.
Regarding your first point, I am not sure I understand how screaming frog would help. I did not use screaming frog yet but link sleuth for status code checks. The status check of 404 in google webmaster tools will probably generally also give 404 status in screaming frog. My objective is to identify among these thousands of 404, the few which are caused by inaccurate or outdated links on external websites so that I can create a 301 for these.
Best,
Daniel
-
icourse
I would suggest downloading the free version of screaming frog for an easy way to get status codes on any or all links.
As to fixing and "crawl bandwidth" being a problem, I disagree. If you are not being crawled it is because of all the 404's. I do not know the timeline for inaction on this, but I do believe "manual submission is not an option" is a recipe for disaster. Because fully analyzing your issues is outside the scope of Q&A, I would suggest you start manually fixing the issues and if on a CMS, start looking at plugins, etc. as a root cause.
Hope that helps
Robert
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Search Console Site Property Questions
I have a few questions regarding Google Search Console. Google Search Console tells you to add all versions of your website https, http, www, and non-www. 1.) Do I than add ALL the information for ALL versions? Sitemaps, preferred site, etc.? 2.) If yes, when I add sitemaps to each version, do I add the sitemap url of the site version I'm on or my preferred version? - For instance when adding a sitemap to a non-www version of the site, do I use the non-www version of the sitemap? Or since I prefer a https://www.domain.com/sitemap.xml do I use it there? 3.) When adding my preferred site (www or non-www) do I use my preferred site on all site versions? (https, http, www, and non-www) Thanks in advance. Answers vary throughout Google!
Intermediate & Advanced SEO | | Mike.Bean0 -
How to proceed? Older ecommerce site, unnatural link warning 2013, disavow, now what?
Hello all, I have a small, older ecommerce site. It has been around since 2002. It ranked very well until a few years ago. It currently does rank for some terms, but not many. (I am trying to say that it is not completely off the map.) Our domain authority is 36. Our Spam Score in Open Site Explorer is a 2/10. We received a notice in GWT in May 2013 re: unnatural links. That notice has since cleared from our account. I assume that it has expired. We were working with an SEO consultant when we received the notice from Google in 2013. He started working on cleaning up our link profile at that point. He submitted a disavowal file to Google with all of the domains that he was not able to get cleaned up manually. He kept working and updated the file again in June 2014. He told me that we did not have to file a reconsideration request. He did find that an SEO company that I hired in the past had gotten me a lot of spammy links. We got these taken down. There are still some spammy links that seem to keep cropping up. I have started going through Open Site Explorer to again contact some of these spammy sites to ask them to take our links down. Of course, the emails immediately bounce back to me. I am documenting everything. I feel like I am in a hole and can't dig out. What am a doing wrong? Should I disavow again? Should we have filed a reconsideration request a year or two ago? At this point, is it too late to do so as the penalty no longer shows up in my GWT account? How should I proceed? I prefer not to post my URL, but I would be happy to PM it to anyone who can offer advice. Thanks in advance. Melissa
Intermediate & Advanced SEO | | pajamalady0 -
Too Many External Links A Problem
Client is thinking of adding a directory to an eCommerce site which users would find useful. It would help users find other services and vendors that are specific to the niche of products this site is selling. My only concern is it would create a number of external links to other sites. Even though they're related, would this diminish our standing with Google search?
Intermediate & Advanced SEO | | alrockn0 -
A sneaky site? Two URLs with a similar layout linking back and forth.
Hello. I have a competitor that is on the front page of Google (and often at or near the top) for many desirable keywords - almost unbelievably so. I notice that their site has a blog. When I click the blog button, I am taken to a different URL that has a very similar layout with a similar navigation bar, etc. When I click one of the navigation buttons on the blog site, I am taken back to the other URL. This seems strange. Is there some ranking benefit to having two URLs set up like this? Is this a sneaky site? Thank you!
Intermediate & Advanced SEO | | nyc-seo0 -
What is the best way to hide duplicate, image embedded links from search engines?
**Hello! Hoping to get the community’s advice on a technical SEO challenge we are currently facing. [My apologies in advance for the long-ish post. I tried my best to condense the issue, but it is complicated and I wanted to make sure I also provided enough detail.] Context: I manage a human anatomy educational website that helps students learn about the various parts of the human body. We have been around for a while now, and recently launched a completely new version of our site using 3D CAD images. While we tried our best to design our new site with SEO best practices in mind, our daily visitors dropped by ~15%, despite drastic improvements we saw in our user interaction metrics, soon after we flipped the switch. SEOMoz’s Website Crawler helped us uncover that we now may have too many links on our pages and that this could be at least part of the reason behind the lower traffic. i.e. we are not making optimal use of links and are potentially ‘leaking’ link juice now. Since students learn about human anatomy in different ways, most of our anatomy pages contain two sets of links: Clickable links embedded via JavaScript in our images. This allows users to explore parts of the body by clicking on whatever objects interests them. For example, if you are viewing a page on muscles of the arm and hand and you want to zoom in on the biceps, you can click on the biceps and go to our detailed biceps page. Anatomy Terms lists (to the left of the image) that list all the different parts of the body on the image. This is for users who might not know where on the arms the biceps actually are. But this user could then simply click on the term “Biceps” and get to our biceps page that way. Since many sections of the body have hundreds of smaller parts, this means many of our pages have 150 links or more each. And to make matters worse, in most cases, the links in the images and in the terms lists go to the exact same page. My Question: Is there any way we could hide one set of links (preferably the anchor text-less image based links) from search engines, such that only one set of links would be visible? I have read conflicting accounts of different methods from using JavaScript to embedding links into HTML5 tags. And we definitely do not want to do anything that could be considered black hat. Thanks in advance for your thoughts! Eric**
Intermediate & Advanced SEO | | Eric_R0 -
How get rid of duplicate content, titles, etc on php cartweaver site?
my website http://www.bartramgallery.com was created using php and cartweaver 2.0 about five years ago by a web developer. I was really happy with the results of the design was inspired to get into web development and have been studying ever since. My biggest problem at this time is that I am not knowledgable with php and the cartweaver product but am learning as I read more. The issue is that seomoz tools are reporting tons of duplicate content and duplicate title pages etc. This is likely from the dynamic urls and same pages with secondary results etc. I just made a new sitemap with auditmypc I think it was called in an attempt to get rid of all the duplicate page titles but is that going to solve anything or do I need to find another way to configure the site? There are many pages with the same content competing for page rank and it is a bit frustrating to say the least. If anyone has any advice it would be greatly appreciated even pointing me in the right direction. Thank you, Jesse
Intermediate & Advanced SEO | | WSOT0 -
Need a trained eye to help with a quick search to see if there’s a poison pill buried somewhere on my site!
Need a trained eye to help with a quick search to see if there’s a poison pill buried somewhere on my site! This is an e-commerce site that I’ve worked on and ran for 5 years which ranks from middle to top in just about all of the quality analytic scores when compared to top 10 competitors in Google, yet this site can hardly stay on the 3<sup>rd</sup> page let alone the 1<sup>st</sup>. Only weakness in metrics that I see is that I need more linking root domains and traffic. Any suggestions will be greatly appreciated. Lowell
Intermediate & Advanced SEO | | lwnickens0