404 from a 404 that 301s
-
I must be missing something or skipping a step or lacking proper levels of caffeine.
Under my High Priority warnings I have a handful of 404s which are like that on purpose but I'm not sure how Moz is finding them. When I check the referrer info, the 404 is being linked to from a different 404 which is now a 301 (due to craziness of our system and what was easiest for the coders to fix a different problem ages ago). Basically, if a user decides to type in a non-existent model number into the URL there is a specific 404 that comes up. While the 404 error is "site.com/product/?model=abc123" the referrer is "site.com/product?model=abc123" (or more simply, one slash is missing). I can't see how Moz is finding the referrer so I can't figure out how to make Moz stop crawling it. I actually have the same problem in Google WMT for the same group of 404s.
What am I just not seeing that will fix this?
-
Let me know if it works Mike. There is actually a third possibility which is;
Some page(s) might generate a dynamic URL only upon being visited by a browser/search agent. If that's the case, then you can set up an event tracking through your website in conjuction with Google Analytics and track teh refferer;
_gaq.push(['_trackEvent', 'Error', '404', 'page: ' + document.location.pathname + document.location.search + ' ref: ' + document.referrer ]);
After you collect some data (Submit your website to Google WMT or wait for next MOZ visit) you can export and run your filter.
The alternative to this method could be one of the 2 following;
- enabling extreme debug/log mode on your programming platform and collect logs for further processing. You can run a small Python script to find the RegEx pattern. I advise to setup a demo copycat of your website on a subdomain and then run this experiment. You can then submit the demo sub domain to Google Webmaster tools and wait for the crawlers.
- Reconfigure your webserver logging (httpd.conf if using Apache) to log more details. Make sure you turn back into to the normal data collecting configuration to avoid storage consumption.
Good luck,
Ali
-
I had done about half of that... I'll take a look at all of it and try again tomorrow following your suggestions and see if I can figure it out then. Thanks.
-
Hi Mike,
Hope all is well. There are two things that might have made this confusion. Either you have some outdated links somewhere on your website that are leading to the custom 404 page or some external link is pointing back to your website with a wrong URL or missing product. In order to find the link (I say so, because a crawler has to hit a link to crawl so there is definitely one), you can use tools like Ahrefs link analysis and see what is pointing where. export to an excel and filter based on a RegEx you'd make out a 404 generating pattern you already have with Moz or Google WMT. You find one and you'll know where they are coming from and how to fix them. You'd be able to write custom redirects in your htaccess if they are not many. If they are many though, htaccess could slow down your website and the best way would be a back-end base redirect either custom coded or through a plugin based on your platform. I would start from
- my error_logs in webserver logs and match them with WMT and Moz report.
- download CSV and import to excel or program of your choice
- filter based on the pattern
- Match it with where you've found the link through Ahref
- and Voila, now you know exactly how to clean them up
Hope this helps Mike,
Have a nice day,
Ali
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404 errors
Hi I am getting these show up in WMT crawl error any help would be very much appreciated | ?ecaped_fragment=Meditation-find-peace-within/csso/55991bd90cf2efdf74ec3f60 | 404 | 12/5/15 |
Technical SEO | | ReSEOlve
| | 2 | mobile/?escaped_fragment= | 404 | 10/26/15 |
| | 3 | ?escaped_fragment=Tips-for-a-balanced-lifestyle/csso/1 | 404 | 12/1/15 |
| | 4 | ?escaped_fragment=My-favorite-yoga-spot/csso/5598e2130cf2585ebcde3b9a | 404 | 12/1/15 |
| | 5 | ?escaped_fragment=blog/c19s6 | 404 | 11/29/15 |
| | 6 | ?escaped_fragment=blog/c19s6/Tag/yoga | 404 | 11/30/15 |
| | 7 | ?escaped_fragment=Inhale-exhale-and-once-again/csso/2 | 404 | 11/27/15 |
| | 8 | ?escaped_fragment=classes/covl | 404 | 10/29/15 |
| | 9 | m/?escaped_fragment= | 404 | 10/26/15 |
| | 10 | ?escaped_fragment=blog/c19s6/Page/1 | 404 | 11/30/15 | | |0 -
404 errors
Hi I am getting these show up in WMT crawl error any help would be very much appreciated | ?escaped_fragment=Meditation-find-peace-within/csso/55991bd90cf2efdf74ec3f60 | 404 | 12/5/15 |
Technical SEO | | ReSEOlve
| | 2 | mobile/?escaped_fragment= | 404 | 10/26/15 |
| | 3 | ?escaped_fragment=Tips-for-a-balanced-lifestyle/csso/1 | 404 | 12/1/15 |
| | 4 | ?escaped_fragment=My-favorite-yoga-spot/csso/5598e2130cf2585ebcde3b9a | 404 | 12/1/15 |
| | 5 | ?escaped_fragment=blog/c19s6 | 404 | 11/29/15 |
| | 6 | ?escaped_fragment=blog/c19s6/Tag/yoga | 404 | 11/30/15 |
| | 7 | ?escaped_fragment=Inhale-exhale-and-once-again/csso/2 | 404 | 11/27/15 |
| | 8 | ?escaped_fragment=classes/covl | 404 | 10/29/15 |
| | 9 | m/?escaped_fragment= | 404 | 10/26/15 |
| | 10 | ?escaped_fragment=blog/c19s6/Page/1 | 404 | 11/30/15 | | |0 -
Https and 404 code that goes into htaccess
The 404 error code we put into htaccess files for our websites does not work correctly for our https site. We recently changed one of our http sites to https. When we went to create a 404.html page for it by creating an htaccess folder with the 404 error code in it, once we uploaded the file all of our webpages were displaying incorrectly, as if the css was not attached. The 404 code we used works successfully for our other 404.html pages for our other sites (www.telfordinc.com/404.html). However, it does not work for the https site. Below is the 404 error code we are using for our https site (currently not uploaded until pages display correctly) ErrorDocument 404 /404-error.html RewriteEngine on RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://(www.)?privatemoneyhardmoneyloan.com/.*$ [NC] RewriteRule .(gif|jpg|js|css)$ - [F] Options +FollowSymLinks RewriteEngine On RewriteBase / RewriteCond %{HTTP_HOST} !^www.privatemoneyhardmoneyloan.com$ [NC] RewriteRule ^(.*)$ http://www.privatemoneyhardmoneyloan.com/$1 [R=301,L] So we want to know if there is a different 404 error code that goes into the htaccess file for an https vs. http? Appreciate your feedback on this issue
Technical SEO | | Manifestation0 -
Easy Fix for 404 Errors foe Newbie
Hey there, I have two errors at these links that to my knowledge do not exist on my domain according to the MOZ. http://educateathletes.com/post/23804085842/educateathletes-ushl-gm-head-coach-jim http://educateathletes.com/products I'm really not sure what to do. The first is an old Tumblr blog post. The second is a page that was created on my site but the URL title was changed to http://educateathletes.com/enroll Any advice is appreciated to eliminate this. Sean
Technical SEO | | EDUCATEAthletes0 -
Client error 404
I have an 404 error but what does that mean? I go to the site and click on the link to exampleX.com there is no problem. What can it be? The error message http://www.example.com/www.example.com/exampleX.html
Technical SEO | | mato0 -
Status code 404??!
among other things... I'm getting this error: http://worldvoicestudio.com/blog/"http://worldvoicestudio.com/" http://worldvoicestudio.com/blog/"http://worldvoicestudio.com/" Any ideas on how to fix this? many thanks!!
Technical SEO | | malexandro0 -
Very, very confusing behaviour with 301s. Help needed!
Hi SEOMoz gang! Been a long timer reader and hangerouter here but now i need to pick your brains. I've been working on two websites in the last few days which are showing very strange behaviour with 301 redirects. Site A This site is an ecommerce stie stocking over 900 products and 000's of motor parts. The old site was turned off in Feb 2011 when we built them a new one. The old site had terrible problems with canonical URLs where every search could/would generate a unique ID e.g. domain.com/results.aspx?product=1234. When you have 000's of products and Google can find them it is a big problem. Or was. We launche the new site and 301'd all of the old results pages over to the new product pages and deleted the old results.aspx. The results.aspx page didn't index or get shown for months. Then about two months again we found some certain conditions which would mean we wouldn't get the right 301 working so had to put the results.aspx page back in place. If it found the product, it 301'd, if it didn't it redirected to the sitemap.aspx page. We found recently that some bizarre scenerio actually caused the results.aspx page to 200 rather than 301 or 404. Problem. We found this last week after our 404 count in GWMT went up to nearly 90k. This was still odd as the results.aspx format was of the OLD site rather than the new. The old URLs should have been forgetten about after several months but started appearing again! When we saw the 404 count get so high last week, we decided to take severe action and 301 everything which hit the results.aspx page to the home page. No problem we thought. When we got into the office on Monday, most of our product pages had been dropped from the top 20 placing they had (there were nearly 400 rankings lost) and on some phrases the old results.aspx pages started to show up in there place!! Can anyone think why old pages, some of which have been 301'd over to new pages for nearly 6 months would start to rank? Even when the page didn't exist for several months? Surely if they are 301's then after a while they should start to get lost in the index? Site B This site moved domain a few weeks ago. Traffic has been lost on some phrases but this was mainly due to old blog articles not being carried forward (what i'll call noisy traffic which was picked up by accident and had bad on page stats). No major loss in traffic on this one but again bizarre errors in GWMT. This time pages which haven't been in existence for several YEARS are showing up as 404s in GWMT. The only place they are still noted anywhere is in the redirect table on our old site. The new site went live and all of the pages which were in Googles index and in OpenSiteExplorer were handled in a new 301 table. The old 301s we thought we didn't need to worry about as they had been going from old page to new page for several years and we assumed the old page had delisted. We couldn't see it anywhere in any index. So... my question here is why would some old pages which have been 301'ing for years now show up as 404s on my new domain? I've been doing SEO on and off for seven years so think i know most things about how google works but this is baffling. It seems that two different sites have failed to prevent old pages from cropping up which were 301d for either months or years. Does anyone has any thoughts as to why this might the case. Thanks in advance. Andy Adido
Technical SEO | | Adido-1053990 -
Are there any SEO implications if a page does two 301s and then a 304?
Curious to see if this is a positive or negative thing for SEO...or even perhaps, neutral. h9SZz
Technical SEO | | RodrigoStockebrand0