Many more 404 being reported in GWT than MA
-
Hi
I have been submitting MA crawl reports to clients developers post going live with a new site migration and instructing them to set up 301's for any 404's still reporting which they have now done (despite instructing them not to go live until all old url's mapped & 301'd to new replacement page, or HP if no replacement).
1) When i look in GWT crawl errors there has been a spike since going live with 660 'page not founds' being reported compared to x11 404's in MA. Could there be a 'lag' in GWT reporting and actually they have already been dealt with just not updated by GWT in this report and the MA report is more accurate ? Should i wait and see/or mark as fixed and see if return tomorrow, or tell dev to immediately investigate ? I have checked some samples links and they are going to 404 type pages so presume they are still broken and urgent issue dev must fix immediately ?
2) How long does it take aprox for the page authority to be transferred via a 301 redirect to a new page since i see some category pages that had good PA and have been 301'd to new category urls, are now showing a PA of just 1 !!
Cheers
Dan
-
Many Thanks !!
-
If the two sites are structured differently and do not share ANY significant content then you don't need the rel alternate href lang tag.
Good luck Dan!
-
Thanks so much for having a look Everett !
I did tell dev to fix all urls (many have been but i see many remain) i.e. not use upp case, remove full stops and spaces, and that redev stage would be good opportunity to fix those along with all the 301 redirects etc. Also further advice received from Tom Roberts about that on this thread too: http://moz.com/community/q/duplicate-title-tags-being-caused-by-upper-case-and-lower-case-version-of-urls All of which i have fwd on to dev.
Client also told to add 100% unique descriptive content at same time as adding products but will remind them.
Re the .com & .co.uk & hreflang - these sites are managed by different teams and content is significantly different i think and site structured quite differently. Would that still require hreflang attribution ?
Thanks again Everett !!
All Best
Dan
-
Hello Dan,
Thank you for following up with the private message. I'll keep the domain name out but will post here so others can learn from it.
I checked the 404 error page and it does return a 404 response in the http header, which is good. However, I did notice a few other issues while on the site...
I think you should block the following directories in your robots.txt file:
/search/
/stockist/
/ukstockist/
/services/form/I also noticed several product pages with absolutely no product detail descriptions (e.g. /happy-tee-pink), which is not a good signal to send to Google, and not a good user experience. You should be creative and write custom, 100% unique content for these pages.
I noticed there is both a .com and .co.uk site with no use of rel alternate href lang. Click here for more information.
You should stop using capitol letters in the URLs. Everything should be lower-case. Also don't use spaces in the URL. Use a dash or underscore instead (e.g. "/Stockist/North East" should be "/stockist/northeast" ).
Google has been executing javascript more and more lately and this is causing a lot of 404 rises, among other errors, across the web. Google Webmaster Tools will tell you exactly which page caused the error and, if applicable, which page was linking to it. This would help diagnose the issue so I suggest looking into that and going from there.
Good luck!
-
Hey Everett
No worries at all really appreciate your input !
I cant share since whilst i know Moz community a highly ethical bunch of lovely people this is a new very precious client of mine i cant afford to risk losing should anyone see and try and pitch for the biz, as unlikely as that is i cant risk the possibility but have pm'd you just in case your happy to have a look.
I started this thread after looking at GWT and looking at the reports. I have told dev to 301 any old pages to their new equivalents or HP if not. Then to apply code as per Altecs advice to to batch process 301's for any remaining. They are increasing at aprox 150 new page not founds each day so something clearly going on but hopefully if dev deploy all the above that should reverse the ever growing spike do u think ?
I have told them once they do that to mark all as fixed in GWT and then just attend to any that recur afterwards on a case by case basis.
Any further help advice much appreciated, especially if above unlikely to resolve ?
All Very Best
Dan
-
Hello Dan,
I'm sorry, but I can't provide more actionable advice without seeing the site. If you're willing to share the site I can have a look at what the URLs are doing. But if you really want to know what's causing the 404 errors, Google Webmaster Tools has a report you can look at, or you can check log files.
-
Thanks Altec
There are only 2x soft 404's being reported
Also im asking them to redirect any 404 to the nearest equivalent page or the home page if no new similar equivalent page. Any others will be dealt with by following your code advice for those occuring due to parameters etc
I take it that if do both above that will avoid any soft 404's ?
Cheers
Dan
-
Dan-
Yep you are right on, if you added the above type code below all of the specific page redirects then that would "catch" all the remaining URL's with random parameters in the URL and redirect them to the home page. And its very similar code for the windows server, I'm confident your Dev should know this, if for some reason they don't I can message you the code snippets you would need for the windows server.
Also Everett Sizemore brought up a great point below. If you are redirecting all 404 pages to this 1 custom "404 landing page" that is incorrect. That is called a "soft 404" error. These will actually show up in your WMT account as well as under a section called "soft 404's".
-
Thanks so much for taking the time to comment Everett, I really appreciate it !!
i just tested that and can confirm entering a nonesense url inc client domain, it is resolving in a 404 page.
Since thats the case can i now ignore this bit:
"If you redirect the user agent to another URL before returning the status code they (e.g. Googlebot) will never know that the original URL they were trying to access has been "removed" and will assume it has just been redirected".
However 404s are growing further every day by the hundreds so does that mean likely as a result of the below?:
"Furthermore, as the parameter following the ?aspxerropath=..." changes each time you will have more and more 404 URLs instead of just the one."
Any further help/advice/clarification re above much appreciated since i ultimately want to take the aspects of this thread that still apply into an email to dev to help them resolve this asap ?
Very Many Thanks
Dan
-
thanks again Altec !
yes my clients just migrated to a new site and typically Google launched Panda 4 straight after so not sure whats causing what at the moment and suppose should wait and see after another week or two before drawing any conclusions.
However page not found errors have spiked and continue to grow at a rate of knots
The clients site is on a windows server so don't think that apaache code would do, dont suppose you know what it would be for windows would you ? or i take it this is something clients developers should know in which case i will simply instruct them to add relevant code ?
I have told dev to redirect ALL old content pages to new equivalent page, so if they add above type code after that then should deal with any remaining non content type page not founds ?
Thanks again Alec !!
-
Hello Dan,
Google is constantly crawling the web and updating their data almost in realtime whereas Moz crawls and updates in batches. The Moz data should catch up after the next refresh, which I think happens about every month.
Something in your example caught my attention. The URL incluided a path indicating a custom 404 page: "Info/FileNotFound.aspx?aspxerrorpath=". Let's say I go to your site and type in some nonesense URL like domain.com/123456.aspx", will I get redirected to this /info/FileNotFound.aspx page? If so, that is not ideal. You want the URL I was attempting to access to return the 404 response so it can be removed from the index. If you redirect the user agent to another URL before returning the status code they (e.g. Googlebot) will never know that the original URL they were trying to access has been "removed" and will assume it has just been redirected. Furthermore, as the parameter following the ?aspxerropath=..." changes each time you will have more and more 404 URLs instead of just the one.
-
No problem Dan.
There is definitely no harm in ticking off the errors as "fixed" and then seeing which ones return. If they return I'm betting 1 of 2 things. 1.) That you still have a live link somewhere on the new site pointing to the page. The easiest way to test this is click on one of the pages in WMT and hit the "linked from" button. That shows you where google is getting sent to the page from. The second option is that the pages have an "external" link pointing too them. If that's the case you want to be double sure to redirect it so you get the link credit!
I'm not 100% positive on why MOZ is not showing the errors as quickly, but MOZ does a "deep" crawl of the web and then does a lot of computing. WMT is literally just crawling your site, and spitting out anything it finds to your account. A much easier job for WMT, hence it is pretty quick to report site errors too you. The errors in your MOZ account will most likely update when MOZ pushes their next mozscape index update (this will also be useful for you because MOZ will have better data about "external" links than WMT)
From the error you sent me it looks like you are going through a move of a site from .aspx to .php or .html. When you change languages like this you can end up with a lot of gnarly 404 pages, especially from search pages that have variables in the URL. If you have a large amount of these errors, so many that you feel you are getting a temporary "penalty" for too many site errors you can write a quick snippet of code in the .httpdconf file on your apache server to fix these old useless pages. (if your not on an apache server this can still be achieved, the code would just be different.)
make sure to put this snippet below all of your other individual page redirects
#redirect all old dead .aspx pages to the new homepage
RedirectMatch 301 /(.*).aspx /You should still make your best effort to redirect each old page to its corresponding "new" page. However at a certain point if you have thousands to redirect, and many are just pages with with search parameters in the URL... there comes a point where you want to just hit all the remaining pages in 1 fell swoop.
I'm not certain on how long it would take the 301 to fully "pass" page rank in Google's eyes. I think it is safe to assume that once the old page no longer has a 404 link, and it is not showing up any longer in Google searches then the PR has been passed. MOZ page authority should reflect it once MOZ pics up the change, but MOZ authority is just very "similar" to PR. Its not an exact science, MOZ should pic up the change easily though, it will just take them a little longer time.
-
Many Thanks Altec !
Any harm in ticking them all off and seeing which return ? Should be more thorough shouldn't it than doing a sample ? and also dev say that many 404's refer to a really old version of the website before they were involved so doubt these are still really there....although the spike only occurred after new site went live so i'm slightly dubious about that response from them.
The list has also grown from x660 to x715 in 24 hours so looks like more being discovered!
Any ideas why Moz Analytics is not discovering & reporting these in crawl error reports (saying only 3x 404s remain compared to GWT saying x715 page not founds) ?
Also what about the kind of pages starting with below, being reported as 404, do they matter ?:
| Info/FileNotFound.aspx?aspxerrorpath= |
Any ideas re my second question, how long does G take to pass authority via a 301 from an old page to its replacement aproximately ? and does Moz Page Authority reflect this in its score once G has passed it ?
Sorry for more questions any feedback much appreciated
Many Thanks
Dan
-
Dan-
It sounds like you have already spot checked a few of the pages and they are still returning errors. In my experience WMT is very quick to re-crawl pages you have marked as fixed to see if they truly are fixed. For example if you mark 100 pages as fixed, then 24 hours later 95 of the pages come back, its always been the case for me that the 95 pages still had an issue and were not "fixed".
I would pull up the error report in WMT again, spot check a few of the examples and see if the pages still error out. If they do I would export that list from WMT, send it to your Dev and ask him to fix all redirection errors.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Error in Duplicate Content Being Reported - Pages Aren't Actually Duplicates
The recent crawl of one of our sites revealed a high number of duplicate content issues. However, when I viewed the report for pages with duplicate content I noticed almost all of them are not duplicates. For example, these two pages are marked as dupes:
Moz Bar | | M_D_Golden_Peak
https://www.writersstore.com/publishers/hollywood-creative-directory
https://www.writersstore.com/authors/g-miki-hayden These are thin as far as content goes but definitely not duplicates. Any recommendations or ways to adjust the settings so that these false positives aren't clogging up our site crawl report?0 -
MOZ Freshweb reported mentions that really does not contain any mention - anyone else experiencing this?
MOZ Freshweb reported links from "traffic.fullcontentrss.com" that doesn't really contain any mention. Is this a MOZ error or is it maybe since fullcontentrss.com has a javascript that pulls data into their own RSS, MOZ ended up crawling partially and as a result not reporting the correct URL? Anyone else experiencing this anomaly?
Moz Bar | | CPR_PTANTONO0 -
A good ranking Tool to generate reports for multiple markets
Hi, Unfortunately, Moz doesn't offer any feature to generate 1 report with the multiple keywords I have across different campaigns. Since I am running around 10-11 campaigns, it's quite time consuming to create the reports manually. Could you please suggest any good tool/service that can generate a report for multiple keywords across different campaigns ?
Moz Bar | | ebsbarschools0 -
Internal Links Count in Crawl Report
My understanding of the 'Internal Links' results in a moz crawl report is that it represents the number of links on the given page that link to other pages on the same site.Assuming this is a correct assumption: We recently ran a crawl report on www.phase1tech.com. Some of the pages are coming back with a large amount of 'internal links'. These 2 pages for example are showing 800 internal links: http://www.phase1tech.com/Upcoming-Events
Moz Bar | | AISEO
http://www.phase1tech.com/Contact Then there are a number of pages coming back with 705 Internal Links, including: http://www.phase1tech.com/Dalsa-CameraLink-Cameras
http://www.phase1tech.com/Hitachi-CameraLink-Cameras At best there are approximately 70-80 links on these pages. Where are these large counts coming from? Is there a means to see what the links being reported on are? At the same time the 'Too Many On-Page Links' indicates 'No' for some pages with a high number of links, and 'Yes' for pages with a low number of links. For example: http://www.phase1tech.com/Baumer-SX-Series
Too Many On-Page Links: Yes
Internal Links: 2
What's up with that?0 -
Suggestion for Improving the Crawl Report on Canonicals
This came up in the answer to a question I gave here http://moz.com/community/q/canonicals-in-crawling-reports#reply_222623 Wanted to post here to put it in as a suggestion on how to improve the Moz Crawl reports Currently, the report shows FALSE if there is no canonical link on a page and TRUE if there is. IF you get a TRUE response, this shows up as a warning in your report. I currently use Canonical to Self on almost all my pages to help with some indexing issues. I currently use the EXACT function in excel to create a formula to see if my canonical link matches the URL of the page (as this is what I want it to do). I can then know that the canonical is implemented properly, or if I need to manually check pages to make sure the canonical that points to another page is correct. I would like to suggest that the Moz crawl tool does this. It can show FALSE is the canonical is missing, TRUE if the canonical is present and SELF if the canonical points to the URL of the page it is on. I think for the most part this would be much more actionable information. I would even suggest that TRUE would need to be more of a high priority alert, and SELF can't do any damage, so I would leave that info in the CSV but not have that as a warning in the web interface. Thanks for listening!
Moz Bar | | CleverPhD0 -
Report Dates are being displayed in a strange format ??
My reports dates are showing a very strange format (month/day/year) how do we fix to correct format (day/month/year) ? 😉
Moz Bar | | Dan-Lawrence0 -
Historical reporting- PDF is default and I need csv/excel
I am trying to create trend reports for certain keywords that we track in Moz- The problem is that all of the historical data is only available as PDF! Is there a way to get historical reports in csv? Please help! Thanks!
Moz Bar | | SheilaK0