September's Mozscape Update Broke; We're Building a New Index
-
Hey gang,
I hate to write to you all again with more bad news, but such is life. Our big data team produced an index this week but, upon analysis, found that our crawlers had encountered a massive number of non-200 URLs, which meant this index was not only smaller, but also weirdly biased. PA and DA scores were way off, coverage of the right URLs went haywire, and our metrics that we use to gauge quality told us this index simply was not good enough to launch. Thus, we're in the process of rebuilding an index as fast as possible, but this takes, at minimum 19-20 days, and may take as long as 30 days.
This sucks. There's no excuse. We need to do better and we owe all of you and all of the folks who use Mozscape better, more reliable updates. I'm embarassed and so is the team. We all want to deliver the best product, but continue to find problems we didn't account for, and have to go back and build systems in our software to look for them.
In the spirit of transparency (not as an excuse), the problem appears to be a large number of new subdomains that found their way into our crawlers and exposed us to issues fetching robots.txt files that timed out and stalled our crawlers. In addition, some new portions of the link graph we crawled exposed us to websites/pages that we need to find ways to exclude, as these abuse our metrics for prioritizing crawls (aka PageRank, much like Google, but they're obviously much more sophisticated and experienced with this) and bias us to junky stuff which keeps us from getting to the good stuff we need.
We have dozens of ideas to fix this, and we've managed to fix problems like this in the past (prior issues like .cn domains overwhelming our index, link wheels and webspam holes, etc plagued us and have been addressed, but every couple indices it seems we face a new challenge like this). Our biggest issue is one of monitoring and processing times. We don't see what's in a web index until it's finished processing, which means we don't know if we're building a good index until it's done. It's a lot of work to re-build the processing system so there can be visibility at checkpoints, but that appears to be necessary right now. Unfortunately, it takes time away from building the new, realtime version of our index (which is what we really want to finish and launch!). Such is the frustration of trying to tweak an old system while simultaneously working on a new, better one. Tradeoffs have to be made.
For now, we're prioritizing fixing the old Mozscape system, getting a new index out as soon as possible, and then working to improve visibility and our crawl rules.
I'm happy to answer any and all questions, and you have my deep, regretful apologies for once again letting you down. We will continue to do everything in our power to improve and fix these ongoing problems.
-
I hope we might actually have that 11/17 index out a little bit early. We've made a lot of fixes and optimizations, and, fingers crossed, it looks (so far) like it's making a difference in terms of speed to index processing completion.
-
Gotcha - makes a lot more sense now. Moz's DA/PA is clearly the gold standard that most in the industry rely on, but I didn't realize the extent of the processing required to make that happen. Even more props to Moz for the BHAG of taking on such a complex task all these years.
-
The same story here. Quite honestly, I think that last index was very much messed up, probably due to broken index before that. So, I gave up on it and simply waiting for next scheduled one - on 11/17.
-
It's been about a month since the last update on this and just curious if there's any news on the progress? I'm still seeing the same results in OSE that I've been seeing for the last couple of months so it appears it's not fixed yet, but is there any indication of when it might be?
-
Thanks for clarifying!
-
Sometimes yes. Sometimes, we don't know until we reach the last stages of processing whether it's going to finish or take longer. We're trying to get better at benchmarking along the way, too, and I'll talk to the team about what we can do to improve our metrics as an index run is compiling.
-
Thanks!
I have been noticing for quite some time that last minute changes in update release dates are becoming "normal". Is there way you guys can make those changes in update dates be announced earlier than on the expected update release date?
-
It didn't break, but it is taking longer to process than we hoped. Very frustrating, but we have a plan that, starting in a few more weeks, should get us to much more consistent index releases (and better quality ones, too).
-
Hello, Rand. I just noticed that yesterday new update was scheduled for October 8th. And just now it says October 14th! What's going on? I hope it didn't break again...
-
Hi Lehia
Crawl reports are separate from our Mozscape indexes. Also any delays with our index only impact the ability to access new data. With your crawl reports I have a suspicion the URLs with the 404s are ones with trailing slashes e.g. domain.com/ and not domain.com
If not, send us your account info and some examples at help@moz.com and we can take a look!
-
Hey Rand, is this why my crawl reports are saying that i have some 404 client errors on pages where I can't see any issues? Or is this another issue that I'm incurring?
Thanks in advance
-
Thanks Rand for the update. We have hired a full time marketing manager and he has been working hard the past month, I know he's excited to see the new results. "Putty & Paint does not a NEW Boat make" Fixing is a painstaking reality compared to building. Moz is great, so we will wait
-
Hi Joe - fair question.
The basic story is - what the other link indices do (Ahrefs and Majestic) is unprocessed link crawling and serving. That's hard, but not really a problem for us. We do it fairly easily inside the "Just Discovered Links" tab. The problem is really with our metrics, which is what makes us unique and, IMO, uniquely useful.
But, metrics like MozRank, MozTrust, Spam Score, Page Authority, Domain Authority, etc. require processing - meaning all the links needed to be loaded into a series of high-powered machines and iterated on, ala the PageRank patent paper (although there are obviously other kinds of ways we do this for other kinds of metrics). Therein lies the rub. It's really, really hard to do this - takes lots of smart computer science folks, requires tons of powerful machines, takes a LONG time (17 days+ of processing at minimum to get all our metrics into API-shippable format). And, in the case where things break, what's worse is that it's very hard to stop and restart without losing work and very hard to check our work by looking at how processing is going while it's running.
This has been the weakness and big challenge of Mozscape the last few years, and why we've been trying to build a new, realtime version of the index that can process these metrics through newer, more sophisticated, predictive systems. It's been a huge struggle for us, but we're doing our best to improve and get back to a consistent, good place while we finish that new version.
tl;dr Moz's index isn't like others due to our metrics, which take lots of weird/different types of work, hence buying/partnering w/ other indices wouldn't make much sense at the moment.
-
Talk about reading everyone's mind... I should point out though that Rand mentioned above that moz was working on a new real time tool like the ones we have seen elsewhere. I think a little patience might solve everyone's problems.
-
Thanks for the transparency as usual. A question I've always been wondering:
Moz seems to have much more stature, clout, and maybe funding compared to many other SEO software companies based around the world. And of course you offer more of a suite of products rather than just focusing on Open Site Explorer. But to me one of the most important SEO tools is the backlink explorer tools that companies offer, and it seems like OSE, although one of the first, lags compared to a few others. I've read that OSE isn't looking to just grab all the links, but only the most important ones. It seems though that there's been lots of technical challenges, and I can't help but think that there are other companies that have already solved their indexing challenges or are a few steps ahead of OSE.
Would Moz ever go out an buy a pretty good backlink explorer company like Ahrefs or Majestic or some other upstart that's solved that piece of the puzzle? Combining that new technology that's solved the indexing part with your DA algorithm seems like a match made in heaven. I'm sure you guys have considered this years ago internally, but it's a question I've always pondered...
-
Two potential solutions for you - 1) watch "Just Discovered Links" in Open Site Explorer - that tab will still be showing all the links we find, just without the metrics. And 2) Check out Fresh Web Explorer - it will only show you links from blogs, news sites, and other things that have feeds, but it's one of the sources I pay attention to most, and you can set up good alerts, too.
-
And they would have gotten away with it too if weren't for those meddling kids and their pesky subdomains
-
I did notice no new links added to a number of projects in the last 2 months and I was wondering what went wrong. Thanks for clearing up the issue with this post. We look forward to the resolution.
-
Yeah - the new links you see via "just discovered" will take longer to be in the main index and impact metrics like MozRank, Page Authority, Domain Authority, etc. It's not that they're not picked up or not searched, but that they don't yet impact the metrics.
And yes - will check out the other question now!
-
Hi Will - that's not entirely how I'd frame it. Mozscape's metrics will slowly, over time, degrade in their ability to predict rankings, but it's not as though exactly 31 days after the last update, all the metrics or data is useless. We've had delays before of 60-90+ days (embarrassing I know) and the metrics and link data still applied in those instances, though correlations did slowly get worse.
The best way I can put it is - our index's data won't be as good as it normally is for the next 20-30 days, though it's better now than it will be in 10 days and was better 10 days ago than it is today. It's a gradual decline as the web's link structure changes shape and as new site and pages come into Google's index that we don't account for.
-
Webmasters love of sub-domains... shake fist!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Regarding Moz API token password update
Hi, In March we have updated password for MOZ API and used in our application it worked, but currently the updated password is not working and in the MOZ site the old password is shown and its active. We are using Legacy username and password.
API | | NickAndrews
We see that 5 tokens can be added for API, if we add 2 tokens both will be active.
We are currently using free services. Please help us resolve this issue.0 -
API v1 still gets new data?
Hello! Do the v1 API endpoints provide fresh data, or do I need to use the v2 endpoints for fresh data? According to the v1 API docs "This guide outlines the endpoints for now archived Mozscape API endpoints." Does this mean that the v1 API only serves archived data? Thanks!
API | | peterkovacs0 -
The New and Improved Domain Authority Is Here!
Update: Domain Authority 2.0 has arrived! Check it out over in Link Explorer or in your Campaigns, and visit our resource center for more information about the change. Hey Moz friends, I’m excited to share some news from the Moz product team. In the last few months our team of data scientists have been hard at work developing an improvement to one of the favorite SEO metrics used in digital marketing: Domain Authority, also referred to as “DA.” On March 5, 2019, we’ll release the new and improved Domain Authority algorithm, which includes a number of new factors that make this score even more accurate, trustworthy, and predictive than ever before. Having worked with marketing clients in the past and reported on Domain Authority during monthly reviews, I wanted to make sure we give our community enough advance notice to understand what is changing, why it’s changing, and what it might mean for your reporting. Sudden, unexpected fluctuations in any core metric you use in reporting have the potential to make your job more difficult, so we want to help you start the conversation about this change with your stakeholders. Let’s start with the “why” ... Why is Moz changing the DA algorithm? The Search Engine Results Page (SERP) is constantly changing. Rankings change and the algorithms that drive those rankings change. For Moz to ensure you have the most accurate prediction possible, it means we need to update our algorithm from time to time to ensure it delivers on its promise. You trust Moz, in part, because of the accuracy of the data we create. We want to make sure that we’re providing you with the best data to make your work easier. To ensure that DA continues to accurately predict ability of sites to rank, and to remain reliable over time, we’ve decided to make some improvements. What can I expect from the DA algorithm update? Many sites should expect to see a change to their current Domain Authority score. Depending on the site, this change might be insignificant, but it’s possible the new algorithm will cause material adjustments. The new Domain Authority takes into consideration a number of additional factors, such as link pattern identification and Moz’s Spam Score metric, to help you deploy your SEO strategy. How can I prepare for this algorithm update? I recommend that you reach out to your stakeholders or clients prior to the March 5th launch to discuss this upcoming change. This can be an opportunity to both refresh them on the utility of Domain Authority, as well as plan for how to use it for additional link building or ranking projects. Visit this page to check out resources that may help you to have conversations with your stakeholders. If you feel inclined to save a snapshot of your current Domain Authority and history, you can consider exporting your historical data from your Moz Pro account. Is historical data changing? Yes. When the new DA algorithm goes into place, all historical data will be affected. However, for anyone who has an active Moz Pro campaign, you will be able to see a historical representation of the old DA line for reference for an interim period. As the “Metrics over time” chart is designed to help track your work over time, we believe applying the update to both past and present DA scores will help you to best track linear progress. Is Domain Authority an absolute score or a relative one? Domain Authority is a relative, comparative metric. Moz evaluates over 5 trillion pages and greater than 35 trillion links to inform Domain Authority. Your site’s links are evaluated amongst those trillions of links. Because of this, it is important to compare your DA to your competition, peers, and other sites that show up in search results important to your strategy. In terms of how to use Domain Authority, nothing is changing. If you use it to evaluate domains to purchase, it will function exactly the same. If you use it to find hidden keyword ranking opportunities, it will still be your best friend. It’s the same trusty tool you used before — we just sharpened for you. I saw a change to my DA when Link Explorer launched last April. What’s the difference between that change and this one? In April 2018, Moz released its new link index along with its new research tool, Link Explorer. Because the link index was so much larger than the previous index, and because Domain Authority is based on attributes discovered in that index, scores changed. Any changes that occurred were due to the upgrade of that link index, not how the algorithm calculated scores. The change coming in March 2019 will be an actual algorithm update to how Domain Authority is calculated. How will Page Authority (PA) be affected by this update? Page Authority will not be impacted by the March 2019 update. This particular algorithm update is specific to Domain Authority only. Will API users be affected at the same time? Yes. The Domain Authority metric in all of our products, including our API, will be affected by this update on March 5th. Check out this page for more resources about the Domain Authority algorithm update. You can also read more here in Russ Jones’s announcement post on the blog. We’d love to hear from you here in this Q&A thread, or you can send an email over to help@moz.com with any questions.
API | | BrianChilds22 -
How frequently is the Search Volume update for each keyword? API for Search Volume?
Subject pretty much says it all... How frequently is the Search Volume update for a given keyword? Is there an API call that would include keyword-specific Search Volume for one or more keywords? Thank you.
API | | ToddLevy0 -
How to retrieve keyword difficulty information using Mozscape API?
Hi, Are we possible to use Mozscape API to retrieve keyword difficulty information for a list of keywords? I can't find its documentation. Thanks
API | | uceo0 -
Can we get access to Moz's Rank Tracker via the API?
I'd like to be able to pull the results from Rank Tracker into my own application. Can I access it via an API? I don't see it anywhere in the Moz documentation, which is usually a clear answer. If not, how do you suggest to automate the inclusion of this data without, for example, being blacklisted?
API | | MB070 -
Moz Analytics doesn't work
Hello Mozers, Moz analytics doesn't work already more than a week. I've tried on Mozila Firefox 29.0.1 and Chrome Version 34.0.1847.137 and getting some nasty errors. Firefox Firebug console: TypeError: freya.getApp is not a function
API | | juris_l
...mpaign_owner_id:app.getParam('campaign_user_num'),campaign_id:app.getParam('camp... applic...23b3.js (line 223) Chrome Firebug console: Uncaught TypeError: undefined is not a function <a class="console-message-url webkit-html-resource-link" title="http://analytics.moz.com/assets/application-c42010fe6b0d425aca7cad19e29223b3.js:223">application-c42010fe6b0d425aca7cad19e29223b3.js:223</a> freya.RoutesUtils.applyDefaultParams<a class="console-message-url webkit-html-resource-link" title="http://analytics.moz.com/assets/application-c42010fe6b0d425aca7cad19e29223b3.js:223">application-c42010fe6b0d425aca7cad19e29223b3.js:223</a> freya.RoutesUtils.make<a class="console-message-url webkit-html-resource-link" title="http://analytics.moz.com/assets/application-c42010fe6b0d425aca7cad19e29223b3.js:223">application-c42010fe6b0d425aca7cad19e29223b3.js:223</a> freya.Routes.settingsCampaignPath<a class="console-message-url webkit-html-resource-link" title="http://analytics.moz.com/assets/application-c42010fe6b0d425aca7cad19e29223b3.js:227">application-c42010fe6b0d425aca7cad19e29223b3.js:227</a> freya.views.NavSite.freya.View.extend.renderData<a class="console-message-url webkit-html-resource-link" title="http://analytics.moz.com/assets/application-c42010fe6b0d425aca7cad19e29223b3.js:713">application-c42010fe6b0d425aca7cad19e29223b3.js:713</a> freya.View.Backbone.View.extend.render<a class="console-message-url webkit-html-resource-link" title="http://analytics.moz.com/assets/application-c42010fe6b0d425aca7cad19e29223b3.js:231">application-c42010fe6b0d425aca7cad19e29223b3.js:231</a> (anonymous function)<a class="console-message-url webkit-html-resource-link" title="http://analytics.moz.com/pro/home:213">home:213</a> fire<a class="console-message-url webkit-html-resource-link" title="http://analytics.moz.com/assets/jquery-b68cd156547e7de90502ecf7becf0beb.js:60">jquery-b68cd156547e7de90502ecf7becf0beb.js:60</a> self.fireWith<a class="console-message-url webkit-html-resource-link" title="http://analytics.moz.com/assets/jquery-b68cd156547e7de90502ecf7becf0beb.js:66">jquery-b68cd156547e7de90502ecf7becf0beb.js:66</a> jQuery.extend.ready<a class="console-message-url webkit-html-resource-link" title="http://analytics.moz.com/assets/jquery-b68cd156547e7de90502ecf7becf0beb.js:22">jquery-b68cd156547e7de90502ecf7becf0beb.js:22</a> DOMContentLoaded<a class="console-message-url webkit-html-resource-link" title="http://analytics.moz.com/assets/jquery-b68cd156547e7de90502ecf7becf0beb.js:53">jquery-b68cd156547e7de90502ecf7becf0beb.js:53</a>1