Mozscape API Updates (Non-updates!) - becoming a joke!
-
This is the 3rd month in succession where the Mozscape index has been delayed. Myself and clients are losing patience with this, as I am sure many others must be.
Just what do you suppose we tell clients waiting for that data? We have incomplete and sometimes skewed metrics to report on, delays which then get delayed further, with nothing but the usual 'we are working on it' and 'bear with us'.
It's becoming obvious you fudged the index update back in January (see discussion here with some kind of explanation finally from Rand: https://moz.com/community/q/is-everybody-seeing-da-pa-drops-after-last-moz-api-update), and seems you have been fumbling around ever since trying to fix it, with data all over the place, shifting DA scores and missing links from campaign data.
Your developers should be working around the clock to fix this, because this is a big part of what you're selling in your service, and as SEO's and marketers we are relying on that data for client retention and satisfaction. Will you refund us all if we should lose clients over this?! .. I don't think so!
With reports already sent out the beginning of the month with incomplete data, I told clients the index would refresh April 10th as informed from the API updates page, only to see it fudged again on day of release with the index being rolled back to previous. So again, I have to tell clients there will be more delays, ...with the uncertainty of IF it WILL EVEN get refreshed when you say it will. It's becoming a joke.. really!
-
Hey Matt - I can get into some of the nitty gritty details on this.
Basically - we've been having trouble of all kinds with Mozscape, and while our team has indeed been working around the clock, the reality is that it's an old, clunky, hard-to-understand system that needs to be replaced entirely. That work is also going on, but as you might imagine, has a separate team on it, which means the Mozscape team's bandwidth is split.
Mozscape has crawling trouble - we've had issues with our own crawler design, specifically with spam that's fooled our crawlers (it's designed to fool Google, obviously, but has caught us, too), and biased our index. We also had an issue where some code was commented out that helped us recrawl important pages and other issues (along with a couple of longtime engineering departures) made that invisible to us for a good few months (even with it fixed, it will take an index or two to get back to normal). We've had other issues with hardware and bandwidth restrictions, with team changes, with unintentionally excluding important sites and important pages on sites due to erroneous changes on our end, with robots.txt interpretation mistakes. You name it. It's been pretty frustrating because it's never a single issue coming up again and again, but rather new issues each time. The team currently on the Mozscape project is relatively new -- we had almost complete turnover on that team in the last year (a combination of voluntary and non), so there's a lot of rampup and trying to understand what things do, and fix old problems, etc. I'm sure as an engineer you're familiar with those types of challenges, especially when the documentation isn't pristine.
IMO - those are crappy excuses. We should be better. We will be better. I don't provide them to pardon our shitty quality the last few months, but rather because you said you wanted detail, and I do love transparency.
I think we're going to have a tough slog until the new index system comes out (likely this Fall). I'm keeping my fingers crossed that we can repair each new problem and that few others arise, but the past 6 months have made me wary of overpromising and under-delivering.
BTW - it is true that the ML model means there's lots of DA flux as the goal is to be as accurate as possible with Google's changes, so if we see a site with certain types of inputs matching patterns of sites that don't rank as well, that DA will drop. Given that Google's rankings fluctuate all the time, that our crawlers fluctuate a lot (more than they should, as noted above), and that the link graph changes constantly, a lot of flux in DA is to be expected. That said, the new model will have DA refreshed daily, rather than monthly, and will also have history, as well as a way to dig in and see what inputs are big in DA and how those have changed. I think all of that will help make these shifts vastly more transparent, even if they continue to be high (which they should so long as Google's own flux is high).
One thing I am working on with the team - a different kind of score, called something like "domain visibility" or "rankings visibility" that tracks how visible a site's pages are in a large set of Google rankings. I think that score might be more what clients are seeking in terms of their overall performance in Google, vs. their performance in the link graph and how their links might be counted/correlated with higher/lower rankings.
-
Hi Lisa, we've also experienced many of the same frustrations Greg mentions, but I mainly wanted to respond to your comments about not being able to compare Domain Authority over time. Given that this metric is positioned as measuring a website's overall ability to rank, it shouldn't be unexpected that people want to see how their score evolves over time. Even in your own software the change in Domain Authority since last update appears as one of the most prominent items on the Dashboard, and you also show a graph charting how it changes over time. My question is: since you clearly understand that customers want to be able to compare how Domain Authority evolves over time on a consistent scale, why not at least attempt to normalize it? I am also a machine learning engineer so the explanation "this is based upon machine learning so it will fluctuate unpredictably" makes no sense to me. You could normalize your inputs based upon certain characteristics of the population, or you could use a representative basket of websites to normalize the outputs. From what I've seen it seems that even just normalizing based on the size of your index hugely improves the consistency. It wouldn't need to be perfect to be a huge improvement over the current situation.
-
thanks for your reply Lisa.
Please understand my point is not to get a discount, refund or cancel my subscription. It's to air my grievances (in a place where it can be understood contextually more than a twitter tweet) and make you aware of how this looks for us as marketers in the middle.
I would far prefer you and the teams there to be able to fix the issues you're having and be on time with the updates, than migrate away from Moz - that's the last thing I want.
I'm aware the other tools and services are functioning fine, but of course with DA and link data missing there's a huge gap in our reporting to clients, and the frequent delays don't do anything for confidence.
I thank you for the detailed reply - which in reality could have been a good way to calm our nerves (with a product support blog post, email update to customers) rather than just relying on the Mozscape API update page saying 'we are having some problems with launching the new index'. That kind of transparency is after all part of the TAGFEE of Moz which has made it a success.
I sincerely hope you can iron-out the problems there, so we can all be confident in the data we are reporting on (and when index refreshes and everything else is scheduled to happen).
Greg
-
Hi Greg,
My name is Lisa and I'm a member of the support team here at Moz.
I want to thank you for getting in touch here and on Twitter to tell us about your concerns and the impact that these failures have on you as a customer and on your clients.
You're right - the last few updates have sucked. They've not been up to the standard that we hope to provide as a company and they haven't been what we want for you or your clients.
And you're right again when you say that our apologies and "we're working on it" messages haven't been enough to give you confidence that we really are doing the best we can to be better.
I apologise for the issue that you've had with your reports and their incomplete data and I wish I could make you some assurances but no decisions have currently been made and I don't want to make you any false promises.
Behind the scenes, the engineering teams have been doing a lot of work to build a better and more reliable crawler, shore up the infrastructure used to store and return the index data and proactively work to spot problems and prevent them from reaching our customers.
Some of this work has caused problems of its own - changing the infrastructure, for instance, made the index fail to upload correctly several times at the end of February - and some of it has not yet been completed.
You mentioned that we 'fudged the index update back in January' and to some extent that is true. We collected a lot of data from sites that were junk and had no value. Since then, we've been reviewing which sites are included in the index and working to strike a balance so that valuable sites are crawled but spam sites and subdomains are blocked. A lot of this work must be done manually.
Another problem we have, with Domain Authority in particular, is that we know there's a mismatch between how it works and how our customers apply it.
Although it is a score between 0 and 100, Domain Authority is a relative metric. It depends on the ranking model data we have (about which types of sites are likely to rank) and the data we've collected in the index, not just for one site but for all of them. A change to the link data we've collected for your site, or for other sites, or a change to the ranking model can dramatically affect the Domain Authority score of any site. This data should not be considered in a vacuum but used comparatively.
Domain Authority scores have always been and will always be expected to fluctuate. The best way to use a site's Domain Authority is to compare it to competitor sites and measure the distance between them. Using it as a standard score out of 100 is likely to cause anger and frustration when it drops, even though drops and rises are both part of the nature of Domain Authority.
Everyone here is invested in making the index better and we all want to make you and your clients happy. We'd like to provide round the clock coverage to solve these problems but this is not possible for us. We have a small engineering team based in Seattle and we use their time as efficiently as possible to allow them to do their best work.
We do feel that our tools have value beyond the link index and Domain Authority data - that's why we offer this for free using MozBar, Open Site Explorer and our API - but I would understand if you feel that the tools are not meeting your needs while there are problems and delays with this data. I'd be happy to help you cancel your subscription and offer a refund on your last payment if this in the case - just reach out to me via help@moz.com.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I get "Date First Seen","Date Last Seen" and "Date Lost" from the API?
"Date First Seen","Date Last Seen" and "Date Lost" are columns in the CSV exported from LinkExplorer's Inbound Links page. How do I get that data from the API?
API | | StevePoul1 -
Can the API give the same data as the UI?
What I mean is, I'm not that interested in counts. What I'd like to figure out is how to get the API to give me what the UI gives me , so that for a given target domain I can download a report with the following fields: URL
API | | StevePoul
Title
Anchor Text
Spam Score
PA
DA
Linking Domains to Page
Target URL
Link Type
Link State
Date First Seen
Date Last Seen
Date Lost
HTTP Status Code
Links to Page
Outbound Domains from Page
Outbound Links from Page
Operator profile That's essentially the same as a CSV report generated by clicking on the "Export CSV" button. I could ,I suppose, get someone to write something in Selenium to enter a domain, click on the button and hang around on the notifications page for the results but I'd really rather not.1 -
Need help understanding API
I know what information I need to pull... I know I need APIs to do it... I just don't know how to pull it or where. I have tools like Screaming Frog, Scrapebox, SEMRush, Moz, Majestic, etc. I need to find out how to type in a query and pull the top 10 ranking specs like DA, PA, Root Domains, Word Count, Trust Flow, etc. Here is a screenshot of info I manually pulled... https://screencast.com/t/H1q5XccR8 (I can't hyperlink it... it's giving me an error) How do I auto pull this info?? H1q5XccR8
API | | LindsayE0 -
Still not got any index update data.
Is anyone finding that they haven't got the results of the update yet? I have tried some competitors and they are not updated either.
API | | AHC_SEO0 -
Two days since the supposed update
And still no update. Less a question, more a comment. Feeling a bit of deja vu here.
API | | pfrance3 -
Can the API Filter Links with Certain Anchor Text?
I am trying to get all links that have a certain strings in their anchor text: I am using the python library: https://github.com/seomoz/SEOmozAPISamples/blob/master/python/lsapi.py Looking at the documentation, it says I can get the normalized anchor text by using the bit flag 8 for the LinkCols value: https://moz.com/help/guides/moz-api/mozscape/api-reference/link-metrics So I tried this: links = l.links('example.com', scope='page_to_domain', sort='domain_authority', filters=['external'], sourceCols = lsapi.UMCols.url, linkCols=8) But it doesn't return the expected 'lnt' response field or anything similar to the anchor text. How do I get the anchor text on the source URLs? I also tried 10 for the linkCols value, to get all the bit flags in the lf field as well as the anchor text. In both instances (and even with different variations of targetCols & sourceCols), this is all the fields that are returned: 'lrid', 'lsrc', 'luuu', 'uu', 'luupa', 'ltgt'
API | | nbyloff0 -
Mozscape Index
Hello: There was a Mozscape Index scheduled 9/8/2015 and now it go pushed back October 8,2015. There seems to be a lot of delays with the Mozscape Index. Is this something we should expect? Updates every 2 months instead of every month? Thanks!
API | | sderuyter1 -
August 3rd Mozscape Index Update (our largest index, but nearly a monthly late)
Update 5:27pm 8/4 - the data in Open Site Explorer is up-to-date, as is the API and Mozbar. Moz Analytics campaigns are currently loading in the new data, and all campaigns should be fully up-to-date by 4-10pm tomorrow (8/5). However, your campaign may have the new data much earlier as it depends on where that campaign falls in the update ordering. Hey gang, I wanted to provide some transparency into the latest index update, as well as give some information about our plans going forward with future indices. The Good News: This index, now that it's delivered, is pretty impressive. Mozscape's August index is 407 Billion URLs in size, nearly 100 Billion (~25%) bigger than our last record index size. We indexed 2.18 trillion links for the first time ever (prior record was 1.54 trillion). Correlations for Page Authority have gone up from 0.319 to 0.333 in the latest index, suggesting that we're getting a slightly more accurate representation of Google's use of links in rankings from this data (DA correlations remain constant at 0.185) Our hit ratio for URLs in Google's SERPs has gone up considerably, from 69.97% in our previous index to 78.66% in the August update. This indicates we are crawling and indexing more of what Google shows in the search results (a good benchmark for us). Note that a large portion of what's missing will be things published in the last 30-60 days while we were processing the index (after crawling had stopped). The Bad News: August's index was late by ~25 days. We know that reliable, consistent, on-time Mozscape updates are critically important to everyone who uses Moz's products. We've been working hard for years to get these to a better place, but have struggled mightily. Our latest string of failures was completely new to the team - a bunch of problems and issues we've never seen before (some due to the index size, but many due to odd things like a massive group of what appear to be spam domains using the Palau TLD extension clogging up crawl/processing, large chunks of pages we crawled with 10s of thousands of links which slow down the MozRank calculations, etc). While there's no excuse for delays, and we don't want to pass these off as such, we do want to be transparent about why we were so late. Our future plans include scaling back the index sizes a bit, dealing with the issues around spam domains, large link-list pages, some of the odd patterns we see in .pl and .cn domains, and taking one extra person from the Big Data team off of work on the new index system (which will be much larger and real-time rather than updated every 30 days) to help with Mozscape indices. We believe these efforts, and the new monitoring systems we've got will help us get better at producing high quality, consistent indices. Question everyone always asks: Why did my PA/DA change?! There are tons of reasons why these can change, and they don't necessarily mean anything bad about your site, your SEO efforts, or whether your links are helping you rank. PA and DA are predictive, correlated metrics that say nothing about how you're actually performing. They merely map better than most metrics to Google's global rankings across large SERP sets (but not necessarily your SERPs, which is what you should care about). That said, here's some of the reasons PA/DA do shift: The domains/pages with the highest PA/DA scores gain even faster than most of the domains below them, making it harder each index to get higher scores (since PA/DA are on a logarithmic scale, this is smoothed out somewhat - it would be much worse on a conventional scale, e.g. Facebook.com 100, everyone else 0.0003). Google's ranking algorithm introduces new elements, changes, modifies what they care about, etc. Moz crawls a set of the web that does or doesn't include the pages that are more likely to point to a given domain than another. Although our crawl tends to be representative, if you've got lots of links from deep pages on less popular domains in a part of the web far from the mainstream, we may not consistently crawl those well (or, we could overcrawl your sector because it recently received powerful links from the center of the web). My advice, as always, is to use PA/DA as relative scores. If your scores are falling, but your competitors' are falling more, that's not a bad thing. If your scores are rising, but your competitors' are rising faster, they're probably gaining ground on you. And, if you're talking about score changes in the 1-4 points range, that's not necessarily anything but noise. PA/DA scores often shift 1-4 points up or down in a new index so don't sweat it! Let me know if you've got more questions and I'll do my best to answer. You can also refer to the API update page here: https://moz.com/products/api/updates
API | | randfish8