August 3rd Mozscape Index Update (our largest index, but nearly a monthly late)
-
Update 5:27pm 8/4 - the data in Open Site Explorer is up-to-date, as is the API and Mozbar. Moz Analytics campaigns are currently loading in the new data, and all campaigns should be fully up-to-date by 4-10pm tomorrow (8/5). However, your campaign may have the new data much earlier as it depends on where that campaign falls in the update ordering.
Hey gang,
I wanted to provide some transparency into the latest index update, as well as give some information about our plans going forward with future indices.
The Good News: This index, now that it's delivered, is pretty impressive.
- Mozscape's August index is 407 Billion URLs in size, nearly 100 Billion (~25%) bigger than our last record index size. We indexed 2.18 trillion links for the first time ever (prior record was 1.54 trillion).
- Correlations for Page Authority have gone up from 0.319 to 0.333 in the latest index, suggesting that we're getting a slightly more accurate representation of Google's use of links in rankings from this data (DA correlations remain constant at 0.185)
- Our hit ratio for URLs in Google's SERPs has gone up considerably, from 69.97% in our previous index to 78.66% in the August update. This indicates we are crawling and indexing more of what Google shows in the search results (a good benchmark for us). Note that a large portion of what's missing will be things published in the last 30-60 days while we were processing the index (after crawling had stopped).
The Bad News: August's index was late by ~25 days.
We know that reliable, consistent, on-time Mozscape updates are critically important to everyone who uses Moz's products. We've been working hard for years to get these to a better place, but have struggled mightily. Our latest string of failures was completely new to the team - a bunch of problems and issues we've never seen before (some due to the index size, but many due to odd things like a massive group of what appear to be spam domains using the Palau TLD extension clogging up crawl/processing, large chunks of pages we crawled with 10s of thousands of links which slow down the MozRank calculations, etc). While there's no excuse for delays, and we don't want to pass these off as such, we do want to be transparent about why we were so late.
Our future plans include scaling back the index sizes a bit, dealing with the issues around spam domains, large link-list pages, some of the odd patterns we see in .pl and .cn domains, and taking one extra person from the Big Data team off of work on the new index system (which will be much larger and real-time rather than updated every 30 days) to help with Mozscape indices. We believe these efforts, and the new monitoring systems we've got will help us get better at producing high quality, consistent indices.
Question everyone always asks: Why did my PA/DA change?!
There are tons of reasons why these can change, and they don't necessarily mean anything bad about your site, your SEO efforts, or whether your links are helping you rank. PA and DA are predictive, correlated metrics that say nothing about how you're actually performing. They merely map better than most metrics to Google's global rankings across large SERP sets (but not necessarily your SERPs, which is what you should care about).
That said, here's some of the reasons PA/DA do shift:
- The domains/pages with the highest PA/DA scores gain even faster than most of the domains below them, making it harder each index to get higher scores (since PA/DA are on a logarithmic scale, this is smoothed out somewhat - it would be much worse on a conventional scale, e.g. Facebook.com 100, everyone else 0.0003).
- Google's ranking algorithm introduces new elements, changes, modifies what they care about, etc.
- Moz crawls a set of the web that does or doesn't include the pages that are more likely to point to a given domain than another. Although our crawl tends to be representative, if you've got lots of links from deep pages on less popular domains in a part of the web far from the mainstream, we may not consistently crawl those well (or, we could overcrawl your sector because it recently received powerful links from the center of the web).
My advice, as always, is to use PA/DA as relative scores. If your scores are falling, but your competitors' are falling more, that's not a bad thing. If your scores are rising, but your competitors' are rising faster, they're probably gaining ground on you. And, if you're talking about score changes in the 1-4 points range, that's not necessarily anything but noise. PA/DA scores often shift 1-4 points up or down in a new index so don't sweat it!
Let me know if you've got more questions and I'll do my best to answer. You can also refer to the API update page here: https://moz.com/products/api/updates
-
Rand, I've emailed you. Thx
-
Where are you seeing that? In OSE? Or in Moz Analytics? In Moz Analytics, it's possible that it's still cached, and will be updating (a few thousand campaigns each hour, so not too long until all of them are done), but in OSE, that data should absolutely be new. If not, can you send an email to me - rand at moz dot com - with your sites, and I'll ask the Big Data team to look into it.
-
Hi Rand, I'm still seeing 9 June in my campaigns and no updated data....or missing data. Not fixed here yet.
-
Yup - I'm seeing the same team. Have let our engineers know - hopefully they can sort it out and fix soon.
-
Rand, I'm seeing some seriously weird data on many of our sites. Crazy Euro links that go nowhere...that definitely aren't meant to be there, and link totals that don't add up.
-
I'm seeing some odd ones too that appear not to have updated. Pinging the team as it shouldn't usually take this long for data to update.
-
Update:
some of the sites we are tracking have data in them but it's still from 9 June. The rest are showing incorrect / corrupted links or no links at all.
Conclusion: there is something seriously wrong with the MozScape update for us.
-
Hey Sticky! It takes about 24-48 hours for new index information to be submitted to Moz Campaigns anytime a new Mozscape Index is released. By checking your domain directly on OSE (moz.com/researchtools/ose) you will be able to see your data—and more—before campaigns are updated. This may be slightly delayed as we are building monthly data for all campaigns which we run on the 1st of each month. Generally our index updates are rarely released near the beginning of the month which would not interfere with normal campaign updates.
Hope this helps and let me know if you have any questions!
-
It is working on most sites, but a few I have just checked have changed, ie one started at 27 - 5 hours ago was 32 now 30! So might give it another 24 hours to settle down.
-
Glad it is working for you. I'm still seeing last Index, and in some cases no data.
-
I had to re-fresh a few pages, a few times, but all the data has come though now. Every website up, though a few by only 4, but I am still hopeful that is not noise but the result of hard work.
-
I'm wondering why I can't see the updated MozScape data in my account? It still says next index 9 June and the data still appears to be old (and / or incomplete). Any advice?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Mozscape API subscription
We have questions regarding our subscription and the plan we are on. We are more interested in Mozscape API and not the features that we have access to currently. Will you let us know how we can change. Is there some one that we can chat with. Thanks,
API | | PatientPop
Naveen
naveen.sarabu@patientpop.com0 -
Sitemaps and Indexed Pages
Hi guys, I created an XML sitemap and submitted it for my client last month. Now the developer of the site has also been messing around with a few things. I've noticed on my Moz site crawl that indexed pages have dropped significantly. Before I put my foot in it, I need to figure out if submitting the sitemap has caused this.. can a sitemap reduce the pages indexed? Thanks David.
API | | Slumberjac0 -
DA not updated???
My campaign shows my DA has raised from 10 to 12 but on my open site explorer/on Moz bar it's still showing 10... My competitors are all showing their updated DA on Moz bar and open site explorer - is this an error with my website or is it simply slow to update?
API | | RayflexGroup0 -
How to retrieve keyword difficulty information using Mozscape API?
Hi, Are we possible to use Mozscape API to retrieve keyword difficulty information for a list of keywords? I can't find its documentation. Thanks
API | | uceo0 -
Ranking Drop for August 2016
We've been working our way back after all Google's been doing this year. We'd been sitting in the 30s most of the year, but even with some improvement in keywords as well as backlinks, we just dropped from 31 to 26. Is there a way to find a report or at least some insight as to why we dropped 6?
API | | kkenyon1 -
Spring is here and so is our May Index Update!
Happy Index Release Day! For the second month in a row, our hard-working, supremely dedicated Big Data team has delivered our Index Update EARLY! Beyond being punctual, the May Index is one of our most comprehensive and largest update of the year for Moz. Let’s dig into the details: 162,225,495,455 (162 billion) URLs. 1,135,327,420 (1.1 billion) subdomains. 194,346,505 (194 million) root domains. 1,168,465,575,815 (1.1 Trillion) links. Followed vs nofollowed links 2.84% of all links found were nofollowed 65.80% of nofollowed links are internal 34.20% are external Rel canonical: 28.89% of all pages employ the rel=canonical tag The average page has 92 links on it 76 internal links on average. 16 external links on average.. Go have fun with your new data! PS - For any questions about DA/PA fluctuations (or non-fluctuations) check out this Q&A thread from Rand: https://moz.com/community/q/da-pa-fluctuations-how-to-interpret-apply-understand-these-ml-based-scores
API | | IanWatson5 -
Why did the April Index Raise DA?
All of our websites DA raised dramatically, including the competitors we track Any idea why this may have happened across the board?
API | | Blue_Compass0 -
September's Mozscape Update Broke; We're Building a New Index
Hey gang, I hate to write to you all again with more bad news, but such is life. Our big data team produced an index this week but, upon analysis, found that our crawlers had encountered a massive number of non-200 URLs, which meant this index was not only smaller, but also weirdly biased. PA and DA scores were way off, coverage of the right URLs went haywire, and our metrics that we use to gauge quality told us this index simply was not good enough to launch. Thus, we're in the process of rebuilding an index as fast as possible, but this takes, at minimum 19-20 days, and may take as long as 30 days. This sucks. There's no excuse. We need to do better and we owe all of you and all of the folks who use Mozscape better, more reliable updates. I'm embarassed and so is the team. We all want to deliver the best product, but continue to find problems we didn't account for, and have to go back and build systems in our software to look for them. In the spirit of transparency (not as an excuse), the problem appears to be a large number of new subdomains that found their way into our crawlers and exposed us to issues fetching robots.txt files that timed out and stalled our crawlers. In addition, some new portions of the link graph we crawled exposed us to websites/pages that we need to find ways to exclude, as these abuse our metrics for prioritizing crawls (aka PageRank, much like Google, but they're obviously much more sophisticated and experienced with this) and bias us to junky stuff which keeps us from getting to the good stuff we need. We have dozens of ideas to fix this, and we've managed to fix problems like this in the past (prior issues like .cn domains overwhelming our index, link wheels and webspam holes, etc plagued us and have been addressed, but every couple indices it seems we face a new challenge like this). Our biggest issue is one of monitoring and processing times. We don't see what's in a web index until it's finished processing, which means we don't know if we're building a good index until it's done. It's a lot of work to re-build the processing system so there can be visibility at checkpoints, but that appears to be necessary right now. Unfortunately, it takes time away from building the new, realtime version of our index (which is what we really want to finish and launch!). Such is the frustration of trying to tweak an old system while simultaneously working on a new, better one. Tradeoffs have to be made. For now, we're prioritizing fixing the old Mozscape system, getting a new index out as soon as possible, and then working to improve visibility and our crawl rules. I'm happy to answer any and all questions, and you have my deep, regretful apologies for once again letting you down. We will continue to do everything in our power to improve and fix these ongoing problems.
API | | randfish11