Sounds great, Mike! Just send them over and I'll take a look!
Thanks,
Carin
Welcome to the Q&A Forum
Browse the forum for helpful insights and fresh discussions about all things SEO.
Sounds great, Mike! Just send them over and I'll take a look!
Thanks,
Carin
Hey Mike,
I'm sorry you're so frustrated with the issues in the index lately - I know it's an inconvenience, but, I assure, you the team has been working all hours to work out these kinks!
In fact, after many nights and weekends sacrificed, we're looking at probably being early on our next release. The bugs will be much less evident in this next index as the stale crawl data is dropped from the index.
I know that doesn't help you out right now. Can you send me some details on the corruption you're seeing? A full OSE link with all the parameters would be perfect as well as a CSV, if you have one. If you don't feel comfortable posting in Q&A, please email me at carin@seomoz.org.
This sounds like the same bug we saw emerge in this index, and have since fixed, but I want to make sure that is the case.
Again, I'm really sorry for the inconvenience and frustration this is causing - we are working hard at ironing out these final issues!
Thanks,
Carin
This just came to our attention yesterday and our engineers have been investigating over the weekend. It appears to be fallout from the parsing bug that caused the initial delay of this index launch.
We're still investigating, but we do have another index in the works, with the parsing bug no longer present. We hope to have this ready in the next two weeks. In the meantime, we're looking into how we can remedy this current anchor text portion.
If you would like to read more about the parsing bug, Phil provided a great explanation in the forum article here.
Sorry for the inconvenience this will cause - we're looking into ways to remedy this as soon as we can!
Thanks,
Carin
Yep, David is correct - that call is only available with a paid API plan. If you are interested in a paid plan, check out the different tiers on our Mozscape API page.
Thanks!
Carin
No problem! I'm so sorry for the inconvenience!
I just pushed the remaining pending reports through, so I think you should be set, but if you continue to run into any problems, just let me know!
Hey there!
Haha - we ran into a problem on Monday night with one of the machines falling over causing a huge backlog to pile up. We were able to get things back on track yesterday and churn through the backed up reports, but with the index launch yesterday, we're seeing a bit of a backlog again this morning.
We are getting a monster machine up right now to speed through this! Once things calm down you should see these come through. It looks like you have about 8 pending - I'll keep an eye on them to make sure they go through!
Thanks,
Carin
Hey there!
The Top 500 list is compiled from our Mozscape (formerly Linkscape) link data compiled from our crawlers, but, unfortunately, we don't crawl Facebook since the pages are https.
Adding the ability to crawl https is on our road map, however!
Thanks,
Carin
Hey Ravi,
Sorry for the delayed response - I wanted to follow up with the engineers to see if they had any suggestions for you.
They agreed the Limit parameter set to 1,000 might be too large to process. Have you tried adjusting that to 300 or even 500? Do you see better success at a lower limit?
Our system will timeout at about 60 seconds so I'm not sure if the hanging is on our end. If dropping the limit size doesn't help, you might want to think about ending the request after about a minute. Sometimes requests that are too long wil timeout, but work fine on a retry as some data will be cached from the previous request.
I hope this information is helpful, but let me know if you're still experiencing issues!
Thanks,
Carin
Hey there!
Just want to make sure I'm understanding what you're trying to do - basically you're hoping to use jQuery to send requests to the API and then fetch the JSON results?
What type of queries are you sending the API? What would the API query look like?
Also, we do have the API Help Forums to post in or search as well - not sure if you've explored these pages, but there could be some helpful information for you there as well!
Thanks!
Carin
Hey! This is an issue I haven't heard of before - would you be able to provide anymore information like an example query to the API and some of the pages you are seeing hang?
Thanks!
Carin
Hey Ravi,
If you are a Free API user, that is probably the fastest way you're going to be able to pull that data. However, if you're using the Site Intelligence API, you would be able to get those calls from the URL-metrics API call, which is much more efficient and a lot faster.
You can see the Free versus Paid list here: http://apiwiki.seomoz.org/w/page/13991153/URL Metrics API
Internal links: Links (uid) - External Links (ueid)
External links: ueid
Follow links: External Links (ueid) - Juice-Passing Links (ujid)
Nofollow links: Links (uid) - Juice-Passing Links (ujid)
I'm sorry I can't be more help! Let me know if you have more questions,
Carin
Hey Ravi,
The links call is pretty intensive and does take some time to process requests, but there could be a way to optimize your call. Would you be able to post the final URL you are sending to the API (eliminating your secret key)?
Also, how many requests are sending in at a time? Are you sending these in as batch requests or single requests?
Thanks!
Carin
Hey Max,
*imr is the "raw" internal MozRank value. The "pretty" value that most people use for their applications is *imp, which has been fit to a nice 0-10 scale.
*imr isn't an official data point in the API, just meaning we reserve the right to change as necessary, which is why we don't have it documented. However, we still return the value if you know the endpoint
Hope that helps! Let me know if you have more questions!
Thanks,
Carin
Hey!
Page title is missing from the API in this example because we only saw the link and did not crawl the page. This is indicated by the value returned in the "us" column indicating whether or not we crawled the page. A "0" means that we did not crawl the page.
The Keyword Difficulty tool pulls some metrics from the API, but is able to get the page title while searching the keyword. The tools is not querying the API for the title.
Unfortunately, in this case, there won't be a way to get the page title for this URL.
I hope that helps, but let me know if you have more questions!
Thanks,
Carin
Hey Shinya,
I'm guessing from your question that you are trying the above API query and receiving an error (401 Unauthorized - unauthorized api 'links/page_to_page.domain_authority').
The reason is because the Scope and Sort parameters you are using are incompatible. The request you are sending is asking for pages on the same domain to be sorted by Domain Authority, but they will all have the same Domain Authority.
However, if you try something like this, it should work to sort the results by Page Authority. Let me know if you are still seeing the results you are hoping for!
http://lsapi.seomoz.com/linkscape/links/www.seomoz.org/blog?Scope=page_to_page&Sort=page_authority
Thanks,
Carin
Hey!
That sounds like odd behavior and I don't think I've heard of that happening before. I'd love to dig a bit deeper to see what's going on.
Would you be able to send me the pages you are searching? I assume you are experiencing this in Open Site Explorer?
If you would prefer not post the URLs in this forum, feel free to email me directly at carin@seomoz.org!
Thanks,
Carin
Hey Lawrence,
The changes made to the PA model do not treat 301 redirects any different - we haven't made any updates to how we handle redirects. However, if you've seen a drop in your links, we probably didn't crawl the pages redirecting to you. This would definitely affect your PA score.
I hope this helps, but let me know if you have more questions! If you want to find out more information on the new PA model, Matt Peters, the engineer who worked on the new model, has a great blog post here:
http://www.seomoz.org/blog/introducing-seomoz-updated-page-authority-and-domain-authority
Thanks,
Carin
Hey Zack,
Sorry to hear you're still having problems - we've seen an improvement on most sites at this point. Would you want to send me info on the site you're searching and any filters you are using?
If you don't feel comfortable posting that info on this thread, feel free to email me directly: carin@seomoz.org.
Thanks!
Carin
Haha! Sorry to dispel the mystery with boring reality! I did like your conspiracy theory though - much more exciting!!
Carin
Sure! Glad it was helpful!
Please post any questions you guys still have and I'll answer as best I can
Thanks!!
Carin
Hey Lee,
Sha just gave me the heads up about this thread so I wanted to jump in and see if I can clarify what's going on with these downloadable links.
We made some improvements to the Linkscape crawler to make it fresher, crawl deeper and crawl more diverse domains - however, the deeper part ended up bringing to light a bug we had in the crawler. Once we started crawling deeply into websites, we started encountering more downloadable files which our crawler had no idea what to do with. They thought it was a link so they crawled it, but then when trying to associate it with a domain, it didn't know how to properly handle it and it ended up causing weird associations with domains previously crawled by the crawler.
We have been able to implement a few fixes, but, unfortunately, they take a bit of time to propagate through into the index - a full month to crawl and several weeks to process.
There were two solutions we found after investigating this problem. First, don't count binary files as a link - this has been done and should be part of our next index scheduled to launch 10/18. This should address about 70% of the issue. Second, update the crawler to disregard download files if it does encounter them. This update was just recently deployed to our crawlers and still needs about a month to propagate and go through processing. The affects of this fix probably won't be seen for another two index updates.
I hope this helps clear up some of the confusion going on here - most likely the weird "phantom" links you're seeing are a result of this bug we discovered in our updated crawler. If you're still seeing odd behavior after the next index update scheduled 10/18, please email the Customer Service team! We love the feedback as it makes our crawler be even better!
Thanks,
Carin
Thanks Ryan for the great answer! We do have the new social features in Open SIte Explorer that display the Facebook shares, collected from their FQL API.
We are also in development of a new tool in the PRO app offering Social Analytics metrics. Here is Rand's blog post about it!
Hope that helps, but let me know if you have any more questions!
Thanks,
Carin
For sure! I figured it's a question that has crossed people's minds before (I know it has for me!) so if you see anyone else wondering, I wanted you to have something to point them to
Thanks!
Carin
Hey! I just saw this post - not sure if you filed a ticket the Customer Service team, but I wanted to see if I can help explain what's going on.
I wanted to make sure I had a full handle of what was going on so I talked with one of the Open Site Explorer developers. So my understanding is it will automatically redirect when Linkscape has identified it as a redirect, but if Open Site Explorer requests data from Linkscape and is returned a 301, then the message alert will show up and Open Site Explorer will show the redirected URL metrics.
Two main reasons this could happen - the redirect wasn't in place when we crawled the page so it wasn't recorded as a redirect. The other reason could be when Open Site Explorer requests Linkscape metrics for a given URL, it does not explicitly say if the URL is a redirect - it only tells us if the searched form is canonical. If the search form does not match the canonical, we assume it redirects to the canonical.
I hope that helps explain a bit, but let me know if you have more questions! The customer service team is awesome and they will be able to help you at as well!
Thanks,
Carin
Hey PSV,
Sorry I'm jumping in so late to the game on this! The response from the Customer Service team by Kenny below is correct - with the adjustments we've made to the new crawler, we're going less deep this crawl than we did in the July crawl. This is because we're still trying to make tweaks to get the domain diversity back up to what is what before. It looks like you had a major change from July to August which is strange - I haven't seen such a drastic change pointed out by any other users.
If I could get the actual domain, I can look into this for you. Is this discrepancy being reported by Open Site Explorer or are you querying the API directly?
It's hard to speculate without knowing which domain you're referring to, but you may see a decrease in inbound links if you had a lot of links from a high MozRank page. If you had a large amount of links on one domain and we didn't crawl that domain as deeply this past index where we went MUCH deeper the index before, many of the links deeper down in the domain won't be in this latest index.
Please let me know which domain you're referring to and I can try to track down some information for you!
Thanks,
Cairn
Hey guys,
Yep, Keri is correct, unfortunately We found a bug in ourJuly index with our new crawlers - they were crawling binary files as if they were links and, since they are not normal links, the crawler couldn't handle them very well.
We have made some updates to our crawling so it will go deeper into sites. The reason for these odd inbound links from high-authority sites is due to the crawler reaching much deeper into sites where there are more download (i.e. binary) links. The first issue is the crawler is counting a binary file as a link, but the larger issue, is that the crawler doesn’t really know how to handle these types of files. This bug is causing some links to be improperly associated with certain domains. This is why you’re seeing inbound links to pages that don’t really exist.
There are two steps to addressing this issue: changing how the crawler sees these file types and then fixing how the crawler handles these file types. We have made improvements to our algorithm so that we will be able to handle the majority of these files correctly, however, this update will need a few more weeks to propagate. The fix for this issue probably won’t be seen for another update, meaning late September. Our improvements should catch most of the issues, but there still could be a few cases we haven't addressed. If this happens, don't hesitate to let us know; we love feedback since it helps us improve and make our index even better!
The next step is to fix how our crawlers handle binary file links and prevent them from being improperly associated with certain domains. We are in the process of working through that issue right now. We’re doing everything we can to resolve this bug as we know it is alarming to see these phantom inbound links.Thanks for your patience!Carin
Hey! Same message to you If you want me to look into your account, feel free to email me at carin@seomoz.org!
Just provide some info on what URL you are looking for, what tab you're requesting from and if you have any filters applied and we'll look into why these certain ones are having issues.
Thanks for your patience!!!
Carin
Hey! I need some more info from you and I can look into it - feel free to email me at carin@seomoz.org.
You're not being difficult - I understand the frustration
If you can provide your account information in an email to me, as well as any information regarding which tab you're requesting from and what URL you are querying for and if you were applying any filters.
Thanks!
Carin
Hey! Can you give me some more information so I can dig into this? Looks like the last batch of 100 reports is crunching through now, but I'd still like to find out why you're not getting yours.
Can you send me some info on the request like: the site searched, any filters used and from which tab.
Thanks!
Carin
Hey guys!
I wanted to jump in here and give you all the lastest on the CSV download issue. We were finally able to clear through the backlog of CSV reports about 7 pm PST last night, however, there were about 3,000 jobs that were still in a finalizing status and just hanging. We were able to work a quick fix to get these last remaining reports out today. The fix was such a great idea, we've decided to make it a permanent feature in Open Site Explorer!
Since the new launch of OSE, we've had reports of users requesting reports that end up hanging during high peak times. The fix we added today will help in those scenarios by re-queueing the report if it is hanging for a long period of time.
I'm hoping this helped get the last of the missing reports out, but please let me know if you guys are still seeing pending or hanging requests.
Thanks!
Carin
Hey John,We have made some updates to our crawling so it will go deeper into sites, and this is the first launch including the new metrics. We've discovered a bug, however, in the updated crawler. The first issue is the crawler is counting a binary file as a link, but the larger issue, is that the crawler doesn’t really know how to handle these types of files. This bug is causing some links to be improperly associated with certain domains. This is probably why you’re seeing your competitor have these .edu links - they're probably incorrectly associated with their site.
There are two steps to addressing this issue: changing how the crawler sees these file types and then fixing how the crawler handles these file types. We have made improvements to our algorithm so that we will be able to handle the majority of these files correctly, however, this update will need about a month to propagate. The fix for this issue probably won’t be seen for two more updates, meaning late September. Our improvements should catch most of the issues, but there still could be a few cases we haven't addressed. If this happens, don't hesitate to let us know; we love feedback since it helps us improve and make our index even better!
The next step is to fix how our crawlers handle binary file links and prevent them from being improperly associated with certain domains. We are in the process of working through that issue right now. We’re doing everything we can to resolve this bug as we know it is alarming to see these “questionable” links associated to either your site or your competitors sites.
I hope this helps answer your questions around these .edu links, but let me know if you have any more questions!
Thanks,
Carin
Hey Zack,
Thanks so much for understanding! We are doing everything we can to get the bug resolved. Binary files are the downloadable files you see as links - .pdf, .exe, .img, etc.
I'm really sorry, but we don't have a URL to the old OSE. I saw Steven's response as a workaround - is that possible or are there too many file types to filter out?
Our crawlers that provide the metrics to OSE are always crawling, but will take about a month for our fix to propagate through to all the pages we crawl. Once we have removed these links from our crawlers, then we'll have to process the metrics. This is why it's looking like late September for the fix to show up.
I really appreciate your patience and understanding, we're doing everything we can to fix it!!
Thanks,
Carin
Hey Zack, I saw the ticket you filed was answered by Aaron, but I just wanted to follow up with you as well. We have made some really exciting changes to the crawler, but, unfortunately, there is a pretty obvious bug as well...
The reason for the “questionable” links coming from the Internet Wild West is due to the crawler reaching much deeper into sites where there are more download (i.e. binary) links. The first issue is the crawler is counting a binary file as a link, but the larger issue, is that the crawler doesn’t really know how to handle these types of files. This bug is causing some links to be improperly associated with certain domains. This is probably what you're seeing with all the crazy links from China and Russia which don't actually link to the site you're researching.
There are two steps to addressing this issue: changing how the crawler sees these file types and then fixing how the crawler handles these file types. We have made improvements to our algorithm so that we will be handle the majority of these files correctly, however, this update will need about a month to propagate. The fix for this issue probably won’t be seen for two more updates, meaning late September. Our improvements should catch most of the issues, but there still could be a few cases we haven't addressed. If this happens, don't hesitate to let us know; we love feedback since it helps us improve and make our index even better!
The next step is to fix how our crawlers handle binary file links and prevent them from being improperly associated with certain domains. We are in the process of working through that issue right now. We’re doing everything we can to resolve this bug as we know it is alarming to see these “questionable” links associated with your sites.I hope this helps and thanks so much for being patient :)Thanks,Carin
Hey guys,
The issue you are seeing is due to the new OSE update. We have done some updating with our crawler and this index represents the newest version - sadly, with a few bugs...We are looking into this issue and hope to have it resolved as soon as possible!
The newest version of our crawler is built to be fresher, but it is also going much deeper into high MozRank pages. This bug has probably always existed, but has never been obvious since we weren't crawling as deep into domains where there are more download links. We are currently looking into fixing this so these won't be counted as inbound links.
I'm so sorry for the inconvenience - once we get this new version of the crawler dialed and smoothed out, it will be providing you guys a much fresher and higher quality index!
There is another thread regarding this topic, so check it out if you want more information on what is going on with this index.
Thanks,
Carin
Hey guys!
Keri is right - we have done some updating with our crawler and this index represents the newest version - unfortunately with a few hiccups. People seem to be seeing two issues with this new index - link counts and domain authorities are going up or down considerably and there is an increase of "questionable" inbound links.
Both issues are due to the same root cause: our new crawler is built to be fresher, but it is going deeper into domains, and, unfortunately not visiting as many domains. Domains with a high MozRank are getting crawled deeper, but domains with middle to lower MozRanks are not getting crawled.
Our top priority now is to get the domain diversity back up to or better than that of our last update as was originally designed. It's fixable and we will be focusing all efforts on this.
Previous crawling worked by selecting a list of the top MozRank URLs (around 10B) and then crawling one page from each of them. Now we are crawling links as we discover them, and crawling high MozRank sites daily, weekly or monthly. The advantage of the new crawlers is we are crawling all the time and so we will have fresher data. As links are added, we are much more likely to discover these deeper links. The new crawl had 59B urls, a lot more than the previous 42B, however, more of these links are from the same domain.
The reason for the "questionable" links is due to the fact that the crawler is reaching deeper into the domains where there are more download links. We are currently looking into fixing this so these won't be counted as links. We'll let you know as soon as that issue is resolved!
We are really sorry for the inconvenience. Once we have this new crawler dialed it will provide much fresher and higher quality data!!
Thanks,
Carin