Hey guys,
Yep, Keri is correct, unfortunately We found a bug in ourJuly index with our new crawlers - they were crawling binary files as if they were links and, since they are not normal links, the crawler couldn't handle them very well.
We have made some updates to our crawling so it will go deeper into sites. The reason for these odd inbound links from high-authority sites is due to the crawler reaching much deeper into sites where there are more download (i.e. binary) links. The first issue is the crawler is counting a binary file as a link, but the larger issue, is that the crawler doesn’t really know how to handle these types of files. This bug is causing some links to be improperly associated with certain domains. This is why you’re seeing inbound links to pages that don’t really exist.
There are two steps to addressing this issue: changing how the crawler sees these file types and then fixing how the crawler handles these file types. We have made improvements to our algorithm so that we will be able to handle the majority of these files correctly, however, this update will need a few more weeks to propagate. The fix for this issue probably won’t be seen for another update, meaning late September. Our improvements should catch most of the issues, but there still could be a few cases we haven't addressed. If this happens, don't hesitate to let us know; we love feedback since it helps us improve and make our index even better!
The next step is to fix how our crawlers handle binary file links and prevent them from being improperly associated with certain domains. We are in the process of working through that issue right now. We’re doing everything we can to resolve this bug as we know it is alarming to see these phantom inbound links.Thanks for your patience!Carin