A suggestion to help with linkscape crawling and data processing
-
Since you guys are understandably struggling with crawling and processing the sheer number of URLs and links, I came up with this idea:
In a similar way to how SETI@Home (is that still a thing? Google says yes: http://setiathome.ssl.berkeley.edu/) works, could SEOmoz use distributed computing amongst SEO moz users to help with the data processing? Would people be happy to offer up their idle processor time and (optionally) internet connections to get more accurate, broader data?
Are there enough users of the data to make distributed computing worthwhile?
Perhaps those who crunched the most data each month could receive moz points or a free month of Pro.
I have submitted this as a suggestion here:
http://seomoz.zendesk.com/entries/20458998-crowd-source-linkscape-data-processing-and-crawling-in-a-similar-way-to-seti-home -
Sean - I share Rand' sentiments, thanks so much for the suggestion!
We have considered distributed crawling in the past (or even distributed rank checking because then it would be in that user's locale) but there are a whole different set of challenges. For example, you have to handle all the edge cases: what if a user's computer isn't on, or loses connectivity, what if we crawl too fast and the user gets blocked from a site, how do you write all that data securely?
Of course all of these concerns can be overcome, but right now we feel like we have a good handle on the problems, and it will be much faster for us to just fix what we have
Although, I know all of us are so appreciative of the ideas and support, and we will have something really great soon!
-
Thanks a ton Sean! We have considered distributed computing as a way to help crawl, index, process, etc. It's so flattering and humbling to hear that you'd be willing to help out and that the community would, too
For now, we believe we can get to the index size/quality/freshness using our hosted system, but the engineering team will certainly be encouraged to hear that folks in our community might contribute to this. Distributed systems present their own challenges, and we'd have to write that code from scratch, but if we find that we can't do what we want with our existing network, we might reach out.
BTW - I wanted to let folks know that the team here does feel very confident that come December/January, we're going to be producing indices that reach exceptional quality bars. The problems we face are largely known, and we now have the team and the solutions to tackle it, so we're pretty excited.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Strange "?offset" URL found with content crawl issues
I recently recieved a slew of content crawl issues via Moz for URL's that I have never seen before For example:
Moz Pro | | HannahPalamara
Standard URL: https://skilldirector.com/news,
Newly identified URL: https://skilldirector.com/news?offset=1469542207800&category=Competency+Management). Does anyone know where the URL comes from and how to fix it?0 -
Crawl Errors and Notices drop to zero
Hi all, After setting up a campaign in Moz the crawl is successful and it showed the Errors and Warnings in crawl diagnostics (each one had about 40-50), but after a few days the number dropped to zero. Only the "notices" seems to stay normal, with a slight drop since the campaign set up, but not dropping to zero. I set this campaign up in a colleague's account and the same thing happened shortly after set up. I didn't find any Q&A already posted so any insight is appreciated!
Moz Pro | | Vanessa120 -
The crawl report shows a lot of 404 errors
They are inactive products, and I can't find any active links to these product pages. How can I tell where the crawler found the links?
Moz Pro | | shopwcs0 -
Why does SEOMoz only crawl 1 page of my site?
My site is: www.thetravelingdutchman.com. It has quite a few pages, but for some reason SEOMoz only crawls one. Please advise. Thanks, Jasper
Moz Pro | | Japking0 -
Campaign crawl re - schedule
Hello, On the last crawl of a website of mine, seomoz pointed out about 1500 errors (ouch!) on my site. I have made some corrections and i just want to see if they are at the right way but the next crawl is in a week. Is there any way so i can force a crawl before the scheduled date? Thanks!
Moz Pro | | Tz_Seo0 -
Campaign status stock in status "Next Crawl in Progress!
Has anyone else had an issue laetly where the campaign status was stock in _ **"Next Crawl in Progress!" **_? One of our campaigns has been in this status for the page 2 1/2 days and this has not happened in the past as there are only 597 pages for this campaign to crawl. I send a help ticket request to the SEOMOZ team but was wondering if this is an isolated issue or if other community members have also experienced it? Thanks.
Moz Pro | | DRTBA0 -
Crawl Rate for Lower Page Authority Websites
Hi,At thumbtack.com we get tons of links from low (or no) page authority websites, and I'm wondering what the crawl rate of those links looks like. I know Google pulls in the web at an astonishing rate, but I'd imagine they aren't re-crawling lower PA very frequently.Are they discovering these links a week after they're posted? A month? More? I spent a while looking around for histograms of actual crawl rates and found surprisingly little. I'd love to see average crawl rate by Domain or Page Authority if that exists anywhere.
Moz Pro | | Thumbtack
Thanks!-MichaelP.S. Here are some random examples of the types of pages with inbound links I'm talking about. Normally we wouldn't spend too much time thinking about these, but there's just so many of them we can't ignore it!- http://www.majestic-cleaners.webs.com/- http://domchieraphotography.blogspot.com/- http://charlottepiano.musicteachershelper.com/- http://pin-upgirlphotography.vpweb.com/default.html- http://jfaithful.weebly.com/0