A suggestion to help with linkscape crawling and data processing
-
Since you guys are understandably struggling with crawling and processing the sheer number of URLs and links, I came up with this idea:
In a similar way to how SETI@Home (is that still a thing? Google says yes: http://setiathome.ssl.berkeley.edu/) works, could SEOmoz use distributed computing amongst SEO moz users to help with the data processing? Would people be happy to offer up their idle processor time and (optionally) internet connections to get more accurate, broader data?
Are there enough users of the data to make distributed computing worthwhile?
Perhaps those who crunched the most data each month could receive moz points or a free month of Pro.
I have submitted this as a suggestion here:
http://seomoz.zendesk.com/entries/20458998-crowd-source-linkscape-data-processing-and-crawling-in-a-similar-way-to-seti-home -
Sean - I share Rand' sentiments, thanks so much for the suggestion!
We have considered distributed crawling in the past (or even distributed rank checking because then it would be in that user's locale) but there are a whole different set of challenges. For example, you have to handle all the edge cases: what if a user's computer isn't on, or loses connectivity, what if we crawl too fast and the user gets blocked from a site, how do you write all that data securely?
Of course all of these concerns can be overcome, but right now we feel like we have a good handle on the problems, and it will be much faster for us to just fix what we have
Although, I know all of us are so appreciative of the ideas and support, and we will have something really great soon!
-
Thanks a ton Sean! We have considered distributed computing as a way to help crawl, index, process, etc. It's so flattering and humbling to hear that you'd be willing to help out and that the community would, too
For now, we believe we can get to the index size/quality/freshness using our hosted system, but the engineering team will certainly be encouraged to hear that folks in our community might contribute to this. Distributed systems present their own challenges, and we'd have to write that code from scratch, but if we find that we can't do what we want with our existing network, we might reach out.
BTW - I wanted to let folks know that the team here does feel very confident that come December/January, we're going to be producing indices that reach exceptional quality bars. The problems we face are largely known, and we now have the team and the solutions to tackle it, so we're pretty excited.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl Diagnostics 403 on home page...
In the crawl diagnostics it says oursite.com/ has a 403. doesn't say what's causing it but mentions no robots.txt. There is a robots.txt and I see no problems. How can I find out more information about this error?
Moz Pro | | martJ0 -
Crawl Diagnostics - unexpected results
I received my first Crawl Diagnostics report last night on my dynamic ecommerce site. It showed errors on generated URLs which simply are not produced anywhere when running on my live site. Only when running on my local development server. It appears that the Crawler doesn't think that it's running on the live site. For example http://www.nordichouse.co.uk/candlestick-centrepiece-p-1140.html will go to a Product Not Found page, and therefore Duplicate Content errors are produced. Running http://www.nhlocal.co.uk/candlestick-centrepiece-p-1140.html produces the correct product page and not a Product Not Found page Any thoughts?
Moz Pro | | nordichouse0 -
Linkscape update
I noticed the scheduled linkscape update was on August 1. My link report hasn't been updated since July 6. Did the index update occur on Aug. 1? If not when is it expected to occur? thanks
Moz Pro | | larahill0 -
SEOmoz Dashboard Report: Crawl Diagnostic Summary
Hi there, I'm noticing that the total errors for our website has been going up and down drastically almost every other week. 4 weeks ago there were over 10,000 errors. 2 weeks ago there were barely 1,000 errors. Today I'm noticing it's back to over 12,000 errors. It says the majority of the errors are from duplicate page content & page title. We haven't made any changes to the titles or the content. Some insight and explanation for this would be much appreciated. Thanks, Gemma
Moz Pro | | RBA1 -
I have corrected the Problems in Crawl Diagnostics. When would it refresh/ re-crawl my site ?
I have corrected most of the problems shown in crawl diagnostics and changed the meta desc. , titles etc. When will SEOMOZ recrawl those pages and show that Its correct now ?
Moz Pro | | VarunBansal0 -
Crawl Test produced only 1 page
Hi, I recently submitted a crawl for www.cirrato.com using SEOMoz Crawl Test Tool. I have a lot of pages, but the crawl result shows only 1 page, which is the front page and nothing else... Does anyone know what this could mean or what the problem is?
Moz Pro | | yusufcirrato0 -
Hosting Reviews and Suggestions
I have been going round in circles trying to find a Hostings company with a good reputation and service etc... For ever one you find you find reviews sayings its shocking... I was planning to take a straw poll using the vote up and down function to try and spot good ones. I will add the ones I know about or use/tried. If you like just vote up. Any more suggestions for companies please add them with your views. Try and reply to the right hostings companies with reviews. Not sure if this is suitable for SEOMoz I apologise if this is not and offends anyone.
Moz Pro | | JohnW-UK0 -
When do I see the new Linkscape?
Hi I have seen a Tweet and item on the latest news saying that the Linkscape has been updated. I'm still seeing a report generated at the start of May though. What do I need to do to get my grubby hands on the latest data? Chris
Moz Pro | | P4D0