A suggestion to help with linkscape crawling and data processing
-
Since you guys are understandably struggling with crawling and processing the sheer number of URLs and links, I came up with this idea:
In a similar way to how SETI@Home (is that still a thing? Google says yes: http://setiathome.ssl.berkeley.edu/) works, could SEOmoz use distributed computing amongst SEO moz users to help with the data processing? Would people be happy to offer up their idle processor time and (optionally) internet connections to get more accurate, broader data?
Are there enough users of the data to make distributed computing worthwhile?
Perhaps those who crunched the most data each month could receive moz points or a free month of Pro.
I have submitted this as a suggestion here:
http://seomoz.zendesk.com/entries/20458998-crowd-source-linkscape-data-processing-and-crawling-in-a-similar-way-to-seti-home -
Sean - I share Rand' sentiments, thanks so much for the suggestion!
We have considered distributed crawling in the past (or even distributed rank checking because then it would be in that user's locale) but there are a whole different set of challenges. For example, you have to handle all the edge cases: what if a user's computer isn't on, or loses connectivity, what if we crawl too fast and the user gets blocked from a site, how do you write all that data securely?
Of course all of these concerns can be overcome, but right now we feel like we have a good handle on the problems, and it will be much faster for us to just fix what we have
Although, I know all of us are so appreciative of the ideas and support, and we will have something really great soon!
-
Thanks a ton Sean! We have considered distributed computing as a way to help crawl, index, process, etc. It's so flattering and humbling to hear that you'd be willing to help out and that the community would, too
For now, we believe we can get to the index size/quality/freshness using our hosted system, but the engineering team will certainly be encouraged to hear that folks in our community might contribute to this. Distributed systems present their own challenges, and we'd have to write that code from scratch, but if we find that we can't do what we want with our existing network, we might reach out.
BTW - I wanted to let folks know that the team here does feel very confident that come December/January, we're going to be producing indices that reach exceptional quality bars. The problems we face are largely known, and we now have the team and the solutions to tackle it, so we're pretty excited.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why is Link Count smaller than Internal Links in Crawl Test report?
We recently ran the crawl test report and for most of our pages we are getting 1150 internal links but 40-50 as the link count. Why is there such a big disparity?
Moz Pro | | usdmseo0 -
Moz vs google data conflict?
Hi there, I am doing an SEO site audit for a client(giveaway, and here is the problem: when performing site:domain.com on google --> 13,800 pages were found When I see this number it seems to be a bit too much compare to the links i checked on integrity(link check for broken links) which gave me a result of 1291. I digged in more into the Google results and saw hundreds(maybe thousands) of pages that are blocked by robots.txt. So I am thinking, ok this is it, thousands of pages can't be crawled by the search engines. Here is the big BUT though, then I check at my moz crawl (see attachment) and no pages are blocked by the SEs, and then look at the dups, only 23 recorded?? Is Moz not crawling properly the 13,800 results that google finds or is this some magical phenomenon happening here? I am really confused here that is why I need some help here! Thank you guys! A990Hu4.png k842AOn.png
Moz Pro | | Ideas-Money-Art0 -
SEO Crawl Report Images?
Does SEOMOZ crawl images in the report? Raven tools is showing me about 200 missing alt tags and title tags. I can not seem to find any of this information on the SEOMOZ report. Am I missing something?
Moz Pro | | jasonsixtwo0 -
New on SEO Moz and need help regarding panda recover.
Hello Dear Members, I am a new member to not only seomoz but to the SEO World as well. I was learning and implementing SEO from the last 5,6 months and got my website on Google.com page 1. I was quite happy with all the results. I recently got adsense approved on my site and was quite happy with the earnings as well. Unfortunately the most recent Panda update has effected most of my SEO efforts I was doing from the last 5,6 months. My rankings are effected a lot and same with my earnings from adsense. My rankings are effected in this manner. A couple of keywords that were in top 3 spots (1st page Google.com) are now at 7,8 (1st page Google.com) position. There are 3,4 pages that were previously on 5,6 spot (1st page Google.com) are now on 15, 16 spot (2nd page Google.com). Now I want to ask what tools from seomoz can help me to regain my lost positions in Google? Secondly what type of strategy (backlinks) can help me to regain the lost rankings? Best Regards Sam
Moz Pro | | sampaul5490 -
Crawl Diagnostic | Starter Crawl taken 14hrs.. so far
We started a starter crawl 14hrs ago and it's still going, can anyone help on why this is taking so long, when it says '2 hrs' on the interface.. Thanks, Rory
Moz Pro | | RoryMacDonald0 -
Lots of site errors after last crawl....
Something interesting happened on the last update for my site on SEOmoz pro tools. For the last month or so the errors on my site were very low, then on the last update I had a huge spike in errors, warnings, and notices. I'm not sure if somehow I made a change to my site (without knowing it) and I caused all of these errors, or if it just took a few months to find all the errors on my site? My duplicate page content went from 0 to 45, my duplicate page titles went from 0 to 105, my 4xx (client error) went from 0 to 4, and my title missing or empty went from 0 to 3. On the warnings sections my missing meta description tag went form a hand full to 444. (most of these looking to be archive pages.) Down in the notices I have over 2000 that are blocked by meta robots, meta-robots nofollow, and Rel canonical. I didn't have any where near this many prior to the last update of my site. I just wanted to see what I need to do to clean this up, and figure out if I did something to cause all the errors. I'm assuming the red errors are the first things I need to clean up. Any help you guys can provide would be greatly appreciated. Also if you'd like me to post any additional information, please let me know and I'd be glad to.
Moz Pro | | NoahsDad0 -
SEOmoz application suggestion
I would like to suggest the ranking changes be broken into two categories, You and Competition, right now it seems everything is lumped in together... Screen shot attached. Thanks. A24tp.jpg
Moz Pro | | CouponCactus0 -
Scheduling crawls between certain time periods
Hi, today SEOMoz crawled our site and it interfered with an email campaign that we sent out and pretty much brought our site to a crawl (seoMoz even reported numerous 4XX errors). Is there a way to tell the crawler to only allow indexing between certain time periods?
Moz Pro | | RugsUSA0