[flud-devel] [p2p-hackers] announcing Allmydata-Tahoe v0.3

Alen Peacock alenlpeacock at gmail.com
Tue Jun 12 21:27:17 PDT 2007


On 6/11/07, John Bäckstrand <sandos at home.se> wrote:
>
> This is very interesting. I have been looking for software that does
> this (the friends-backup use-case, that is) for a long time, but I never
> found anything that did what I wanted it to.
>
> It is still far from a perfect match though:

  Sounds like you are looking for something more like CrashPlan
(http://www.crashplan.com), perhaps?


> ... I guess its possible to setup a
> private cloud though. I was imagining a very simple system where I
> specify for each file stored how much availability I want it to have,
> minimum, and then just store it on that amount of nodes, no fancy FEC
> nor DHT at all. A good question of course is what happens when nodes go
> offline, but not a huge problem if you are actually using this together
> with a set of close friends.

  Theoretically, the simple replication you are talking about is just
FEC with an outrageous expansion factor and/or lowered reliability
(http://oceanstore.cs.berkeley.edu/publications/papers/pdf/erasure_iptps.pdf).

  I believe Tahoe uses an expansion factor of 4x (correct me if I'm
wrong zooko).  Suppose you have 8 friends who are willing to back up
your files.  For the same amount of space and bandwidth, you could
either use FEC and store bits of your file+parity on all 8 nodes, or
you could choose 4 nodes and store a complete copy on each.  In the
latter case, even when half of those friends are online, you may not
be able to retrieve your file (if they are the wrong 4 friends).  In
the former, you'll be able to recover your file even if only 2 (any
two!) of those friends are online.  If you are worried about
reliability and performance, the FEC route chosen by Tahoe seems
clearly better.


> ... I only care about having a few (2-10) mostly-trusted nodes,
> and not a whole lot about a DHT with the entire world which seems
> to be the point here: I feel both reliability and foremost performance
> will be much better in a smaller set of nodes with better connectivity.

  When you say "mostly-trusted nodes," what does that mean?  Do the
nodes have to belong to individuals who you personally know?  What if
you could find reliable nodes that are controlled by strangers, and
make them part of the set of nodes that you perform backup to?  Could
that really be any worse?  I mean, my best friend's internet
connection might be flaky, my mom's computer might be susceptible to
viruses, my computer at work might be squirreled away behind a
firewall, my brother might be prone to turn his computer off in the
evenings, etc.  Is it really any better to trust those computers than
it would be to find computers controlled by strangers who have
*demonstrably* reliable operation, and then harness enough of these so
that you are virtually guaranteed to be able to recover your data?

  The only way to determine reliability is to measure it directly. In
flŭd backup (http://www.flud.org), each node uses a localized trust
metric to determine reliability, and learns to prefer demonstrably
reliable nodes over time
(http://www.flud.org/wiki/index.php/LocalizedTrust).  Additionally,
flŭd treats storage resources as a type of currency, creating an
economic incentive for fairness and symmetry
(http://www.flud.org/wiki/index.php/Architecture#Storage_Layer).  I
believe that Tahoe uses some of these same techniques, but since I am
not intimately familiar, I'll let the Tahoe peeps address that.

  There's one more minus to using computers from people that you know:
they often exhibit poor geographic diversity.  It's a tired example, I
know, but if you happened to live on the Gulf Coast in 2005, and were
backing up mostly to other computers in the New Orleans region then
chances are that even an aggresive FEC scheme might not have helped
you...

Alen


More information about the flud-devel mailing list