[flud-devel] [p2p-hackers] announcing Allmydata-Tahoe v0.3

Alen Peacock alenlpeacock at gmail.com
Tue Jun 19 19:50:19 PDT 2007


  Sorry for the long lag in answering, but you do make some excellent
points that I'd like to respond to.


On 6/15/07, Ludovic Courtès <ludovic.courtes at laas.fr> wrote:
> Hi,
>
> "Alen Peacock" <alenlpeacock at gmail.com> writes:
>
> >  You are right regarding availability and coding, but in flŭd, where
> > nodes are free to choose trading partners based on locally observed
> > reliability, this is taken into account.
>
> However, at the time you choose partners, you haven't made any local
> observations yet, by definition.  Or do you use other user's
> observations as recommendations, as in reputation systems?

  At the time of initial partner choice, it is true that we haven't
yet made any observations.  One principle I've tried to stay close to
is that nodes should make decisions based on /locally/ observable
input -- I'm a fan of behavior-based approaches
(http://en.wikipedia.org/wiki/Behavior-based_robotics and
http://flud.org/blog/2006/02/23/out-of-control/).  This doesn't
entirely rule out reputation or gossip systems, but I do prefer to
avoid those solutions as long as there is a reasonable and simpler
answer that can use purely local information.

  In the case of flud backup, I think there is an easier answer.
Since the initial backup of one's data over today's broadband
connections can take many days, there is plenty of observation data
that a node will have access to before its first backup completes.
After initial backup completes, the node will continue monitoring its
storing partners and engaging new ones as necessary.  This means that
it may take some time for a node to stabilize its list of good storage
partners, but this doesn't seem problematic due to the redundancy that
erasure coding gives us.  And since nodes will be constantly
monitoring their stored files with VERIFY ops, they will be able to
ferret out unreliable nodes fairly quickly, re-storing just the chunks
of data that are lost to these.

  Of course, this basic scheme could later be augmented with
reputation or gossip systems.  In the future, it is my hope that a
diverse set of strategies are implemented among flud nodes.  One of
those strategies my very well involve reputations.


> > Nodes who are actively participating in a backup network have a
> > self-interest in remaining connected (or reachable) /continuously/.
>
> In practice, it could be the case that you don't want to leave your
> machine on 24/7, though.  Ideally, you'd like backup to somehow occur
> when the machine is on, but you'd prefer not to leave it on all day long
> "just" for the sake of backup.
>
> Now, I agree that this is hardly achievable in practice...

  I agree entirely, and eventually, flud will support less reliable
nodes.  We'll likely do that by simply allowing nodes to try and find
an affinity for other nodes with reliability similar to their own
(nodes with much better reliability would be unlikely to enter into
long-term storage relationships with these, so their best bet is to
find the best nodes that will) and then adjusting the encoding scheme
to make up for the lowered reliability.  The details are a bit more
complicated than that, but there's the general idea.

  Another (simpler) option is to allow less reliable nodes to proxy
through a more reliable node, e.g. the one computer in my home that
stays up all the time could proxy the backups for the other two
machines which have limited uptime.


> > In flŭd, there is an extra incentive for remaining available; a node
> > which does not remain available consistently will have a very hard
> > time finding partners willing to trade storage and bandwidth
> > resources.
>
> Right, but this might impede liveliness in a practical, large-scale open
> deployment.

  flud is targetted, initially, at computers that are already
connected 24x7.  I'm trying hard to remain committed to that narrow
niche until we've got it working beautifully there, then we can look
at implementing the infrastructure neccessary to allow less available
systems to work as well.


> FWIW, I also think that cooperative backup in closed networks, e.g.,
> among a group of pals, is likely to be more easily deployed and perhaps
> more trustworthy as well.  And if you choose backup peers in a
> close-enough time zone, each other's machine may be up at roughly the
> same time.  ;-)

  flud actually has supported this mode of operation since the
beginning, with secure private networks among pals
(http://www.flud.org/wiki/index.php/Architecture#Node_Identity).  But
I'm not convinced that this actually gives you more reliable or
trustworthy operation.  I'm not even sure it makes deployment easier.
Maybe that's just because not enough of my own friends are early
adopters of flud, and even fewer have the type of systems that would
provide good symmetry to my own (as my node looks for trading
partners).  It seems to me that opening myself up to the entire pool
of available flud nodes would give me more choices, and allow me to
find nodes that are both more reliable than those of my friends, as
well as more geographically diverse.

  But, there definitely may be scenarios where private flud networks
would be desirable, e.g., inside a business with a decent number of
nodes (ignoring, for the moment, the catastrophic loss that would be
caused by fire/flood/theft if all backups were performed inside a
single office).  I don't envision ever removing support for private
flud networks.

Alen


More information about the flud-devel mailing list