[flud-devel] private flud network and flud goals

Alen Peacock alenlpeacock at gmail.com
Wed Sep 5 11:30:56 PDT 2007


On 9/5/07, Stuart Langridge <sil at kryogenix.org> wrote:
>
> > This corresponds fairly closely
> > with the "group name" idea you mention -- as long as members know the
> > group name, they can participate in the group.
>
> ...and they know how to tunnel ports through their firewall. This is the
> big problem; the server isn't in my design because I want
> centralisation, it's there because the moment you mention "open port
> 4242 on your firewall and forward it to your flud server" you have lost,
> in my opinion. I was pretty serious about the "just start the program"
> thing.

  Yes, that's one caveat I didn't mention, and it is a big one.
Currently, we don't do any NAT traversal or hole punching, so users
will have to open up ports.  But I do plan on implementing such at
some point, likely through a STUNT approach (or, failing that, by
dropping down to UDP and using STUN/ICE, which can apparently solve
this issue for >90% of users behind firewalls/nats today).  We could
also investigate using UPnP (as Bittorrent and others do), but I'm
more leery of that approach.

  At any rate, this will be fixed one day, because you are absolutely
right -- it's too much trouble for the average user.  For now, the
type of user who would be interested in running flud will need to be
capable of setting up port forwarding etc.

  (STUN/STUNT does require semi-centralized relay servers, btw, but
these can be distributed as part of the generic flud node codebase).


> Note that you can't just provide a "private group ID" to
> potential members and have them type it in, either, without a server
> that dictates which flud nodes that provate group ID pertains to. I'm
> thinking here of a private group ID being something like
> "langridge-family", not a block of code listing lots of flud nodes! My
> use case is:
>
> 1. I say to my dad: install flud using Add/Remove Programs in Ubuntu
> 2. I say to my brother: download flud.exe from flud.org and run it
> 3. I say to both of them: run flud and type "langridge-family" and the
> password "ourSekr1tPassword" in the box
> 4. That's it.
>
> No port forwarding, I don't have to know which IP addresses their
> machines are on, we're not on the same network (we live hundreds of
> miles apart). If it can be that easy it's a huge win. I can't see how it
> can be made this easy without a central server, not to store the backups
> but to (a) coordinate access between different nodes, so you can just
> refer to a private flud network by name and (b) to proxy connections
> between two firewalled nodes.

  In order to join a flud network, you'll need to know the address of
at least one other node in that network.  Currently, this does need to
be entered manually.  This adds a step 3.5 to your list.

  In the future, we can use something very simple like distributed
gwebcaches to get nodes introduced to one another.

  Additionally, in the future, nodes will use discovery methods to
find others on their local network (bonjour style) as step #1, then
query gwebcaches for contacts as step #2.  Bonjour-style discovery
doesn't help your use case, but it does facilitate private flud
networks on, for example, a company LAN.

  Currently, the groupID is the secret password, and although you can
certainly use "langridge-family," it would be better to choose
something unguessable (groupIDs are hashed into the sha256 space,
which is unguessable, but only as unguessable as the input).

  For private flud networks, I lean towards the following:

1. You install flud and issue invitations to join your private flud
network from the GUI, which results in an email being sent.
2. The recipient of the email is told where they can download flud,
and given a block of text to cut-n-paste into the GUI on first run.
The block of text is opaque-looking, but contains at least the
groupID, and the IP address and nodeID of the sending node.
3. That's it!

  If the recipient's node can't connect directly to the IP address
sent in email, it connects to the flud network at large (via
bonjour/gwebcaches) and does a lookup for the nodeID, which should
allow it to connect even if the IP address changes.

  This needs to be thought out a bit more thoroughly (especially bits
about disjoint flud networks, non-transitivity, etc), but I think the
basic scheme is workable, despite what I'm about to say about private
flud networks in the next section, below.

  Do you think that would be easy enough for most users?


> > flud uses symmetric storage relationships among peers to enforce
> > fairness.  This corresponds to your statement "if you want to back up
> > N megabytes you have to offer 3N megabytes of space to the group."
> > [*2]
>
> There is a bit of a risk here, which I haven't managed to think of a
> solution to: imagine a private flud network between Alice, Bob, and Dr
> Evil. Alice wants to back up 1MB of data, Bob wants to back up 2MB of
> data, and Dr Evil wants to back up his BitTorrent downloads folder with
> 800GB of downloaded episodes of 60 Minutes and Santa Barbara. Alice and
> Bob will presumably have to allocate something like 150GB of disc space
> for backing up the rest of the network, even though each of them only
> want to back up a couple of Word documents and that's it. I can't think
> of a way around this other than to have Alice and Bob nail Dr Evil's
> head to a tree if he tries a trick like this.
>
> Note also that Dr Evil will take about nine hundred years to ship all
> his data over the network to be backed up, which might be a problem.

  All storage resources in flud must be traded symmetrically.  In this
scenario, Dr. Evil will fail to find enough storage resources unless
Alice and Bob have decided to be very generous, or have some
inclination for reserving vast amounts of future storage resources for
themselves (both unlikely, because by default, flud nodes won't do
either).  So while Alice and Bob will find plenty of offers for
trading storage resources and back up their data without problems, Dr.
Evil will get frustrated.

  The solution is to not use private flud networks at all, but instead
use the (not yet existent) public flud network.

  This is understandably a hard argument for many to swallow, but I'll
continue to make it: nodes are likely to find better trading partners
in a large anonymous network than they are in a small
private-but-trusted network.

  Even though you may trust all the individuals in your private
network, that doesn't mean that the computers that belong to their
flud nodes are reliable.  They may have connectivity issues because
their ISP sucks, or they might have viruses, or they might have crappy
hardware and insufficient extra storage/bandwidth/cpu with which to
service requests that *your* node issues.  Or, as you illustrated,
they may simply have asymmetric trading requirements.

  Trusting anonymous strangers seems scary, I know.  But this is what
the flud protocol was designed for: to enforce symmetry, to verify
that data that is claimed to be stored is really stored, etc.  And
it's what the localized trust system was built to monitor: that nodes
are reliable and don't cheat.  Combined with the tolerance for
failures that erasure coding gives us, nodes will be able to detect
cheaters/unreliable nodes and move data to more reliable locations
before any data is lost.  And over time, a node will anneal into a
state where it is trading mainly with reliable nodes (as perceived
locally).

  I think that storing data to the public flud network will be a
better proposition for most users than setting up private flud
networks among known entities, but do plan on supporting both for the
foreseeable future.

  As for the time it would take Dr. Evil to backup 800GB, that's an
issue that can only be solved by fatter pipes. [*1]  Either that, or
convince Dr. Evil that his Santa Barbara penchant is destructive to
his soul :)


> > Now, the caveats:  [..cut..]
> >
> > - currently, there is no differential backup for changed files. [..cut..]
>
> Yeah. This sort of thing is not that much of a problem if you're backing
> up small stuff like config files. It's also not much of an issue for
> large multimedia collections, since a photo or mp3 or movie don't
> *change* much once they're created. Differential backup within files
> isn't hugely important, IMO, as long as flud doesn't back up files it's
> already got, which is already sorted.

  Right.  For most files, this isn't a big deal.  But for db-type
files, it can be pretty painful, especially as the db grows (Outlook,
for example, stores all email in a db, which can grow to be many GBs
in size).  That's a problem that I think we have a good fix for, but
is not urgent.

  Alen

[*1: or by using techniques like those used by crashplan
(www.crashplan.com), where you can carry your laptop/harddrive to a
partner's node and do a local copy to get things started off. This
would conflict with flud's design goals, but is a very pragmatic
approach]




More information about the flud-devel mailing list