[flud-devel] DHT justifications (was: DHT Performance? Design?)

Alen Peacock alenlpeacock at gmail.com
Thu Nov 1 22:08:16 PDT 2007


On Nov 1, 2007 6:03 AM, Bill Broadley <bill at broadley.org> wrote:
>
> Before I discuss flud, I'll explain my thoughts on my preliminary design.  I
> was envisioning 2 kinds of peers, those with contracts, and those with out.

  The characteristics of your peers with/without contracts form a nice
list, and align quite closely with those of flud nodes.  In a lot of
ways, a flud node will have the same partnering/non-partnering node
dichotomy.  Most of the attributes you list for the system as a whole
are also shared by flud.  I think you've got a really solid starting
point.


> How does the DHT help convergent storage?  Filesystem data blocks are only
> stored on contracted peers... right?

  Let's say that a user in Ireland backs up the file '4 - Bubbly.mp3'
(today's #1 Amazon.mp3 download) to the system for the first time.
Now another user in Taiwan wants to backup the same file, but wants to
take advantage of convergent storage.  How does their client discover
if the file is already in the system?  They have to consult some sort
of global mapping of file storage keys to find the location[s].  There
are a number of ways to accomplish that using traditional centralized
services, but this is exactly what DHTs were designed to do in a
decentralized environment: map keys to values.

  In your design, you aren't offerring global convergent storage,
correct?  So the DHT doesn't fulfill this need for you.  In flud it
does.

  There are certainly benefits to not doing global convergent storage
(besides not needing a dht), and I'm sure you have thought of some of
them.  But there are also diminishing returns on doing convergent
storage at all as the size of a clique becomes smaller.  You can
counteract this by trying to form cliques via system similarity, but
that has its own disadvantages, such as susceptibility to common
vulnerabilities (contrast the Phoenix Recovery System to Pastiche).


> Maybe a PGP like web of trust would work.  I build 3 peers myself.  I add
> 3 friends/colleagues.  Then configure my trust my contract peers opinion
> on anyone I don't know.  If you make my organization happy I offer you
> better terms if you are good, and inflict damage if you misbehave.  This would
> lead to incestuous groups of peers, but I'm not sure that is a bad thing.  If
> say 100 peers all gossip about each other, offer each other great terms, and
> don't connect with a single other peer.  Is the entire swarm diminished in any
> way?  Are those 100 peers any more vulnerable?

  A web of trust might work.  I'm suspicious of how easy it seems to
game trust-webs/gossip/etc. systems, but find those techniques
alluring nonetheless.

  Are those 100 peers more vulnerable?  It depends.  If the 100 peers
lack geographic or system diversity, I'd say yes.  But if they have
enough diversity, I agree, their xenophobic tendencies are probably
fine, and it seems likely from your description that they would
interact with other nodes to some degree anyway.

  In flud, there is a similar concept of private flud networks,
wherein nodes interact exclusively with other nodes that share a
common secret (or maybe at some point, shamir secret shares).  I'm not
a big fan of that mode of operation, because I think that there are
big advantages to choosing interactions from a pool of as many peers
as possible, relying on trust metrics to let a node determine who is a
reliable trader and who is not, much as you described.  It's there
mainly for entities that might want complete control over their backup
resources.


> Interesting, I hadn't thought of that attack.  Seems pretty risky.  If
> PeerA and PeerB have a contract for 10 days for 1GB.  Then PeerA stores
> 20GB of erasure coded data on 20 peers (including peerB).  Then peer A has a
> crash and declares it to all contractual peers.  I'd expect that a (small)
> file containing all contracts would already have been copied to all peers
> (it is of course valuable data).  Additionally you need the block metadata
> stores somewhere (dht or not).  Seems like if peerB deletes 90% of the
> contracts with signed proof of said contacts that it could easily end up with
> a terrible reputation that could be externally verified. I.e. signed contract
> + inconsistent signed claim said contract doesn't exist.

  Right.  In both strategies (DHT | copies of contracts to peers), you
have to store some metadata on nodes other than the ones where you
store data, in order to be able to audit these contractual agreements
in the case of a catastrophic loss.  The DHT approach still seems very
simple and straightforward to me, largely because it formalizes where
records are going to be stored/found in the protocol itself.  If you
are going to instead place copies of contracts on secondary nodes, you
have to decide 1) a [deterministic?] strategy for placement, 2) a
strategy for replication, 3) define how often you rewrite these
meta-contract files (once for every 'complete' backup session?, once
for each file stored?, once per day?, etc) 4) a strategy for finding
those secondary copies of records such that you can locate them when
needed.  I imagine you've already settled on an overall scheme for
this, and will discover how to tweak the details as you go.


> With each backup copy a file containing all your contracts to all your peers.

  Where 'all your peers' > 12?

  This file experiences a high amount of churn, because it needs to be
overwritten each time a backup occurs, right?  E.g., suppose you've
backed up your 350k files, and now have 350k*replication_factor
contracts, which have been copied to all your peers in the contract
file.  Now, during the next backup, 10 files are modified/added -- do
you have to push the file again with all (350k+10)*replication_factor
entries to all peers?

  Maybe you can address this by having one contract file per file
backed up, and push it to all your peers?  Wouldn't it be convenient
to do this with a generic key-value storage system?  Maybe a smaller
DHT that is formed just of your peers?  But then, that's sort of
tricky, because unless your peers only backup to peers that you
backup, their DHTs will contain peers that don't exist in yours, and
yours in theirs...

  Maybe you can push only a subset of the contracts to each of the
other peers, such that you have full coverage + redundancy?

  Or maybe you simply live with the overhead of pushing that contract
manifest file to all peers each time, even on days when you've only
backed up one changed file?


> I guess (at least so far) that there should be 2 alternate designs:
> #1 all files and metadata are on contracted peers
> #2 all files and metadata are stored in dht.

  Yes, those are valid alternate designs.  When we did ABS, we did a
DHT-like system that ended up looking a lot like #1 in implementation.
 That had its shortcomings, but was sufficient for a LAN-scale p2p
backup system.

  #2 seems challenging.  A DHT that could store arbitrary amounts of
data in arbitrary forms would be ripe for all sorts of abuse.  And I
imagine performance would also be very challenging; every put/get op
could potentially consume resources (disk/bandwidth) for an extended
amount of time.


> Pure
> store is completely useless without metadata.  Pure metadata is useless
> without data.

  Absolutely true.


> Hmm, this might be the piece I'm missing.  So if ubuntu gutsy i386 /bin/i386
> /bin/login has a sha256 of foo.  I can say get_dht(file_sha:foo) and get back
> a nodeID of a node that stores it for me at no cost to myself?

  You can say get_dht(sha:foo) and get back the block metadata record
that tells you where the blocks of that file are stored.  You can then
contract with those nodes (if they are willing) to "store" those
blocks for you (without actually xferring them), do verify ops on
them, and/or re-store blocks to new nodes and update the block
metadata record.


>  Do contracts
> only apply to storage not already existing in the DHT?  How can I be assured
> that data that is in the DHT today will be in the DHT tomorrow without a
> contract?

  You still need a contract.  A node that stores popular content must
be paid for the increase in bandwidth/ops that the new storing node
will incur, and all blocks are stored with reference lists for
accounting purposes including garbage collection (so if a node doesn't
do this, it may find the blocks deleted out from under it).


> Dunno, I guess eventually.  Say said attacker has at least 1 new $3k
> server with a 10mbit internet connection and 8 CPUs.  They do a few DHT
> queries to map out the peers around the ID they are interested in,
> say a MP3 of a popular song.  They pick 8 IDs, sha256 the IDs and see
> if they are closer then the existing peer.  With 1000 clients is
> that going to take long?  Or are you assuming millions?  Not that
> millions is completely unreasonable.  If 8.5 M people are on skype at
> this very moment presumably enjoying free phone calls... why not 1M
> for free backups.

  Generating random sha256s is useless in flud without having the
corresponding public/private keypair from which the sha256 nodeID is
supposed to be generated[*1] (public/private keygeneration is not
cheap).  Finding partial hash collisions ala
Hashcash/Herbivore-Join-Op is even more expensive.

  And yes, when the network is small, the attack is easier. But when
the network is small, it isn't a very appealing target for attack.
And when it is large, it is a difficult target to attack.  Kademlia's
preference for older nodes also helps here, especially since nodes in
a backup dht would be incentivized to stay online [nearly permanently]
instead of constantly churn (as in filesharing apps).

  [*1] All of the storage-layer ops in flud require the initiator to
do a challenge-response using the private key.


> So if I'm (350k files 150GB disk) an average user it looks like 7GB would be
> stored in the DHT?  So some user with a 100MB homedir on ADSL might end up
> storing many GB of DHT data?  Seems like that might reward someone to write
> a flud-selfish that used ONLY the DHT, never attempted a contract, and
> stored EVERYTHING in the DHT under the guise tracking metadata for fictitious
> contracts.  I did find a p2p hackers thread ( I think started by you) that
> did mentioned various dht protective measures, I'll track it down and read it.

  Yeah, there is a thread (or two) somewhere on this.  Basic summary
is that it isn't that hard to prevent this type of DHT abuse by
requiring *all* data in the DHT to be stored transparently.  This
allows nodes to immediately reject data that doesn't conform to the
strict format of correct data layout, and also allows nodes to audit
the DHT data to confirm that it actually points to valid block data.
Violating any of this results in getting blacklisted and having the
DHT records purged.

  In other words, flud's DHT is not a generic key-data store.  It is a
highly constrained and auditable key-value store.

  A node would get much better and higher-throughput data storage
resources by simply storing files according to the protocol, so this
type of attack is only useful to an irrational malicious node anyway.


> Indeed, I'm quite excited that flud has started and these discussions have
> forced me to reevaluate all the ideas I've had in this area.  All the while
> the number of man hours between my current design state and a working
> distributed client are shrinking because of improved design.  Even with those
> improvements it seems like a better return on investment to help flud as
> the opportunities arise.

  This has helped me reevaluate, too.  I am of course prone to defend
the existing design or at least rationalize the decisions made to
arrive at it, but that's all for the sake of vetting it.  I am very
intrigued by the areas where your thoughts differ from mine, and I
think you've hit on some very promising ideas which I'd very much like
to see tried out.  I'm also pleased to see that we have *a lot* of
overlap in how we think this should be accomplished -- that's very
nice validation.

  Alen




More information about the flud-devel mailing list