[flud-devel] DHT justifications (was: DHT Performance? Design?)

Bill Broadley bill at broadley.org
Thu Nov 1 05:03:20 PDT 2007


Ah, I'm digesting your below comments, especially the 3 types of metadata and 
where it's stored, and the implications of particularly devious peers.  I have 
great respect for bit-torrent's tit for tat bandwidth exchange.

I also wanted to mention that erasure codes are of course ideal for adding
redundancy to a file that a computer can make use of to correct 
errors/ommisions, compression is a great way to reduce the overhead by 
removing redundancy in a file that is in a form that can't be used.  I haven't 
heard compression mentioned but is seems a natural.  Especially since normally 
the first recommendation before you encrypt something is to increase the 
entropy pet bit (and decrease the CPU required for encryption) with compression.

Before I discuss flud, I'll explain my thoughts on my preliminary design.  I 
was envisioning 2 kinds of peers, those with contracts, and those with out.

Peers with contracts:
* You regularly connect to
* You monitor and keep history of dealings with them
* You track uptime, response to previous challenges
* You might even manually introduce some peers, stacking the odds in
   your favor
* You might pick specifically because of network topology.  Maybe something
   related to hops.  I.e. don't want them in the same building, but don't want
   them on another continent either.
* the confidence in these peers would influence the redundancy you are happy
   with (irregardless of how trustworthy random peers are).
* ideally these are long term relationships that will help reduce the
   vulnerability to some hacker group, business entity, or government
   from flooding the network with evil peers.

Peers without contracts:
* have an arbitrary churn rate subject to flash crowds (i.e. slashdot
   posting)
* Connect via DHT for help finding contracts
* Potentially share blacklists and gossip about reputations
* potentially aid in direct connecting NAT/ip masq peers
* Potentially donate storage and bandwidth for low security files.  I.e.
   I have /usr/local/bin/google-earth, I don't care if folks know I have it,
   I'll send you asked for bytes even without a contract.  But I won't promise
   I have it tomorrow.
* at any given point a large number of DHT members could be evil bots, current
   number of DHT enabled storm bots at any one time is in the 500-1M range
   (with 1-5M in a week?)

So my intent was:
* allow disaster recovery with only your private/public keys and your
   contracted peers.
* Contracted peers are somewhat of a club, society, fraternity or religion.
   Interested in the members success and protection to a greater degree than
   other non-members.  Often involve initiation ceremonies, membership dues,
   probation periods, internships, licenses, and/or have laws.  Typically the
   main threat is expulsion (and the corresponding lack of benefits).  So sure
   you donate food to charities, and stack sandbags to help keep back the
   floods when a non-member is in need.  But for members you offer to pay their
   medical bills, help rebuild their house or provide them housing in a time
   of need.  Of course the flip side is the expectation of more help from your
   organization if you are in a time of need.
* membership with contracted peers increases the odds of short term return
   of favors.  It's relatively selfish.  It also insulates you from random
   folks that are selfish and/or abusive.
* Membership in any of the mentioned organizations tend to be fairly closed
   societies, low churn rate (because of the high cost of entry), and they
   also tend to gossip about each other.  This makes the penalties higher
   for misbehaving according to norms, you can get away with less, which tends
   to set the standards for behaving higher then the society at large.
* non-contracted peers are somewhat anonymous, not monitored, and interesting
   mostly for the potential to become members.
* DHTs are hard to trust.  You could ask who your peers were, or where a file
   was and even if you could check the result for a correct answer you don't
   know who to blame if it's the wrong one.  It could be the DHT peer that
   answered... or it could be anyone else with a malicious dht_put.  (unless
   the result was signed by you in the first place).

Now that write this all out is this.  The entire project is about getting your
filesystem back.  100% of that filesystem is stored on your contracted peers
(right?).  You HAVE to depend on your peers (or a percentage of them). 
Depending on anything else is just lowering your reliability.

Alen Peacock wrote:
> On Oct 31, 2007 12:32 AM, Bill Broadley <bill at broadley.org> wrote:
>> Hrm, so what makes the metadata in the DHT a good idea?
> 
>   This is driven mainly by the goal to do convergent storage

How does the DHT help convergent storage?  Filesystem data blocks are only 
stored on contracted peers... right?

> in an
> internet-scale backup system, where another major goal is complete
> decentralization.

I heartily agree on the decentralization

> If these are goals (as they are in flud), I know of
> no better mechanism than a DHT for allowing nodes to find data in a
> content-addressable manner.  I'll expand on this a bit below:

Ok.

>>  Storing potentially
>> fairly large (on average one peers metadata*12?) amount of data from folks you
>> don't have a vest interest in (nor they you).
> 
>   Ah, but rational nodes do have self-interest, and risk getting
> blacklisted if they misbehave.  If blacklisting proves insufficient,
> we can add more layers of trust/reputation/rumors/epidemics for the
> DHT layer.

Well if you do not have any contracts you don't have anything to lose.  What 
is easier to monitor and has a higher cost on defaulting? Acting as a perfect 
DHT node?  Or your performance on contracts? What is a bigger carrot?  Perfect 
DHT reputation or perfect contract? What is a bigger penalty?  Seems like any 
negative reputation can easily be escaped, blow away your keys and restart. 
But having 20 peers who are willing to trade big storage for long periods
of time is really worth something, and reinitialization (and loss of 
contracts) has a significant cost.

>>  Disaster recovery can be done via more direct methods.
> 
>   I admit that the method you described for allowing a node to
> discover its trading peers through those nodes' verification ops is
> very reasonable.  But there is one other goal that drives flud design
> which advises against it: attack resistence.

I heartily agree.

>  I subscribe to the view
> that one of p2p's greatest architectural strengths is that a
> decentralized application, if successful, will spawn many
> implementations, written by many different parties (ala gnutella,
> bittorrent, eD2k, etc), and any one of these implementations, if
> acting rationally, could/should seek maximum benefit for their
> clients, even to the detriment of other nodes.  Such actions are
> sometimes called rational attacks
> [http://www.flud.org/wiki/RelatedPapers#A_Taxonomy_of_Rational_Attacks]

I did want to mention that while the sybil attack observation about centrally 
controlled IDs preventing an attacker from having an arbitrary number of 
identities is true.  Keep in mind  that IP address allocations (cost and 
scarcity) and routing helps greatly to minimize this problem.  So while a p2p 
backup system doesn't need central login server, if you don't trust the DHT 
much (just for contracts and the like) it's easy to protect yourself.  A first 
step would be to insure that irregardless of the peers ID that you make sure
that none have contract peers have the same IP address.  Even better would be 
that they are on a different subnet.  Even better would be that they are do 
not share the last few hops with any other peer.  Even better would be to 
verify that the IP block is owned by a different entity.  So while a 
determined attacker might be able to get a huge number of clients to join the 
DHT, you can still pick peers with a maximum diversity in regards to many 
externally verifiable methods.

Maybe a PGP like web of trust would work.  I build 3 peers myself.  I add
3 friends/colleagues.  Then configure my trust my contract peers opinion
on anyone I don't know.  If you make my organization happy I offer you
better terms if you are good, and inflict damage if you misbehave.  This would
lead to incestuous groups of peers, but I'm not sure that is a bad thing.  If
say 100 peers all gossip about each other, offer each other great terms, and
don't connect with a single other peer.  Is the entire swarm diminished in any 
way?  Are those 100 peers any more vulnerable?

> and the best defense against them is to provide sufficient
> punishments/rewards so that a rational player would never attempt
> them.  The goal is to arrive at something similar to a Nash
> equilibrium.

Agreed.

>   For a node which has experienced a complete and catastrophic loss,
> discovering its old trading peers via those peers' verification ops
> seems vulnerable to the rational attack where a peer notices that one
> of its trading peers has begun failing verify ops, infers that the
> target node must have lost data, and then, realizing that the target
> no longer has anything to offer it (since it has lost not only its own
> data but the data of its partners), decides to purge the data that was
> stored for it.  Best of all for the rational attacker, the target has
> no way to know how much data has been purged; it only knows that the
> attacker was storing at least some data (because it tried some verify
> ops). If the attacker does something smart like, say, only purging 90%
> of that node's contracts, it may well appear that the attacker is
> functioning perfectly and has not lost data.

Interesting, I hadn't thought of that attack.  Seems pretty risky.  If
PeerA and PeerB have a contract for 10 days for 1GB.  Then PeerA stores
20GB of erasure coded data on 20 peers (including peerB).  Then peer A has a 
crash and declares it to all contractual peers.  I'd expect that a (small) 
file containing all contracts would already have been copied to all peers
(it is of course valuable data).  Additionally you need the block metadata 
stores somewhere (dht or not).  Seems like if peerB deletes 90% of the 
contracts with signed proof of said contacts that it could easily end up with 
a terrible reputation that could be externally verified. I.e. signed contract 
+ inconsistent signed claim said contract doesn't exist.

A more devious node might just claim a disk failure of it's own as soon as a
peer fails.  Althought that would not provide much advantage because your 
peers are just going to start storing immediately and burn more of your bandwidth.

>   Of course there are a number of countermeasures you can take that
> don't involve using a DHT, but can you think of one that is simpler
> and less brittle?  Or more resilient?

With each backup copy a file containing all your contracts to all your peers. 
  Disk failures happen and aren't ground for a contract violation, but lying 
about contracts is a serious violation and significantly decreases your 
reputation.  Seems perilous for a node to lie about a contract.

I'm not sure I see a way for a rational client to get significantly better
value.  Maybe once it looks like a full restore then declare your own
disk failure.  Although it going to result in the use of bandwidth as the
contract peer refills their contracted storage.

>  In flud, there is no way for a
> storing node to discover the sK for the file that a stored block
> belongs to, and thus no way for it to figure out which nodes in the
> DHT are storing the block metadata.  So even if it wanted to collude
> to purge that data as in the attack outlined above, it would not be
> able to figure out how to find the nodes that it needed to collude
> with.  If a node tries to purge data unilaterally, it will be
> discovered.
> 
>   This might sound overly paranoid, but that's the fun; starting out
> with the assumption that peers might be malicious or misbehave (by
> running software not written by the creator) is for me one of the
> hallmark differences between distributed systems and decentralized
> ones.

Indeed.  In my opinion, of all the systems to be paranoid about, backups is
#1.


>> Not like the metadata does you much good if your
>> peers reject your contracts/block stores.
> 
>   Block storage operations in flud occur before the block metadata is
> stored to the DHT.

Er, I didn't mean the order.  I just meant that irregardless of the cool, 
robust, elegant, efficient, uncensorable storage for metadata.  It's all
just so many useless bits without the storage blocks.  If the contracted
peers all deny you access to the storage blocks you are doomed.

I guess (at least so far) that there should be 2 alternate designs:
#1 all files and metadata are on contracted peers
#2 all files and metadata are stored in dht.

It just doesn't seem useful to split metadata (all 3 kinds) and data.  Pure
store is completely useless without metadata.  Pure metadata is useless 
without data.  Although I still have this haunting feeling that I'm missing
some advantage of metadata in the DHT helping convergent storage somehow.

>   Sorry, I think I've confused things with my references to
> 'metadata'.  In flud, there are three types of metadata:
> 
> 1) filesystem metadata (filename, perms, ownership, timestamps, EAs,
> etc).  This is erasure coded and stored along with blocks in the block
> storage layer
> 
> 2) block metadata, which contains the names of blocks belonging to a
> file (sha256) and where they are stored (nodeIDs).  This is stored in
> the DHT.
> 
> 3) master metadata.  This is the list of filekeys (sK) to filenames,
> with backup time.  This is occasionally stored as a regular file, with
> a special DHT entry indicating its location.

Very helpful.

> 
>   It provides a performance benefit, both for discovering if a file is
> already stored to the system

Hmm, this might be the piece I'm missing.  So if ubuntu gutsy i386 /bin/i386
/bin/login has a sha256 of foo.  I can say get_dht(file_sha:foo) and get back
a nodeID of a node that stores it for me at no cost to myself?  Do contracts
only apply to storage not already existing in the DHT?  How can I be assured
that data that is in the DHT today will be in the DHT tomorrow without a
contract?

>, but also for finding files without doing
> global queries to all nodes

You mean files that I haven't contracted for a node to store?

>, or even to all nodes with which a peer
> has entered into trading contracts (if such a list is available).
> But, you are right that since flud can operate correctly without it,
> it is redundant.

I'm all for decentralized, robust, attack resistant, paranoid, etc.  But
the cost of the redundancy (in bandwidth, cpu, or storage) should be offset
with a benefit.  I'm unclear of that benefit.  Utilizing "free" storage
without contract might be that benefit.

>   I think this is a very reasonable approach, but I wonder what your
> thoughts are re: malicious/misbehaving nodes during attempted restore?

You shouldn't trust a peer to keep a copy of your contract.  Just like
with paper, each party should keep a secure signed copy.  In this case
the obvious solution for disaster recovery is to store contracts at
least other peers (if not all peers).  A contract should basically be a
statement "X storage in exchange for Y storage for Z hours started
at this time" signed by the offerer, and then signed by the offeree.

I guess some small additional protection would be to never challenge, but
always restore.  That way you exercise the same code paths, and reduce the
information slightly about what you are doing.  I.e. is it a reputation
check or an actual restore?  Especially since grabbing a single file isn't
unusual for a backups system.  Of course if you ask for a file every 0.01
seconds it's kind of obvious after the first few 100 that you may well
be asking for all files.  But even then assuming that client can't get
a copy of your mutually signed contract seems rather risky.  If at that
moment you claim you don't have x GB (to save you from uploading) you
are likely to get x GB of downloads in the very near future.  The opportunity 
to use the x GB in the meantime (before the peer starts uploading to
you) seems of little practical value.


>   You can't choose your nodeID in flud: it is generated as the sha256
> hash of your public key.  And there is a hashcash component that
> requires a modest amount of computation every time period (probably
> bi-monthly), to discourage pre-compute attacks.  So unless you have an
> extraordinary amount of computational resources to do a very large
> number of key-generation / hash / hash-collision-searches, targetting
> parts of the DHT ID space will be rather futile.

Dunno, I guess eventually.  Say said attacker has at least 1 new $3k
server with a 10mbit internet connection and 8 CPUs.  They do a few DHT
queries to map out the peers around the ID they are interested in,
say a MP3 of a popular song.  They pick 8 IDs, sha256 the IDs and see
if they are closer then the existing peer.  With 1000 clients is
that going to take long?  Or are you assuming millions?  Not that
millions is completely unreasonable.  If 8.5 M people are on skype at
this very moment presumably enjoying free phone calls... why not 1M
for free backups.

> 
>> I think the only piece misssing is the cost/benefit to
>> putting the metadata again in the DHT.
> 
>   The DHT storage layer in flud ends up taking up only a small
> fraction of storage resources.

Hrm, guestimates.  So for the DHT there's a replication of 12
(currently)?

Each file has 20 blocks?

Block metadata =
   sha256 = 32 bytes
   nodeID = sha256(public_key) = 32 bytes
   seq  = 1 byte (I'm block N of 20 of the erasure coded blocks).

per file =
filename = max_pathlen, er 512chars?
timestamp = 4 bytes?
filekeys = sha256(filename) = 32 bytes

So master metadata = 12 (replication) x 350k (files) x ( 512  + 4 + 32) = 
2194MB = 2GB

Block metadata = 350k (files) * 20 (erasure coded blocks per file) * (32+32+1) 
  * 12 (replication) = 5207MB

So if I'm (350k files 150GB disk) an average user it looks like 7GB would be 
stored in the DHT?  So some user with a 100MB homedir on ADSL might end up 
storing many GB of DHT data?  Seems like that might reward someone to write
a flud-selfish that used ONLY the DHT, never attempted a contract, and
stored EVERYTHING in the DHT under the guise tracking metadata for fictitious
contracts.  I did find a p2p hackers thread ( I think started by you) that
did mentioned various dht protective measures, I'll track it down and read it.

>  Performance doesn't seem to be a
> problem, especially given the ability to do DHT ops in parallel as per
> kademlia's design, and to overlap them with other operations to the
> block storage layer.  But you have absolutely convinced me that I need
> to do some serious performance evaluation of this before I can change
> that to a statement of fact.

Nothing like hard data, but at last the storage requirements should be
easy to calculate.

>   I'm no stranger to being wrong, and won't hesitate to drop the DHT

I can't imagine a DHT not being useful.  I don't really see any other 
practical way to track all users, do peer introductions, bypass nat/ipmasq,
search for contract peers, find the IP address of peers, reputation, etc.
without a central server (evil).

Metadata on the other hand is less obvious to me, but I'm seeing additional 
signs in this message that some other things are going on with the dht, 
metadata, and storage provided outside of contracts.

> if it ends up being a problem and I can find another technique that
> can replace it.  That's the great thing about being able to call flud
> "experimental" software :)

Indeed, I'm quite excited that flud has started and these discussions have 
forced me to reevaluate all the ideas I've had in this area.  All the while
the number of man hours between my current design state and a working 
distributed client are shrinking because of improved design.  Even with those 
improvements it seems like a better return on investment to help flud as
the opportunities arise.





More information about the flud-devel mailing list