[flud-devel] DHT justifications (was: DHT Performance? Design?)

Alen Peacock alenlpeacock at gmail.com
Sun Nov 4 00:20:37 PDT 2007

On Nov 2, 2007 5:03 AM, Bill Broadley <bill at broadley.org> wrote:
> Thoughts on peers being optionally generous?  Allowing non-contracted
> with downloads for blocks they stored?

  I lean heavily towards only allowing nodes that own content to
download it.  This prevents the system from devolving into a generic
filesharing application, where users can pass around sK to share
files.  It also prevents malicious parties from downloading content
that they don't have posession of in the first place.  In order to
keep things simple, I also lean towards symmetric trading
relationships (1byte for 1byte), but there is nothing that would
prevent a node from offering disproportionate trades, and that might
be something that a node with popular content would want to do.

> A few things I can think of with flud:
> * Tons of small contracts could be painful.  Say someone need a block you
>    has stored, it's 5MB, they offer a 5MB contract.  Now you want to store
>    6MB... now you ask for a new contract?  Seems like you could have
>    substantial storage overhead with tons of different size pieces (on average
>    1/20th of the erasure coded  filesize)

  True, a contract-per-file granularity is probably too fine-grained.
In flud, "contracts" are very temporary, and should be replaced by
Samsara-style unforgeable claims before expiring, i.e, contract
lifetime should usually be measured in hours or less.  I am also in
agreement that nodes should contract for more-than-current-file-size,
and that there should be some sort of [dynamic?] floor value for
contract/claim size.

> * Selfish peers will try to store the most popular blocks so they can get
>    tons of contracts for only the cost of storing a block once.  Not entirely
>    sure if this is a bad thing or a good thing.  I'm leaning towards bad.
>    Peers might only say accept contracts to store a file they think they
>    can make a substantial profit on.

  The storer chooses where to store data in flud, so selfish peers
can't elect to do so themselves, except through collusion.  Two other
things to keep in mind here: 1) storing popular content causes extra
bandwidth and IO costs to the storer, since all the requesting nodes
do verify ops constantly, and the frequency of restore increases, and
2) storing nodes can choose to store popular content elsewhere than
where it is already stored, so even if a node colludes to be the first
storer of some popular file, it doesn't mean that nodes won't learn to
avoid it as a storer later.  Some variant of #2 allows the system to
avoid hotspots in the storage layer.

> The 2 I can think of is limitations of files per second that can
> be queried from a DHT, and the ratio of metadata in the DHT vs file size.

  Mainly because we can pipeline things with lots of ops happening
asynchronously, I'm not too worried about the files-per-sec issue (but
again, I need to get some perf numbers).
  The ratio of meta/file seems quite acceptable, at least on synthetic
workloads I've been pushing through flud (I'll have to dig up some
numbers... thought I'd posted some to the list in late '05 / early
'06).  Of course, this falls apart a bit if you push pathological
cases, such as a user who backs up 1GB of data composed entirely of 1B

> I've made significant progress on understanding the flud DHT... although
> I was curious.  Say you find 10MB on a peer that you want to insure stays
> around.  You propose a contract with that peer.  Does that contract say:
> Any 10MB I want for 1 week in exchange for any 10MB you want?

  Yes.  Except that, as I mentioned above, a contract as long as 1
week in flud seems a bit long -- the contract only needs to exist for
as long as it takes a node to backup some data (if it has some ready)
or an unforgeable claim that can be replaced with valid data later (if
it does not yet have data ready).  Once data is backed up, the storing
node is expected to keep it around forever.

  Well, that's not exactly true... It will keep it around forever
where "forever" has some decay, according to frequency of
verify/store/retrieve operations.  A block that hasn't been 'touched'
in a very long time can be deleted with some probability.  This is
okay, because flud is not an archive service, it is a mirror of data
backed up from another computer, and that other computer has to
maintain the backup copy through these occasional 'pings' or expect it
to be eventually purged.

  There is also an explicit 'delete' op, some variant of which is used
to obtain another temporary contract when a node wishes to exchange a
claim for real data.

> Say 60 contracts to 60 peers.  I'm not sure how flud saves the peers (on
> average) disk space, I get how the DHT finds you a peer with the storage, but
> how does the contract reflect savings in the finding of a already stored file?

  Disk space is one scarce resource.  The other is bandwidth.
Convergent storage saves the requestor the latter, and the storer the
former.  Figuring out which is more scarce is difficult, because it
varies among users.  I would venture to say that today, for most
dsl/cable users, upload bandwidth might be even more constrained than
free disk space (certainly true in my own case).  If someone can save
the 2 days in upload time that it takes to store their mp3 collection,
that's an attractive deal.

  Convergent storage can also be useful for a user's unique content,
in the case where they are imaging a new machine but with all the
content copied from the old.  Or giving the backup software a second
try (maybe after an OS upgrade, etc.).  To tell you the truth. I think
this is actually where convergent storage comes in most handy, and
where users end up seeing most *actual* benefit.

> Or the much more suitable have a DHT for
> contracted peers.  Hrm, even uglier, since my peers can have contracts
> with peers I don't know there can't be a dht of only the peers I have
> contracts with.

  Occasionally, I get the crazy feeling that some sort of namespaced
DHT scheme could work for these types of cases.  And then I disgard
the idea because it seems unworkable.  I'm not aware of anyone
pursuing it, could be an interesting diversion...

> Hrm, maybe that's not so bad:
> [clip algo summary]
> Seems reasonable.  Basically all contracted peers are queried for the slices,
> any slices found are registered.  Any left over slices are stored on peers
> until they all fill up.  After that new peers are found.

  I really can't think of anything wrong with this approach at all
right now.  I'm eager to hear how it works in practice, problems you
encounter along the way, etc.

> Seems like with a large flud swam (say 100k peers)
> for slice in 2000 :
>      peer=dht_get(slice.sha256())

should be 'block_metadata=dht_get(slice.sha256())'

>      #wait approximately 5-10 hops each the latency of crossing 0.5 of the
>      #internet


>      if (peer) # someone has the block

should be 'if (block_metadata)' and if so, the node can decide to
store its blocks on the already present nodes, or store to others of
its own choosing.  if block_metadata doesn't exist, it can only do the
former, of course.

and then, before the next chunk of code, we use that list of nodes as 'peer'

>          if (peer.contract(peer)) # do we have a contract?
>              if (peer.contract.spaceleft(peer)>slice.size #is it big enough?
>                  register_slice(slice,peer)
>          else # no contract
>              if (request_contract(peer,slice.size) # can we have one
>                 if (peer.contract.spaceleft(peer)>slice.size) # we have one
>                    register_slice(slice,peer) #register
>                    slice=null

instead of checking for contracts, we are checking our own claim
accounting to see if we can trade in existing claims, or we are asking
for new trading relationships (short term contracts that will almost
immediately be turned into storage/claims).

>      else
>          peer = list_of_contracted_peers
>          while (slice not null and peer not null)
>             if (peer.contract.spaceleft(peer)>slice.size) # have contract
>                  if (store_slice(slice,peer)) # store it
>                      slice=null
>             peer=peer.next #no contract = try next peer
>          while (slice) # peer list exhausted, still have slice.
>              if (peer == null)
>                  peer=get_new_peer()
>              if (request_contract(peer,slice.size)
>                  if (store_slice(slice,peer)
>                      slice=null

the replacement for this block is handled by my last paragraph.

> How many bytes does a contract take?  A few integers, a time stamp, 2
> nodeIDs, and then a double signature (one per peer) with 2048 bit public keys?

contracts are soft state and short-lived.  claims take up the amount
of disk storage traded for, with a few extra bytes that contain the
owner's nodeID.

> Agreed, the missing pieces definitely had me thinking, why the hell would
> you do that.... but now I see at least some of the reasons.  Unfortunately now
> I have 2 different ideas without a clear winner.  Time for a long dog walk
> to ponder.

  My 2 cents: stick with your general design and tweak the details as
you go.  I really would like to see where you end up with this.  Of
course, if you do come to the conclusion that flud's design is the
best possible design in the universe EVAR (well, or "mostly right,
anyhow"), I'd much rather you pitch in and help out in speeding up its
progression than start from scratch ;)

  Speaking of starting from scratch, if we end up having at least some
components in common, that'd be really cool too...


More information about the flud-devel mailing list