[flud-devel] DHT justifications (was: DHT Performance? Design?)

Bill Broadley bill at broadley.org
Wed Nov 14 00:16:00 PST 2007


Alen Peacock wrote:
> http://www.flud.org/wiki/Architecture#Versioning

Ah, reading, noticed a reference to ABS(which I just read a fair bit of), 
thanks.  IMO, even without clever delta compression techniques providing what 
looks to be like an large number of snapshots is fairly cheap.

As far as your advantages of the mirror scheme:
#1 simplicity - can't argue there, seems pretty small compared to all of Flud
                 though.
#2 preservation of resource trade symmetry - I don't get this one, it doesn't
          seem practical to attack, and if successful your reputation is going
          to suffer.  If I send you 1 million encrypted blocks and challenge 

          100 a day which ones do you delete?  Seems like even with perfect
          analysis you are unlikely to score high enough to deny me old
          versions and not damage your reputation (and make me pick a new
          peer).
#3 storage consumption - certainly I'd want a peer to decide how many (if
          any) versions that would want to keep.
#4 decoupling - agreed, although not sure if that should be different layers
          in flud or different layers outside of flud.

I guess it's a question of scope.  If you edited your bookmarks, source code, 
documents, or email last night and at midnight flud ran, then today you 
realized that a user error, os error, application error, disk error or 
something lead to a corrupted file being backed up I'd be very unhappy to hear 
that I couldn't get my work back and that I should use a version system for my 
bookmarks, source code, documents, email or whatever.

All backup systems I know of (I'm most familiar with backuppc and amanda) 
allow recover from this kind of scenario.  The most common mirror I know of is 
a RAID which usually right at the top of the documentation says "THIS IS NOT A 
REPLACEMENT FOR BACKUPS".  Granted FLUD is much more robust than a disk RAID 
(because of the coding and the geographic diversity), but still many of the
failure modes are the same (like user error).

Seems like 99% of what is needed for versioning is present, record of the
metadata, file <->block mapping.  Mostly instead of saying:
  #1 /home/user/src/project/foo.c has been changed or created since last backup
  #2 encrypt and erasure code foo.c
  #3 distribute to peers
  #4 if (file at path changed) # not a new file at that path)
  #5      delete old /home/user/src/project/foo.c

To:
  #1 /home/user/src/project/foo.c has changed
  #2 encrypt and erasure code foo.c
  #3 distribute to peers
  #4 x = already backed up copies of path (/home/user/src/project/foo.c)
  #5 if (x>policy.desired_versions)
      delete oldest /home/user/src/project/foo.c

BTW, ZFS is designed to be copy on write, and they make a case that 
maintaining versions is actually easier than not.  It's much like convergent 
  storage actually (since you never update a block, it's always the same 
contents for the same checksum).  It doesn't seem that great an idea to me
since every file update seems to propagate changes in a block and then
propagates up the tree to /.  Not something I want to do with every write,
chmod, or close.  It does seem pretty attractive for backups since I'd imagine
most people would do run the backup once a day, distribute blocks to peers, 
then want to update the metadata and update those to peers.  Then leave it
to garbage collection to free up the blocks for files that are too old and/or 
have too many replications.

Worth a quick view, in particular pages 7 and 8:
http://www.opensolaris.org/os/community/zfs/docs/zfs_last.pdf




More information about the flud-devel mailing list