[flud-devel] filesystem analysis proposal

Bill Broadley bill at broadley.org
Wed Nov 14 20:30:20 PST 2007

Alen Peacock wrote:
> Bill, I think this is an excellent idea, and would be very useful to
> folks working on a variety of different projects.  It would certainly
> be invaluable to flud.  I'm guessing tahoe might be interested, too
> (although they might already have statistics gathered from allmydata
> users).  We could ping a couple of other mailing lists, too.  I can't
> imagine that anyone working on any sort of distributed file store
> would not be game for submitting some numbers.

Cool.  The only thing I'm not sure how to approach is the size of the rsync
differences.  The ABS paper mentions (in 4.1.1) generating an rsync basis file 
which is a compact, hash-based representation of a file that supports fast, 
file block comparisons between two file versions.  Is that available via some
flag (I browsed the rsync manpage without finding anything obvious), or is 
that maybe an librsync function?

> I'm not aware of any published studies since the 1999/2000
> Bolosky/Deuceur papers, and I agree that things have likely changed in
> the last 8 years.
> Let me know what I can do to help you out with this.

Except for the basis file I think I can handle it, I have a host that
can act as the server to crunch the resulting checksums and summaries.
The rest of the code should be pretty similar to other tools I've written.

