[flud-devel] storage overhead estimates and measurements

Alen Peacock alenlpeacock at gmail.com
Mon Mar 6 19:47:23 PST 2006


I'm posting these notes, taken in December, regarding overhead for storage.
This is a bit of a doozy for the initial post to the dev mailing list, but I
wanted to get this archived in a reasonable place:

--------------------------------------------------------------
Dec 4, 2005
Overhead = coding overhead + dht metadata overhead + master file overhead /
file

coding overhead = k+m/k * (original file size + pad)
dht metadata overhead = file metadata size * replication
file metadata size currently = 4581 bytes
master file overhead per file currently = 71 bytes

So, for the famous nrpy.pdf file, whose original size is 323038 bytes, we
get
file + pad = 323200
coding overhead = 20+20/20 * 323200 = 646400
dht metadata overhead = 4581 + 5 = 22905

so total consumed flud storage for this file = 646400 + 22905 + 71 =
669376, for a total storage consumption of 2.07 times the orginal file
(not bad!).

For smaller files the picture gets less rosy.
For a simple python script of 23 bytes, we get
file + pad = 240
coding overhead = 2 * 240 = 480

so total consumed flud storage for this file = 480 + 22905 + 67 =
23452, 1019.65 times the original file.  This is very bad.  Very very
bad.

So, if we want to stop the pain at, oh, say 3x consumption, we have to
do something else with files smaller than some size.  For 3x, a rough
estimate of that size is (rough because we'll ignore pad):

3x = 2x + 22905 + 71
3x = 2x + 22976
x = 22976

(with padding, this original file size gets us at 3.01x experimentally).

Which means that if we want to preserve the characteristic of 'no file
consumes more than 3x its original size,' we have to do something
special for files smaller than 23K.

Ideas:
 - Aggregate all files < 23K in a single directory, and treat as one file.
 - Don't code files < 23K.  Store X replicas instead.  In order for
this to have the same type of recoverability guarantee as larger
files, X would need to be more than 3, but certainly not more than k+m
(40), which is a huge improvement over 1000x.  Except that such a
scheme would require about the same amount of metadata, so we'd still
be at around 1000x.

--------------------------------------------------------------
Dec 4, 2005
> Ideas:
>   - Aggregate all files < 23K in a single directory, and treat as one
file.
>   - Don't code files < 23K.  Store X replicas instead.  In order for
> this to have the same type of recoverability guarantee as larger
> files, X would need to be more than 3, but certainly not more than k+m
> (40), which is a huge improvement over 1000x.  Except that such a
> scheme would require about the same amount of metadata, so we'd still
> be at around 1000x.

  - Use the same X hosts for replication of all small files.  This
way, only one part of the metadata needs to be large, and anytime a
small file is stored/retrieved, the same list of X hosts can be shared
(since the host list makes up the majority of the 23K dht metadata
overhead).  Each file would still need around 550 bytes of
file-specific metadata (path, ownership. timestamps, encryption key,
etc), but the rest of the 23K would be amortized over all small files.

--------------------------------------------------------------
Dec 5, 2005
 > > Ideas:
> >   - Aggregate all files < 23K in a single directory, and treat as one
file.

  After doing a preliminary study on my own home directory, this
option is out -- of all the dirs/subdirs in ~alen, the majority don't
have > 23K bytes even when all the small files in that directory are
combined together.  So this only helps in a fraction of the cases.

--------------------------------------------------------------
Dec 5, 2005
>   After doing a preliminary study on my own home directory, this
> option is out -- of all the dirs/subdirs in ~alen, the majority don't
> have > 23K bytes even when all the small files in that directory are
> combined together.  So this only helps in a fraction of the cases.

  As an aside, the total consumption of small files in my home dir is
just over 1MB.  I have a rather old, large dir structure, but I'd
guess this is not too far from typical.  Could assume concatenation of all
files to some granularity, but this has its own problems: any single
file update requires uploading the whole shebang.  what is the proper
granularity?  etc.

--------------------------------------------------------------
Dec 6, 2005
>   As an aside, the total consumption of small files in my home dir is
> just over 1MB.  I have a rather old, large dir structure, but I'd
> guess this is somewhat typical.  Could assume concatenation of all
> files to some granularity, but this has its own problems: any single
> file update requires uploading the whole shebang.  what is the proper
> granularity?  etc.

 I need to correct this:  actual size of all small files combined is
36.44 MB.  If I assume a coding overhead of about 100 bytes and dht
metadata overhead of 22976 (as before), total consumed for the lot,
stored individually just like big files, is 137.76 MB, for an average
inefficiency of only 3.8x!  This does vary by the mix of small files.
For example, in my public_html/icentris directory, it is 12x.  In
/flud, it is 5.37x.    In /public_html, it is 5.37.  in /.gconf it is
28.52x, (73k turns into just over 2MB -- lots of directories with
single files.  other similar directories like .gnome have similar
ratios).

I need to verify all this experimentally under current setup, but it
does seem like the cost isn't prohibitive.  I tend to think that my files
tend towars worst case, and an extra 2MB for all my data doesn't
seem to bad.

 --------------------------------------------------------------
Dec 6, 2005
Experimentally verified: a directory with 6 small files has du 136K.
After backing up, 400K total is consumed with dht metadata, and around
1.95MB with the data, yielding 2.3MB consumed total, for an
inefficiency ratio of 17.29x.  This is using 'du' to measure -- adding
up filesizes as reported by ls gives a slightly better ratio (due to
ext3 min block size of 8k, I believe).

Indeed.  All of the stored segments take up 8k, even though they are
smaller than that.  Taking a raw total of actual bytes consumed
results in only 315K being consumed.  Using the same measurement, the
original files took up 89.5K, for an inefficiency ration of 3.52!
Certainly acceptable.  One flaw: 5 of the original small files were on
the large end of the small file scale.  Only one was tiny (8 bytes).
I need to try this again on another dir with more smallish files.

(this was on ~/ABS/client/.deps, which looks like this:
-rw-rw-r--  1 alen alen 16809 Oct  7 14:36 CodedBlocks.Po
-rw-rw-r--  1 alen alen 18577 Oct  7 14:37 Coder.Po
-rw-rw-r--  1 alen alen 18519 Oct  7 14:38 Decoder.Po
-rw-rw-r--  1 alen alen     8 Oct  7 14:36 fileCoder.Po
-rw-rw-r--  1 alen alen 19155 Oct  8 12:24 fileDecoder.Po
-rw-rw-r--  1 alen alen 18611 Oct  7 14:38 testCoder.Po
)

> I need to try this again on another dir with more smallish files.

 Okay, here's the dir:

/home/alen/BitTorrent-4.0.4/images/:
total 92
-rw-r--r--  1 alen alen 7406 Aug 17 12:24 bittorrent.ico
-rw-r--r--  1 alen alen 1046 Aug 17 12:24 broken.png
-rw-r--r--  1 alen alen 1076 Aug 17 12:24 finished.png
-rw-r--r--  1 alen alen 1162 Aug 17 12:24 info.png
drwxr-xr-x  2 alen alen 4096 Dec  7 10:20 logo
-rw-r--r--  1 alen alen 1668 Aug 17 12:24 paused.png
-rw-r--r--  1 alen alen  450 Aug 17 12:24 pause.png
-rw-r--r--  1 alen alen  759 Aug 17 12:24 play.png
-rw-r--r--  1 alen alen 1650 Aug 17 12:24 queued.png
-rw-r--r--  1 alen alen  315 Aug 17 12:24 remove.png
-rw-r--r--  1 alen alen  799 Aug 17 12:24 running.png

/home/alen/BitTorrent-4.0.4/images/logo:
total 12
-rw-r--r--  1 alen alen 4326 Aug 17 12:24 bittorrent_32.png


'du' shows 28849 bytes original, manual addition of 'ls' listings
shows 20657 bytes

After store, dht metadata consumes 251955 (du and manual).
Coded blocks take up 45600 bytes.

Total is therefore 297555 bytes, for 14.4x inefficiency. (270.4K extra
was needed to store 20.17K).

This is much worse than the other experiment, but still within
acceptable range for an initial release.  Paying 15x for small files
shouldn't be a problem, unless the user has lots & lots of small
files.

--------------------------------------------------------------
Dec 6, 2005
So, analyzing my home dir, which is made up of 6832 files and takes
up about 432MB, 4968 of those files are small (73%).  But the small
files only take up a 10th of the total, at roughly 37MB.  Predicted
small file overhead is at 5.06x, large file at 2.10x, for a total
combined inefficiency of 2.35x.  Need to verify experimentally (but
these predicted numbers were tweaked with the above experiments, so
should be fairly close).

--------------------------------------------------------------
Jan, 2005
The current implementation eases minimum block size requirements by
aggregating 'store' requests to the same host.  This is done, currently,
by tar'ing the files together on the client side, and leaving them tar'ed
once received on the server side.  This only kicks in for files whose size
is less than a certain threshold.  Retrieve and Verify ops on the server
side were modified to search into these tar files.  The tar files are
named after their destination nodeID on the client, and renamed for
the originating nodeID on the server.  (This wouldn't be necessary for
filesystems that handle small files more efficiently, but we've got to
plan for lowest common denominator).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flud.org/pipermail/flud-devel_flud.org/attachments/20060306/aafbf7f1/attachment-0003.html>


More information about the flud-devel mailing list