[flud-devel] opaque data and non-auditability
alenlpeacock at gmail.com
Thu Jun 7 20:56:43 PDT 2007
I've been really worrying about a potential problem regarding opaque
metadata stored in flŭd's DHT for the past week or so, and wanted to
write down an explanation of the problem and the solution that I'm
As you may know, one of flŭd's most important characteristics is that
it doesn't allow nodes to unfairly consume resources, or to cheat the
system so that they can store significantly more data than they
provide. This attribute is HUGELY important in a completely
decentralized system -- without it, the entire network can be hijacked
and abused, and if such abuse becomes widespread, the network becomes
useless for its intended purpose.
As explained here
metadata layer is implemented with a DHT, and all the data stored
therein is transparent and auditable (providing the aforementioned
important characteristic) *EXCEPT* for the encrypted
filesystem-specific metadata. This always bothered me a little bit,
because there is no way for anyone other than the owner of this
information to verify that the opaque portion is really constrained to
its intended purpose. The DHT is particularly vulnerable here,
because storage consumed by a node in the DHT is not debited against
that node. But it didn't seem like a huge problem; if we simply
limited the opaque portion by length, the attack vector becomes
narrower and harder for someone to abuse.
But now I realize that this isn't true. For one, an attacker could
store many, many bogus files and thus provide many records with
transparent bits that would verify and audit properly, but then use
the very small opaque field for nefarious purposes. Perhaps that's
not so bad -- such a user would have to pay for their abuse by
providing resources in proportion to those stored files. But what if
the attacker instead targetted already-stored files, using popular CAS
(SIS) files already stored by other users, simply appending opaque
fields to existing records? Such an attack would cost very little to
mount, and would be very hard to detect.
But even worse is the fact that since these records are content
addressable, a single user may have many files that map to the same
CAS key, meaning that we /have to/ allow a single node to have many
opaque fields for any single CAS key. A user who backs up many empty
or 1-byte files, for example, will have all of those files stored
under a common key, but each individual file metadata (filename,
ownership, etc) needs to be preserved. And that metadata is
sensitive; it must remain opaque.
So, I've started warming up to the idea of moving the opaque data out
of the metadata layer and storing it alongside each block of the
This means that we will consume an extra ~400 bytes (more or less
depending on the metadata) for each file block that we store.
That's a hefty price for small files, but it seems reasonable for
larger files. I'll have to redo my 'storage expansion factor' figures
with a reasonable data set to see just how much of an impact this will
That is the big negative.
The big positive is that we actually get some nice features out of the
system by doing this:
#1 - We close this attack vector; /all/ metadata-layer data becomes
transparent and verifiable
#2 - DHT storage resources are reduced (nice because we don't count
DHT consumption against a node)
#3 - We gain some extra recoverability. We'll be able to completely
recover files even if the DHT fails, simply by querying nodes for "all
the data that belongs to me." (This is something I've been meaning to
finalize anyway, and it means we can also store "I am block 5 of 40"
info along with the file block. i.e., this is the long-promised "all
file metadata will also be replicated outside of the DHT"
that allows us to recover all data even if we lose the DHT records and
the master metadata record).
#4 - We are already storing reference lists for file data blocks --
this isn't so hard to tack into that data structure.
#5 - Accountability. Nodes that want to store lots and lots of
filesystem metadata, including extended attributes etc., can do so
easily and fairly. Consuming extra storage this way is directly
debited from the available storage a node can consume. Likewise,
nodes that are frugal with their opaque metadata will be rewarded with
#6 - This makes the opaque field much easier to expand for other
filesystems and operating systems in the future.
I think those benefits outweight the costs, but will do an analysis
with numbers when I get the chance.
More information about the flud-devel