DHT Ramblings, Part Three

The last two posts mull a vaguely defined problem, a set of issues and unclear design requirements. They have something to do with preserving data, making sure that preservation is robust against various failure modes, and making this cheap (free? as in libre?) easy, automatic, and beneficial to private individuals with modest abilities and resources. So, basically, people like me. This is a continuation of that exhibition of yoga poses. These musings may seem simplistic, but that’s because I have no clear idea of what the problem is.

Lets ponder the generic offering of DHT, in any of multiple forms: as IPFS, as bittorrent, as GnuNet. Perhaps more distantly, in the form of scuttlebutt. So, for starters, there’s a hash. Any “reasonably-sized” blob of data, a few bytes to a few megabytes, maybe even gigabytes, if you insist, can be hashed to produce a “universally unique ID” for it. It can even be done in a “cryptographicly secure” fashion, so that it’s effectively impossible to alter the content and still get the same hash. Yes, SHA-256 is broken, mostly, kind of almost-ish with hard work, fakery and deception which hopefully the victim won’t notice. It’s the exception that proves the rule: forging a hash is hard, err, impossible.

This is useful for “content addressable storage”: if you want a song called XYZZY, and if someone publishes an index “song XYZZY has a hash of ‘abc-200-more-hexadecimal-digits’, and if someone else advertises that they have content that hashes down to that hash, and if they are willing to deliver it to you, then you have the ability to verify that the content you received really does have that hash. It may not be song XYZZY, but it does have that hash. That’s a lot of if’s.

DHT’s provide one interesting piece of technology in support of the above: they provide IP address routing tables for hashes. That is, you can ask any participant of a DHT scheme a question: “who has content associated with hash ‘abc’?” and the participant can reply either “I do”, or it can reply “I don’t, but the following list of IP addrs might”. And that’s it. That’s all there is.

What can you do with this? Well, build file-sharing networks. But this is old news: everyone knows this.

Ipfs.tech is forthright about this: they state: “Our peer-to-peer content delivery network is built around the innovation of content addressing: store, retrieve, and locate data based on the fingerprint of its actual content rather than its name or location.” That’s it. That’s all there is.

Hmmm.

Footnote: After more review, I’ve come to the conclusion that IPFS is a crypto-scam. Compare it to, say, the Wikipedia article on BitTorrent: It’s clear that BitTorrent is stable, mature, well-thought-out, debugged. Compare IPFS to Ceph, or, better yet, to searches “Ceph vs. mdadm“, “Ceph vs. NFS” or “Ceph after power outage.” It’s clear that Ceph is stable, mature, well-thought-out, debugged, and, dare I say it — a reasonable candidate for a small office/home network — at least, if you’re technically proficient. I’m still not clear how well Ceph responds to power outages, and people randomly rebooting computers on a small network. But its … well, its plausible. Might actually solve some of the annoyances of small networks. We’ll see.

DHT Ramblings, Part Three

Comments

Leave a Reply Cancel reply