The Symptomatic Filesystem (SFS)

The Symptomatic Filesystem is a new design for a large scale distributed network file service. The name is chosen to reflect the fact that the physical location of any given file in the service is unknown at all times, and that users only interact with symptoms of files, rendering the files themselves safe both from malicious attack and network congestion.

Introduction

Motivation

SFS is motivated by a number of scenarios, none of which modern network filesystems can adequately handle:

The Government of Evilland

The Government of Evilland is very evil, and Joe Average (an Evilland expatriate) wants the whole world to know it. Joe publishes a "web page" or similar resource outlining the evil things that the Government of Evilland does. The Government of Evilland promptly sets their Evil Internet Goon Squad on the task of finding and eliminating the network server serving this page. They do so my any number of known attack patterns -- none at all sophisticated. They attempt to flood the TCP stack, take down nearby routers, send harassing emails to the ISP, etc. etc. until the page goes away.

The Slashdot Effect (a.k.a. "Flash Crowds")

Something amazingly cool is published on the internet, and within minutes news of its existence has multiplied through the geometry of email forwarding and IRC to literally millions of interested onlookers. Each of them loads the resource, even if just to see what all the fuss is about. The site the resource is published on is completely swamped, and falls over from the strain of trying to satisfy all the requests. Nobody gets to see it at all.

Poorly Planned Acts of Nature

You implement a security system which relies on the availability of public or private crypto keys. You encrypt everything, and as soon as you finish, the one disk which is holding the crypto keys crashes. You're sunk.

The Wandering Eccentric

A very smart person, known and loved to many a computer user, wanders the globe sleeping on various peoples' couches and taking leftovers from their fridge in exchange for little gems of advice which come from the twisted recesses of her genius mind. She is not affiliated with any institution or even a reliable ISP, since she has no permanent address for billing or anything. Nonetheless she is regularly in front of a terminal, wanting to check her mail, edit her files, have emacs start up just the right way, etc. She does not trust any one person to store her files, as they may suddenly become unavailable, so she makes do with floppy disks, rsync, and a variety of obscure FTP servers she knows nobody will look at.

For SFS to be successful, it must address these motivating cases in a satisfactory manner.

Rough Outline

In its idealized state, the SFS model is an array of exactly 2^128 files (figure 1), each of which may be as large or as small as desired. The files, which have unique locations in the array, also posess unique names. No 2 files can have the same name, but aside from that names may be arbitrary binary strings of any sort. Files can only be accessed by name. There is no intrinsic browsing, indexing or hierarchical directory structure implied on top of SFS -- that is left to a higher level protocol.

...
6eb9 adb8 4e14 cdf2
a51f 325c 4730 2979

"Fried Chicken"
6eb9 adb8 4e14 cdf2
a51f 325c 4730 297A

"David and Goliath"
6eb9 adb8 4e14 cdf2
a51f 325c 4730 297B

"Linux Beer Steins"
...
Figure 1: the simplified SFS

The array is distributed amongst a set of server hosts. Each server holds some, but not all, of the files. A user process interacting with the servers need only find one server in the set to locate any file in the SFS array by name. There can be many distinct, disconnected SFS arrays built (for instance, a single organization may want one internally) as SFS is self organizing and does not depend on any external authority or administrative structure like DNS, IP numbers, ASNs, etc. The mechanisms for accessing files given their name and distributing files between hosts are the remaining interesting details of this paper.


Accessing Files Based on their Name

Names and Numbers

As stated above, file names are free form binary strings, as are their contents. Names are stored along with files at their locations in SFS (to uniquely determine files), and files can be protected by cryptographic features like public keys, such that they can only be altered by clients posessing proper credentials. The numerical location for a file is arrived at by taking a secure cryptographic hash function such as SHA-1 and applying it to the file name. The result will be a 128-bit integer, which is the file's index in the SFS. The likelihood of hash value collision is minor, besides which fact multiple files can be stored at any location given sufficient storage on the host, so long as they have different names.

If we let the function char *sha(char *c) be an implementation of the SHA-1 hash algorithm, and consider the function's iterates starting from a given name nm, we get a well defined infinite sequence of 128-bit integers (figure 2).

sha(nm) 6eb9adb84e14cdf2a51f325c47302979
sha(sha(nm)) ea1673a5cfac98bb31a116b104163db3
sha(sha(sha(nm))) 46f92615dc993376dd6db7ea354c7b49
sha( ... sha(nm) ... ) some other integer here
Figure 2: Iterated Hashing

We will call this sequence of integers the Hash Chain of nm. It is extremely unlikely that any two files published by the same individual will have the same hash chain, even if their names differ only by a single bit. Cryptographic hash algorithms are designed to come up with very different output given even slightly different input.

The purpose of computing a hash chain in the first place is that it gives not only the primary or "home" address of a file in SFS, but also a sequence of "backup" hosts which a client process may wish to query if the primary is unreachable. In fact, as we shall see, failure of any given server in the hash chain is of little consequence to the overall likelihood of retrieval.

Where are the files?

The motivating cases above are difficult to address primarily because any illigitimate user wishing to disrupt file service need only masquerade as a legitimate user long enough to locate the server with a file they dislike, and then commence an attack on that server (uniquely determined by IP address). SFS solves this scenario in an unorthodox way: rather than trying to resist attack, when a file server S even suspects it is under any form of attack it immediately falls over, dead as a doornail. The only thing S needs to ensure is that its files have been safely mirrored elsewhere before it cuts off service. While this mirroring activity could in principle be driven by the attack itself, things are much simpler and more secure if all file servers simply follow an algorithmic rule of thumb: In order for S to even begin providing file service for a given file named N, S must first secure a backup location somewhere along the hash chain for N, and only once the backup acknowledges that it has a copy can S begin to service requests.

This simple rule, plus a little careful thought about the sequencing of communication, puts the attacker in a very difficult position. Before commencing any given attack (which takes at least some resources), they must learn of the file's location. But in order to learn of a file's location, they must receive a positive reply from some server in the hash chain, at which point they know the file will already have been backed up to some unknown position, elsewhere in the chain. The only reliable form of attack involves hitting the entire hash chain, which (since it is theoretically infinite and sparsely distributed throughout the address space) becomes increasingly difficult as the size of the set of file servers grows. Thus, any attack (or failure, which is modeled as attack) is always slightly too late to cause any damage to the overall system

Redundancy and Caching

In order to best capitalize on the concept of hash chains, the file retrieval protocol is closely connected to the chains themselves. A request for a file is sent to any host (chosen at random) in the hash chain, which forwards the request "up" the chain towards the first hash iterate, host by host. If any host in the chain is unavailable, it is skipped, and the message winds its way upwards. When the message reaches a host which is willing to satisty the request, the response is bundled in another message and sent back down the chain addressed to the member who initiated the request. That host may cache the response if the request was read-only and a special "loose security" flag is set on the file, but (like all servers) the caching host must ensure that it has a backup before it actually serves any further requests. It then becomes a "mirror" for the original file, which helps balance the load. Requests which enter the chain "below" the caching host will be served from its cache; requests which enter the chain "above" it will be served by the initial image.

If someone has attacked a server which was serving the requested file, the request will make it to the head of the hash chain (the lowest-numbered iterate which is still online), be rewritten as a failure message, and work its way back down the chain passing through all the hosts inbetween. In this case, one of two things will happen:

  1. The failure message will have passed by a host on the chain which was a backup for the original file. This "backup host" now knows that someone or some act of nature has crashed the "primary host". It will then (sometime in the near future) quietly make its own backup somewhere else in the chain. It will do nothing else until it hears another request. It has now assumed primary service for the file.

  2. The failure message did not pass such a backup host. Nothing happens.

The client, upon receiving a failure message, simply tries again at some other random position in the hash chain. If it wants to be methodical in its search, the client can restrict its random search to servers "further down" the hash chain. In all likelihood, the file will re-appear some time in the future if the client continues this course of action.


Who Stores What

Navigation

Assume for a moment that the idealized SFS of figure 1 is already distributed in some uniformly random way amongst a large number of hosts on the internet. The numeric file addresses served by each host do not correspond in any predictable way to the host's IP address; thus there is clearly a problem of even finding the IP address of any host in a given hash chain. SFS solves this problem by ensuring that each host in the network have a special kind of map of the 128-bit SFS address space; a client only needs to find one host and make use of the host's map in order to find all other hosts in the network.

Since the address space of SFS is quite large, it is infeasable to store all 2^128 addresses in a flat file, or even to store a list of all active servers in a given network in a flat file. However, we would like SFS to be self-organizing and not have central failure points, so a tree-structured addressing system like IP or the DNS is not a viable option. What we choose instead is a model in which each server has a "home address" which is the center of the numerical range it is serving, and it maps out the space numerically close by; the map becomes more and more sparse and inaccurate as the numerical distance between the mapping host and the mapped host increases. The implementation is mathematically quite simple: each host stores an array with exactly 255 slots in it (figure 3). Each slot S holds the IP address of some host who serves an SFS address A with ((int) log(A-H)) == S, where H is the home address of the mapping host, and the logarithm is taken in base 2. We refer to this construct as a log map.

127
sfs.snark-hunter.com
...
1
sfs.chicken-cow.net
0
localhost
-1
sfs.bazooka.com
...
-127
sfs.flimsy.edu
Figure 3: A Log Map

If all hosts in the SFS have log maps, then a client can perform a reasonably efficient binary search of the address space. The client simply asks any nearby server for a file with address A, and (assuming the server does not belong to the file's hash chain, which would be remarkably lucky) the server takes log(A-H), truncates it to an integer I, jumps to the I'th entry in its log map, and forwards the request to that host. Thus even in the worst case where each file is on a separate server and there are 2^128 servers, it takes at most 128 network hops to find the a server in the correct hash chain. In practise, it the search should terminate much more rapidly.

For the purposes of navigation and numerical distribution, the SFS address space should be considered circular; that is, for any address A, we have that A + 2^128 == A - 2^128 == A.

Growth and Shrinkage

A network of servers (and the associated maps) is not a static set of data. Servers will join and leave the network. Failure will be common. Relocation will be common. These sorts of things have to be easily accomodated by the SFS protocol.

Growth

A set of SFS servers begins with a single server A, whose map has "home address" H and has 128 entries, all pointing back to itself. It is the only server in the network. This server can be solicited by any other new server B which wants to join the network. A then inserts B's IP address (or DNS name) into slot 127 of its map. This effectively cuts the address space served by A in half. A then assigns B an address randomly chosen between (2^127)-H and 2^127 as "home address", and transmits its map (consisting of all-A entries) to B. B then inserts H in it's map at the natural (-127th) slot. This completes a "split" of the address space. When A next receives a solicitation, it will assign the new server an address from the slot (- 2^127)-H, then (2^126)-H, (- 2^126)-H, (2^125)-H, etc. until it has split off every entry in the map.

Each time it assigns an address to a new server, A transmits the entire 255-entry map to the new host. Since the "perspective" of the map changes between one host and another, many entries in the transmitted map will have the same truncated logarithmic offset when viewed from the new host's home address. Only the first entry in a given slot in the new host's map is kept; additional entries are discarded. Likewise, A will not assign more than one new address from a given slot. Once a server has assigned a non-local entry to a slot, the slot is consumed and not available for further splitting. If A receives further solicitations it will pass them on to hosts in its map. All servers follow the same protocol for splitting

Since it is possible as the space becomes more densely populated that 2 different servers may attempt to hand out the same address, a server wishing to assign an address to a soliciting host must query the address and confirm that no server currently considers it a "home address", or is serving a file with that exact numerical address. So long as the chosen number is "free" in this respect, it can be arbitrarily assigned.

Shrinkage

While temporary failures of individual hosts may be recovered from reasonably well by the redundancy inherent in the hash chains, it is possible that a host with a large quantity of the address space may simply drop off the network. Careful observation of the above splitting protocol will note, however, that such a server will eventually have all of its addresses re-assigned at random by its neighbours, as it will not be online to stake a claim on any such addresses. It may be desirable, in addition, to have a neighbour take over service for a failed server; this special type of solicitation is a simple variation on the original randomized solicitation, and allows for easy recovery of a large set of addresses, if space is short.


Summary

SFS is a novel design for a simplified, distributed filesystem with high reliability and security. It is not necessarily the fastest or most easily administered filesystem (from the point of view of conventional, centralized control of resources), but it may serve a useful role in many wide area applications such as publishing, file sharing, archival, and multi-user collaboration. I would like to see a sort of volunteer "distributed.net of filesystems" arise from people's spare hard disk space, for instance. Whether SFS scales down to the size of a single workgroup or cluster remains to be seen in implementation.

The design is mostly my own work (Graydon Hoare) but I received useful insight from Justin Wells and Laurie Harper, and some inspiration from work by Ian Goldberg. It stems from an earlier idea I'd hoped to implement for key management, which is filed away as an internet draft. Adam Back helpfully pointed me towards this proposal from Ted Anderson, who clearly has more of a clue than I do. I also discussed this sort of stuff with Ian Clarke in email for a while, but he seemed totally uninterested in making a filesystem (with things like the ability to update data, or even ensure it stays online if it's not popular); nonetheless it seems he's made enough people keen on "p2p" systems that it doesn't really matter which cryptosystem you use underneath anymore.

Implementation

No implementation of this filesystem currently exists. It is a concept only. As such, well, I don't know if it's meaningful to copyright something when I really mean to be patenting it, but I want it to be clear that at least this document is copyright (C) 1999 Graydon Hoare, and licensed to all parties under the terms of the GNU GPL v 2.0+. I don't know if that really holds any water in court, but if you implement this, patent it, and then try to sue me when I implement it, I will never ever come to your birthday party. You're also more than welcome to implement it under GPL.

I'm hoping to do an implementation of SFS if I get enough intrest from others and a spare few months to put it together. If you are interested in implementing it (or perhaps contracting me to implement it), please contact me.