 When a system administrator is first asked to provide a reliable,
redundant means of protecting critical data on a server, RAID
is usually the first term that comes to mind.  In fact, RAID is just 
one part of an overall data availability architecture.  RAID,
and some of the complimentary storage technologies, are reviewed 
below.
When a system administrator is first asked to provide a reliable,
redundant means of protecting critical data on a server, RAID
is usually the first term that comes to mind.  In fact, RAID is just 
one part of an overall data availability architecture.  RAID,
and some of the complimentary storage technologies, are reviewed 
below.
RAID, short for Redundant Array of Inexpensive Disks, is a method whereby information is spread across several disks, using techniques such as disk striping (RAID Level 0) and disk mirroring (RAID level 1) to achieve redundancy, lower latency and/or higher bandwidth for reading and/or writing, and recoverability from hard-disk crashes. Over six different types of RAID configurations have been defined. A brief introduction can be found in Mike Neuffer's What Is RAID? page.
Over time, bad blocks can accumulate, and, from personal experience, do so as fast as one a day. Once a block is bad, data cannot be (reliably) read from it. Bad blocks are not uncommon: all brand new disk drives leave the factory with hundreds (if not thousands) of bad blocks on them. The hard drive electronics can detect a bad block, and automatically reassign in its place a new, good block from elsewhere on the disk. All subsequent accesses to that block by the operating system are automatically and transparently handled by the disk drive. This feature is both good, and bad. As blocks slowly fail on the drive, they are automatically handled until one day the bad-block lookup table on the hard drive is full. At this point, bad blocks become painfully visible to the operating system: Linux will grind to a near halt, while spewing dma_intr: status=0x51 { DriveReady SeekComplete UnrecoverableError } messages.
Using RAID can mitigate the effect of bad blocks. A Linux md-based software RAID array can be forced to run a check/repair sequence by writing the appropriate command to /sys/block/mdX/md/sync_action (see RAID Administration commands, and also below, for details). During repairs, if a disk drive reports a read error, the RAID array will attempt to obtain a good copy of the data from another disk, and then write the good copy onto the failing disk. Assuming the disk has spare blocks for bad-block relocation, this should trigger the bad-block relocation mechanism of the disk. If the disk no longer has spare blocks, then syslog error messages should provide adequate warning that a hard drive needs to be replaced. In short, RAID can protect against bad blocks, provided that the disk drive firmware is correctly detecting and reporting bad blocks. For the case of general data corruption, discussed below, this need not be the case.
If some random "soft" error created in RAM is written to disk, it becomes enshrined and persistent, unless some step is taken to repair it. Random bit-flips on-disk are, by definition, persistent. As these random errors accumulate, they can render a system unusable, and permanently spoil data. Unfortunately, regular data backups do little to avoid this kind of corruption: most likely, one is backing up corrupted data.
Despite this being a common disk failure mode, there is very little (almost nothing) that can be done about it, at least on current Linux with current open source tools. RAID, even in theory, does not address this problem, nor does file system journaling. At this point, I am aware of only a small number of options:
For a three-disk RAID-1 system, in principle, one could be more clever, and let the multiple disks "vote" about the correct data. This is not done. In other words, RAID-1 will do "the right thing" only if the disk drive itself reported a read error. If the disk drive is silently returning bad data, its luck-of-the-draw as to whether the bad data will be propagated.
Woe is I! Over the last 15 years, I've retired over 25 hard drives with bad blocks, while managing a small stable of four or five servers and home computers. This works out to a failure rate of less than one every three years, but, multiplied by the number of machines, this adds up. Most recently, I installed a brand new WDC SATA drive, only to discover weeks later that it was silently corrupting my data, and that at a rather incredibly phenomenal rate: dozens of files a day. It took weeks of crazy system instability before I realized what was going on: the drive was really, really cheap! Defective from the factory! The silent part of the corruption was particularly disturbing: at no point did any Linux system component tell me quite what was really happening. Yet, this could have been avoided. Woe is I.
This lack of data error detection and data error correction options for Linux prompts the following wish-list:
Similarly, RAID-5 stores parity bits, but does not actually use them for data integrity checks. RAID-6 stores a second set of parity bits, but does not use these in an ECC-like fashion. A "simple" modification of RAID-6 could, in principle, store ECC bits, and then use these for recovering from a bad block.
Currently, all file system integrity tools, such as tripwire, AIDE or FCheck are aimed at intrusion detection, and not at data decay. This means that all file changes are assumed to be malicious, even if they were initiated by the user. This makes them impractical for day-to-day operations on a normal file system, where user-driven actions cause many files to be added, modified and removed regularly. It is also inappropriate for triggering bad block replacement mechanisms, since unchanged files are never physically moved about the disk. (Physically writing a disk block will normally trigger bad block replacement algorithms in the disk firmware in most drives. Simply reading a block will not (in most drives)).
A core assumption of such a file-system integrity checker is that on-disk data corruption is far more frequent than data corruption due to spontaneous bit flips is RAM or other system components. If corruption in other system components was common, then the likelihood of false positives increases: that good on-disk data was mis-identified as bad.
See also:
The basic Linux Software RAID implementation is provided by the md (multi-disk) driver, which has been around since the late 1990's. Features of the md driver include:
Because these boxes appear as a single drive to the host operating system, yet are composed of multiple SCSI disks, they are sometimes known as SCSI-to-SCSI boxes. Outboard boxes are usually the most reliable RAID solutions, although they are usually the most expensive (e.g. some of the cheaper offerings from IBM are in the twenty-thousand dollar ballpark). The high-end of this technology is frequently called 'SAN' for 'Storage Area Network', and features cable lengths that stretch to kilometers, and the ability for a large number of host CPU's to access one array.
Both SCSI-to-SCSI and EIDE-to-EIDE converters are available. Because these are converters, they appear as ordinary hard-drives to the operating system, and do not require any special drivers. Most such converters seem to support only RAID 0 (stripping) and 1 (mirroring), apparently due to size and cabling restrictions.
The principal advantages of inboard converters are price, reliability, ease-of-use, and in some cases, performance. Disadvantages are usually the lack of RAID-5 support, lack of hot-plug capabilities, and the lack of dual-ended operation.
If the RAID disk controller has a modern, high-speed DSP/controller on board, and a sufficient amount of cache memory, it can outperform software RAID, especially on a heavily loaded system. However, using and old controller on a modern, fast 2-way or 4-way SMP machine may easily prove to be a performance bottle-neck as compared to a pure software-RAID solution. Some of the performance figures below provide additional insight into this claim.
There are a number of journaled file systems available for Linux. These include:
These different systems have different performance profiles and differ significantly in features and functions. There are many articles on the web which compare these. Note that some of these articles may be out-of-date with respect to features, performance or reputed bugs.
The benefit of LVM is that you can add and remove hard drives, and move data from one hard drive to another without disrupting the system or other users. Thus, LVM is ideal for administering servers to which disks are constantly being added, removed or simply moved around to accommodate new users, new applications or just provide more space for the data. If you have only one or two disks, the effort to learn LVM may outweigh any administrative benefits that you gain.
Linux LVM and Linux Software RAID can be used together, although neither layer knows about the other, and some of the advantages of LVM seem to be lost as a result. The usual way of using RAID with LVM is as follows:
Another serious drawback of this RAID+LVM combo is that neither Linux Software RAID (MD) nor LVM have any sort of bad-block replacement mechanisms. If (or rather, when) disks start manifesting bad blocks, one is up a creak without a paddle.
The package contains low level utilities including sgdskfl to load disk firmware, sgmode to get and set mode pages, sgdefects to read defect lists, and sgdiag to perform format and other test functions.
This web page is remarkable because it also provides a nice cross-reference to other diagnostic and monitoring tools.
Also, the RAID Reviews contains some product reviews and performance benchmarks, circa 1998, that were originally a part of this web page. Obsolete/unmaintained.