GNU/Linux Pet Peeves

or

How to Hate Your Computer as Much as I Do

by Linas Vepstas <linas@linas.org>

After years of administering a variety of GNU/Linux systems, both server and desktop, I decided I've had enough. That's it. I hate having to deal with all the bullcrap! Below follows a list of my pet peeves, all of the things that are wrong with Linux and really need to be fixed. I am writing them down here because there's one thing I hate even more: arguing with idiots on mailing lists who don't even understand the problem you are trying to report, before they start telling you that its not a problem. Arghhhhh! So I'm just gonna blow of some steam here, and hope that maybe someday these things will get fixed. And before you tell me that its time I tried running Windows Server 2003, go read the last section, about why PC's suck. GNU/Linux, warts and all, is still better than Windows.

The /var Tirade

Either the FHS/LSB doesn't mandate a place where archive files should be kept, or package owners don't follow them. For example, mailman (at least on Debian) puts mailing list archives into /var/lib/mailman/archive, list configuration files into /var/lib/mailman/lists, and assorted crap in other subdirectories. Who knows where in the world it puts subscriber lists. I want to back up the archives, and the conf files, and the subscriber lists. I don't want to backup temporary queues. By mixing together temporary, transient files with permanent archives in the /var directory, it makes it hard to divine the correct backup strategy. This makes my backup scripts insanely complicated, error-prone, and of course ... I didn't realize they were broke until I needed the data. Grrr..

You may ask, "Why didn't you just save yourself a headache and backup the temp data, too?" Answer: Because my backup strategy is to never delete files. If I start backing up temp files, I would soon have a huge collection of crap files named "/var/spool/whizbang/queue/Bxxjzckue.218" that are totally and completely un-needed and unwanted and just chewing up backup space. The backup would grow without bound. The rational solution to this is to have mailman clearly distinguish temp working files, which need to go into /var/tmp or /var/spool or some other directory that is not backed up. But archives and subscriber lists need to go into directories which are backed up. To a (much) lesser degree, postfix is guilty of the same bad behavior.

Ideally, the /var filesystem shouldn't contain both temporary, transient files (such as /var/lock) and permanent, archivable files (such as /var/www) Its stupid to mix transient and permanent files in the same file system, for several reasons. Besides the backup problem, there is also the disk-reliability and disk-performance problem. If a file system is getting a lot of activity, it is more likely to experience hard-drive failure. I want to isolate that activity so that when the hard-drive failure occurs, it only takes down my spool area, and not my precious permanent data. I want my transient spool area on a different partition, or even better, a different disk than /etc, /home, /usr. When the head on the hard drive controller is seeking to the next temporary file, I want it to scrub over a very limited area of the hard drive, where the other temporary files are stored. I don't want it flying back and forth over my permanent data. Its like putting a precious Ming vase on a high shelf in a small shack at the end of an airport runway. Sure, airplanes almost never crash. But living in the flightpath is still a bad idea. Secondly, if my server needs to do a lot of spooling, I want to put that file system on something high-performance, maybe a stripped array, rather a slow-but-safe RAID-5 array. Mixing up transient and permanent files on /var makes it hard to draw this distinction. I shouldn't have to be a SuperHero SysOp to get it right. It should be possible for any WindowsLamer to get a reasonable Unix server config, right out of the box, that wouldn't make a pro Unix admin gag.

Right now, that parts of /var that must be backed up are interspersed, like swiss cheese, with other bits that must not be backed up. In the olden days of Unix, one never-ever backed up /var: by definition, all files in it were temporary, perishable. /var was like /tmp, but one step above it: it consisted entirely of transient files, and unlike /tmp, it just wasn't world writable. And then some damn fool decided that it was a good idea to put permanent files into /var. So now we have total idiocies like /var/www and /var/lib. That is just plain wrong. And packages like mailman, which install themselves into /var. Which is just plain insane. I dunno. The same FHS that explains that /var is for transient data speaks in the same breath of putting non-transient data into /var/lib. Conclusion: FHS is psychotic? Arghhhh! (April 2003)

The IDE/fsck/RAID Tirade

The current Linux storage subsystem works just fine if you have very reliable hardware, and never make any changes. But whoa be to the sysadmin who needs to enlarge, change, modify or repair. In that case, it reveals itself to be fragile, undependable and unreliable. Storage subsystem maintenance in Linux is a nightmare. Below follows a list of problems that I've experienced, and some ideas on steps that could be taken.

Storage Management

The current relationship between the mount command, the storage subsystem, RAID, and LILO is an unholy mess.

Wintel PC's Suck

Yes, they do. When you are fighting down in the trenches, PC's just aren't worth the crap they're made out of. Below are a few reasons why. Disclaimer: I am currently employed by a manufacturer of very high-end, very expensive, non-wintel computer hardware. Many PC users simply do not understand why these 'other' kinds of computers cost a factor of ten more than a PC does, for just about the same MIPS and FLOPS, speeds and feeds, RAM and disk.

I worked with a fool who thought it would be cool to build a terrabyte file server out of $5K of commodity PC parts. Yeah, right, nothing worked. Three out of ten hard drives arrived DOA. One hot-plug drive tray was DOA, and looked like it was used, returned as faulty by a previous customer, and shipped again as new. The IDE cables were too short. The on-board Ethernet controller hung after transferring about 1GB of data. There were PCI bus errors until I re-flashed a newer BIOS onto the main-board. The driver for the IDE subsystem was proprietary, not open source, and installed itself in a weird place, which was later blown away during subsequent install. Piece of honking, blowing junk. There is a reason that IBM/Dell/Compaq PC Servers cost $50K and not $5K. That's because they actually work, out of the box. There's a reason that SGI/Sun/IBM/HP Unix servers cost $500K and not $50K. Its because they are actually reliable, and deal with faults and failures in a predictable, recoverable way. Oh, and you actually get service.

I think the litany below might help PC owners to understand the high cost of servers from SGI, Sun, HP and IBM. For example, did you know that the gate oxide on the IBM Power4 CPU is four times thicker than that on the Intel Pentium/Xeon/Celeron, etc. CPU's? The thicker oxide makes the CPU one hundred times less likely to fail. It also makes the CPU run slower, since the gate cannot slew as fast; it cannot source/sink large currents the way that a thin-oxide gate can. The CPU chip clock cannot be run at the higher frequencies seen in the Intel chips. Net/net: the Power4 made a tradeoff between raw performance and reliability, and picked reliability over performance. Now ain't that counter to the currents in the PC world?)



Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included at the URL http://www.linas.org/fdl.html, the web page titled "GNU Free Documentation License".