blue flame This little image is a closeup of the top of one of the Arnold Tongues (phase locked regions occurring at frequencies that are Farey numbers), specifically, the one that appears on the iterated circle map. There are many more pictures in my Art Gallery.

Open Letter to Debian and Ubuntu Developers

25 May 2015 - Updated 24 Jan 2017 - Updated 4 Feb 2017

There used to be a time when Linux was a joy to use. Now, its a head-ache inducing slog through the bowels of the operating system. You have to be a brain surgeon and a rocket scientist. You have to work over-time. You have to have the patience of a saint and the perseverance of a boxer. I am so tired of this. I'm exhausted.

Update: 4 February 2017 - Systemd fucking rulez

cgroups/cgfs.c:lxc_cgroupfs_create:901 - Could not find writable mount point for cgroup hierarchy 12 while trying to create cgroup.

So, after an apt-get update; apt-get upgrade on Ubuntu 14.04 Trusty, my LXC containers stopped booting, with the above error message. It took me maybe 6 hours, with a dinner break, to find and fix the issue. The solution was simple, it turned out -- cgroups were not being mounted and my hacky solution was to copy /usr/bin/cgroupfs-mount from another system and run it by hand. Bingo, LXC containers work again.

What happened? Well, the interwebs claim that .. who knows. Its got something to do with systemd. The Ubuntu LTS maintainers apparently don't bother testing their code before pushing it any more ... and they broke LXC. WTF. OK, yes, Ubuntu jumped the shark many years ago; but the new hero, the savior and solution to all our problems has not yet appeared.

Seriously: operating systems for servers are supposed to be stable. The apt-get update; apt-get upgrade is not supposed to break working systems. WTF.

Update: 24 January 2017 - FUCK YOU SYSTEMD

systemd-udevd[120]: renamed network interface eth0 to p1p1

Why can't systemd just boot my machine without fucking with the network interfaces!? Why is networking so goddamned difficult with systemd? Why can't it just get out of the way and let the networking subsystem do its thing? I just want to boot my machine, I don't want to search on-line help to figure out why my system doesn't boot anymore because systemd renamed eth0 to p1p1 and then causes `ifup eth0` to fail. Its just frickin the integrated ethernet port on the motherboard! Quit trying to fuck with it! FUCK YOU SYSTEMD!

And now back to the original 25 May 2015 rant.

I have 5 Linux boxes I regularly maintain; two are webservers. You're looking at one now. The Linux kernel has a fantastic uptime -- a year, two years without reboots. But then there is the inevitable power outage during a thunderstorm. And then, at least one or two of my machines won't boot afterwards. Its been like this for 5 or 6 or 7 years now, and frankly I'm beyond getting tired of it. I'm beyond having enough. What are the Kubler-Ross stages of grief? Denial, anger, bargaining, depression, acceptance? I used to want to punch, well, I don't know who, maybe Kay Sievers, maybe Lennart Poettering, or someone, anyone, in the face, for all my trouble and my pain. The pain is still there. I think this open letter is a manifestation of the "bargaining" stage. What do I have to do, what price can I pay, to have a system that boots?

Its never the same thing twice in a row. Many years ago, it was udev and dbus. You had to do rocket surgery to get udev-based systems to boot. That eventually sorted itself out, but for a while, I lost back-to-back 12 hour days fighting udev. Then it was plymouth. Or it was upstart. Why were such utterly broken and buggy systems like plymouth and and upstart foisted on the world? Things with names like libdevmapper should not crash. And then there is systemd, which, as far as I can tell, is a brick shithouse where the laws of gravity don't hold. I understand the natural urge to design something newer than sysvinit, but how about testing it a bit more? I have 5 different computers, and on any given random reboot, 1 out of 5 of these won't boot. That's a 20% failure rate. Its been a 20% failure rate for over 6 years now.

Exactly how much system testing is needed to push the failure rate to less than 1-out-of-5? Is it really that hard to test software before you ship it? Especially system software related to booting!? If systemd plans to take over the world, it should at least work, instead of failing. Stop killing init. Stop failing to find the root file system. Stop running fsck on file systems that are already mounted r/w. Do you have any idea how hard it is to try to edit plymouth or upstart files from busy-box, hoping that maybe this time, all will be OK? To boot rescue images over and over and over and over, tracing a problem through a maze of subsystems, following clues, only to find, two days later, that it was Colonel Mustard, err, systemd that did it in the kitchen, with a candlestick? I mean, I have a really rather high IQ (just look at the web page below), and I have patience that is perhaps unmatched. And I find this stuff challenging. Lets get real: sysvinit was simple and easy-to-use by comparison, and it worked flawlessly. Between 1995 and 2009, I never once had a boot problem. Sure, there were times when I could not watch youtube videos ... but then Ubuntu came along and solved even that problem. For a while, it was Heaven on Earth.

Do you have any idea how shameful it is to tell your various bosses how great Linux is, and then have to dissemble and obfuscate, because you can't bear to tell them the reason you did no work for the last 10 days was because your Linux box didn't boot? To say "no thanks" when your boss offers to buy you a new laptop?

And its not just the low-level stuff, either. There's also the nuttiness known as gnome-shell and unity. Which crash or hang or draw garbage on your screen. And when they do work, they're unusable, from the day-to-day usability perspective. This wasn't a problem with gnome2. Gnome2 rocked. It was excellent. Why did you take something that worked really really well, and replace it with a borken, unusable mess? What happened, Gnome and UI developers? What were you thinking? In the grips of what madness? In what design universe is it OK to list 100 apps, whose names I don't recognize, in alphabetical order? Whoever your design and usability hero is, I am pretty sure they would not approve of this.

Its spreading, too. Like cancer. Before 2013, web browsers worked flawlessly. Now, both mozilla firefox and google chrome are almost unusable. Why, oh why, can't I watch youtube videos on firefox? Why does Chrome have to crash whenever I visit adware-infested websites? What's wrong with the concept of a web browser that doesn't crash? Why does googling my error messages bring up web forums with six thousand posts of people saying "me too, I have this same problem?" When you have umpteen tens of thousands of users with the exact same symptoms, why do you continue to blame the user?

I can understand temporary insanity and mass hysteria. It usually passes. I can wait a year or two or three. Or maybe four. Or more. But a trifecta of the Linux boot, the Linux desktop, and the Linux web-browser? What software crisis do we live in, that so many things can be going so badly, so consistently, for so long? Its one thing to blame Lennart Poettering for creating buggy, mal-designed, untested software. But why are the Gnome developers creating unusable user interfaces at the same time? And what does any of this have to do with the web browser?

I'm not sure its limited to Linux, either. Read the trade press, everyone belly-aches about the incompatible, fragmented Android universe. And, well, obviously, Microsoft Windows has been a cesspool for decades; it was the #1 reason why I switched to Linux in the first place. Duhh. But why has Linux morphed into all of the worst parts of Microsoft Windows, and none of the best parts? We are all Microsoft Windows, now.

What's at the root cause of this? Sure, its some combination of programmer hubris, lack of system test, inexperienced and callous coders. Overwhelmed coders with a 10 year-long backlog of reported, unfixed bugs. Perhaps some fatigue and depression in the ranks of Debian and Ubuntu package maintainer community. Perhaps it is a political problem: the older, more experienced developers have failed to teach, to guide the younger developers. Perhaps we've hit a fundamental complexity limit: there are too many possible combinations of hardware and software. I fear we have hit a wall in the ability to communally develop software; the community is not working. All bugs are no longer shallow. Or maybe it has something to do with capitalism and corporate profitability. Some malaise presaging the singularity. I don't know. What's the root cause of this train wreck?

We need to figure out what is going wrong, not just at the technical level, but at the social and political level, that is allowing major distros to ship buggy and incomplete and broken software, oblivious to the terrible condition it is in, uncaring and dis-interested in fixing it, or perhaps unable to fix it, and unable to see a way forward. But we have to move forward. We need to find a way out of this mess. It cannot continue like this.

Yesterday, there was another thunderstorm, another power outage. Today, I spent the last 11 hours trying to make my other webserver, boot. No matter how I twist and turn, I get a "can't mount root filesystem" or "killing init". Its supposed to be a holiday weekend. I'm not being paid to run these servers. Why can't I just have a system that boots?

-- Linas Vepstas 25 May 2015 Austin TX