Cheapest and Hardest: Diskless Nodes

Next: Beowulf Maintenance Up: Building the Beowulf Previous: What about other distributions? Contents

Cheapest and Hardest: Diskless Nodes

Diskless nodes, you say? But, but, but, how do they boot? What do they run? How do they work?

I'm glad you asked.

Actually, in all parts of the Unixoid computing universe but that part occupied by the Wintel-Macintoad industrial conspiracy^11.20diskless operation via NFS has been a standard feature in widespread operation for longer than I've been managing unixoid systems (some fifteen years, with some years of DOS before that). Sun Microsystems, in particular, for years sold workstations designed to operate as diskless computers. They didn't have a disk. They didn't want a disk. They didn't need a disk. And all of this was on a far older, far slower network than we enjoy today at a time that 16 megabytes was a lot of memory.

Nowadays, with 100 megabit per second switched networks the minimal beowulf standard (in most cases), 500+ bogomip servers, memory available at less than $1 per megabyte, diskless configuration works, it works well, and it saves you at least the cost of one hard disk per node, which is (these days) approximately $100 per node, which (these days) might buy you six nodes for the cost of five or better. Diskless nodes can make good economic sense.

The linux kernel is perfectly capable of diskless operation, and has been for several major revisions now. There are only one or two things that make diskless operation more difficult than it really should be, and it is these things that make this the second best way to run a beowulf for most people (presuming that most people reading this book are not, or at any rate are not yet, linux gurus).

One of these things is that (conspiracy or not) ``personal computer'' makers (as opposed to ``Unix workstation'' makers like Sun and SGI) generally don't install BIOS's that are capable of booting over a network device. This, in turn is at least partly because the network device in a PC generally is made by a third party and requires a driver that doesn't live in the bios because of the lack of a uniform network device API. This leaves one with a hideous chicken and egg sort of problem. To get a device driver for the network card into a system, one needs a disk. If one had it, one could boot diskless (and load a network device driver without a disk).

Oops.

There are solutions to this, of course. One can get network cards that have a BIOS that is aware of a diskless boot protocol developed to support diskless boots in unix workstations that can bootstrap both the kernel and the required device drivers with no disk at all. Some of the ways of obtaining the requisite hardware will be indicated in the Hardware appendix. This, however, is a bit tricky and there is a different way, I wouldn't quite say a better way, that is ``directly'' supported on a standard PC with over the counter parts.

That is to boot from a floppy drive (which is very cheap, generally costing less than $20) and use a standard cheap network card, but skip the hard disk altogether. Because this is the cheapest and most robust solution, this is the one we will develop in detail below. Let's start with the new and improved hardware list:

A floppy drive.
A generic SVGA card (I usually get $30 S3-Virge cards)
Your NIC's

where again you may or may not be able to skip the SVGA card (depending on how hard you want to work and whether you want independent access to the nodes).

It is just about this moment of your life that you should pause and read the Diskless Howto, which is reasonably current and does a far more detailed job of describing diskless operation than I'm going to give you here. I will just outline the key elements:

Build a boot floppy with the kernel and required network support for your distribution. The floppy will likely be a lilo floppy, and will have a whole little set of parameters that are to be passed to the kernel being booted. These parameters tell the kernel who the system is supposed to be (basically permitting it to configure its primary network interface) and where it should look for its root directory.
On the server, build a root directory for the node and export it to the node. There is a huge range of ways to go about this. Some give every node a very large independent root. Others share one root among all nodes. Still others give each node a small independent root, but mount the ``big'' directories (typically /usr and /home) from a common server export.
Develop a way of ``cloning'' the node files or directories on the server. Again, there are some very clever ways of going about doing this, from the simple but wasteful to the trancendentally clever and cheap but awesomely complex.

Note that this approach gives you essentially all the scaling advantages that the best method. You have a script titled something like ``makenode'' that you run on the server. It either clones a root and modifies the requisite files inside or clones the requisite files on a common root. It builds you a boot floppy, either customized to the node or generic. You pop the boot floppy into the floppy drive, power it up, and - Instant Node.

Not only that, but if a node goes down, a replacement can be brought up by popping the aforementioned floppy into the floppy drive and booting. There are also loads of Clever Tricks $_{\rm tm}$ that one can play - a system with a hard drive running WinXX by day can be rebooted into a beowulf node by night by popping in the floppy. A diskless WinXX node can be built (not that anyone sane would ever want one) by installing a diskless linux node and running e.g. VMware. To say there is no backup burden is an understatement - the nodes have no disks to back up and the server either contains a single image (with a handful of node specific files) to be backed up or a single image that is cloned to make the nodes that must be backed up - the script can reconstruct the node roots from this one image.

With all of this going for it, why is this method number two?

For two reasons. First of all, as you'll discover when you attempt to set it up, it is a bit tricky. You really need to know what you are doing to make diskless operation work. I used a diskless boot to ``bootstrap'' a cloning install to disk for our beowulf (back when I was still using slackware) and it took a lot of work and learning for me to do, in spite of my having run Sun SLC's and ELC's diskless for years. The Suns were easy in comparison, as they had boot proms that knew how to boot diskless a priori.

Second, your diskless system will probably have no swap, and will need a lot of memory to ensure stable, fast operation. In fact, you'll likely need to spend most of what you saved on a hard disk on extra memory unless you were already planning to build a node with a lot of memory (more than 128 MB, say). I'd advise adding at least 64 MB more than you planned on just to hold the operating system and your code. Linux is smart - it will use this extra pad of memory for buffering and caching libraries and disk files and so forth and greatly reduce the impact of mounting everything from NFS - and with or without this if you ever exhaust physical memory your system will die the aforementioned ugly death, probably right in the middle of your calculation.

Note that I do not advise turning extra memory into a ramdisk and going through some arcane ritual to load it with a large root. Linux turns it into a ``ramdisk'' for you, in all the senses that matter, and does a far more optimal job than you are likely to be able to do, while also being able to use the memory in other ways (like to avoid running out) in the event that an application needs it for a short time.

This wraps up our general discussion of beowulf installation approaches, in the order that you are likely to tackle them as a novice. Clearly there are lots of things to learn, and the best way to learn them is by doing. I rather wish that a lot of these were fully encapsulated in scripts and so forth for the novice, but thus far this hasn't happened. I make a stab at it in a few cases in the Software Appendix, but in other cases we all await contributions.

This by no means exhausts our need for clever tricks or advice on how to configure things. The next chapter is full of some (I hope) sound and sensible advice on the best way to set up certain aspects of the networking and so forth. First, however, we promised to address maintenance.

Next: Beowulf Maintenance Up: Building the Beowulf Previous: What about other distributions? Contents

Robert G. Brown 2004-05-24