Computer Cluster — Jim Turner

In my lab at Duke University, we had a lot of old computers from prior research projects that were no longer being used. I volunteered to put them together into a cluster for the lab to use for computationally-intensive tasks. I didn’t know anything about cluster computing before this project, so it was a great experience learning how to put together and use a computer cluster.

If you’re new to cluster computing and are interested in setting up your own small computer cluster, the following overview may be helpful.

Hardware & Network

The cluster has seven x86-64 desktop computers of varying age with a range of processors and memory capacities. They are all connected with a single 8-port unmanaged network switch that is connected to Duke’s network. This is a photograph of the cluster:

Photograph of seven desktop computers of different types on the floor of the lab, connected with a single network switch. — Photograph of the computer cluster. Image © 2016 Jim Turner and licensed under CC BY‑SA 4.0.

Six of the computers (dsg01, dsg03, dsg04, …, dsg07) are compute nodes, and the remaining one (dsg02) is the login node, SLURM controller, and file server. This is the network topology:

The nodes are connected to a single switch. That switch is connected to Duke's network, which is separated from the Internet by a firewall. Users can connect to Duke's network directly or, if they are elsewhere on the Internet, through Duke's VPN. — Network topology of the cluster and users. Image © 2016 Jim Turner and licensed under CC BY‑SA 4.0.

Software

The hardest part of setting up the cluster was figuring out what software to use and how to configure it. Since I was unfamiliar with cluster computing, I strongly favored projects with good documentation that were fairly easy to set up. I decided on the following:

Debian stable for the OS. It’s free software, is reliable, and has long-term support. The Debian project also works very hard to minimize changes to Debian stable, which reduces the work required to administer the cluster.
Gluster for the shared file system (for users’ home directories). The Gluster documentation is pretty good, so I found it easier to set up than the alternatives. It’s also a distributed file system, so if I need to add more storage capacity or speed up transfer rates in the future, I can add more storage nodes.
SLURM for scheduling and resource management. It is straightforward to set up¹, provides all the functionality I need, and is popular.
MPICH as the MPI implementation. Supposedly, it automatically detects and integrates with SLURM, but I haven’t tested this myself.
MUNGE for hosts to authenticate each other (needed for SLURM). This is easy to set up.
Ganglia for historical performance monitoring. The documentation wiki appears to no longer exist, but the Ganglia quick start was sufficient to set it up.
Sphinx to build the documentation for the cluster (hosted on the head node).
Apache as the web server on the head node to host the documentation and Ganglia. Debian makes setting up Apache very easy.
OpenSSH for users to connect to the cluster and transfer files with SFTP. I also set up passwordless (key-based) authentication for all users between hosts for MPICH.

I installed additional software for users to develop and run their programs, including:

Miniconda for the Python environment because it’s the easiest way to get up-to-date Python packages on Debian stable.
GNU Compiler Collection (GCC) for the C/C++/Fortran environment.
GNU Octave as a free alternative to MATLAB.
MATLAB, because the other researchers in my lab use it.

Usage

If you’re unfamiliar with computer clusters, it’s helpful to know how they work from the user’s perspective. This is how the small cluster I built is set up:

The user has access to his/her home directory and the /tmp directory on each node. The user’s home directory is shared across the nodes with Gluster, so all programs and input/output files in the user’s home directory are available on all nodes. To run a job on the cluster:

The user transfers his/her program and input data to the login node with SFTP.
The user SSHes into the cluster’s login node. He/she can run inexpensive tasks on the login node, such as compiling small programs. However, for computationally-intensive tasks, the user should submit a job with SLURM to run on the compute nodes.
On the login node, the user can use the following SLURM commands:
- srun to run a single job and wait for it to complete,
- salloc to allocate resources (primarily for an interactive job), or
- sbatch to schedule a batch job for execution.
When the necessary resources (i.e. processors and memory) become available on the compute nodes, SLURM starts the job on the available compute nodes.
The user can cancel the job with scancel or check its status with squeue.
If the user submitted a batch job, SLURM saves the standard output and standard error from the job to the specified location (typically the user would specify files in his/her home directory). The program being run can also save output files itself to the user’s home directory, because the user’s home directory is transparently synchronized between the nodes with Gluster.
When the job is complete, the user can download the output files from the login node with SFTP.

Configuration Management & Testing

One of my goals was to automate the installation and configuration of the cluster as much as possible in order to simplify maintenance and enable version control of the configuration. For installation and configuration, I’m using:

Ansible for configuration management. Ansible is relatively simple to set up, is extensible, and works well enough for my needs.
Git for version control of the configuration.
Debian preseeding for the initial installation of the OS. Unfortunately, preseeding is not well documented, but I was successful basing the template off of this example and the partman-auto documentation.
Jinja for generating the preseed files by filling a template with variables parsed from Ansible.
GNU Make for automating the build process of the configuration and test images.

Since users could be running jobs on the cluster, I needed a way to test changes that didn’t interfere with the actual cluster. I’m using the following additional software to test the configuration with a network of virtual machines on my laptop:

Packer to build clean Debian virtual machine images with the preseed files.
Vagrant to start and provision the virtual machines with Ansible.
VirtualBox to run the virtual machines.

Documentation & Sustainability

One of my goals when building the cluster was to make it sustainable after I leave Duke. As a result, I automated as much of the configuration as possible and documented everything. I’m using Sphinx for documentation, and I’m keeping the configuration and documentation on Duke’s GitLab instance.

Other Resources

If you’d like to set up your own small cluster, the following resources may be helpful:

The documentation for the software I listed above.
Partway through the project, I found ajdecon’s ansible-simple-slurm-cluster repository with Ansible roles to set up a SLURM-based cluster. I made some different decisions than ajdecon, but his example was really helpful as an outline of what to do.
Many universities have documentation about their SLURM clusters; this is helpful to learn how users interact with the cluster. For example, UT Austin has good documentation for their Stampede cluster.

To generate an initial configuration, use one of the configuration builders, which are available at /usr/share/doc/slurmctld/slurm-wlm-configurator.easy.html and /usr/share/doc/slurmctld/slurm-wlm-configurator.html once you have slurmctld installed. Look at the man page for slurm.conf(5) for more information about the options. ^[return]