Bootstrapping the Bootstrapper | The Cooperative Computing Lab

Much of our recent work has involved running Python applications at scale. While Python itself has pretty mediocre performance, it does make a convenient language for directing operations that use high-performance implementations. Numpy is a good example of this paradigm: the user might write high-level Python code, try things out interactively, and keep everything in a Jupyter Notebook . But Numpy is implemented using highly-optimized C code to carry out the actual operations, so users can work in a friendly Python environment while still taking advantage of fast compiled code. One can take a similar approach for remote and parallel execution: Parsl allows users to write their applications using decorated Python functions. Behind the scenes, the runtime can send these functions to nodes in a cluster, cloud, etc. to execute and bring the results back. The user doesn't have to know that their Python function actually ran on a supercomputer, and the runtime uses futures to make it seem like a simple asynchronous function call.

To make that seamless execution work, we need to be able to run user-provided Python code on remote nodes. Those nodes, however, are often not set up the same way as the node where the application is running; maybe the node only has Python 2 but the app Python 3, maybe some libraries are missing, or maybe Python isn't installed at all. Any of these differences would prevent the user's code from running. To be able to seamlessly send code and data back and forth, we need to have a consistent Python environment on all the nodes. In a cluster a shared filesystem can help with this, with all nodes working from a common Python installation. If a shared filesystem isn't available, we need some other means to set up a common Python environment. We could simply require that users manually configure all nodes, but this is tedious and error prone. It also breaks the abstraction: a user should be able to write and run Python code on their laptop, flip a switch, and run that same code on a cluster. We need a way to bootstrap a working Python environment on remote nodes. Our goal then will be to take a packaged Python environment (created using Conda and conda-pack ) and get up and running on nodes where dependencies (or even Python itself) may not be available.

But what are the exact requirements for our packaging setup? conda-pack includes Python code, as well as some sh . Python packages with embedded C libraries are supported, so do we need to include anything for those? And the whole package needs to be uncompressed and extracted. At a minimum, we'll need tar to be available and support compression ( conda-pack generates .tar.gz files by default). Which brings us to the main question of this piece: what do we need to bootstrap the bootstrapper ? The absolute minimum would be a kernel and nothing else. Based on knowing a bit about conda-pack and Python applications, we'll need more than just a kernel.

So, what exactly do we need on nodes to be able to execute Python in a packaged environment? Rather than enumerating a list of Linux distros that work, let's take a fun approach: start from nothing and figure out the minimal requirements. We can use Linux namespace magic (the same stuff that container systems like Docker are built on) to create a sandbox containing only a kernel and an init process. Then we can add in the exact pieces required to get to a working Python setup.

The Plan

For a minimum environment, our sandbox should contain as close to nothing as possible. With a container system Docker , the details of setting up a sandboxed environment are handled automatically. That's a bad fit here, though; Docker isn't set up for starting from scratch (i.e. completely empty environment), and even minimal pre-configured containers like Alpine Linux come with a lot of stuff that's necessary for a normal, working system. We don't need most of the features of a full container runtime, and this isn't going to be a normal, working system!

So instead, we'll hand-roll our own minimal container environment. Our first goal will be to get a new (empty) root filesystem and a shell in our "container". We could just copy in the shell itself (and later on we'll do just that), but we won't get very far with only /bin/sh . Some capabilities are provided by the shell itself, but a lot of the basic functionality used in shell scripts is actually provided by external programs (e.g. echo , printf , true , false , and [ used in conditionals are all real executables). We'll need the shell itself plus a set of basic utilities for something resembling a sane system. We don't really want to pull in a full GNU stack, as one of our goals is to figure out if there are any weird GNU-related dependencies.

Fortunately for us, a solution already exists. BusyBox is a single-binary toolkit that provides the basic functionality we need. It's often used in routers, low-memory systems, or other minimal environments as it consists of one statically linked executable, with symlinks for everything else.

After we have a minimal working sandbox, we'll try unpacking and running the environment. The sandbox won't even include libc to start, so we'll certainly see errors. But those errors will tell us about the hidden dependencies we're looking for. We can incrementally copy necessary bits from the host system into the sandbox until we have a working Python environment. Doing this for a normal application would be quite a headache (Ever use strace to watch dynamic library loading? Even simple programs do a lot behind the scenes). But this is a bootstrap package, and given that we can successfully run across distros, those requirements must be fairly limited.

Preparation

We'll start from the ground up: an empty directory.

[x@localhost ~]$ mkdir /tmp/sandbox && cd /tmp/sandbox

Since we're planning to copy in pieces from the host system, the sandbox will need to look at least somewhat like a real system. First, let's add a few directories and symlinks. I'm going to skip the mkdir and ln commands here and go straight to the finished result.

[x@localhost sandbox]$ tree .
.
├── bin -> usr/bin
├── dev
├── lib -> usr/lib
├── lib64 -> usr/lib
├── proc
├── sbin -> usr/bin
├── sys
├── tmp
└── usr
    ├── bin
    ├── lib
    └── sbin -> bin

The symlinks here are a little weird. If you look at the Filesystem Hierarchy Standard , a standard Linux filesystem needs /bin , /sbin , /usr/bin , /usr/sbin , /lib , /lib64 , /usr/lib , /usr/lib64 among other things. Since we're planning to pull in pieces from the host system, some of these locations are hard-coded so we can't just leave them out. (We are leaving out a whole lot of other pieces like /etc and /home , but we can get by without those. This is not a normal system, remember!) My host system uses Arch Linux, which along with several other distros merged the various system directories . Arch only uses /usr/bin and /usr/lib ; all the rest are just symlinks to those two.

I also included a few other things. Some libc implementations use special device files like /dev/null , and bash isn't happy if it can't mess around with the terminal, so we'll plan to just bring /dev over from the host system. It's possible to get by without all the devices, but we're not going for security or resource control so isolating devices is more trouble than it's worth. Likewise a lot of basic functionality involves looking around in /proc to get info on running processes, or digging around in /sys to get info on the system, CPU, etc. We'll just bring those from the host system as well. I also added /tmp , since lots of things expect to be able to stash stuff somewhere. Since everything will be running as one user, there's no need to set the sticky bit and everything.

The last bit of setup is to add BusyBox. There's no installation required; you can just download a binary from the website and start using it. Since busybox is statically linked, we don't even need to add libc.

[x@localhost sandbox]$ wget -O bin/busybox https://busybox.net/downloads/binaries/1.30.0-i686/busybox
[x@localhost sandbox]$ chmod +x bin/busybox

We now have a usable sandbox!

Container Setup

On newer kernels, we can take advantage of unprivileged user namespaces to do our container setup without being root. The first step will be to call unshare to set up the required namespaces. We aren't looking for full system isolation, so we only need a new mount namespace (for binding directories from the host system) and a new user namespace (so we can become pseudo-root and use privileged operations like chroot ). If you're running an older kernel or can't use namespace tricks, you can skip the unshare step and do the rest as actual root. Just be careful not to overwrite anything outside the sandbox and break your system.

[x@localhost sandbox]$ unshare --user --map-root-user --mount sh
sh-5.0#

In this shell we're running as pseudo-root and can make changes to the filesystem layout! Let's grab those special directories from the host system.

sh-5.0# mount --rbind /dev ./dev
sh-5.0# mount --rbind /proc ./proc
sh-5.0# mount --rbind /sys ./sys

These sandbox directories now have the same contents as the host system.

sh-5.0# ls -l ./proc/self/cwd
lrwxrwxrwx 1 root root 0 May 26 16:00 ./proc/self/cwd -> /tmp/sandbox

If we wanted to do stuff with the network, run graphical applications, etc. there would be extra steps to make things work (see here for more ), but this is sufficient for our purposes. Let's enter the sandbox!

sh-5.0# env -i PATH=/usr/bin chroot /tmp/sandbox/ /usr/bin/busybox sh
/ #

We clear the environment to make sure the host system's config doesn't leak into the sandboxed shell. Note that things might not work correctly if $PATH isn't set, so I added a generic default. So now in this BusyBox shell, / is our sandbox directory. Let's look around:

/ # ls -l
sh: ls: not found
/ #

Uh oh.... We don't have ls in the sandbox. We can still invoke BusyBox's implementation:

/ # busybox ls -l /bin/
total 956
-rwxr-xr-x    1 0        0           975004 Jan  1  2019 busybox

Looks like our sandbox is indeed empty. To get all the other executables needed, we can have BusyBox make symlinks for us.

/ # /usr/bin/busybox --install -s
/ # ls -l /bin/ls
lrwxrwxrwx    1 0        0               12 May 26 20:33 /bin/ls -> /usr/bin/busybox

We have a number of shell utilities as symlinks to busybox , which should give us a basic shell environment with nothing in our sandbox but BusyBox. We now have a sandboxed (container-lite) environment where we can try things out. The next step will be to bring in the packed Conda environment and see what happens!

Preparing the Python Environment

We're using Conda and conda-pack to manage Python environments, so let's set up a test Python environment. Use a different shell for this section, since you'll need to use the host system and your rc files. In this example, I'll be using a different machine running RHEL. You'll need Conda installed and set up (see here for instructions). Make sure you've also installed conda-pack in your base environment. So now we need to create a Conda environment to pack up. A Python interpreter plus Numpy should work.

(base) -bash-4.2$ conda create -y -p /tmp/bootstrap python=3.6 numpy

Since the target system (sandbox) doesn't have much of anything, we'll need to be sure to include Python itself as part of the environment. I also chose Numpy as it includes C extensions, which should give conda-pack some more work. We're now be ready to create a self-contained package for our new Python environment.

(base) -bash-4.2$ conda pack -p /tmp/bootstrap

Which will give us a package in bootstrap.tar.gz . So now let's copy this into the sandbox!

(base) -bash-4.2$ cp bootstrap.tar.gz /tmp/sandbox

You could also prepare the Conda environment on another machine (as I did here), in which case you would probably use scp instead.

Getting a Shell

Back in the previous shell ( chroot ed into the sandbox), we now extract the tarball package. (You can imagine the conda-pack parts happening on the head node or on your laptop, while this part is on the worker nodes that might have little/nothing installed.) Fortunately for us, BusyBox includes an implementation of both tar and gzip , so we can extract as usual.

/ # mkdir /bootstrap
/ # tar -C /bootstrap -xzf bootstrap.tar.gz

This will give us a full Conda environment under /bootstrap .

/ # ls -l /bootstrap
total 0
drwxr-xr-x    2 0        0             1460 May 26 21:43 bin
drwxr-xr-x    2 0        0               80 May 26 21:43 compiler_compat
drwxr-xr-x    2 0        0              620 May 26 21:43 conda-meta
drwxr-xr-x    9 0        0             2240 May 26 21:43 include
drwxr-xr-x   15 0        0             2820 May 26 21:43 lib
drwxr-xr-x   10 0        0              200 May 26 21:43 share
drwxr-xr-x    3 0        0              180 May 26 21:43 ssl
drwxr-xr-x    3 0        0               60 May 26 21:43 x86_64-conda_cos6-linux-gnu

We now need to activate the environment to use it. From the conda-pack docs , we need to run the activate script included in the package to fix up prefixes and such. But that won't work yet....

/ # source /bootstrap/bin/activate
Unrecognized shell.

Looks like conda-pack is trying to detect the shell. From the source code, bash and dash should be supported; I initially tried using BusyBox sh and adding dash to the sandbox. Unfortunately, conda-pack still didn't detect dash . We'll therefore skip that digression and start by adding bash to the sandbox. Let's do the obvious thing: copy the bash executable from the host system! From another shell:

[x@localhost ~]$ cp /usr/bin/bash /tmp/sandbox/usr/bin/

Now, let's try starting bash in the container.

/ # bash 
sh: bash: not found

Hmmm.... the executable is definitely there.... This error message isn't very helpful....

The issue here is that bash is a dynamically linked executable. We haven't included ld.so in the sandbox, so the execve() syscall is failing and BusyBox is telling us all it knows: (dynamic linker) not found. There's not a separate errno for dynamic linker not found versus executable not found. (For more info on how programs get run, see here ) We can verify this using ldd on the host machine to show the libraries bash links against.

[x@localhost ~]$ ldd /usr/bin/bash
linux-vdso.so.1 (0x00007fff9d1ac000)
libreadline.so.8 => /usr/lib/libreadline.so.8 (0x00007fde76bb8000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007fde76bb2000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fde769eb000)
libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x00007fde7697a000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fde76d3e000)

Of particular interest:

linux-vdso.so.1 isn't a real library. Certain syscalls (e.g. getting the time) don't require any work from the kernel, and impose significant overhead due to context switching. Linux therefore provides some basic info as a special memory area that's mapped into every process. Thus to get the time, a call to libc 's clock_gettime() will simply read a special VDSO memory location, no syscall required.
/lib64/ld-linux-x86-64.so.2 is the dynamic linker. This is a hard-coded absolute path (which is one of the difficulties in making portable ELF files). We can take this directly from the host system to enable dynamically linked executables. (This is the reason we needed to duplicate the filesystem layout from the host system. Hard-coded paths are always trouble....)

So now let's copy in the ld.so from the host system into the sandbox.

[x@localhost ~]$ cp /lib64/ld-linux-x86-64.so.2 /tmp/sandbox/lib64/

Now we can try running bash again in the container.

/ # bash
bash: error while loading shared libraries: libreadline.so.8: cannot open shared object file: No such file or directory

Now the dynamic linker is working, so we get more detailed error messages. As expected, we need the other libraries ldd listed. So as a first pass, let's copy in those libraries. On Linux, a library might have different soname versions, symlinks, or .a files that ld.so will resolve. We don't want to pay attention to that, so let's just resolve symlinks and copy everything for the libraries ldd gave.

[x@localhost ~]$ cp /usr/lib/libreadline.* /tmp/sandbox/usr/lib/
[x@localhost ~]$ cp /usr/lib/libdl.* /tmp/sandbox/usr/lib/
[x@localhost ~]$ cp /usr/lib/libc.* /tmp/sandbox/usr/lib/
[x@localhost ~]$ cp /usr/lib/libncursesw.* /tmp/sandbox/usr/lib/

We now have some basic libraries available in the sandbox. Pretty much every program (including Python itself) will need some of these libraries, anyway. Now let's try bash again in the sandbox.

/ # bash
bash-5.0#

Looks like bash is working in the sandbox!

Activating the Packaged Environment

Now we can try running the activate script from within our freshly copied bash shell.

bash-5.0# source /bootstrap/bin/activate
(bootstrap) bash-5.0#

So far so good. Now let's try running python , which should work before running conda-unpack .

(bootstrap) bash-5.0# python
python: error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file or directory

Looks like we're missing some dependencies for Python. Let's use ldd on the Python that conda-pack provided. From the host system:

[x@localhost ~]$ ldd /tmp/sandbox/bootstrap/bin/python
linux-vdso.so.1 (0x00007ffee4ffd000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f6421635000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f642146e000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f6421468000)
libutil.so.1 => /usr/lib/libutil.so.1 (0x00007f6421463000)
librt.so.1 => /usr/lib/librt.so.1 (0x00007f6421458000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007f6421313000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f64219ee000)

Looks like we need a few more libraries for Python.

[x@localhost ~]$ cp /usr/lib/libpthread.* /tmp/sandbox/usr/lib/
[x@localhost ~]$ cp /usr/lib/libutil.* /tmp/sandbox/usr/lib/
[x@localhost ~]$ cp /usr/lib/librt.* /tmp/sandbox/usr/lib/
[x@localhost ~]$ cp /usr/lib/libm.* /tmp/sandbox/usr/lib/

OK, let's try this again.

(bootstrap) bash-5.0# python
Python 3.6.10 |Anaconda, Inc.| (default, May  8 2020, 02:54:21) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Success! We have a Python interpreter running in our sandbox. Since that works, we can exit out of the Python interpreter and try unpacking the environment.

(bootstrap) bash-5.0# conda-unpack

No errors here. If all went well, we should now have Python plus any libraries available in the environment (Numpy in this example). Let's start Python again and see if Numpy works.

(bootstrap) bash-5.0# python
Python 3.6.10 |Anaconda, Inc.| (default, May  8 2020, 02:54:21) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.version.version
'1.18.5'

It works! We've transferred a Python interpreter and libraries, including both Python and C components, from a host system using a different distro into our minimal sandbox. So it would seem the absolute minimum required for using conda-pack in a container is:

basic shell utilities ( busybox )
bash
ld-linux-x86-64.so.2
libc
libdl
libm
libncursesw
libpthread
libreadline
librt
libutil

Everything else can be provided via the tarball that conda-pack produced. Our sandbox is still much smaller than any fully-functional Linux container, but we know the exact set of requirements for bootstrapping Python environments.

Epilogue: Why did that work?

Given the pieces taken from different Linux distros and the general austerity of the sandbox environment, I was surprised when things more or less just worked. Much of the troubleshooting was getting Bash working and adding a few basic libraries, which for real users would be handled by a package manager. Once the base libraries were in place, Python and conda-unpack just started right up.

The requirements to start a Python interpreter were fairly small: a few C libraries from the host system (several of which were already required by bash ), and the dynamic linker. Much of the rest of Python's functionality is implemented in Python, so we could get by with only a few pieces from the host system. Despite creating the environment on a machine with a different distro than where it executed, we could simply copy over the files. The dependencies shown by ldd happened to have matching sonames, so despite different minor versions across distros the dynamic linker could figure things out. The core libraries Python links against provide a pretty stable ABI, so libraries providing these sonames should generally be available on any modern machine. The dynamic linker itself is another potential source of trouble: the absolute path is hard-coded into the ELF header. Since this is such a critical component, distros are careful to ensure compatibility. Hence the weird symlinks Arch uses to make the filesystem follow convention. The authors of these core dynamic linking/libc components take great pains to make things as stable as possible, which allowed us to relocate the environment to another host without too much trouble.

Python itself does quite a bit to make itself relocatable. Rather than relying on hard-coded paths to its libraries (which would break on copying to a new directory), Python has some complicated logic to figure out where it's currently running, and finds its base libraries by searching nearby directories for landmarks. Thus right after un-tarring the Conda environment, Python can find its base installation based on the location of its executable. The main reason we needed bash was for the activate script, which sets $PATH and plays with the shell prompt. If we use an absolute path ( /bootstrap/bin/python in the sandbox), it's possible to start Python directly from the BusyBox sh , straight out of the tarball.

The real work of relocating the installation falls to conda-unpack . This script is generated with instructions on how to fix up all the hard-coded paths scattered around the environment. As it turns out, conda-unpack is itself written in Python. Thus since the base Python installation is naturally relocatable, it can serve as a fixed starting point for bootstrapping everything else.