“A nearly impenetrable thicket of geekitude…”

Docker and LXC

Posted on February 9, 2024 at 20:51

My long-standing virtualisation platform of choice has been VMware’s; for many years I’ve had a couple of machines running ESXi hosting my virtual machines and I use their Fusion product on my Macs. With the acquisition of that business by Broadcom and the ongoing “business transformation”, though, I’ve moved the server side of things over to Proxmox Virtual Environment.

Proxmox is a more open virtualisation product built around Debian Linux. It is a little less featureful than the VMware suite in some ways, but it has what I need in my very not-Enterprise use case, and I’m less concerned that a new “business transformation” might suddenly compromise my access to it.

As well as running virtual machines, Proxmox has a very usable interface to Linux containers (LXC), a technology which I’d been aware of from my use of Docker but which I’ve not previously had the opportunity to look into in its own right. Having a new tool for the toolbox is always a joy, though, so I’ve been digging into this other form of virtualisation recently.

Linux containers are built on the same Linux kernel technologies that underlie Docker, Podman and Kubernetes: cgroups, namespaces and an isolated filesystem that (usually) looks a lot like a Linux installation. There are differences, though:

  • In an “application” container (Docker, etc.), setting up networking is the job of the container engine, not the container. In a “system” container (LXC) the container sees a raw (virtual) network interface and is responsible for its setup, in the same way as a virtual machine would be.
  • In an application container, things like environmental variables are passed in to the container’s entrypoint, which is often the only thing that runs. In a system container, there are fewer assumptions and you can do whatever you like but very little is done for you.

Simple container

As an example, let’s look inside a simple application container:

% docker run --rm -it alpine:latest
/ # ps gax
PID   USER     TIME  COMMAND
    1 root      0:00 /bin/sh
    7 root      0:00 ps gax
/ # ip a
...
21: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 65535 ...
    link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
/ # df -h /
Filesystem                Size      Used Available Use% Mounted on
overlay                  58.4G     22.4G     32.9G  41% /

Here, the process with PID 1 is the container’s entrypoint, the shell. An Ethernet interface is ready for use, and I have 33GB of space on my root drive. Those are all lies, of course, but they are the kind of lies a containerised application needs to hear.

What about LXC? A container is just a bunch of files invoked in some kind of context, so maybe we can just export that bunch of files and fire it up in Proxmox as a system container? Turns out that is exactly true:

% docker container create --name example1 alpine:latest
% docker container export --output example1.tar example1
% docker container rm example1
% gzip example1.tar
% scp example1.tar.gz root@pve02:/var/lib/vz/template/cache/

pve02 is one of my Proxmox nodes; the /var/lib/vz/template/cache/ directory is a location where Proxmox stores templates from which you can create containers using its graphical user interface. Creating such a container from the template that used to be a Docker image and firing it up gives us a login prompt, after which we can have a poke around:

example1:~# ps gax
PID   USER     TIME  COMMAND
    1 root      0:00 /sbin/init
   25 root      0:00 /bin/login -- root
   26 root      0:00 /sbin/getty 38400 tty2
   27 root      0:00 -ash
   28 root      0:00 ps gax
example1:~# ip a
...
2: eth0@if482: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 ...
    link/ether bc:24:11:fa:87:94 brd ff:ff:ff:ff:ff:ff
example1:~# df -h /
Filesystem                Size      Used Available Use% Mounted on
/dev/mapper/pve-vm--106--disk--0
                          7.8G      7.7M      7.4G   0% /

The lies are different this time. PID 1 is now an explicit init process created by the environment, and that has spun up the login process which leads to our shell. There’s a network interface, but it’s unconfigured. The root filesystem is a mounted virtual disk now, and a tiny 8MB of that is in use by our Alpine Linux userspace.

At this point, because this is Alpine Linux and even the tiniest container bundles BusyBox, you can create an appropriate /etc/network/interfaces, type ifup eth0 and be off to the races. Things will be a little trickier with images based on other distributions, but the principle still applies.

Complex container

In the above, I glossed over the fact that although I got a shell out of both container technologies, the mechanism was entirely different. That’s more obvious if the Docker container is supposed to be doing something other than running a shell, which will almost always be the case. The technique above will always give you a shell (if it works at all); the original image’s ENTRYPOINT, CMD, ENV and so on are all ignored.

To see what can be done in real use cases, let’s move on to my motivating example for looking into this at all. netboot.xyz is a neat thing to have hosted in your network if you’re trying different Linux distributions or just installing a lot of virtual machines. You can deploy it as a Docker container but it’s better for it to have its own independent IP address so that you can put that into your DHCP server’s configuration. Deploying as an LXC container is ideal, but that’s not something they provide.

Instead, we can take the Docker image, add some missing components to turn that into a fully functional LXC template using docker buildx, then deploy a container from that template. Here’s the Dockerfile:

FROM netbootxyz/netbootxyz:latest

RUN apk add --no-cache openrc ifupdown-ng

ADD netbootxyz /etc/init.d/
RUN rc-update add netbootxyz default

This Dockerfile installs two packages: openrc is Alpine’s init system; ifupdown-ng is a more advanced network interface package than that provided by BusyBox: it’s there to provide compatibility with Proxmox’s LXC network configuration UI.

We’re also adding a netbootxyz service to the system’s default run level. The configuration file (which needs to be executable) looks like this:

#!/sbin/openrc-run

command="/start.sh"
command_background="true"
pidfile="/run/netbootxyz.pid"

export TFTPD_OPTS=
export NGINX_PORT=80
export WEB_APP_PORT=3000

depend() {
        need net
}

The export statements provide the environmental variables required by the container image’s /start.sh entrypoint, which we’re defining here as a service dependent on the availability of networking. Because netbootxyz is part of the default run level, its dependency will be brought up before it whenever the system starts.

Building the new template is simple:

% docker buildx build --output type=tar,dest=example2.tar .
% gzip example2.tar
% scp example2.tar.gz root@pve02:/var/lib/vz/template/cache/

Deploying this template from the Proxmox UI and providing a static IP address results in the following:

example2:~# ps gax
PID   USER     TIME  COMMAND
    1 root      0:00 /sbin/init
  348 root      0:00 {start.sh} /bin/bash /start.sh
  357 root      0:00 /sbin/getty 38400 tty2
  384 root      0:00 {supervisord} /usr/bin/python3 /usr/bin/supervisord ...
  386 root      0:00 /usr/sbin/syslog-ng --foreground --no-caps
  387 root      0:00 nginx: master process /usr/sbin/nginx ...
  388 nbxyz     0:00 /usr/bin/node app.js
  389 root      0:00 /usr/sbin/in.tftpd -Lvvv --user nbxyz --secure /config/menus
  391 nbxyz     0:00 nginx: worker process
  392 nbxyz     0:00 nginx: worker process
  393 nbxyz     0:00 nginx: worker process
  394 nbxyz     0:00 nginx: worker process
  406 root      0:00 tail -f /var/log/messages
  412 root      0:00 /bin/login -- root
  413 root      0:00 -ash
  414 root      0:00 ps gax
example2:~# ip a
...
2: eth0@if486: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 ...
    link/ether bc:24:11:82:48:d1 brd ff:ff:ff:ff:ff:ff
    inet 192.168.117.1/24 scope global eth0
       valid_lft forever preferred_lft forever
...
example2:~# df -h /
Filesystem                Size      Used Available Use% Mounted on
/dev/mapper/pve-vm--106--disk--0
                          7.8G    182.3M      7.2G   2% /

You can see from this that the network interface has been configured with the supplied static address, and that the full netboot.xyz stack is running:

  • PID 348 is the service startup script, running as a daemon.
  • That has invoked a supervisord process (PID 384) to start the component services:
    • syslog-ng
    • tftpd
    • a Node.js application for the user interface
    • nginx and its worker processes to serve asset requests

Conclusions

Containers are variations on a bunch of files in an isolated runtime context. It’s possible — and even fairly straightforward — to convert a container image intended for one runtime for use in another by tweaking the set of files in use; they’re just files. docker buildx is, pretty much by definition, one way of performing manipulations of this kind.

Proxmox is a pretty nice platform to deploy both virtual machines and system containers (application containers too, with the proviso that the recommendation from Proxmox is to run your application container runtime inside a virtual machine). The resulting containers are tiny, start very quickly and are easy to move around a Proxmox cluster if needed, without dragging a virtual machine around with them.

Tags: