“A nearly impenetrable thicket of geekitude…”

Docker-in-Docker and host resolution errors

Posted on December 7, 2022 at 10:37

This article documents a particularly niche issue I ran into which took a while to debug and resolve. I’ve posted it for the benefit of search engines, so that the next person to run into these exact conditions might save a few hours. You’re welcome.

This is probably interesting only to someone in my exact situation, so most people should find some cat pictures to look at instead. If you’ve been directed here by a search engine because you’re tearing your hair out, though, read on.

The very niche situation I alluded to breaks down to this:

  • A private GitLab instance, running a continuous integration job.
  • The job is docker building something based on a recent Linux base image (in my case, Rocky Linux 9).
  • The runner is executing on a slightly older Linux implementation (in my case, Ubuntu 20.04 LTS).
  • We’re using Docker-in-Docker mode to avoid giving the job access to the host’s Docker socket.

The failure looks like this:

---> Running in faab08d66352
+ dnf -y install httpd mod_ssl
Rocky Linux 9 - BaseOS                          0.0  B/s |   0  B     00:00    
Errors during downloading metadata for repository 'baseos':
  - Curl error (6): Couldn't resolve host name for https://mirrors.rockylinux.org/mirrorlist?arch=x86_64&repo=BaseOS-9 [getaddrinfo() thread failed to start]

You might think that the most likely cause of an issue like this would be DNS. After all, it’s always DNS, right? Not in this case (and it wasn’t IPv4 vs. IPv6 either, which was my other initial thought). Search engines brought up suggestions to restart my router. I rolled my eyes.

Eventually I happened across ubuntu:21.10 and fedora:35 do not work on the latest Docker (20.10.9) by Akihiro Suda, one of the maintainers of the various components used by Docker. He explains the root of the problem, which is a change in user-mode behaviour in the glibc shipped with newer Linux distributions (Fedora 35, Red Hat Enterprise Linux 9 and derivates like Rocky Linux 9, Ubuntu 22.04). The way older versions of Docker (20.10.9 and earlier) handle the new clone3 system call means that it’s treated as a permissions violation rather than a “not implemented” call which would have resulted in fallback behaviour appropriate to the host operating system.

As an aside, I think this is only the second time I’ve hit any kind of kernel mismatch problem while using Docker. That’s an impressive compatibility record given that I’ve been using it for eight years now.

Akihiro’s recommendation:

The right solution is to upgrade Docker to 20.10.10 or later.

This didn’t help me because all my Docker nodes were already running 20.10.21. However, I saw that the problem was only showing up on those nodes running Ubuntu 20.04 LTS: the node running Ubuntu 22.04 LTS wasn’t seeing the problem, so I felt I was probably on the right track.

The wrinkle in my case turned out to be that my GitLab continuous integration pipeline included the following:

image: docker:20.10.6
services:
  - docker:20.10.6-dind

This is configuring a “helper service” containing its own copy of the Docker daemon for use in the build. This “Docker in Docker” arrangement allows conventional docker build operations to be performed within the CI job without granting the build container access to the host’s Docker daemon. (The build does still need to be run “privileged”, which is still slightly scary, but those are the tools available.)

I deduce that the same interpretation of errant system calls is performed within the dind container as needs to be performed by “normal” Docker. This is a little surprising when you think about all the turtles being stacked on top of each other here, but also makes sense: how else could it work?

Updating the dind helper service to a more recent version of course addresses the original issue. One option is to use docker:20.10-dind to get maintenance updates without the exciting functional changes that might greet a user of the latest tag.

Tags: