Pulling container images on a node that has no internet

Published: May 16, 2026

ocicontainer-registrykubernetescontainerdalpineopenrchomelabsupply-chaindevsecopsdebugging

Fifth in the KMQ series. Second in the homelab infrastructure subseries that runs alongside it.

Previous in KMQ: the broker from primitives, the ring buffer and backpressure, the ack sidecar retry loop, the TLS sidecar loop.

Previous in infra: a private OCI registry and a pull-through cache.

The infra post set up a private registry on a LAN host and a cache for three public registries on the segment router. Both were exercised from a workstation with full internet access. This post extends the same substrate to a kubernetes worker that has none.

What this is

A two-segment cluster. The control plane sits in the outer segment with internet access. The worker sits in the inner segment behind a default-deny firewall rule. That rule is the point: anything scheduled to that worker, including anything compromised, cannot reach the public internet.

The worker still runs pods that reference public registries. registry.k8s.io/coredns/coredns:v1.11.3. docker.io/library/alpine:3.21. The pods do not know about the segmentation, and the kubelet does not need to know either 1. It asks the container runtime to pull. The bytes appear.

Five components are doing distinct jobs behind that sentence. Each one can be reasoned about and verified on its own, which is what makes the rest of this post short 2.

The components

Inside the segment: the worker; a private OCI registry on the same LAN over plain HTTP, conformant with the OCI Distribution Specification 3; a DNS resolver on the segment router answering homelab names; and a containerd mirror configuration on the worker, under /etc/containerd/certs.d/, that translates each public registry hostname to the private one 4.

Outside the segment: the control plane has internet egress and an OCI client that pre-pushes upstream images into the private registry 5. The push is automated and runs once per kubernetes version bump, driven by kubeadm config images list. It populates seven images: api server, controller manager, scheduler, proxy, coredns, pause and etcd. The same loop with a different list populates upstream tooling images the cluster will need.

What stays manual is the push of internally-built artifacts (the broker, the producer, the test consumer). Those go in after each local build, by hand for now.

What a pull looks like

The kubelet asks containerd for registry.k8s.io/coredns/coredns:v1.11.3. Containerd reads /etc/containerd/certs.d/registry.k8s.io/hosts.toml, finds a mirror pointing to http://registry.home.arpa:5000, opens the connection, fetches the manifest, fetches the blobs and reports success. The pod moves through ContainerCreating to Running.

The hostname registry.k8s.io is never resolved to a public IP from the worker. The firewall is never asked to allow public egress. The bytes never leave the segment.

Three layers have to work for that flow: TCP routability worker-to-registry, DNS resolution of the local hostname and containerd correctly applying its config. Each is verifiable on its own. When the assembly fails, the move is to isolate the layer that broke before changing anything else 6.

Four dead ends on the way

The cookbook below is the version after the dust settled. The path there was four dead ends. Each is a recognisable shape worth writing down.

The pull-through cache did not help. The first instinct was to point the worker at the same cache built for the outer segment. The cache opens an HTTPS connection to registry.k8s.io upstream, which replies with a 307 redirect to a regional CDN backend on a public address. The cache cannot follow the redirect on the worker’s behalf. The worker cannot reach the CDN either. The error from kubectl describe pod named the CDN’s public IP: dial tcp 34.96.108.209:443: connect: connection refused. The IP was the clue. The cache was an inert participant, not a culprit.

The registry rejected the manifest, not the blobs. The pre-push from the control plane went blob by blob and then failed: MANIFEST_INVALID: manifest invalid; map[...mediaType:application/vnd.docker.distribution.manifest.v2+json]. The private registry follows the OCI Image Specification strictly. To accept Docker V2 schema manifests (which is what registry.k8s.io serves through Google Artifact Registry), it needs http.compat = ["docker2s2"] in its config 7. Adding the line and restarting the registry let the next run of the loop succeed against existing blobs and put the manifests in. One footnote: the registry converts Docker manifests to OCI on upload, which changes the digest. Pull-by-tag works. Pull-by-original-digest does not.

The mirror configuration was being ignored entirely. With the pre-push done, the worker still failed to pull, with the same upstream IP in the error. The clue was upstream of the runtime, in the config loader: containerd config dump | grep WARN reported, for every CRI section in the file, “Ignoring unknown key”. The runtime had moved to config schema v3. The file was written for v2 plugin paths (io.containerd.grpc.v1.cri). The new paths are io.containerd.cri.v1.images for the image plugin and io.containerd.cri.v1.runtime for the runtime plugin 8. Transcribing the same physical settings into the new structure made the runtime parse them.

The mirror pointed at the right host, with one wrong word. With v3 in place, containerd config dump showed config_path active and /etc/containerd/certs.d/registry.k8s.io/hosts.toml was being read. The mirror still failed, again with the upstream IP in the error. The hosts.toml had override_path = true. That field tells the runtime not to prepend /v2/ to upstream requests, intended for registries that put their API root elsewhere. The private registry follows the OCI spec, so it does have /v2/. With override_path = true the runtime made requests against URLs the registry returned 404 for, then fell back to the configured server value, which was the unreachable public host. Removing one line fixed it.

The thread through all four: don’t think, look 2. Each fix came from reading what the system reported, in order, layer by layer. The longest single delay was on the third one, where the smoking gun (WARN lines on every CRI key) was visible from the first command, and we initially overlooked it.

Cookbook: adding a new upstream

Three steps. From a host with internet access, pre-push the images:

crane copy <upstream>/<image>:<tag> <private-registry>/<image>:<tag>

On the worker, create /etc/containerd/certs.d/<upstream-hostname>/hosts.toml:

server = "https://<upstream-hostname>"

[host."http://<private-registry-hostname>:5000"]
  capabilities = ["pull", "resolve"]
  skip_verify = true

Containerd reads hosts.toml on each pull. No restart needed.

To verify before scheduling pods, issue a pull through the same code path the kubelet uses:

ctr -n k8s.io images pull --hosts-dir /etc/containerd/certs.d <upstream-hostname>/<image>:<tag>

If this succeeds, the kubelet will too. If it fails, the error message identifies which of the three pieces above is missing or wrong. Reproduce before isolating; isolate before changing 2.

What is queued

Pushing internally-built artifacts is not yet automated. That is queued behind the Ansible role for the rest of the homelab.

The private registry serves plain HTTP. Adding TLS through the local CA is queued behind certificate lifecycle automation.

The image catalog is not pruned. Old manifest references accumulate. Cleanup belongs in the same place as rotation.

What this enables matters more than what it is. With image pulls reliable on a default-deny segment, every other piece that depends on container scheduling becomes possible there: cluster DNS, the broker pods, anything the cluster might host next. The constraint that defines the segment is preserved.

References

[1] Kubernetes documentation, “Images”. Pull policy, registry resolution, image identifiers. The kubelet’s behaviour around image pulls is described here. https://kubernetes.io/docs/concepts/containers/images/

[2] David J. Agans, Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems, 2002. The two rules that did the work in this session were rule 3 (“Quit Thinking and Look”) and rule 4 (“Divide and Conquer”). Every fix above came from running a command, reading the output and isolating one layer. The book is short and worth keeping at hand. https://debuggingrules.com/

[3] Open Container Initiative Distribution Specification. Endpoint definitions for /v2/, manifests and blobs; the protocol the private registry speaks. https://github.com/opencontainers/distribution-spec/blob/main/spec.md

[4] containerd, “Registry Host Configuration”. The hosts.toml schema: server, [host."..."], capabilities, override_path, skip_verify. This is the single file that wires public hostnames to private mirrors. https://github.com/containerd/containerd/blob/main/docs/hosts.md

[5] crane copy documentation in the go-containerregistry project. Source-to-destination mirroring of OCI artifacts without intermediate disk. The loop on the control plane is one line per image. https://github.com/google/go-containerregistry/blob/main/cmd/crane/doc/crane_copy.md

[6] W. Richard Stevens, TCP/IP Illustrated, Volume 1: The Protocols, 2nd ed., 2011. Chapter 2 on tracing and layered diagnosis. The principle (“never assume the layers below are working before measuring them”) generalises directly from network stacks to container runtime stacks: routability, DNS, runtime config in this case. https://www.kohala.com/start/tcpipiv1.html

[7] zot configuration reference. http.compat = ["docker2s2"] accepts Docker V2 Schema 2 manifests; the registry converts them to OCI on upload, which changes the manifest digest. https://zotregistry.dev/v2.1.15/admin-guide/admin-configuration/

[8] containerd CRI plugin documentation. Config schema v3 splits the legacy io.containerd.grpc.v1.cri plugin into io.containerd.cri.v1.images (sandbox image, registry, config_path) and io.containerd.cri.v1.runtime (runtimes, cgroup driver). Sections under the old path are silently ignored with a WARN on startup. https://github.com/containerd/containerd/blob/main/docs/cri/config.md