Overview
oci2bin converts any Docker (OCI) container image into a single, self-contained executable. It doesn’t require any Docker daemon, no container runtime, and no installation on the target machine as the binary is static. Just copy the file over and run it.
The output is a polyglot file which is simultaneously a valid ELF64 executable and a valid POSIX tar archive which itself is an OCI image.
How It Works
The Polyglot Format
The output is a polyglot file — valid as both an ELF64 executable and a POSIX tar archive simultaneously. This works because the two formats’ magic bytes don’t collide: ELF magic (7f 45 4c 46) lives at byte 0, while tar’s ustar magic sits at byte 257. The 64-byte ELF header fits entirely within the tar filename field (bytes 0–99), and the remaining tar header fields fill bytes 64–511 without touching the ELF magic. See sysfatal’s blog post for more on the ELF+TAR polyglot technique.
The file layout:
[0-63] ELF64 header (embedded in tar's filename field)
[64-511] Remaining tar header (ustar magic at byte 257)
[512-4095] NUL padding (page-aligns the loader for mmap)
[4096-~75K] Loader binary (statically linked C)
[~75K-end] OCI image tar (manifest.json, config, layer tarballs)
[EOF] Metadata block (image name, digest, timestamp)
The build_polyglot.py script parses the loader’s ELF to extract entry points and program headers, then builds a synthetic ELF64 header and wraps it in a tar header structure. The loader’s program header offsets are shifted by PAGE_SIZE (4096 bytes) to account for the tar header prefix, while keeping virtual addresses unchanged — maintaining the p_offset % PAGE_SIZE == p_vaddr % PAGE_SIZE invariant required for mmap.
The loader binary contains placeholder marker values that are patched at build time:
OCI_DATA_OFFSET— patched with the byte offset where the OCI tar beginsOCI_DATA_SIZE— patched with the OCI tar’s size in bytesOCI_PATCHED— set to confirm patching succeeded
This is how the loader knows where to find the embedded image inside itself.
The Loader
When you execute the binary, the loader (~2800 lines of statically linked C, no dependencies) runs the following sequence:
- Reads
/proc/self/exeto find its own path - Verifies the
OCI_PATCHEDmarker, then seeks toOCI_DATA_OFFSETand extracts the embedded OCI tar to a temp directory (mkdtemp, mode0700) - Parses
manifest.jsonfrom the extracted OCI tar to get the layer list and config - Extracts each image layer in order into a
rootfs/, validating every path against traversal attacks (rejects..components and absolute paths) - Patches the rootfs for single-UID namespace operation:
- Rewrites
/etc/passwdand/etc/group— maps all UIDs/GIDs to 0 (exceptnobodyat 65534) - Writes
/etc/apt/apt.conf.d/99oci2binto disable apt’s sandbox (which would fail under UID remapping) - Replaces
/etc/resolv.confwith the host’s actual resolver config (the chroot can’t follow symlinks outside the rootfs)
- Rewrites
- Enters the user namespace (
CLONE_NEWUSER) and maps container UID/GID 0 to the invoking user’s real UID/GID — no real root privileges are gained - Enters mount, PID, and UTS namespaces (
CLONE_NEWNS | CLONE_NEWPID | CLONE_NEWUTS), and optionally the network namespace (CLONE_NEWNETwith--net none) - Forks — the child becomes PID 1 in the new PID namespace, the parent waits and cleans up the temp directory on exit
- Child sets up bind mounts for volumes, secrets, SSH agent forwarding, and optionally an overlayfs for
--read-onlymode chroots into the rootfs, mounts/proc,/tmp,/dev(with device nodes: null, zero, random, urandom, tty), applies seccomp filter and capability restrictionsexecs the entrypoint
All rootless. No daemon, no suid helper. The only dependency on the target system is tar.
Security
Seccomp-BPF filtering. The loader installs a BPF filter (with architecture validation) that blocks 16 dangerous syscalls by default: kexec_load, kexec_file_load, reboot, syslog, perf_event_open, bpf, add_key, request_key, keyctl, userfaultfd, pivot_root, ptrace, process_vm_readv, process_vm_writev, init_module, and finit_module. The filter is applied with SECCOMP_FILTER_FLAG_TSYNC to cover all threads, falling back to prctl(PR_SET_SECCOMP) if unavailable. Can be disabled with --no-seccomp.
Privilege escalation prevention. PR_SET_NO_NEW_PRIVS is set before exec, preventing privilege gains through setuid binaries or file capabilities. The user namespace maps only a single UID/GID — container root is just the invoking user on the host.
Capability management. Linux capabilities can be dropped and selectively re-added (--cap-drop all --cap-add NET_BIND_SERVICE). The loader manages the permitted set, inheritable set, ambient set, and the bounding set — dropping from the bounding set is permanent for the process lifetime.
Tar extraction safety. Layers are extracted with --no-same-permissions --no-same-owner, stripping setuid/setgid bits and ignoring the original file ownership. Every layer path is validated to reject .. components and absolute paths before extraction.
Symlink attack prevention. Secret files, SSH agent sockets, and device mounts are created with O_CREAT | O_EXCL, which fails atomically if the path already exists — preventing a malicious image layer from planting symlinks that redirect mounts outside the rootfs.
Temp directory isolation. All extraction happens in a mkdtemp-created directory (mode 0700), cleaned up by the parent process on exit.
Installation
Build Dependencies
gcc– C compilerglibc-static– static C library headers (Debian:libc6-dev)python3– build scripts (stdlib only, no pip packages)docker– to pull/save images (skopeosupport is next)
Install
git clone https://github.com/latedeployment/oci2bin
cd oci2bin
make
make install PREFIX=/usr/localThe loader is compiled automatically on first invocation if not already built.
Usage
Basic
# Convert an image to an executable
oci2bin alpine:latest
# This produces ./alpine_latest
./alpine_latest
# Override the entrypoint
./alpine_latest /bin/shThat’s it. alpine_latest is a standalone binary you can scp to any Linux box and run.
Build-Time Options
# Cross-compile for aarch64
oci2bin --arch aarch64 alpine:latest
# Strip docs, man pages, locales, apt caches from the image
oci2bin --strip myapp:latest
# Inject files at build time
oci2bin --add-file config.yml:/etc/app/config.yml myapp:latest
# Inject an entire directory
oci2bin --add-dir certs:/etc/ssl/certs myapp:latest
# Merge additional layers on top
oci2bin --layer debugtools:latest myapp:latest
# Cache builds (keyed by image digest)
oci2bin --cache --strip myapp:latestRuntime Flags
The generated binary accepts Docker-like flags:
# Environment variables
./myapp -e DB_HOST=localhost -e DB_PORT=5432
./myapp --env-file .env
# Volumes
./myapp -v /host/data:/data
./myapp -v $(pwd)/logs:/var/log/app
# Secrets (read-only mounts)
./myapp --secret api_key.txt:/run/secrets/api_key
# Forward SSH agent
./myapp --ssh-agent
# Networking
./myapp --net host # default, shares host network
./myapp --net none # isolated, no network
# Read-only rootfs (overlayfs copy-on-write)
./myapp --read-only
# Run as specific user
./myapp --user 1000:1000
# Resource limits
./myapp --ulimit nofile=1024 --ulimit nproc=512
# Capabilities
./myapp --cap-drop all --cap-add NET_BIND_SERVICE
# Device access
./myapp --device /dev/sda:/dev/sda
# Detach (fork to background)
./myapp -d
# Init process (reaps zombies, forwards signals)
./myapp --init
# Separate flags from args
./myapp -- -v # the -v here is passed to the entrypoint, not oci2binSubcommands
# Inspect embedded metadata
oci2bin inspect ./myapp
# List cached binaries
oci2bin list
oci2bin list --json
# Prune outdated cache entries
oci2bin prune
oci2bin prune --dry-run
# Diff two binaries' filesystems
oci2bin diff ./myapp_v1 ./myapp_v2Use Cases
- Ship a Service Without Docker
- Air-Gapped Environments
- Reproducible Dev Environments
- Lock Down a Container
The Polymorphic Binary Trick
The same file works in three ways:
# As an executable
./myapp
# As a tar archive
tar tf ./myapp
# As a Docker image
docker load < ./myapp
docker run myapp:latestSo if you ever do need Docker again, the binary is a valid image you can docker load right back.