Tech Study Guide
Containerization, OCI, and VMs
Container fundamentals, OCI image/runtime/distribution standards, Linux namespaces, cgroups, overlay layers, bridge networking, macOS and Windows container behavior, KVM, hypervisors, and VM/container tradeoffs.
Containerization, OCI, and VMs
Containers are isolated processes, not tiny full computers. On Linux, container runtimes combine kernel features such as namespaces, cgroups, capabilities, seccomp, filesystem mounts, and networking to make one or more processes see a constrained view of the host. Virtual machines isolate a whole guest operating system behind virtual hardware and a hypervisor.
Command Examples
docker info
docker image inspect <image>
docker container inspect <container>
cat /proc/<pid>/status
cat /proc/<pid>/cgroup
readlink /proc/<pid>/ns/*
cat /proc/<pid>/mountinfo
lsns -p <pid>
docker network inspect bridge
ip link show type bridge
bridge link
nft list ruleset
Example output and meaning:
| Command | Example output | What it does |
|---|---|---|
cat /proc/<pid>/cgroup |
0::/system.slice/docker-<id>.scope or container cgroup paths. |
Shows which cgroup owns the process and its resource accounting boundary. |
readlink /proc/<pid>/ns/* |
Namespace IDs such as net:[4026532741] and mnt:[4026532738]. |
Proves which namespaces differ from the host or another process. |
docker network inspect bridge |
Bridge subnet, gateway, containers, and options. | Connects container IPs to the host bridge and NAT path. |
VM vs Container
| Feature | Container | Virtual Machine |
|---|---|---|
| Kernel | Shares the host kernel or a VM-provided kernel. | Runs its own guest kernel. |
| Boundary | Process isolation plus kernel-enforced resource and namespace boundaries. | Hardware virtualization boundary around a guest OS. |
| Startup | Usually fast: start processes and set isolation. | Usually slower: boot or resume a guest OS. |
| Image | Application filesystem layers plus metadata. | Disk image with OS, bootloader, kernel, and userspace. |
| Density | High, because containers share one kernel. | Lower, because each VM carries an OS. |
| Compatibility | Must match kernel family and CPU architecture expectations. | Can run different OS kernels supported by the hypervisor. |
| Security model | Strong but still shares kernel attack surface unless nested in a VM. | Stronger kernel boundary, more overhead. |
The practical rule: use containers to package and isolate applications that can share a kernel. Use VMs when you need a different kernel, stronger tenant isolation, kernel modules, full boot behavior, or OS-level compatibility.
How Linux Containers Work
Linux containers are possible because several kernel features compose:
| Feature | What It Does |
|---|---|
| Mount namespace | Gives a process its own view of mounted filesystems. |
| PID namespace | Gives a process tree its own PID view, often making the app PID 1 inside the container. |
| Network namespace | Gives isolated interfaces, routes, sockets, firewall state, and loopback. |
| IPC namespace | Isolates System V IPC and POSIX message queues. |
| UTS namespace | Isolates hostname and domain name. |
| User namespace | Maps container users to different host users, enabling safer rootless behavior. |
| cgroups | Account and limit CPU, memory, I/O, PIDs, and other resources. |
| Capabilities | Split root privileges into smaller privileges such as network admin or raw sockets. |
| seccomp | Filters syscalls available to a process. |
| LSMs | SELinux, AppArmor, and similar systems add mandatory access-control policy. |
| Overlay filesystem | Combines read-only image layers with a writable upper layer. |
A runtime starts a container roughly like this:
- Prepare the root filesystem from image layers.
- Create namespaces for the process.
- Configure mounts, bind mounts, and the working directory.
- Configure cgroups for CPU, memory, pids, and I/O limits.
- Drop capabilities, set seccomp/AppArmor/SELinux policy, and set user mappings.
- Configure networking, often with veth pairs, bridges, NAT, or CNI.
execthe configured process.
The result feels like a small machine because pathnames, process IDs, network interfaces, users, and resource limits look local. It is still one or more host processes enforced by the kernel.
From Image to Running Process
A container image is stored data. It does not run. The kernel runs a normal process whose root filesystem, namespace membership, credentials, resource controls, and mounts were prepared by container tooling.
The path from registry image to process usually looks like this:
- A client asks for a tag such as
app:1.2.3. - The registry returns an OCI image index or manifest. Multi-architecture images use an index to point at per-platform manifests such as
linux/amd64orlinux/arm64. - The runtime downloads content-addressed blobs: image config and compressed layer blobs.
- Layers are verified by digest, decompressed, and unpacked into runtime storage.
- The runtime creates a mounted root filesystem, often through
overlayfs. - The runtime writes or derives an OCI runtime bundle: a root filesystem plus
config.json. - An OCI runtime such as
runcorcruncreates namespaces, joins cgroups, sets mounts, applies security policy, andexecs the configured command.
flowchart TB
Ref[Image reference: app:1.2.3] --> Resolve[Resolve tag to digest]
Resolve --> Manifest[Fetch OCI index or manifest]
Manifest --> Blobs[Download config and layer blobs]
Blobs --> Verify[Verify content digests]
Verify --> Unpack[Unpack layers into snapshotter storage]
Unpack --> Rootfs[Create overlayfs root filesystem]
Rootfs --> Bundle[Build OCI runtime bundle and config.json]
Bundle --> Runtime[runc / crun]
Runtime --> Kernel[clone / unshare / setns / cgroups / mounts / seccomp]
Kernel --> Exec[exec container process]
Useful inspection commands:
docker image inspect nginx:latest
docker image history nginx:latest
docker container inspect <container> --format '{{.State.Pid}}'
findmnt -T /var/lib/docker
findmnt -o TARGET,SOURCE,FSTYPE,OPTIONS | grep overlay
readlink /proc/<pid>/root
cat /proc/<pid>/mountinfo
The important mental model: container images describe a filesystem and process configuration. The kernel only sees mounted filesystems, tasks, credentials, namespaces, cgroups, sockets, and security labels.
What the Kernel Enforces
The Linux kernel does not know that an OCI image tag exists. Runtime software translates image metadata into kernel operations.
| Runtime Intent | Kernel Mechanism |
|---|---|
| Isolate process IDs | clone, unshare, setns, and PID namespaces. |
| Give a private filesystem view | Mount namespaces, bind mounts, overlay mounts, pivot_root, and sometimes chroot. |
| Limit memory, CPU, IO, and process count | cgroup controller files under /sys/fs/cgroup. |
| Reduce root privilege | Linux capabilities, user namespaces, UID/GID maps, and no_new_privs. |
| Restrict syscalls | seccomp filters evaluated on syscall entry. |
| Apply mandatory policy | SELinux, AppArmor, or another Linux Security Module. |
| Isolate network view | Network namespaces, veth devices, routes, bridge ports, and netfilter state. |
Security boundary layering:
flowchart TB
Process[Container process]
Process --> Creds[UID/GID, user namespace, no_new_privs]
Creds --> Caps[Capabilities: narrow privileged operations]
Caps --> Seccomp[seccomp: syscall allow/deny]
Seccomp --> LSM[AppArmor / SELinux labels and policy]
LSM --> Mounts[Mount namespace and readonly/bind mounts]
Mounts --> Cgroup[cgroup CPU, memory, IO, pids, devices]
Cgroup --> Net[Network namespace, veth, bridge, netfilter]
Net --> Kernel[Shared Linux kernel]
This layering is defense in depth, not a single wall. A process can be UID 0 inside a container but still lack CAP_SYS_ADMIN, be blocked by seccomp from a syscall, be denied by AppArmor or SELinux, see only a readonly mount tree, and be capped by cgroup controllers.
That translation is why container debugging often leaves the container CLI quickly and moves into /proc, /sys/fs/cgroup, findmnt, lsns, ip, bridge, nft, and ss. Those tools show the kernel state that actually enforces the container boundary.
OCI Standards
The Open Container Initiative defines interoperable container standards. The three practical specs are:
| OCI Spec | Purpose |
|---|---|
| Image Specification | Defines container image layout, manifests, configs, layers, descriptors, and digests. |
| Runtime Specification | Defines how to run an unpacked filesystem bundle with a config.json. |
| Distribution Specification | Defines registry API behavior for pushing and pulling content-addressed images. |
Common runtime chain:
docker or nerdctl CLI
containerd or CRI-O
runc, crun, or another OCI runtime
Linux kernel namespaces, cgroups, mounts, capabilities, seccomp
Kubernetes adds the Container Runtime Interface (CRI) between kubelet and the runtime. kubelet asks a CRI implementation such as containerd or CRI-O to create Pod sandboxes and containers. The low-level OCI runtime still creates the isolated process.
Images, Layers, and Registries
An image is not one tarball in normal operation. It is a set of content-addressed objects:
- manifest or index,
- image config,
- filesystem layers,
- descriptors with media types, sizes, and digests.
Layers are usually read-only and shared across images when their digests match. A running container gets a writable layer on top. Deleting a container can remove that writable layer, which is why persistent data belongs in volumes, bind mounts, databases, object storage, or another external state path.
Registries store and serve image content. Tags are mutable names. Digests are immutable content identifiers. For production deployment, pinning or recording digests gives stronger evidence of exactly what ran.
Layer Mechanics and overlayfs
Layering is what makes images reusable, cacheable, and cheap to start.
| Term | Meaning |
|---|---|
| Manifest | Points to the image config and layer descriptors for one platform. |
| Index | Points to multiple manifests, usually for different CPU architectures or OS platforms. |
| Config | Stores command, environment, user, working directory, exposed ports, labels, and rootfs diff IDs. |
| Layer blob | A compressed filesystem diff stored by digest in a registry or local content store. |
| diffID | Digest of an uncompressed layer diff. |
| Chain ID | Identifier derived from the ordered sequence of unpacked layer diffs. |
| Writable layer | Per-container upper layer that records changes made after start. |
On Linux Docker installations, the storage driver is often overlay2, which uses kernel overlayfs.
lowerdir = read-only image layers
upperdir = writable per-container layer
workdir = overlayfs working directory
merged = mounted view presented to the container
When a process reads a file, overlayfs searches from the top layer down through lower layers. When a process writes to a file that exists in a lower layer, overlayfs performs copy-up: it copies the file into the upperdir and modifies that copy. When a process deletes a file from a lower layer, overlayfs records a whiteout in the upper layer so the file disappears from the merged view.
Practical consequences:
- Rewriting a large file from a lower layer can copy the whole file into the writable layer.
- Deleting a file from an earlier image layer does not remove the bytes from that earlier layer; it hides the file in later layers.
- Volumes and bind mounts bypass the image writable layer at their mount paths.
- Image layer count, build cache order, and package manager cleanup affect size and rebuild speed.
- Overlay semantics can matter for databases and write-heavy workloads; persistent database data should live on a real volume or host filesystem chosen for that workload.
Example checks:
docker inspect <container> --format '{{json .GraphDriver.Data}}'
findmnt -t overlay
du -sh /var/lib/docker/overlay2/* 2>/dev/null
cgroups and Resource Control
cgroups account for and limit resource use. They do not make a process believe it has a private CPU or private memory bus. Namespaces change what a process can see; cgroups control what it can consume.
Most modern distributions use cgroup v2, a unified hierarchy under /sys/fs/cgroup. systemd, container runtimes, and Kubernetes all place processes into cgroup paths and write controller files.
flowchart TB
Root[/sys/fs/cgroup]
Root --> System[system.slice]
Root --> User[user.slice]
Root --> Kube[kubepods.slice]
Kube --> QoS[Guaranteed / Burstable / BestEffort]
QoS --> Pod[Pod cgroup]
Pod --> Container[container process cgroup]
Container --> Files[cpu.max, cpu.stat, memory.max, memory.high, memory.events]
| cgroup v2 File | Meaning |
|---|---|
cgroup.controllers |
Controllers available below this point in the hierarchy. |
cgroup.procs |
Processes in this cgroup. |
cpu.max |
CPU quota and period. max 100000 means no quota with a 100 ms period. |
cpu.stat |
CPU usage and throttling counters. |
memory.max |
Hard memory limit. |
memory.current |
Current charged memory. |
memory.events |
OOM, high, max, and pressure event counters. |
pids.max |
Maximum number of processes or threads. |
io.max |
Per-device I/O throttling limits. |
memory.high |
Throttle/reclaim threshold before the hard limit; useful for pressure management. |
memory.oom.group |
Controls whether the cgroup should be killed as a group on OOM. |
Useful checks:
cat /proc/<pid>/cgroup
systemd-cgls
systemd-cgtop
cat /sys/fs/cgroup/<path>/cpu.max
cat /sys/fs/cgroup/<path>/cpu.stat
cat /sys/fs/cgroup/<path>/memory.current
cat /sys/fs/cgroup/<path>/memory.high
cat /sys/fs/cgroup/<path>/memory.max
cat /sys/fs/cgroup/<path>/memory.events
cat /sys/fs/cgroup/<path>/pids.max
Common cgroup surprises:
- CPU quota throttling can make a process slow even when host CPU looks idle.
- Memory limits include more than application heap: page cache, tmpfs, some kernel memory, and allocator fragmentation can matter.
pids.maxcounts threads too, so highly threaded applications can hit PID limits.- cgroup OOM kills are local to the cgroup and may happen while the host still has free memory.
- systemd services and containers are both cgroup-managed, so host unit limits can stack with container limits.
Kubernetes Resource Mapping
Kubernetes requests are scheduling signals. Limits become runtime enforcement through cgroups. That distinction is the source of many “node has free CPU/RAM but my Pod is slow or killed” incidents.
| Kubernetes Setting | Linux Behavior |
|---|---|
| CPU request | Scheduler placement and relative CPU weight; not a hard guarantee under all contention. |
| CPU limit | CFS quota in cpu.max, visible as throttling in cpu.stat. |
| Memory request | Scheduler placement and QoS classification input. |
| Memory limit | Hard cgroup memory limit, usually memory.max; hitting it can trigger cgroup OOM. |
| Ephemeral storage limit | Kubelet accounting and eviction behavior, not the same as a cgroup memory limit. |
| Pod QoS | Guaranteed, Burstable, or BestEffort, affecting eviction priority and cgroup placement. |
Practical checks from a container:
cat /proc/self/cgroup
cat /sys/fs/cgroup/cpu.max
cat /sys/fs/cgroup/cpu.stat
cat /sys/fs/cgroup/memory.current
cat /sys/fs/cgroup/memory.max
cat /sys/fs/cgroup/memory.high
cat /sys/fs/cgroup/memory.events
cat /proc/pressure/cpu
cat /proc/pressure/memory
Interpreting the evidence:
nr_throttledandthrottled_usecincpu.statrising means CPU limit throttling, even if the node has idle CPU at other times.memory.eventshighrising means the cgroup crossedmemory.highand reclaim/throttling pressure occurred.memory.eventsoomoroom_killrising means allocation failed at the cgroup boundary.- PSI shows time lost to CPU, memory, or IO pressure and is often more useful than a single utilization percentage.
For Kubernetes incidents, compare Pod metrics with cgroup files, kubelet eviction events, and node pressure. A Java heap sized only from the memory limit can still OOM because native memory, thread stacks, direct buffers, page cache, tmpfs, and sidecars also count.
cgroup v2 Labs
CPU throttling lab:
cat /sys/fs/cgroup/cpu.max
cat /sys/fs/cgroup/cpu.stat
yes > /dev/null &
sleep 10
cat /sys/fs/cgroup/cpu.stat
If nr_throttled and throttled_usec rise, the cgroup hit its CPU quota. The host can still show idle CPU if this cgroup is limited while other CPUs or time slices are not available to it.
Memory pressure lab:
cat /sys/fs/cgroup/memory.current
cat /sys/fs/cgroup/memory.high
cat /sys/fs/cgroup/memory.max
cat /sys/fs/cgroup/memory.events
cat /proc/pressure/memory
Watch memory.events before and after a controlled allocation test. high indicates reclaim/throttling pressure. max, oom, or oom_kill indicate the hard boundary was reached.
Kubernetes mapping check:
kubectl describe pod <pod> | grep -A5 -E 'Requests|Limits'
kubectl exec <pod> -- cat /sys/fs/cgroup/cpu.max
kubectl exec <pod> -- cat /sys/fs/cgroup/cpu.stat
kubectl exec <pod> -- cat /sys/fs/cgroup/memory.max
kubectl exec <pod> -- cat /sys/fs/cgroup/memory.events
This proves the runtime limit rather than relying only on the manifest.
Namespaces and PID 1
Namespaces provide isolated views of kernel resources. A process can be in the host mount namespace but a container network namespace, or in a new PID namespace but the host cgroup hierarchy. These features compose independently.
readlink /proc/<pid>/ns/pid
readlink /proc/<pid>/ns/mnt
readlink /proc/<pid>/ns/net
readlink /proc/1/ns/net
lsns -p <pid>
nsenter -t <pid> -m -p -n -- ps aux
nsenter -t <pid> -n -- ip addr
Inside a PID namespace, the first process becomes PID 1 for that namespace. PID 1 has special responsibilities:
- reap zombie child processes,
- handle
SIGTERMand other shutdown signals correctly, - forward signals to child processes when it is a wrapper script or supervisor.
If an application was never designed to be PID 1, use a small init such as tini, Docker --init, or a container entrypoint that forwards signals and waits correctly.
Container Networking on Linux
The default Linux container network model uses the same primitives an administrator can create by hand: network namespaces, veth pairs, Linux bridges, routes, netfilter NAT, and conntrack.
| Primitive | Role |
|---|---|
| Network namespace | Gives the container its own interfaces, addresses, routes, sockets, and firewall view. |
| veth pair | A virtual Ethernet cable: one end in the container netns, one end on the host. |
| Linux bridge | Software switch that connects veth peers and often has the host gateway IP. |
| IPAM | Allocates container IPs from a bridge subnet. |
| netfilter/nftables/iptables | Applies filtering, DNAT for published ports, and SNAT/MASQUERADE for egress. |
| conntrack | Remembers translated flows so return traffic is mapped back correctly. |
For Docker’s default bridge path, the container sees something like eth0 and a default route through the bridge gateway. The host sees the peer veth attached to a bridge such as docker0.
container process
container eth0
veth pair
host bridge docker0 or br-...
host routing table
netfilter SNAT/MASQUERADE
physical or virtual NIC
A Linux bridge is a Layer 2 software switch. NAT is separate. The bridge learns MAC addresses and forwards Ethernet frames between ports. netfilter handles address translation and firewall decisions around that path.
Bridge Networks and Packet Paths
Default bridge network:
- Docker usually creates
docker0. - Containers get addresses from the bridge subnet.
- The host bridge address is normally the default gateway for containers.
- Container-to-container communication on the same bridge can stay local to the bridge.
- Egress to outside networks commonly uses MASQUERADE so external peers see the host address.
User-defined bridge networks:
- get their own bridge device, subnet, and rules,
- provide better service-name DNS behavior than the legacy default bridge,
- isolate groups of containers from other bridge networks unless routing or rules allow traffic.
Published port path:
client -> host_ip:published_port
netfilter PREROUTING or OUTPUT DNAT
container_ip:container_port
bridge
veth
container process
Egress path:
container_ip:ephemeral_port -> remote_ip:remote_port
veth -> bridge -> host route
POSTROUTING SNAT/MASQUERADE
remote sees host_ip:translated_port
return traffic matched by conntrack
Important bridge and NAT details:
docker0is not present for every runtime or every Docker network; user-defined bridges are often namedbr-<id>.- Hairpin NAT may be needed when a container or host reaches a service through the host’s published port and the traffic loops back to the same bridge.
- MTU mismatches are common when bridges, VXLAN, VPNs, or cloud networks add encapsulation overhead.
- Docker Desktop networking has a Linux VM boundary; the bridge and iptables rules live inside that VM, not directly on macOS.
- Containers inherit generated DNS configuration. User-defined Docker bridges commonly provide an embedded DNS resolver for container names.
- Host access is platform-specific.
host.docker.internalexists on Docker Desktop and can be configured on Linux, but it is not a universal kernel feature.
Networking inspection examples:
docker network ls
docker network inspect bridge
ip link show type bridge
bridge link
ip addr show docker0
ip route
nft list ruleset
iptables -t nat -S
conntrack -L 2>/dev/null | head
nsenter -t <pid> -n -- ip addr
nsenter -t <pid> -n -- ip route
nsenter -t <pid> -n -- ss -tulpen
tcpdump -ni any host <container-ip>
Other Container Network Modes
| Mode | What Changes | Common Use |
|---|---|---|
| Bridge | Container has its own netns connected through veth and a Linux bridge. | Default single-host application networking. |
| Host | Container shares the host network namespace. | Low overhead or software that must bind host interfaces directly. |
| None | Container gets no external interface beyond loopback. | Batch jobs, manual networking, or high isolation. |
| macvlan | Container gets a MAC address on the physical L2 network. | Legacy apps that must appear as first-class LAN hosts. |
| ipvlan | Similar goal to macvlan with different L2/L3 behavior and fewer MAC scaling issues. | Dense networks where switch MAC table pressure matters. |
| Overlay | Encapsulates container traffic across hosts, commonly with VXLAN. | Multi-host container platforms and some Kubernetes CNIs. |
Kubernetes uses CNI plugins rather than Docker’s bridge driver as the main abstraction. Depending on the plugin, Pod traffic may use Linux bridges, veth pairs, routing, VXLAN, Geneve, BGP, eBPF, cloud VPC interfaces, or some combination. The primitives are still Linux networking primitives plus plugin-specific control logic.
Linux, macOS, and Windows
| Host | What Actually Runs |
|---|---|
| Linux host with Docker Engine/containerd | Linux containers run directly as isolated Linux processes on the host kernel. |
| macOS with Docker Desktop | Linux containers run inside a Linux VM managed by Docker Desktop. macOS does not provide a Linux kernel for them directly. |
| Windows with Docker Desktop and WSL 2 | Linux containers run inside a WSL 2 Linux VM/backend. |
| Windows Server containers, process isolation | Windows containers share the Windows host kernel with process isolation. |
| Windows containers, Hyper-V isolation | Each container runs inside a small utility VM and gets a stronger kernel boundary. |
This explains common confusion: a Linux container image expects Linux kernel interfaces. On macOS and Windows developer laptops, Linux containers work because a Linux VM is present under the container tooling. Native Windows containers are a separate world with Windows base images and Windows isolation modes.
KVM and Hypervisors
A hypervisor runs virtual machines by presenting virtual hardware to guest operating systems.
| Term | Meaning |
|---|---|
| Type 1 hypervisor | Runs close to hardware, such as Hyper-V, ESXi, or KVM when the Linux kernel acts as the hypervisor. |
| Type 2 hypervisor | Runs as an application on a host OS, though modern boundaries can be blurry. |
| KVM | Linux kernel virtualization support that turns Linux into a hypervisor for hardware-assisted VMs. |
| QEMU | User-space emulator and device model commonly paired with KVM acceleration. |
| Firecracker/Kata-style isolation | Uses lightweight VMs to give container-like workflows stronger VM boundaries. |
VMs and containers are not enemies. They are often stacked:
- Docker Desktop runs Linux containers inside a VM on macOS and Windows.
- Kubernetes nodes may be VMs running containers.
- Kata Containers and similar systems run each container or Pod in a lightweight VM.
- Windows Hyper-V isolation runs containers inside optimized VMs.
Why Containers Are Not Full Security Boundaries By Default
Containers can be strong isolation, but defaults and host integration matter. Risk increases with:
- privileged containers,
- host PID, network, IPC, or mount namespaces,
- broad bind mounts such as
/,/var/run/docker.sock, or host credentials, - added capabilities such as
SYS_ADMIN, - disabled seccomp/AppArmor/SELinux profiles,
- running as root without user namespace isolation,
- writable host paths shared with untrusted workloads.
Security improves with least privilege, rootless containers, read-only filesystems, minimal capabilities, seccomp, LSM policy, signed images, SBOMs, digest pinning, and separate nodes or VMs for untrusted tenants.
Operational Differences
| Symptom | Container Lens | VM Lens |
|---|---|---|
| Out of memory | cgroup memory limit, tmpfs, page cache, application heap. | Guest memory size, ballooning, host pressure, swap. |
| High CPU latency | cgroup quota throttling, host CPU contention, noisy neighbor container. | vCPU scheduling, steal time, host oversubscription. |
| Disk path missing | mount namespace, bind mount, volume driver, overlay layer. | Guest disk attachment, virtio/SCSI/NVMe device, filesystem mount. |
| Network unreachable | network namespace, veth, bridge, NAT, CNI, policy. | virtual NIC, hypervisor switch, guest firewall, host routing. |
| PID 1 behavior | App may need init/reaping and signal handling. | OS init system handles normal service lifecycle. |
Debugging Flow
- Identify whether the workload is a process on Linux, a Linux container in a VM, a native Windows container, or a full VM.
- Find the real host process with
docker inspect,crictl inspect, or runtime tools. - Inspect namespaces with
lsns,/proc/<pid>/ns, and/proc/<pid>/mountinfo. - Inspect cgroups with
/proc/<pid>/cgroupand cgroup controller files. - Check capabilities, seccomp, AppArmor/SELinux labels, and user mappings.
- For macOS and Windows Linux containers, remember the Linux VM boundary when debugging files, ports, memory, and disk usage.
- For VMs, inspect hypervisor, guest kernel, virtual devices, and host resource pressure.
Study Cards
What is a Linux container?
One or more isolated host processes using namespaces, cgroups, mounts, capabilities, seccomp, and related kernel features.
What is the biggest VM versus container difference?
A VM runs its own guest kernel; a container usually shares the host or VM-provided kernel.
What does OCI standardize?
Image format, runtime bundle behavior, and registry distribution behavior for interoperable containers.
Why do Linux containers run on macOS?
Tooling such as Docker Desktop runs them inside a Linux virtual machine because macOS does not provide a Linux kernel.
What do cgroups provide for containers?
Hierarchical accounting and limits for resources such as CPU, memory, I/O, and PIDs.
What do namespaces provide for containers?
Isolated views of resources such as mounts, PIDs, network interfaces, IPC, hostname, and users.
What does overlayfs provide for containers?
A merged filesystem view made from read-only lower image layers plus a writable upper layer.
What is copy-up in overlayfs?
When a lower-layer file is changed, overlayfs copies it into the writable upper layer and modifies that copy.
What does a Linux bridge do for containers?
It acts like a software switch connecting host-side veth peers for containers on the same bridge network.
What usually implements Docker published ports on Linux?
Netfilter DNAT rules translate host ports to container addresses, with conntrack tracking the flow.
What is KVM?
Linux kernel virtualization support that lets Linux act as a hypervisor for hardware-assisted virtual machines.
References
- Open Container Initiative
- OCI Runtime Specification
- OCI Image Specification
- OCI Distribution Specification
- Linux namespaces manual
- Linux cgroups manual
- Linux cgroup v2 documentation
- Linux overlayfs documentation
- Linux bridge documentation
- ip-netns(8)
- Docker bridge network driver
- Linux KVM documentation
- Docker Desktop networking
- Docker Desktop on Mac virtual machine manager
- Docker Desktop WSL 2 backend
- Windows container isolation modes