Foundational Study Review

This page is the gap-review map for the study guide. It does not replace the topic pages; it checks whether the guide teaches the mental models a person needs before they debug production systems.

For concrete manifests, configs, SQL, scripts, and command patterns, use the practical examples embedded in each topic. For example, packet-capture labs belong with Packet Path, DNS resolver checks belong with DNS Resolution and Caching, Kubernetes Service checks belong with Services and EndpointSlices, and host resource examples belong with Containerization, OCI, and VMs.

Review Criteria

A topic is not solid just because it lists commands. Each topic should answer:

Criterion What Good Coverage Includes
Mental model The objects, actors, boundaries, and state transitions.
Data path How real work moves through the system.
Control path Which controller, daemon, protocol, or operator changes state.
Failure modes Common ways the system fails and how symptoms appear.
Observability Commands, logs, metrics, events, and status fields that prove state.
Safety What not to do during incidents, especially with data, identity, and distributed state.
Cross-links Neighboring concepts that explain why a local symptom may have a remote cause.

Topic Coverage Matrix

Area Must-Know Concepts Primary Pages
Linux Processes, syscalls, files, mounts, memory, cgroups, namespaces, systemd, storage, boot, users, scheduled jobs, backups. Linux, Processes and Threads, Containerization, OCI, and VMs, Backup and File Transfer
Networking Encapsulation, L2/L3/L4/L7 boundaries, route lookup, neighbor lookup, NAT, conntrack, MTU, TLS, proxies, load balancers. Networking, Packet Path, NAT Gateways and NAT, Certificates and HTTPS
DNS Stub resolvers, recursive resolvers, authoritative servers, delegation, glue, TTLs, negative caching, DNSSEC, split-horizon views. DNS, Resolution and Caching, Authoritative Zones, DNSSEC and Privacy
Kubernetes API machinery, reconciliation, controllers, scheduling, kubelet, CRI/CNI/CSI, probes, Services, DNS, storage, upgrades, operators. Kubernetes, Core Concepts, Networking, Storage and Upgrades
Identity Authentication vs authorization, IdPs, sessions, cookies, OAuth, OIDC, SAML, JWT validation, JWKS rotation, token lifetime. Identity and Access, IdP, SAML, JWT, OAuth, and OIDC
Databases Data modeling, indexes, transactions, isolation, locking, WAL, backups, replication, pooling, sharding, snapshots, restore drills. Databases, PostgreSQL, PostgreSQL Operations and HA, OpenSearch
Ceph RADOS, OSDs, MON quorum, MGR, pools, PGs, CRUSH, replication, erasure coding, scrub, recovery, fullness, Rook integration. Ceph, Rook-Ceph
Istio Control plane, data plane, Envoy, xDS, sidecar mode, ambient mode, mTLS, identity, routing, policy, telemetry. Istio, Istio Service Mesh
Troubleshooting Incident framing, evidence preservation, layers, dependency isolation, retries, timeouts, rollback, recovery, post-incident learning. Troubleshooting and Error Handling

Big 101 Gaps To Keep Closed

These are the gaps that most often prevent a solid understanding:

  • Confusing data plane and control plane. A controller may be healthy while packets fail, or packets may keep flowing while the API is down.
  • Treating DNS as one lookup. Real DNS includes local host files, NSS, stub resolvers, recursive caches, authoritative delegation, negative caching, and sometimes split-horizon policy.
  • Treating Kubernetes as a command runner. Kubernetes writes desired state, then independent controllers reconcile it over time.
  • Treating TLS certificates as only files. Trust depends on chain building, hostname validation, EKU/key usage, SNI, time, revocation policy, and client CA stores.
  • Treating JWT decoding as validation. Claims are untrusted until issuer, audience, signature, algorithm, expiry, and policy are checked.
  • Treating RAID, replication, snapshots, and backups as interchangeable. Each protects against different failures and has different restore behavior.
  • Treating retries as harmless. Retries need timeouts, backoff, jitter, idempotency, and a total budget.
  • Treating distributed storage as a local disk. Ceph, PostgreSQL HA, OpenSearch, and Kubernetes storage all have quorum, placement, and recovery mechanics.

Review Pass By Area

Use this as a periodic checklist when adding new pages:

  1. Linux: Can the reader explain what the kernel sees: tasks, FDs, pages, mounts, namespaces, cgroups, sockets, and devices?
  2. Networking: Can the reader trace one request through name resolution, route lookup, neighbor lookup, NAT, TCP or UDP, TLS, proxying, and response path?
  3. DNS: Can the reader distinguish authoritative truth from recursive cache and local resolver behavior?
  4. Kubernetes: Can the reader name the controller responsible for the next state transition and where that controller reports failure?
  5. Identity: Can the reader separate browser session, ID token, access token, refresh token, and local authorization decision?
  6. Databases: Can the reader connect query latency and correctness to transactions, locks, indexes, WAL, storage, replication, and backups?
  7. Ceph: Can the reader explain how CRUSH maps objects to OSDs and why fullness or degraded PGs change client behavior?
  8. Istio: Can the reader separate Kubernetes Service selection from mesh routing, mTLS, authorization policy, and Envoy/xDS config?
  9. Troubleshooting: Can the reader preserve evidence, narrow scope, test one layer at a time, and choose a safe rollback or recovery action?

Study Cards

Question

What makes a study topic complete enough for operations?

Answer

It explains the mental model, data path, control path, failure modes, observability, safety boundaries, and neighboring concepts.

Question

Why separate data plane from control plane?

Answer

The system that forwards traffic or IO can fail independently from the system that configures it.

Question

What is the danger of relying only on commands?

Answer

Commands show evidence, but without a model of ownership and state transitions it is easy to misread symptoms.

Question

Why are backups, snapshots, RAID, and replication different?

Answer

They protect against different failures and only backups plus tested restores prove recoverability from deletion or corruption.

References