Index
Tags
A compact map of topics by tag. Use this as the site grows past a simple folder tree.
adapters
administration
Time, Hostname, and Identity
Time synchronization, timezone, hostname, machine-id, DNS identity, certificates, logs, Kerberos sensitivity, and Ubuntu host identity operations.
Users, Permissions, and sudo
Linux identities, groups, file modes, ownership, sudo, service users, locked accounts, ACLs, and permission troubleshooting on Ubuntu.
advanced
Advanced Agents
Advanced agent engineering with ReAct, planning, reflection, tool routing, multi-agent systems, durable execution, memory, sandboxing, approval workflows, trajectory evaluation, and failure recovery.
Advanced Fine-Tuning
Advanced fine-tuning with full tuning, LoRA, QLoRA, adapter composition, multi-adapter serving, dataset mixing, long-context tuning, packing, catastrophic forgetting, and safety-preserving releases.
Advanced Inference and vLLM
Advanced LLM inference with vLLM, PagedAttention, continuous batching, disaggregated prefill, chunked prefill, speculative decoding, prefix caching, KV-cache math, tensor and pipeline parallelism, quantized serving, LoRA serving, and autoscaling.
Advanced ML Architectures
Advanced model architectures including Mixture of Experts, retrieval-augmented models, state space models, diffusion transformers, sparse attention, long-context architectures, and encoder-only, decoder-only, and encoder-decoder tradeoffs.
Advanced ML Observability
Advanced ML observability with drift detection, data quality monitoring, online evals, human feedback loops, trace schemas, prompt/retrieval/model/tool observability, safety monitoring, cost monitoring, and incident review templates.
Advanced RAG
Advanced retrieval-augmented generation with hybrid retrieval, query planning, multi-hop retrieval, reranker architectures, GraphRAG, context compression, citation verification, agentic RAG, retrieval evals, and RAG security.
ML Evaluation Mastery
Advanced ML evaluation with unit evals, golden sets, model-graded evals, human evals, pairwise preference evals, safety evals, regression gates, slice metrics, statistical confidence, and contamination controls.
agents
Advanced Agents
Advanced agent engineering with ReAct, planning, reflection, tool routing, multi-agent systems, durable execution, memory, sandboxing, approval workflows, trajectory evaluation, and failure recovery.
ML Agents and Tool Use
Agentic ML systems with planning loops, tools, state, memory, idempotency, guardrails, human approval, and evaluation.
ai
alignment
LLM Training Lifecycle
LLM lifecycle from pretraining and continued pretraining to SFT, RLHF, DPO, RLAIF, preference data, synthetic data, data filtering, and stage-specific evaluation.
ML Alignment and Evaluation
Alignment, evaluation, RLHF, preference tuning, DPO, safety policy, red teaming, regression gates, calibration, and monitoring.
amd
architecture
architectures
attention
audio
authentication
automation
ML Agents and Tool Use
Agentic ML systems with planning loops, tools, state, memory, idempotency, guardrails, human approval, and evaluation.
Network Boot and Automated Provisioning
PXE, UEFI HTTP Boot, iPXE, DHCP, TFTP, automated Linux installs, Ubuntu autoinstall, cloud-init NoCloud, Kickstart, bare-metal provisioning, and troubleshooting.
Scheduled Automation
Cronjobs, anacron, systemd timers, user timers, environment pitfalls, idempotent jobs, logs, locking, and automation troubleshooting.
backup
backups
benchmarking
bgp
block-devices
boot
Linux Boot and Userspace
Firmware, bootloader, kernel command line, initramfs, root filesystem handoff, PID 1, services, and boot troubleshooting.
Linux Package and Boot Recovery
Debian and Ubuntu recovery runbooks for broken packages, bad kernels, initramfs failures, GRUB rescue, emergency mode, and rollback.
Network Boot and Automated Provisioning
PXE, UEFI HTTP Boot, iPXE, DHCP, TFTP, automated Linux installs, Ubuntu autoinstall, cloud-init NoCloud, Kickstart, bare-metal provisioning, and troubleshooting.
caching
capacity
ceph
Ceph
Ceph distributed storage architecture, RADOS, OSDs, monitors, managers, pools, placement groups, CRUSH, CephFS, RBD, RGW, and operations.
Ceph Block, File, and Object Interfaces
RBD, CephFS, and RGW mental models, command checks, snapshots, mirroring, MDS behavior, bucket gateways, and interface selection.
Ceph Operations and Recovery
Ceph health triage, OSD failure response, recovery and backfill, scrub repair, maintenance flags, full ratios, and safe operational runbooks.
Ceph Performance and Capacity
Ceph performance model, capacity planning, BlueStore, OSD latency, network design, client behavior, benchmarks, and saturation troubleshooting.
Ceph RADOS, CRUSH, and Placement
RADOS objects, pools, placement groups, CRUSH rules, acting sets, PG autoscaling, and placement troubleshooting.
Ceph Storage Examples
Practical Ceph examples for cluster health, replicated pool creation, and RBD image operations.
Rook-Ceph
Rook-Ceph architecture, CephCluster CRD, Kubernetes storage classes, CSI, toolbox usage, upgrades, and troubleshooting.
cephfs
certificates
cgroups
cicd
classical-ml
cloud
Cloud Networking
Cloud networking fundamentals for VPCs, subnets, route tables, security groups, private endpoints, peering, transit gateways, NAT, load balancers, and overlapping CIDRs.
NAT Gateways and Network Address Translation
Deep operational guide to NAT, NAT gateways, SNAT, DNAT, PAT, masquerade, conntrack, ephemeral ports, cloud egress, hairpin NAT, and NAT troubleshooting.
cloudnativepg
CloudNativePG
How CloudNativePG runs PostgreSQL in Kubernetes, including clusters, services, failover, operator scaling, operator upgrades, backups, and operational cautions.
PostgreSQL Zero-Downtime Upgrades on Kubernetes
Zero-downtime and low-downtime PostgreSQL upgrade strategies on Kubernetes, including CloudNativePG rolling updates, major upgrades, logical replication cutovers, PgBouncer draining, and ordinary PostgreSQL runbooks.
cni
containers
Containerization, OCI, and VMs
Container fundamentals, OCI image/runtime/distribution standards, Linux namespaces, cgroups, overlay layers, bridge networking, macOS and Windows container behavior, KVM, hypervisors, and VM/container tradeoffs.
Linux Mount Namespaces and Propagation
Mount namespaces, bind mounts, shared subtree propagation, chroot, pivot_root, overlay mounts, tmpfs, container mount debugging, and /proc mount views.
Linux Security Controls
Linux security primitives for capabilities, seccomp, AppArmor, SELinux, PAM, auditd, sudoers, file capabilities, setuid, and container boundaries.
Network Namespaces and Virtual Networking
Linux network namespaces, veth pairs, bridges, macvlan, ipvlan, tun/tap, VXLAN, overlays, and CNI-style packet paths.
control-loops
coredns
cron
crush
csi
data
databases
CloudNativePG
How CloudNativePG runs PostgreSQL in Kubernetes, including clusters, services, failover, operator scaling, operator upgrades, backups, and operational cautions.
Database and Search Examples
Practical PostgreSQL, PgBouncer, and OpenSearch examples for queries, pooling, and cluster inspection.
Databases
Database fundamentals, modeling, indexes, transactions, replication, and operations.
OpenSearch Operations, Replication, Sharding, and HA
OpenSearch operations guide covering shards, replicas, cross-cluster replication, allocation awareness, snapshots, cluster-manager failover, and HA runbooks.
PgBouncer
How PgBouncer works, pooling modes, sizing, HA placement, failover behavior, prepared statements, session state, observability, and operational runbooks.
PostgreSQL
PostgreSQL architecture, MVCC, WAL, query planning, indexes, vacuum, backups, replication, and operations.
PostgreSQL Operations, HA, Replication, and Recovery
Operational PostgreSQL guide covering HA managers, failover, physical and logical replication, sharding, backups, restore drills, WAL, high CPU, high RAM, and incident checks.
Storage Drives, RAID, and Database Performance
SSD, HDD, NVMe, RAID 0/1/5/6/10, RAID performance, disk-failure recovery, LVM RAID, and PostgreSQL and Elasticsearch storage tradeoffs.
debian
debugging
deep-learning
deployment
device-mapper
devices
dhcp
distributed-systems
dns
Authoritative DNS and Zones
Zones, delegations, NS records, SOA serials, glue, apex constraints, wildcard behavior, and authoritative debugging.
DNS
Name resolution flow, DNS records, delegation, caching, negative answers, and operational debugging.
DNS Examples
Practical examples for DNS zones and resolver debugging.
DNS Records, Responses, and Transport
DNS record types, response codes, EDNS, UDP and TCP transport, truncation, reverse DNS, mail records, and operational debugging.
DNS Resolution and Caching
How resolver lookup paths, caches, TTLs, negative answers, CNAME chains, and client search behavior shape DNS outcomes.
DNSSEC and DNS Privacy
DNSSEC validation, chain of trust, DS/DNSKEY, signing failures, DNS over TLS, DNS over HTTPS, and what privacy does not solve.
Domain Controllers and Directory DNS
How Active Directory domain controllers depend on DNS SRV records, Kerberos, LDAP, Global Catalog, replication, and time.
IPv6 Operations
Operational IPv6 fundamentals: SLAAC, DHCPv6, Router Advertisements, NDP, link-local addresses, privacy addresses, dual-stack, NAT64, DNS AAAA, and firewalling.
Kubernetes DNS and CoreDNS
Kubernetes DNS names, CoreDNS, kube-dns compatibility, Pod resolv.conf, dnsPolicy, search paths, ndots, forwarding, and DNS troubleshooting.
Kubernetes ExternalDNS
How ExternalDNS watches Kubernetes resources and reconciles public or private DNS records through provider APIs.
Kubernetes Networking
Pod networking, Services, CoreDNS, kube-dns history, EndpointSlices, NetworkPolicy, Ingress, Gateway API, and load balancers.
NATS, DNS, and Kubernetes Networking
How NATS uses Kubernetes DNS, Services, StatefulSets, headless Services, route gossip, advertise addresses, NetworkPolicy, and CoreDNS during clustering and client reconnects.
resolv.conf
Linux resolver configuration, nameservers, search domains, ndots, timeouts, attempts, rotation, and systemd-resolved behavior.
dnssec
ebpf
egress
elasticsearch
embeddings
engines
envoy
Istio Observability and Troubleshooting
Istio metrics, access logs, traces, proxy sync, xDS inspection, response flags, common 503 and policy failures, and practical debugging flows.
Istio Service Mesh
Service mesh architecture, request paths, Envoy sidecars, ambient ztunnel and waypoint proxies, mTLS, policy, telemetry, and troubleshooting.
Istio Traffic Management
Istio VirtualService, DestinationRule, subsets, retries, timeouts, fault injection, traffic splits, ServiceEntry, and routing troubleshooting.
evaluation
ML 101 Foundations
Beginner ML foundations: what ML is, when not to use it, datasets, features, labels, splits, training, inference, metrics, overfitting, and bias/variance.
ML Alignment and Evaluation
Alignment, evaluation, RLHF, preference tuning, DPO, safety policy, red teaming, regression gates, calibration, and monitoring.
ML Evaluation Mastery
Advanced ML evaluation with unit evals, golden sets, model-graded evals, human evals, pairwise preference evals, safety evals, regression gates, slice metrics, statistical confidence, and contamination controls.
ML Evaluation and CI/CD
ML evaluation harnesses, golden sets, regression gates, human review, release scorecards, canaries, and CI/CD for model, prompt, retrieval, and tool changes.
ML Explainability
Explainability for ML systems: interpretable models, feature attribution, saliency, counterfactuals, LIME, SHAP, attention caveats, audits, and model cards.
examples
Ceph Storage Examples
Practical Ceph examples for cluster health, replicated pool creation, and RBD image operations.
DNS Examples
Practical examples for DNS zones and resolver debugging.
Database and Search Examples
Practical PostgreSQL, PgBouncer, and OpenSearch examples for queries, pooling, and cluster inspection.
Identity Examples
Practical OAuth, OIDC, JWKS, and JWT examples for identity troubleshooting and API validation.
Istio Service Mesh Examples
Practical Istio examples for VirtualService traffic splitting, DestinationRule subsets, and AuthorizationPolicy.
Kubernetes Examples
Practical Kubernetes examples for Deployments, Services, PodDisruptionBudgets, NetworkPolicy, and StatefulSet storage.
Linux Operations Examples
Practical Linux examples for systemd services, timers, rsync snapshot backups, and nftables host firewalls.
Networking TLS and mTLS Examples
Practical networking examples for certificate inspection and mTLS requests.
Troubleshooting Examples
Practical troubleshooting examples for repeatable incident capture and evidence preservation.
explainability
feature-store
filesystems
Linux Filesystems and IO
VFS, inodes, dentries, page cache, block devices, mounts, filesystem checks, IO schedulers, and storage troubleshooting.
Linux Mounts and fstab
Linux mounts, findmnt, /etc/fstab, UUIDs, systemd mount units, automounts, bind mounts, mount options, and boot-safe storage configuration.
ext4, XFS, and Filesystem Repair
ext4 and XFS operations, mkfs, fsck, xfs_repair, online growth, inode usage, journals, quotas, discard, and safe repair workflows.
finetuning
Advanced Fine-Tuning
Advanced fine-tuning with full tuning, LoRA, QLoRA, adapter composition, multi-adapter serving, dataset mixing, long-context tuning, packing, catastrophic forgetting, and safety-preserving releases.
Fine-Tuning and LoRA
Fine-tuning, instruction tuning, parameter-efficient adaptation, LoRA adapters, data quality, evaluation, merge behavior, and rollback.
firewall
Firewalls, iptables, and Netfilter
Linux firewall fundamentals: netfilter hooks, nftables, iptables, chains, tables, conntrack, NAT, rule ordering, logging, persistence, and troubleshooting.
Routing, NAT, and Firewalls
Routing tables, policy routing, NAT, connection tracking, firewall state, asymmetric paths, and load-balancer edge cases.
firewalls
foundations
fundamentals
gateway
gateway-api
glossary
governance
gpu
Linux
Linux fundamentals for developers and operators: processes, fork/exec, memory, CPU, kernel params, caches, paging, GPUs, and troubleshooting.
Linux GPU Drivers
Linux GPU driver stacks for AMD and NVIDIA, including kernel modules, firmware, Mesa, ROCm, CUDA, nvidia-smi, Secure Boot, containers, and troubleshooting.
ML Accelerators: GPU and TPU
GPU and TPU fundamentals for ML workloads: parallelism, memory, precision, batching, utilization, data movement, and troubleshooting.
ML Performance Engineering
ML performance engineering with training-loop profiling, GPU utilization, memory bandwidth, activation checkpointing, FlashAttention, fused kernels, distributed training bottlenecks, inference throughput tuning, and cost/performance tradeoffs.
high-availability
OpenSearch Operations, Replication, Sharding, and HA
OpenSearch operations guide covering shards, replicas, cross-cluster replication, allocation awareness, snapshots, cluster-manager failover, and HA runbooks.
PostgreSQL Operations, HA, Replication, and Recovery
Operational PostgreSQL guide covering HA managers, failover, physical and logical replication, sharding, backups, restore drills, WAL, high CPU, high RAM, and incident checks.
PostgreSQL Zero-Downtime Upgrades on Kubernetes
Zero-downtime and low-downtime PostgreSQL upgrade strategies on Kubernetes, including CloudNativePG rolling updates, major upgrades, logical replication cutovers, PgBouncer draining, and ordinary PostgreSQL runbooks.
http
Forward and Reverse Proxies
Forward proxies, reverse proxies, CONNECT tunnels, transparent proxies, TLS interception, proxy headers, egress control, caching, environment variables, and troubleshooting.
HTTP and Proxy Debugging
Practical HTTP and proxy debugging with curl, headers, redirects, keepalive, chunking, HTTP/2, gRPC, WebSockets, proxy variables, and timeout alignment.
TCP, TLS, and HTTP
Transport and application handshake layers: TCP state, retransmits, flow control, congestion control, TLS SNI/certificates, and HTTP behavior.
https
identity
Domain Controllers and Directory DNS
How Active Directory domain controllers depend on DNS SRV records, Kerberos, LDAP, Global Catalog, replication, and time.
IdP, SAML, JWT, OAuth, and OIDC
Identity provider basics, SAML assertions, OAuth 2.0 authorization, OpenID Connect authentication, JWT structure, validation checks, and troubleshooting.
Identity Examples
Practical OAuth, OIDC, JWKS, and JWT examples for identity troubleshooting and API validation.
Identity and Access
Identity-provider, federation, SSO, SAML, OAuth, OpenID Connect, JWT, token validation, and access-control study guide.
incident-response
incidents
inference
Advanced Inference and vLLM
Advanced LLM inference with vLLM, PagedAttention, continuous batching, disaggregated prefill, chunked prefill, speculative decoding, prefix caching, KV-cache math, tensor and pipeline parallelism, quantized serving, LoRA serving, and autoscaling.
Inference Benchmarking
Benchmark methodology for LLM inference with warmup, prompt/output distributions, concurrency sweeps, TTFT, ITL, throughput, quality gates, and benchmark anti-patterns.
Inference Engine Comparison
Comparison guide for vLLM, TensorRT-LLM, Hugging Face TGI, llama.cpp, SGLang, Ollama, OpenAI-compatible APIs, quantization, adapters, and migration checks.
Inference Runbooks
Operational runbooks for LLM inference incidents: wrong output, high TTFT, slow inter-token latency, OOM, low throughput, queue buildup, cache pressure, tokenizer mismatch, and runtime regressions.
LLM Inference Systems
Production LLM inference systems: model files, weights, memory, inference engines, vLLM, TensorRT-LLM, TGI, llama.cpp, SGLang, Ollama, KV cache, PagedAttention, batching, quantization, routing, performance tests, and runbooks.
Long-Context Serving
Long-context LLM serving with RoPE scaling, sliding-window attention, sink tokens, lost-in-the-middle behavior, KV-cache growth, prompt budgeting, and long-context evals.
ML Performance Engineering
ML performance engineering with training-loop profiling, GPU utilization, memory bandwidth, activation checkpointing, FlashAttention, fused kernels, distributed training bottlenecks, inference throughput tuning, and cost/performance tradeoffs.
ML Serving, Inference, and vLLM
Production ML inference with model servers, batching, streaming, KV cache, autoscaling, canaries, inference optimization, and deep vLLM operations.
MoE Inference
Mixture-of-Experts inference with active vs total parameters, expert routing, expert parallelism, all-to-all communication, load balance, capacity, KV cache, and serving tradeoffs.
Model Memory Math
Practical LLM memory math for weights, activations, KV cache, optimizer state, model loading, quantization, model sizes, and serving capacity planning.
Quantized Serving
Quantized LLM serving with weight quantization, activation quantization, KV-cache quantization, GGUF, AWQ, GPTQ, FP8, quality gates, and rollout checks.
Tokenizer and Chat Template Compatibility
Tokenizer, chat template, special token, stop sequence, tool schema, and generation config compatibility for reliable LLM inference.
vLLM Operations
Operational guide for vLLM serving with scheduler concepts, PagedAttention, KV cache, prefix caching, speculative decoding, parallelism, metrics, flags, and runbooks.
infrastructure
ingress
Istio Gateways, Ingress, and Egress
Istio ingress gateways, egress gateways, Kubernetes Gateway API, TLS termination and passthrough, SNI, external services, and gateway troubleshooting.
Kubernetes Ingress, Gateway, and Load Balancers
Ingress controllers, Gateway API, Service LoadBalancers, health checks, TLS, SNI, source IP, externalTrafficPolicy, DNS, and edge troubleshooting.
Kubernetes Networking
Pod networking, Services, CoreDNS, kube-dns history, EndpointSlices, NetworkPolicy, Ingress, Gateway API, and load balancers.
interpretability
io
ipc
ipsec
iptables
ipv6
istio
Istio
Istio service mesh fundamentals: control plane, data plane, sidecar mode, ambient mode, traffic management, security, observability, and operations.
Istio Gateways, Ingress, and Egress
Istio ingress gateways, egress gateways, Kubernetes Gateway API, TLS termination and passthrough, SNI, external services, and gateway troubleshooting.
Istio Observability and Troubleshooting
Istio metrics, access logs, traces, proxy sync, xDS inspection, response flags, common 503 and policy failures, and practical debugging flows.
Istio Security, mTLS, and Policy
Istio workload identity, mTLS, PeerAuthentication, AuthorizationPolicy, RequestAuthentication, JWT validation, trust domains, and policy debugging.
Istio Service Mesh
Service mesh architecture, request paths, Envoy sidecars, ambient ztunnel and waypoint proxies, mTLS, policy, telemetry, and troubleshooting.
Istio Service Mesh Examples
Practical Istio examples for VirtualService traffic splitting, DestinationRule subsets, and AuthorizationPolicy.
Istio Traffic Management
Istio VirtualService, DestinationRule, subsets, retries, timeouts, fault injection, traffic splits, ServiceEntry, and routing troubleshooting.
Istio Zero Downtime Upgrades on Kubernetes
Operational runbook for low-risk Istio upgrades on Kubernetes using canary control planes, revisions, revision tags, workload restarts, gateway canaries, rollback, PDBs, and validation gates.
jwt
kernel
kubernetes
CloudNativePG
How CloudNativePG runs PostgreSQL in Kubernetes, including clusters, services, failover, operator scaling, operator upgrades, backups, and operational cautions.
Core Concepts
Kubernetes object model, controllers, scheduling, node agents, and reconciliation.
Cross-Layer Incident Runbooks
Operational runbooks that trace application symptoms through DNS, TCP, TLS, proxies, Linux hosts, Kubernetes Services, CNI, NAT, and cloud networking.
Glossary
Search-focused short definitions for operational terms across Linux, networking, Kubernetes, DNS, TLS, and performance.
Incident Entry Points
Symptom-first entry points for timeouts, resets, DNS works on the node but not in Pods, large payload hangs, and intermittent 5xx failures.
Istio
Istio service mesh fundamentals: control plane, data plane, sidecar mode, ambient mode, traffic management, security, observability, and operations.
Istio Traffic Management
Istio VirtualService, DestinationRule, subsets, retries, timeouts, fault injection, traffic splits, ServiceEntry, and routing troubleshooting.
Istio Zero Downtime Upgrades on Kubernetes
Operational runbook for low-risk Istio upgrades on Kubernetes using canary control planes, revisions, revision tags, workload restarts, gateway canaries, rollback, PDBs, and validation gates.
Kubernetes
Cluster architecture, reconciliation, networking, storage, and operations for modern Kubernetes.
Kubernetes DNS and CoreDNS
Kubernetes DNS names, CoreDNS, kube-dns compatibility, Pod resolv.conf, dnsPolicy, search paths, ndots, forwarding, and DNS troubleshooting.
Kubernetes Examples
Practical Kubernetes examples for Deployments, Services, PodDisruptionBudgets, NetworkPolicy, and StatefulSet storage.
Kubernetes ExternalDNS
How ExternalDNS watches Kubernetes resources and reconciles public or private DNS records through provider APIs.
Kubernetes Ingress, Gateway, and Load Balancers
Ingress controllers, Gateway API, Service LoadBalancers, health checks, TLS, SNI, source IP, externalTrafficPolicy, DNS, and edge troubleshooting.
Kubernetes NetworkPolicy
NetworkPolicy isolation, ingress and egress rules, podSelector, namespaceSelector, ipBlock, default deny, DNS egress, CNI enforcement, and policy debugging.
Kubernetes Networking
Pod networking, Services, CoreDNS, kube-dns history, EndpointSlices, NetworkPolicy, Ingress, Gateway API, and load balancers.
Kubernetes Pod Networking and CNI
Kubernetes network model, Pod IPs, CNI plugins, routes, overlays, eBPF, MTU, hostNetwork, DNS path, and node-level debugging.
Kubernetes Services and EndpointSlices
Service types, selectors, ClusterIP, NodePort, LoadBalancer, headless Services, EndpointSlices, kube-proxy, readiness, and service debugging.
Kubernetes Storage and Upgrades
Persistent storage, CSI, StatefulSets, volume lifecycle, and safe Kubernetes upgrade practices.
NATS, DNS, and Kubernetes Networking
How NATS uses Kubernetes DNS, Services, StatefulSets, headless Services, route gossip, advertise addresses, NetworkPolicy, and CoreDNS during clustering and client reconnects.
Network Namespaces and Virtual Networking
Linux network namespaces, veth pairs, bridges, macvlan, ipvlan, tun/tap, VXLAN, overlays, and CNI-style packet paths.
PostgreSQL Zero-Downtime Upgrades on Kubernetes
Zero-downtime and low-downtime PostgreSQL upgrade strategies on Kubernetes, including CloudNativePG rolling updates, major upgrades, logical replication cutovers, PgBouncer draining, and ordinary PostgreSQL runbooks.
Request Path
A master walkthrough of one request through DNS, client behavior, TCP, TLS, proxies, Kubernetes Service routing, CNI, Pod, application, database, and the response path.
Rook-Ceph
Rook-Ceph architecture, CephCluster CRD, Kubernetes storage classes, CSI, toolbox usage, upgrades, and troubleshooting.
Troubleshooting
First checks and failure paths for common Kubernetes incidents.
linux
Containerization, OCI, and VMs
Container fundamentals, OCI image/runtime/distribution standards, Linux namespaces, cgroups, overlay layers, bridge networking, macOS and Windows container behavior, KVM, hypervisors, and VM/container tradeoffs.
Cross-Layer Incident Runbooks
Operational runbooks that trace application symptoms through DNS, TCP, TLS, proxies, Linux hosts, Kubernetes Services, CNI, NAT, and cloud networking.
Cross-Topic Study Paths
Cross-topic paths for debugging a 504, Pod networking, Linux performance, TLS and mTLS, and cloud egress.
Debian and Ubuntu Operations
Ubuntu-first Linux operations: apt, dpkg, systemd, journald, netplan, ufw, CA certificates, logs, services, and package troubleshooting.
Glossary
Search-focused short definitions for operational terms across Linux, networking, Kubernetes, DNS, TLS, and performance.
Incident Entry Points
Symptom-first entry points for timeouts, resets, DNS works on the node but not in Pods, large payload hangs, and intermittent 5xx failures.
Linux
Linux fundamentals for developers and operators: processes, fork/exec, memory, CPU, kernel params, caches, paging, GPUs, and troubleshooting.
Linux Backup and File Transfer
Linux backup and transfer notes covering rsync, scp, snapshot backups, consistency, restore testing, exclusions, permissions, and automation.
Linux Block Devices and Partitioning
Linux block devices, NVMe and SCSI naming, partitions, GPT, UUIDs, labels, udev, stable device paths, and safe disk identification.
Linux Boot and Userspace
Firmware, bootloader, kernel command line, initramfs, root filesystem handoff, PID 1, services, and boot troubleshooting.
Linux Filesystems and IO
VFS, inodes, dentries, page cache, block devices, mounts, filesystem checks, IO schedulers, and storage troubleshooting.
Linux GPU Drivers
Linux GPU driver stacks for AMD and NVIDIA, including kernel modules, firmware, Mesa, ROCm, CUDA, nvidia-smi, Secure Boot, containers, and troubleshooting.
Linux Kernel Modules and Devices
Linux kernel modules, built-in drivers, modprobe, module aliases, parameters, blacklists, sysfs, devtmpfs, udev, device nodes, initramfs, Secure Boot, DKMS, and troubleshooting.
Linux Kernel Network Performance
Linux kernel packet processing, NAPI, softirq, NIC queues, RPS, RFS, XPS, offloads, qdisc, drops, and practical network performance troubleshooting.
Linux LVM
Logical Volume Manager fundamentals, advanced operations, resizing, thin provisioning, snapshots, migration, and troubleshooting.
Linux Memory Pressure and OOM
Linux memory troubleshooting for RSS, VSZ, page cache, slab, THP, NUMA, swap, cgroup memory, PSI, and OOM killer behavior.
Linux Mount Namespaces and Propagation
Mount namespaces, bind mounts, shared subtree propagation, chroot, pivot_root, overlay mounts, tmpfs, container mount debugging, and /proc mount views.
Linux Mounts and fstab
Linux mounts, findmnt, /etc/fstab, UUIDs, systemd mount units, automounts, bind mounts, mount options, and boot-safe storage configuration.
Linux Network Stack
Linux sockets, routes, neighbor tables, netfilter, conntrack, qdisc, NIC queues, namespaces, and network troubleshooting.
Linux Operations Examples
Practical Linux examples for systemd services, timers, rsync snapshot backups, and nftables host firewalls.
Linux Package and Boot Recovery
Debian and Ubuntu recovery runbooks for broken packages, bad kernels, initramfs failures, GRUB rescue, emergency mode, and rollback.
Linux Performance Triage Runbooks
Symptom-driven Linux runbooks for high CPU, high load, memory pressure, disk latency, softirq saturation, file descriptor exhaustion, and cgroup throttling.
Linux Processes and Threads
Linux processes, threads, tasks, PIDs, TIDs, fork, exec, wait, zombies, signals, scheduling, cgroups, namespaces, and process troubleshooting.
Linux Security Controls
Linux security primitives for capabilities, seccomp, AppArmor, SELinux, PAM, auditd, sudoers, file capabilities, setuid, and container boundaries.
Linux Sockets and IPC
Linux sockets, TCP, UDP, Unix domain sockets, socket files, listen queues, buffers, file descriptors, pipes, signals, shared memory, and IPC troubleshooting.
Linux Storage Health and Performance
Linux storage observability, iostat, SMART, NVMe health, kernel I/O errors, discard, queue depth, latency, saturation, and failure response.
Linux System Call Debugging
Linux syscall debugging with strace, errno, blocking calls, file and socket syscalls, poll, epoll, futexes, and production triage boundaries.
Linux TCP Kernel Tuning
Linux TCP listen queues, SYN backlog, socket buffers, ephemeral ports, TIME_WAIT, keepalives, conntrack pressure, and safe sysctl tuning.
Linux eBPF and Tracing
Practical Linux eBPF tracing with tracepoints, kprobes, uprobes, bpftrace, BCC, tc, XDP, and production-safe observability boundaries.
Logs and Observability
Linux logs, journald, syslog, dmesg, service status, boot logs, log rotation, metrics, and incident evidence collection.
Network Boot and Automated Provisioning
PXE, UEFI HTTP Boot, iPXE, DHCP, TFTP, automated Linux installs, Ubuntu autoinstall, cloud-init NoCloud, Kickstart, bare-metal provisioning, and troubleshooting.
Network Namespaces and Virtual Networking
Linux network namespaces, veth pairs, bridges, macvlan, ipvlan, tun/tap, VXLAN, overlays, and CNI-style packet paths.
Packet Path
How packets move through local sockets, routing, neighbor lookup, firewalls, conntrack, qdisc, NIC queues, and return paths.
RAID, Multipath, and Device Mapper
Linux md RAID, device mapper, dm-crypt/LUKS, multipath, NVMe multipath, WWIDs, path failover, and layered storage troubleshooting.
Request Path
A master walkthrough of one request through DNS, client behavior, TCP, TLS, proxies, Kubernetes Service routing, CNI, Pod, application, database, and the response path.
SSH Access
OpenSSH server administration on Ubuntu: sshd config, keys, authentication, host keys, logs, access control, and troubleshooting.
Scheduled Automation
Cronjobs, anacron, systemd timers, user timers, environment pitfalls, idempotent jobs, logs, locking, and automation troubleshooting.
Storage Drives, RAID, and Database Performance
SSD, HDD, NVMe, RAID 0/1/5/6/10, RAID performance, disk-failure recovery, LVM RAID, and PostgreSQL and Elasticsearch storage tradeoffs.
Switching, VLANs, and Hosts
Layer 2 switching, MAC tables, VLAN access and trunk ports, Linux bridges, 802.1Q tagging, and /etc/hosts name overrides.
TCP and Sockets
TCP state, sockets, listen queues, receive/send buffers, TIME_WAIT, retransmits, keepalive, ephemeral ports, and Linux observability.
Time, Hostname, and Identity
Time synchronization, timezone, hostname, machine-id, DNS identity, certificates, logs, Kerberos sensitivity, and Ubuntu host identity operations.
Users, Permissions, and sudo
Linux identities, groups, file modes, ownership, sudo, service users, locked accounts, ACLs, and permission troubleshooting on Ubuntu.
Zero-Trust Networking
Network security patterns that combine host firewall policy, nftables, mTLS, certificate rotation, SNI, ALPN, service identity, SSH hardening, auditd, SELinux, AppArmor, and real traffic debugging.
ext4, XFS, and Filesystem Repair
ext4 and XFS operations, mkfs, fsck, xfs_repair, online growth, inode usage, journals, quotas, discard, and safe repair workflows.
resolv.conf
Linux resolver configuration, nameservers, search domains, ndots, timeouts, attempts, rotation, and systemd-resolved behavior.
systemd
systemd units, services, targets, dependencies, timers, journals, cgroups, resource controls, and troubleshooting.
systemd Networking
systemd-networkd, systemd-resolved, .network, .netdev, .link files, network targets, wait-online behavior, and operational troubleshooting.
systemd Socket Activation and Network Services
systemd socket activation, ListenStream, ListenDatagram, Accept modes, service dependencies, restart behavior, resource controls, and network sandboxing.
llm
LLM Inference Systems
Production LLM inference systems: model files, weights, memory, inference engines, vLLM, TensorRT-LLM, TGI, llama.cpp, SGLang, Ollama, KV cache, PagedAttention, batching, quantization, routing, performance tests, and runbooks.
LLM Training Lifecycle
LLM lifecycle from pretraining and continued pretraining to SFT, RLHF, DPO, RLAIF, preference data, synthetic data, data filtering, and stage-specific evaluation.
ML Prompt Operations
Prompt engineering operations with templates, versioning, structured outputs, injection resistance, system/developer/user boundaries, evals, and rollback.
Transformer Internals
Transformer internals covering tokenization, embeddings, positional encodings, RoPE, self-attention, multi-head attention, MLP blocks, LayerNorm/RMSNorm, decoder-only models, KV cache, context windows, and MoE basics.
load-balancing
logs
long-context
lora
Advanced Fine-Tuning
Advanced fine-tuning with full tuning, LoRA, QLoRA, adapter composition, multi-adapter serving, dataset mixing, long-context tuning, packing, catastrophic forgetting, and safety-preserving releases.
Fine-Tuning and LoRA
Fine-tuning, instruction tuning, parameter-efficient adaptation, LoRA adapters, data quality, evaluation, merge behavior, and rollback.
lvm
machine-learning
Advanced Agents
Advanced agent engineering with ReAct, planning, reflection, tool routing, multi-agent systems, durable execution, memory, sandboxing, approval workflows, trajectory evaluation, and failure recovery.
Advanced Fine-Tuning
Advanced fine-tuning with full tuning, LoRA, QLoRA, adapter composition, multi-adapter serving, dataset mixing, long-context tuning, packing, catastrophic forgetting, and safety-preserving releases.
Advanced Inference and vLLM
Advanced LLM inference with vLLM, PagedAttention, continuous batching, disaggregated prefill, chunked prefill, speculative decoding, prefix caching, KV-cache math, tensor and pipeline parallelism, quantized serving, LoRA serving, and autoscaling.
Advanced ML Architectures
Advanced model architectures including Mixture of Experts, retrieval-augmented models, state space models, diffusion transformers, sparse attention, long-context architectures, and encoder-only, decoder-only, and encoder-decoder tradeoffs.
Advanced ML Observability
Advanced ML observability with drift detection, data quality monitoring, online evals, human feedback loops, trace schemas, prompt/retrieval/model/tool observability, safety monitoring, cost monitoring, and incident review templates.
Advanced RAG
Advanced retrieval-augmented generation with hybrid retrieval, query planning, multi-hop retrieval, reranker architectures, GraphRAG, context compression, citation verification, agentic RAG, retrieval evals, and RAG security.
Classical ML
Classical machine learning with linear models, logistic regression, trees, random forests, gradient boosting, SVMs, k-means, feature engineering, and tabular workflows.
Deep Learning Fundamentals
Deep learning fundamentals: neural networks, layers, activations, embeddings, CNNs, RNNs, transformers, optimizers, normalization, regularization, initialization, and training stability.
Fine-Tuning and LoRA
Fine-tuning, instruction tuning, parameter-efficient adaptation, LoRA adapters, data quality, evaluation, merge behavior, and rollback.
Inference Benchmarking
Benchmark methodology for LLM inference with warmup, prompt/output distributions, concurrency sweeps, TTFT, ITL, throughput, quality gates, and benchmark anti-patterns.
Inference Engine Comparison
Comparison guide for vLLM, TensorRT-LLM, Hugging Face TGI, llama.cpp, SGLang, Ollama, OpenAI-compatible APIs, quantization, adapters, and migration checks.
Inference Runbooks
Operational runbooks for LLM inference incidents: wrong output, high TTFT, slow inter-token latency, OOM, low throughput, queue buildup, cache pressure, tokenizer mismatch, and runtime regressions.
LLM Inference Systems
Production LLM inference systems: model files, weights, memory, inference engines, vLLM, TensorRT-LLM, TGI, llama.cpp, SGLang, Ollama, KV cache, PagedAttention, batching, quantization, routing, performance tests, and runbooks.
LLM Training Lifecycle
LLM lifecycle from pretraining and continued pretraining to SFT, RLHF, DPO, RLAIF, preference data, synthetic data, data filtering, and stage-specific evaluation.
Long-Context Serving
Long-context LLM serving with RoPE scaling, sliding-window attention, sink tokens, lost-in-the-middle behavior, KV-cache growth, prompt budgeting, and long-context evals.
ML 101 Foundations
Beginner ML foundations: what ML is, when not to use it, datasets, features, labels, splits, training, inference, metrics, overfitting, and bias/variance.
ML Accelerators: GPU and TPU
GPU and TPU fundamentals for ML workloads: parallelism, memory, precision, batching, utilization, data movement, and troubleshooting.
ML Agents and Tool Use
Agentic ML systems with planning loops, tools, state, memory, idempotency, guardrails, human approval, and evaluation.
ML Alignment and Evaluation
Alignment, evaluation, RLHF, preference tuning, DPO, safety policy, red teaming, regression gates, calibration, and monitoring.
ML Data Pipelines and Feature Stores
ML data engineering with dataset versioning, feature stores, lineage, data contracts, train/serve skew, point-in-time correctness, privacy, and reproducibility.
ML Evaluation Mastery
Advanced ML evaluation with unit evals, golden sets, model-graded evals, human evals, pairwise preference evals, safety evals, regression gates, slice metrics, statistical confidence, and contamination controls.
ML Evaluation and CI/CD
ML evaluation harnesses, golden sets, regression gates, human review, release scorecards, canaries, and CI/CD for model, prompt, retrieval, and tool changes.
ML Explainability
Explainability for ML systems: interpretable models, feature attribution, saliency, counterfactuals, LIME, SHAP, attention caveats, audits, and model cards.
ML Models, Types, and Weights
Model families, model types, parameters, weights, checkpoints, tokenizers, embeddings, transformers, diffusion models, and inference boundaries.
ML Observability and Incident Response
ML observability, metrics, traces, prompt and retrieval logging, drift monitoring, feedback loops, and incident response runbooks.
ML Performance Engineering
ML performance engineering with training-loop profiling, GPU utilization, memory bandwidth, activation checkpointing, FlashAttention, fused kernels, distributed training bottlenecks, inference throughput tuning, and cost/performance tradeoffs.
ML Prompt Operations
Prompt engineering operations with templates, versioning, structured outputs, injection resistance, system/developer/user boundaries, evals, and rollback.
ML Security Threats
Advanced ML security threats: prompt injection, jailbreaks, data poisoning, model extraction, membership inference, training data leakage, secrets in prompts, supply chain risk, tenant isolation, and secure RAG.
ML Security and Privacy
ML security and privacy risks including prompt injection, data exfiltration, model supply chain, poisoned retrieval, tenant isolation, PII, retention, and audit logging.
ML Serving, Inference, and vLLM
Production ML inference with model servers, batching, streaming, KV cache, autoscaling, canaries, inference optimization, and deep vLLM operations.
MLOps Systems
MLOps systems with model registries, feature stores, training pipelines, artifact versioning, reproducibility, batch and online inference, canaries, shadow evaluation, rollback, and cost governance.
Machine Learning
Machine learning systems study guide from ML 101 through advanced concepts: math, classical ML, deep learning, transformers, LLM training, fine-tuning, RAG, agents, serving, vLLM, MLOps, security, evaluation, performance, and governance.
Math for ML
Practical math for machine learning: vectors, matrices, tensors, dot products, cosine similarity, gradients, softmax, cross entropy, KL divergence, probability, and attention intuition.
MoE Inference
Mixture-of-Experts inference with active vs total parameters, expert routing, expert parallelism, all-to-all communication, load balance, capacity, KV cache, and serving tradeoffs.
Model Memory Math
Practical LLM memory math for weights, activations, KV cache, optimizer state, model loading, quantization, model sizes, and serving capacity planning.
Multimodal ML
Multimodal ML with vision, audio, speech-to-text, text-to-speech, vision-language models, image embeddings, OCR pipelines, multimodal RAG, and video model basics.
PyTorch Fundamentals
PyTorch fundamentals for tensors, modules, autograd, losses, optimizers, dataloaders, training loops, mixed precision, and checkpointing.
Quantized Serving
Quantized LLM serving with weight quantization, activation quantization, KV-cache quantization, GGUF, AWQ, GPTQ, FP8, quality gates, and rollout checks.
Responsible AI and Governance
Responsible AI governance with model cards, dataset cards, risk classification, human oversight, audit trails, fairness checks, red-team programs, release approval, incident response, and regulatory documentation.
Retrieval-Augmented Generation
RAG systems with embeddings, chunking, vector search, retrieval, reranking, context assembly, citations, freshness, and evaluation.
Tokenizer and Chat Template Compatibility
Tokenizer, chat template, special token, stop sequence, tool schema, and generation config compatibility for reliable LLM inference.
Transformer Internals
Transformer internals covering tokenization, embeddings, positional encodings, RoPE, self-attention, multi-head attention, MLP blocks, LayerNorm/RMSNorm, decoder-only models, KV cache, context windows, and MoE basics.
vLLM Operations
Operational guide for vLLM serving with scheduler concepts, PagedAttention, KV cache, prefix caching, speculative decoding, parallelism, metrics, flags, and runbooks.
math
memory
Linux Memory Pressure and OOM
Linux memory troubleshooting for RSS, VSZ, page cache, slab, THP, NUMA, swap, cgroup memory, PSI, and OOM killer behavior.
Model Memory Math
Practical LLM memory math for weights, activations, KV cache, optimizer state, model loading, quantization, model sizes, and serving capacity planning.
ml-101
mlops
models
ML Models, Types, and Weights
Model families, model types, parameters, weights, checkpoints, tokenizers, embeddings, transformers, diffusion models, and inference boundaries.
Machine Learning
Machine learning systems study guide from ML 101 through advanced concepts: math, classical ML, deep learning, transformers, LLM training, fine-tuning, RAG, agents, serving, vLLM, MLOps, security, evaluation, performance, and governance.
moe
monitoring
Advanced ML Observability
Advanced ML observability with drift detection, data quality monitoring, online evals, human feedback loops, trace schemas, prompt/retrieval/model/tool observability, safety monitoring, cost monitoring, and incident review templates.
ML Observability and Incident Response
ML observability, metrics, traces, prompt and retrieval logging, drift monitoring, feedback loops, and incident response runbooks.
mounts
mtls
Istio Security, mTLS, and Policy
Istio workload identity, mTLS, PeerAuthentication, AuthorizationPolicy, RequestAuthentication, JWT validation, trust domains, and policy debugging.
Istio Service Mesh
Service mesh architecture, request paths, Envoy sidecars, ambient ztunnel and waypoint proxies, mTLS, policy, telemetry, and troubleshooting.
Networking TLS and mTLS Examples
Practical networking examples for certificate inspection and mTLS requests.
mtu
multimodal
namespaces
Containerization, OCI, and VMs
Container fundamentals, OCI image/runtime/distribution standards, Linux namespaces, cgroups, overlay layers, bridge networking, macOS and Windows container behavior, KVM, hypervisors, and VM/container tradeoffs.
Linux Mount Namespaces and Propagation
Mount namespaces, bind mounts, shared subtree propagation, chroot, pivot_root, overlay mounts, tmpfs, container mount debugging, and /proc mount views.
nat
Cloud Networking
Cloud networking fundamentals for VPCs, subnets, route tables, security groups, private endpoints, peering, transit gateways, NAT, load balancers, and overlapping CIDRs.
NAT Gateways and Network Address Translation
Deep operational guide to NAT, NAT gateways, SNAT, DNAT, PAT, masquerade, conntrack, ephemeral ports, cloud egress, hairpin NAT, and NAT troubleshooting.
Routing, NAT, and Firewalls
Routing tables, policy routing, NAT, connection tracking, firewall state, asymmetric paths, and load-balancer edge cases.
nats
networking
BGP and Dynamic Routing
BGP fundamentals for operators: ASNs, peering, route advertisement, route selection, communities, ECMP, anycast, and failure modes.
Certificates and HTTPS
TLS certificates, HTTPS handshakes, chains of trust, SANs, SNI, OCSP, CA stores, and Ubuntu certificate operations.
Cloud Networking
Cloud networking fundamentals for VPCs, subnets, route tables, security groups, private endpoints, peering, transit gateways, NAT, load balancers, and overlapping CIDRs.
Cross-Layer Incident Runbooks
Operational runbooks that trace application symptoms through DNS, TCP, TLS, proxies, Linux hosts, Kubernetes Services, CNI, NAT, and cloud networking.
Cross-Topic Study Paths
Cross-topic paths for debugging a 504, Pod networking, Linux performance, TLS and mTLS, and cloud egress.
DHCP, Routers, and Switches
DHCP-focused networking guide covering DORA, leases, options, default gateways, DNS options, relay agents, helper addresses, Option 82, VLANs, switches, routers, DHCP snooping, DHCPv6, and troubleshooting.
DNS
Name resolution flow, DNS records, delegation, caching, negative answers, and operational debugging.
Datacenter L2/L3 Operations
Datacenter switching and routing concepts for host operators: LACP, MLAG, STP, ARP/NDP instability, EVPN/VXLAN, ECMP, anycast, route dampening, and fabric incident evidence.
Domain Controllers and Directory DNS
How Active Directory domain controllers depend on DNS SRV records, Kerberos, LDAP, Global Catalog, replication, and time.
Firewalls, iptables, and Netfilter
Linux firewall fundamentals: netfilter hooks, nftables, iptables, chains, tables, conntrack, NAT, rule ordering, logging, persistence, and troubleshooting.
Forward and Reverse Proxies
Forward proxies, reverse proxies, CONNECT tunnels, transparent proxies, TLS interception, proxy headers, egress control, caching, environment variables, and troubleshooting.
Glossary
Search-focused short definitions for operational terms across Linux, networking, Kubernetes, DNS, TLS, and performance.
HTTP and Proxy Debugging
Practical HTTP and proxy debugging with curl, headers, redirects, keepalive, chunking, HTTP/2, gRPC, WebSockets, proxy variables, and timeout alignment.
ICMP, MTU, and Path Testing
ICMP, ping, traceroute, path MTU discovery, fragmentation, tunnel overhead, packet loss, jitter, and practical path testing.
IP Addressing and Subnetting
IPv4 and IPv6 addressing, CIDR, subnets, default gateways, private address space, source selection, and route troubleshooting.
IPv6 Operations
Operational IPv6 fundamentals: SLAAC, DHCPv6, Router Advertisements, NDP, link-local addresses, privacy addresses, dual-stack, NAT64, DNS AAAA, and firewalling.
Incident Entry Points
Symptom-first entry points for timeouts, resets, DNS works on the node but not in Pods, large payload hangs, and intermittent 5xx failures.
Istio
Istio service mesh fundamentals: control plane, data plane, sidecar mode, ambient mode, traffic management, security, observability, and operations.
Kubernetes ExternalDNS
How ExternalDNS watches Kubernetes resources and reconciles public or private DNS records through provider APIs.
Kubernetes Ingress, Gateway, and Load Balancers
Ingress controllers, Gateway API, Service LoadBalancers, health checks, TLS, SNI, source IP, externalTrafficPolicy, DNS, and edge troubleshooting.
Kubernetes NetworkPolicy
NetworkPolicy isolation, ingress and egress rules, podSelector, namespaceSelector, ipBlock, default deny, DNS egress, CNI enforcement, and policy debugging.
Kubernetes Networking
Pod networking, Services, CoreDNS, kube-dns history, EndpointSlices, NetworkPolicy, Ingress, Gateway API, and load balancers.
Kubernetes Pod Networking and CNI
Kubernetes network model, Pod IPs, CNI plugins, routes, overlays, eBPF, MTU, hostNetwork, DNS path, and node-level debugging.
Kubernetes Services and EndpointSlices
Service types, selectors, ClusterIP, NodePort, LoadBalancer, headless Services, EndpointSlices, kube-proxy, readiness, and service debugging.
Linux Kernel Network Performance
Linux kernel packet processing, NAPI, softirq, NIC queues, RPS, RFS, XPS, offloads, qdisc, drops, and practical network performance troubleshooting.
Linux Network Stack
Linux sockets, routes, neighbor tables, netfilter, conntrack, qdisc, NIC queues, namespaces, and network troubleshooting.
Linux TCP Kernel Tuning
Linux TCP listen queues, SYN backlog, socket buffers, ephemeral ports, TIME_WAIT, keepalives, conntrack pressure, and safe sysctl tuning.
Load Balancers and Proxies
L4 and L7 load balancing, reverse proxies, health checks, TLS termination, source IP preservation, PROXY protocol, headers, and timeout failures.
NAT Gateways and Network Address Translation
Deep operational guide to NAT, NAT gateways, SNAT, DNAT, PAT, masquerade, conntrack, ephemeral ports, cloud egress, hairpin NAT, and NAT troubleshooting.
NATS, DNS, and Kubernetes Networking
How NATS uses Kubernetes DNS, Services, StatefulSets, headless Services, route gossip, advertise addresses, NetworkPolicy, and CoreDNS during clustering and client reconnects.
Network Boot and Automated Provisioning
PXE, UEFI HTTP Boot, iPXE, DHCP, TFTP, automated Linux installs, Ubuntu autoinstall, cloud-init NoCloud, Kickstart, bare-metal provisioning, and troubleshooting.
Network Namespaces and Virtual Networking
Linux network namespaces, veth pairs, bridges, macvlan, ipvlan, tun/tap, VXLAN, overlays, and CNI-style packet paths.
Networking
Practical systems networking: OSI layers, NICs, ARP, IP routing, TCP/UDP, DNS, TLS, and packet debugging.
Networking TLS and mTLS Examples
Practical networking examples for certificate inspection and mTLS requests.
Packet Capture and Analysis
Practical tcpdump and Wireshark packet analysis for filters, handshakes, retransmits, resets, DNS, TLS SNI, VLAN tags, MTU, and rolling captures.
Packet Path
How packets move through local sockets, routing, neighbor lookup, firewalls, conntrack, qdisc, NIC queues, and return paths.
PgBouncer
How PgBouncer works, pooling modes, sizing, HA placement, failover behavior, prepared statements, session state, observability, and operational runbooks.
Request Path
A master walkthrough of one request through DNS, client behavior, TCP, TLS, proxies, Kubernetes Service routing, CNI, Pod, application, database, and the response path.
Resilience, Timeouts, and Draining
Practical network resilience patterns: timeout budgets, retries, backoff, connection pooling, keepalive, idle timeouts, load balancer draining, rolling restarts, DNS TTLs, and zero-downtime deploy behavior.
Routing, NAT, and Firewalls
Routing tables, policy routing, NAT, connection tracking, firewall state, asymmetric paths, and load-balancer edge cases.
Switching, VLANs, and Hosts
Layer 2 switching, MAC tables, VLAN access and trunk ports, Linux bridges, 802.1Q tagging, and /etc/hosts name overrides.
TCP and Sockets
TCP state, sockets, listen queues, receive/send buffers, TIME_WAIT, retransmits, keepalive, ephemeral ports, and Linux observability.
TCP, TLS, and HTTP
Transport and application handshake layers: TCP state, retransmits, flow control, congestion control, TLS SNI/certificates, and HTTP behavior.
UDP, QUIC, and Connectionless Traffic
UDP behavior, DNS and DHCP traffic, QUIC, connection tracking, timeouts, packet loss, retries, and operational debugging.
VPNs and IPsec Tunnels
VPN fundamentals, IPsec tunnel mode, IKEv2, ESP, NAT-T, traffic selectors, route-based and policy-based tunnels, split tunneling, MTU, and operational troubleshooting.
Zero-Trust Networking
Network security patterns that combine host firewall policy, nftables, mTLS, certificate rotation, SNI, ALPN, service identity, SSH hardening, auditd, SELinux, AppArmor, and real traffic debugging.
systemd Networking
systemd-networkd, systemd-resolved, .network, .netdev, .link files, network targets, wait-online behavior, and operational troubleshooting.
systemd Socket Activation and Network Services
systemd socket activation, ListenStream, ListenDatagram, Accept modes, service dependencies, restart behavior, resource controls, and network sandboxing.
networkpolicy
neural-networks
nvidia
oauth
observability
Advanced ML Observability
Advanced ML observability with drift detection, data quality monitoring, online evals, human feedback loops, trace schemas, prompt/retrieval/model/tool observability, safety monitoring, cost monitoring, and incident review templates.
Istio Observability and Troubleshooting
Istio metrics, access logs, traces, proxy sync, xDS inspection, response flags, common 503 and policy failures, and practical debugging flows.
Linux eBPF and Tracing
Practical Linux eBPF tracing with tracepoints, kprobes, uprobes, bpftrace, BCC, tc, XDP, and production-safe observability boundaries.
Logs and Observability
Linux logs, journald, syslog, dmesg, service status, boot logs, log rotation, metrics, and incident evidence collection.
ML Observability and Incident Response
ML observability, metrics, traces, prompt and retrieval logging, drift monitoring, feedback loops, and incident response runbooks.
oci
oidc
oom
opensearch
Database and Search Examples
Practical PostgreSQL, PgBouncer, and OpenSearch examples for queries, pooling, and cluster inspection.
OpenSearch Operations, Replication, Sharding, and HA
OpenSearch operations guide covering shards, replicas, cross-cluster replication, allocation awareness, snapshots, cluster-manager failover, and HA runbooks.
operating-systems
operations
Ceph Operations and Recovery
Ceph health triage, OSD failure response, recovery and backfill, scrub repair, maintenance flags, full ratios, and safe operational runbooks.
DNS Records, Responses, and Transport
DNS record types, response codes, EDNS, UDP and TCP transport, truncation, reverse DNS, mail records, and operational debugging.
Databases
Database fundamentals, modeling, indexes, transactions, replication, and operations.
Debian and Ubuntu Operations
Ubuntu-first Linux operations: apt, dpkg, systemd, journald, netplan, ufw, CA certificates, logs, services, and package troubleshooting.
Foundational Study Review
A cross-topic checklist for reviewing whether the guide covers the core mental models behind Linux, networking, DNS, Kubernetes, identity, databases, Ceph, Istio, and troubleshooting.
Inference Runbooks
Operational runbooks for LLM inference incidents: wrong output, high TTFT, slow inter-token latency, OOM, low throughput, queue buildup, cache pressure, tokenizer mismatch, and runtime regressions.
Istio Zero Downtime Upgrades on Kubernetes
Operational runbook for low-risk Istio upgrades on Kubernetes using canary control planes, revisions, revision tags, workload restarts, gateway canaries, rollback, PDBs, and validation gates.
ML Prompt Operations
Prompt engineering operations with templates, versioning, structured outputs, injection resistance, system/developer/user boundaries, evals, and rollback.
Machine Learning
Machine learning systems study guide from ML 101 through advanced concepts: math, classical ML, deep learning, transformers, LLM training, fine-tuning, RAG, agents, serving, vLLM, MLOps, security, evaluation, performance, and governance.
Troubleshooting
First checks and failure paths for common Kubernetes incidents.
Troubleshooting Examples
Practical troubleshooting examples for repeatable incident capture and evidence preservation.
Troubleshooting and Error Handling
Cross-technology troubleshooting and error handling for Linux, networking, DNS, Kubernetes, Istio, Ceph, PostgreSQL, and distributed systems operations.
vLLM Operations
Operational guide for vLLM serving with scheduler concepts, PagedAttention, KV cache, prefix caching, speculative decoding, parallelism, metrics, flags, and runbooks.
optimization
orchestration
packet-analysis
Packet Capture and Analysis
Practical tcpdump and Wireshark packet analysis for filters, handshakes, retransmits, resets, DNS, TLS SNI, VLAN tags, MTU, and rolling captures.
Packet Path
How packets move through local sockets, routing, neighbor lookup, firewalls, conntrack, qdisc, NIC queues, and return paths.
performance
Ceph Performance and Capacity
Ceph performance model, capacity planning, BlueStore, OSD latency, network design, client behavior, benchmarks, and saturation troubleshooting.
Inference Benchmarking
Benchmark methodology for LLM inference with warmup, prompt/output distributions, concurrency sweeps, TTFT, ITL, throughput, quality gates, and benchmark anti-patterns.
Linux Kernel Network Performance
Linux kernel packet processing, NAPI, softirq, NIC queues, RPS, RFS, XPS, offloads, qdisc, drops, and practical network performance troubleshooting.
Linux Memory Pressure and OOM
Linux memory troubleshooting for RSS, VSZ, page cache, slab, THP, NUMA, swap, cgroup memory, PSI, and OOM killer behavior.
Linux Performance Triage Runbooks
Symptom-driven Linux runbooks for high CPU, high load, memory pressure, disk latency, softirq saturation, file descriptor exhaustion, and cgroup throttling.
Linux Storage Health and Performance
Linux storage observability, iostat, SMART, NVMe health, kernel I/O errors, discard, queue depth, latency, saturation, and failure response.
ML Accelerators: GPU and TPU
GPU and TPU fundamentals for ML workloads: parallelism, memory, precision, batching, utilization, data movement, and troubleshooting.
ML Performance Engineering
ML performance engineering with training-loop profiling, GPU utilization, memory bandwidth, activation checkpointing, FlashAttention, fused kernels, distributed training bottlenecks, inference throughput tuning, and cost/performance tradeoffs.
PostgreSQL
PostgreSQL architecture, MVCC, WAL, query planning, indexes, vacuum, backups, replication, and operations.
permissions
pgbouncer
pipelines
postgres
CloudNativePG
How CloudNativePG runs PostgreSQL in Kubernetes, including clusters, services, failover, operator scaling, operator upgrades, backups, and operational cautions.
Database and Search Examples
Practical PostgreSQL, PgBouncer, and OpenSearch examples for queries, pooling, and cluster inspection.
PgBouncer
How PgBouncer works, pooling modes, sizing, HA placement, failover behavior, prepared statements, session state, observability, and operational runbooks.
PostgreSQL
PostgreSQL architecture, MVCC, WAL, query planning, indexes, vacuum, backups, replication, and operations.
PostgreSQL Operations, HA, Replication, and Recovery
Operational PostgreSQL guide covering HA managers, failover, physical and logical replication, sharding, backups, restore drills, WAL, high CPU, high RAM, and incident checks.
PostgreSQL Zero-Downtime Upgrades on Kubernetes
Zero-downtime and low-downtime PostgreSQL upgrade strategies on Kubernetes, including CloudNativePG rolling updates, major upgrades, logical replication cutovers, PgBouncer draining, and ordinary PostgreSQL runbooks.
Storage Drives, RAID, and Database Performance
SSD, HDD, NVMe, RAID 0/1/5/6/10, RAID performance, disk-failure recovery, LVM RAID, and PostgreSQL and Elasticsearch storage tradeoffs.
privacy
DNSSEC and DNS Privacy
DNSSEC validation, chain of trust, DS/DNSKEY, signing failures, DNS over TLS, DNS over HTTPS, and what privacy does not solve.
ML Security Threats
Advanced ML security threats: prompt injection, jailbreaks, data poisoning, model extraction, membership inference, training data leakage, secrets in prompts, supply chain risk, tenant isolation, and secure RAG.
ML Security and Privacy
ML security and privacy risks including prompt injection, data exfiltration, model supply chain, poisoned retrieval, tenant isolation, PII, retention, and audit logging.
processes
prompts
ML Prompt Operations
Prompt engineering operations with templates, versioning, structured outputs, injection resistance, system/developer/user boundaries, evals, and rollback.
Tokenizer and Chat Template Compatibility
Tokenizer, chat template, special token, stop sequence, tool schema, and generation config compatibility for reliable LLM inference.
provisioning
proxy
Forward and Reverse Proxies
Forward proxies, reverse proxies, CONNECT tunnels, transparent proxies, TLS interception, proxy headers, egress control, caching, environment variables, and troubleshooting.
HTTP and Proxy Debugging
Practical HTTP and proxy debugging with curl, headers, redirects, keepalive, chunking, HTTP/2, gRPC, WebSockets, proxy variables, and timeout alignment.
pytorch
quantization
rados
rag
Advanced RAG
Advanced retrieval-augmented generation with hybrid retrieval, query planning, multi-hop retrieval, reranker architectures, GraphRAG, context compression, citation verification, agentic RAG, retrieval evals, and RAG security.
Retrieval-Augmented Generation
RAG systems with embeddings, chunking, vector search, retrieval, reranking, context assembly, citations, freshness, and evaluation.
raid
RAID, Multipath, and Device Mapper
Linux md RAID, device mapper, dm-crypt/LUKS, multipath, NVMe multipath, WWIDs, path failover, and layered storage troubleshooting.
Storage Drives, RAID, and Database Performance
SSD, HDD, NVMe, RAID 0/1/5/6/10, RAID performance, disk-failure recovery, LVM RAID, and PostgreSQL and Elasticsearch storage tradeoffs.
rbd
recovery
Ceph Operations and Recovery
Ceph health triage, OSD failure response, recovery and backfill, scrub repair, maintenance flags, full ratios, and safe operational runbooks.
Linux Package and Boot Recovery
Debian and Ubuntu recovery runbooks for broken packages, bad kernels, initramfs failures, GRUB rescue, emergency mode, and rollback.
release
reliability
replication
OpenSearch Operations, Replication, Sharding, and HA
OpenSearch operations guide covering shards, replicas, cross-cluster replication, allocation awareness, snapshots, cluster-manager failover, and HA runbooks.
PostgreSQL Operations, HA, Replication, and Recovery
Operational PostgreSQL guide covering HA managers, failover, physical and logical replication, sharding, backups, restore drills, WAL, high CPU, high RAM, and incident checks.
resilience
resolv-conf
responsible-ai
retrieval
rgw
rook
routing
BGP and Dynamic Routing
BGP fundamentals for operators: ASNs, peering, route advertisement, route selection, communities, ECMP, anycast, and failure modes.
Cloud Networking
Cloud networking fundamentals for VPCs, subnets, route tables, security groups, private endpoints, peering, transit gateways, NAT, load balancers, and overlapping CIDRs.
DHCP, Routers, and Switches
DHCP-focused networking guide covering DORA, leases, options, default gateways, DNS options, relay agents, helper addresses, Option 82, VLANs, switches, routers, DHCP snooping, DHCPv6, and troubleshooting.
Datacenter L2/L3 Operations
Datacenter switching and routing concepts for host operators: LACP, MLAG, STP, ARP/NDP instability, EVPN/VXLAN, ECMP, anycast, route dampening, and fabric incident evidence.
IP Addressing and Subnetting
IPv4 and IPv6 addressing, CIDR, subnets, default gateways, private address space, source selection, and route troubleshooting.
Routing, NAT, and Firewalls
Routing tables, policy routing, NAT, connection tracking, firewall state, asymmetric paths, and load-balancer edge cases.
rsync
runbooks
Inference Runbooks
Operational runbooks for LLM inference incidents: wrong output, high TTFT, slow inter-token latency, OOM, low throughput, queue buildup, cache pressure, tokenizer mismatch, and runtime regressions.
Linux Performance Triage Runbooks
Symptom-driven Linux runbooks for high CPU, high load, memory pressure, disk latency, softirq saturation, file descriptor exhaustion, and cgroup throttling.
Scenario Labs
Operational labs for practicing cross-layer debugging with symptoms, evidence, checks, and answers.
safety
ML Alignment and Evaluation
Alignment, evaluation, RLHF, preference tuning, DPO, safety policy, red teaming, regression gates, calibration, and monitoring.
ML Security and Privacy
ML security and privacy risks including prompt injection, data exfiltration, model supply chain, poisoned retrieval, tenant isolation, PII, retention, and audit logging.
Responsible AI and Governance
Responsible AI governance with model cards, dataset cards, risk classification, human oversight, audit trails, fairness checks, red-team programs, release approval, incident response, and regulatory documentation.
saml
scikit-learn
scp
search
OpenSearch Operations, Replication, Sharding, and HA
OpenSearch operations guide covering shards, replicas, cross-cluster replication, allocation awareness, snapshots, cluster-manager failover, and HA runbooks.
Retrieval-Augmented Generation
RAG systems with embeddings, chunking, vector search, retrieval, reranking, context assembly, citations, freshness, and evaluation.
security
IdP, SAML, JWT, OAuth, and OIDC
Identity provider basics, SAML assertions, OAuth 2.0 authorization, OpenID Connect authentication, JWT structure, validation checks, and troubleshooting.
Identity and Access
Identity-provider, federation, SSO, SAML, OAuth, OpenID Connect, JWT, token validation, and access-control study guide.
Istio Security, mTLS, and Policy
Istio workload identity, mTLS, PeerAuthentication, AuthorizationPolicy, RequestAuthentication, JWT validation, trust domains, and policy debugging.
Kubernetes NetworkPolicy
NetworkPolicy isolation, ingress and egress rules, podSelector, namespaceSelector, ipBlock, default deny, DNS egress, CNI enforcement, and policy debugging.
Linux Security Controls
Linux security primitives for capabilities, seccomp, AppArmor, SELinux, PAM, auditd, sudoers, file capabilities, setuid, and container boundaries.
ML Security Threats
Advanced ML security threats: prompt injection, jailbreaks, data poisoning, model extraction, membership inference, training data leakage, secrets in prompts, supply chain risk, tenant isolation, and secure RAG.
ML Security and Privacy
ML security and privacy risks including prompt injection, data exfiltration, model supply chain, poisoned retrieval, tenant isolation, PII, retention, and audit logging.
SSH Access
OpenSSH server administration on Ubuntu: sshd config, keys, authentication, host keys, logs, access control, and troubleshooting.
Users, Permissions, and sudo
Linux identities, groups, file modes, ownership, sudo, service users, locked accounts, ACLs, and permission troubleshooting on Ubuntu.
Zero-Trust Networking
Network security patterns that combine host firewall policy, nftables, mTLS, certificate rotation, SNI, ALPN, service identity, SSH hardening, auditd, SELinux, AppArmor, and real traffic debugging.
service-mesh
Istio
Istio service mesh fundamentals: control plane, data plane, sidecar mode, ambient mode, traffic management, security, observability, and operations.
Istio Service Mesh
Service mesh architecture, request paths, Envoy sidecars, ambient ztunnel and waypoint proxies, mTLS, policy, telemetry, and troubleshooting.
Istio Service Mesh Examples
Practical Istio examples for VirtualService traffic splitting, DestinationRule subsets, and AuthorizationPolicy.
services
serving
Inference Engine Comparison
Comparison guide for vLLM, TensorRT-LLM, Hugging Face TGI, llama.cpp, SGLang, Ollama, OpenAI-compatible APIs, quantization, adapters, and migration checks.
LLM Inference Systems
Production LLM inference systems: model files, weights, memory, inference engines, vLLM, TensorRT-LLM, TGI, llama.cpp, SGLang, Ollama, KV cache, PagedAttention, batching, quantization, routing, performance tests, and runbooks.
ML Serving, Inference, and vLLM
Production ML inference with model servers, batching, streaming, KV cache, autoscaling, canaries, inference optimization, and deep vLLM operations.
Quantized Serving
Quantized LLM serving with weight quantization, activation quantization, KV-cache quantization, GGUF, AWQ, GPTQ, FP8, quality gates, and rollout checks.
sockets
Linux Sockets and IPC
Linux sockets, TCP, UDP, Unix domain sockets, socket files, listen queues, buffers, file descriptors, pipes, signals, shared memory, and IPC troubleshooting.
TCP and Sockets
TCP state, sockets, listen queues, receive/send buffers, TIME_WAIT, retransmits, keepalive, ephemeral ports, and Linux observability.
ssh
storage
Ceph
Ceph distributed storage architecture, RADOS, OSDs, monitors, managers, pools, placement groups, CRUSH, CephFS, RBD, RGW, and operations.
Ceph Performance and Capacity
Ceph performance model, capacity planning, BlueStore, OSD latency, network design, client behavior, benchmarks, and saturation troubleshooting.
Ceph RADOS, CRUSH, and Placement
RADOS objects, pools, placement groups, CRUSH rules, acting sets, PG autoscaling, and placement troubleshooting.
Ceph Storage Examples
Practical Ceph examples for cluster health, replicated pool creation, and RBD image operations.
Databases
Database fundamentals, modeling, indexes, transactions, replication, and operations.
Kubernetes Examples
Practical Kubernetes examples for Deployments, Services, PodDisruptionBudgets, NetworkPolicy, and StatefulSet storage.
Kubernetes Storage and Upgrades
Persistent storage, CSI, StatefulSets, volume lifecycle, and safe Kubernetes upgrade practices.
Linux Backup and File Transfer
Linux backup and transfer notes covering rsync, scp, snapshot backups, consistency, restore testing, exclusions, permissions, and automation.
Linux Block Devices and Partitioning
Linux block devices, NVMe and SCSI naming, partitions, GPT, UUIDs, labels, udev, stable device paths, and safe disk identification.
Linux Filesystems and IO
VFS, inodes, dentries, page cache, block devices, mounts, filesystem checks, IO schedulers, and storage troubleshooting.
Linux LVM
Logical Volume Manager fundamentals, advanced operations, resizing, thin provisioning, snapshots, migration, and troubleshooting.
Linux Mounts and fstab
Linux mounts, findmnt, /etc/fstab, UUIDs, systemd mount units, automounts, bind mounts, mount options, and boot-safe storage configuration.
Linux Storage Health and Performance
Linux storage observability, iostat, SMART, NVMe health, kernel I/O errors, discard, queue depth, latency, saturation, and failure response.
RAID, Multipath, and Device Mapper
Linux md RAID, device mapper, dm-crypt/LUKS, multipath, NVMe multipath, WWIDs, path failover, and layered storage troubleshooting.
Rook-Ceph
Rook-Ceph architecture, CephCluster CRD, Kubernetes storage classes, CSI, toolbox usage, upgrades, and troubleshooting.
Storage Drives, RAID, and Database Performance
SSD, HDD, NVMe, RAID 0/1/5/6/10, RAID performance, disk-failure recovery, LVM RAID, and PostgreSQL and Elasticsearch storage tradeoffs.
ext4, XFS, and Filesystem Repair
ext4 and XFS operations, mkfs, fsck, xfs_repair, online growth, inode usage, journals, quotas, discard, and safe repair workflows.
study
Cross-Topic Study Paths
Cross-topic paths for debugging a 504, Pod networking, Linux performance, TLS and mTLS, and cloud egress.
Foundational Study Review
A cross-topic checklist for reviewing whether the guide covers the core mental models behind Linux, networking, DNS, Kubernetes, identity, databases, Ceph, Istio, and troubleshooting.
Scenario Labs
Operational labs for practicing cross-layer debugging with symptoms, evidence, checks, and answers.
study-system
switching
DHCP, Routers, and Switches
DHCP-focused networking guide covering DORA, leases, options, default gateways, DNS options, relay agents, helper addresses, Option 82, VLANs, switches, routers, DHCP snooping, DHCPv6, and troubleshooting.
Datacenter L2/L3 Operations
Datacenter switching and routing concepts for host operators: LACP, MLAG, STP, ARP/NDP instability, EVPN/VXLAN, ECMP, anycast, route dampening, and fabric incident evidence.
Switching, VLANs, and Hosts
Layer 2 switching, MAC tables, VLAN access and trunk ports, Linux bridges, 802.1Q tagging, and /etc/hosts name overrides.
syscalls
systemd
Linux Boot and Userspace
Firmware, bootloader, kernel command line, initramfs, root filesystem handoff, PID 1, services, and boot troubleshooting.
Linux Mounts and fstab
Linux mounts, findmnt, /etc/fstab, UUIDs, systemd mount units, automounts, bind mounts, mount options, and boot-safe storage configuration.
Linux Operations Examples
Practical Linux examples for systemd services, timers, rsync snapshot backups, and nftables host firewalls.
Scheduled Automation
Cronjobs, anacron, systemd timers, user timers, environment pitfalls, idempotent jobs, logs, locking, and automation troubleshooting.
systemd
systemd units, services, targets, dependencies, timers, journals, cgroups, resource controls, and troubleshooting.
systemd Networking
systemd-networkd, systemd-resolved, .network, .netdev, .link files, network targets, wait-online behavior, and operational troubleshooting.
systemd Socket Activation and Network Services
systemd socket activation, ListenStream, ListenDatagram, Accept modes, service dependencies, restart behavior, resource controls, and network sandboxing.
systems
tabular
tcp
Linux TCP Kernel Tuning
Linux TCP listen queues, SYN backlog, socket buffers, ephemeral ports, TIME_WAIT, keepalives, conntrack pressure, and safe sysctl tuning.
TCP and Sockets
TCP state, sockets, listen queues, receive/send buffers, TIME_WAIT, retransmits, keepalive, ephemeral ports, and Linux observability.
TCP, TLS, and HTTP
Transport and application handshake layers: TCP state, retransmits, flow control, congestion control, TLS SNI/certificates, and HTTP behavior.
tcpdump
tensors
Math for ML
Practical math for machine learning: vectors, matrices, tensors, dot products, cosine similarity, gradients, softmax, cross entropy, KL divergence, probability, and attention intuition.
PyTorch Fundamentals
PyTorch fundamentals for tensors, modules, autograd, losses, optimizers, dataloaders, training loops, mixed precision, and checkpointing.
testing
threads
threats
time
tls
Certificates and HTTPS
TLS certificates, HTTPS handshakes, chains of trust, SANs, SNI, OCSP, CA stores, and Ubuntu certificate operations.
HTTP and Proxy Debugging
Practical HTTP and proxy debugging with curl, headers, redirects, keepalive, chunking, HTTP/2, gRPC, WebSockets, proxy variables, and timeout alignment.
Networking TLS and mTLS Examples
Practical networking examples for certificate inspection and mTLS requests.
TCP, TLS, and HTTP
Transport and application handshake layers: TCP state, retransmits, flow control, congestion control, TLS SNI/certificates, and HTTP behavior.
Zero-Trust Networking
Network security patterns that combine host firewall policy, nftables, mTLS, certificate rotation, SNI, ALPN, service identity, SSH hardening, auditd, SELinux, AppArmor, and real traffic debugging.
tokenizer
tools
Advanced Agents
Advanced agent engineering with ReAct, planning, reflection, tool routing, multi-agent systems, durable execution, memory, sandboxing, approval workflows, trajectory evaluation, and failure recovery.
ML Agents and Tool Use
Agentic ML systems with planning loops, tools, state, memory, idempotency, guardrails, human approval, and evaluation.
tpu
tracing
traffic-management
training
Deep Learning Fundamentals
Deep learning fundamentals: neural networks, layers, activations, embeddings, CNNs, RNNs, transformers, optimizers, normalization, regularization, initialization, and training stability.
LLM Training Lifecycle
LLM lifecycle from pretraining and continued pretraining to SFT, RLHF, DPO, RLAIF, preference data, synthetic data, data filtering, and stage-specific evaluation.
PyTorch Fundamentals
PyTorch fundamentals for tensors, modules, autograd, losses, optimizers, dataloaders, training loops, mixed precision, and checkpointing.
transformers
Advanced ML Architectures
Advanced model architectures including Mixture of Experts, retrieval-augmented models, state space models, diffusion transformers, sparse attention, long-context architectures, and encoder-only, decoder-only, and encoder-decoder tradeoffs.
Long-Context Serving
Long-context LLM serving with RoPE scaling, sliding-window attention, sink tokens, lost-in-the-middle behavior, KV-cache growth, prompt budgeting, and long-context evals.
ML Models, Types, and Weights
Model families, model types, parameters, weights, checkpoints, tokenizers, embeddings, transformers, diffusion models, and inference boundaries.
MoE Inference
Mixture-of-Experts inference with active vs total parameters, expert routing, expert parallelism, all-to-all communication, load balance, capacity, KV cache, and serving tradeoffs.
Transformer Internals
Transformer internals covering tokenization, embeddings, positional encodings, RoPE, self-attention, multi-head attention, MLP blocks, LayerNorm/RMSNorm, decoder-only models, KV cache, context windows, and MoE basics.
troubleshooting
BGP and Dynamic Routing
BGP fundamentals for operators: ASNs, peering, route advertisement, route selection, communities, ECMP, anycast, and failure modes.
Ceph
Ceph distributed storage architecture, RADOS, OSDs, monitors, managers, pools, placement groups, CRUSH, CephFS, RBD, RGW, and operations.
Ceph Operations and Recovery
Ceph health triage, OSD failure response, recovery and backfill, scrub repair, maintenance flags, full ratios, and safe operational runbooks.
Cloud Networking
Cloud networking fundamentals for VPCs, subnets, route tables, security groups, private endpoints, peering, transit gateways, NAT, load balancers, and overlapping CIDRs.
Cross-Layer Incident Runbooks
Operational runbooks that trace application symptoms through DNS, TCP, TLS, proxies, Linux hosts, Kubernetes Services, CNI, NAT, and cloud networking.
Cross-Topic Study Paths
Cross-topic paths for debugging a 504, Pod networking, Linux performance, TLS and mTLS, and cloud egress.
DHCP, Routers, and Switches
DHCP-focused networking guide covering DORA, leases, options, default gateways, DNS options, relay agents, helper addresses, Option 82, VLANs, switches, routers, DHCP snooping, DHCPv6, and troubleshooting.
DNS
Name resolution flow, DNS records, delegation, caching, negative answers, and operational debugging.
DNS Records, Responses, and Transport
DNS record types, response codes, EDNS, UDP and TCP transport, truncation, reverse DNS, mail records, and operational debugging.
DNS Resolution and Caching
How resolver lookup paths, caches, TTLs, negative answers, CNAME chains, and client search behavior shape DNS outcomes.
Datacenter L2/L3 Operations
Datacenter switching and routing concepts for host operators: LACP, MLAG, STP, ARP/NDP instability, EVPN/VXLAN, ECMP, anycast, route dampening, and fabric incident evidence.
Firewalls, iptables, and Netfilter
Linux firewall fundamentals: netfilter hooks, nftables, iptables, chains, tables, conntrack, NAT, rule ordering, logging, persistence, and troubleshooting.
Forward and Reverse Proxies
Forward proxies, reverse proxies, CONNECT tunnels, transparent proxies, TLS interception, proxy headers, egress control, caching, environment variables, and troubleshooting.
Foundational Study Review
A cross-topic checklist for reviewing whether the guide covers the core mental models behind Linux, networking, DNS, Kubernetes, identity, databases, Ceph, Istio, and troubleshooting.
HTTP and Proxy Debugging
Practical HTTP and proxy debugging with curl, headers, redirects, keepalive, chunking, HTTP/2, gRPC, WebSockets, proxy variables, and timeout alignment.
ICMP, MTU, and Path Testing
ICMP, ping, traceroute, path MTU discovery, fragmentation, tunnel overhead, packet loss, jitter, and practical path testing.
IP Addressing and Subnetting
IPv4 and IPv6 addressing, CIDR, subnets, default gateways, private address space, source selection, and route troubleshooting.
IPv6 Operations
Operational IPv6 fundamentals: SLAAC, DHCPv6, Router Advertisements, NDP, link-local addresses, privacy addresses, dual-stack, NAT64, DNS AAAA, and firewalling.
Incident Entry Points
Symptom-first entry points for timeouts, resets, DNS works on the node but not in Pods, large payload hangs, and intermittent 5xx failures.
Istio Observability and Troubleshooting
Istio metrics, access logs, traces, proxy sync, xDS inspection, response flags, common 503 and policy failures, and practical debugging flows.
Kubernetes DNS and CoreDNS
Kubernetes DNS names, CoreDNS, kube-dns compatibility, Pod resolv.conf, dnsPolicy, search paths, ndots, forwarding, and DNS troubleshooting.
Linux
Linux fundamentals for developers and operators: processes, fork/exec, memory, CPU, kernel params, caches, paging, GPUs, and troubleshooting.
Linux Block Devices and Partitioning
Linux block devices, NVMe and SCSI naming, partitions, GPT, UUIDs, labels, udev, stable device paths, and safe disk identification.
Linux GPU Drivers
Linux GPU driver stacks for AMD and NVIDIA, including kernel modules, firmware, Mesa, ROCm, CUDA, nvidia-smi, Secure Boot, containers, and troubleshooting.
Linux Kernel Modules and Devices
Linux kernel modules, built-in drivers, modprobe, module aliases, parameters, blacklists, sysfs, devtmpfs, udev, device nodes, initramfs, Secure Boot, DKMS, and troubleshooting.
Linux Kernel Network Performance
Linux kernel packet processing, NAPI, softirq, NIC queues, RPS, RFS, XPS, offloads, qdisc, drops, and practical network performance troubleshooting.
Linux LVM
Logical Volume Manager fundamentals, advanced operations, resizing, thin provisioning, snapshots, migration, and troubleshooting.
Linux Memory Pressure and OOM
Linux memory troubleshooting for RSS, VSZ, page cache, slab, THP, NUMA, swap, cgroup memory, PSI, and OOM killer behavior.
Linux Network Stack
Linux sockets, routes, neighbor tables, netfilter, conntrack, qdisc, NIC queues, namespaces, and network troubleshooting.
Linux Package and Boot Recovery
Debian and Ubuntu recovery runbooks for broken packages, bad kernels, initramfs failures, GRUB rescue, emergency mode, and rollback.
Linux Performance Triage Runbooks
Symptom-driven Linux runbooks for high CPU, high load, memory pressure, disk latency, softirq saturation, file descriptor exhaustion, and cgroup throttling.
Linux Processes and Threads
Linux processes, threads, tasks, PIDs, TIDs, fork, exec, wait, zombies, signals, scheduling, cgroups, namespaces, and process troubleshooting.
Linux Sockets and IPC
Linux sockets, TCP, UDP, Unix domain sockets, socket files, listen queues, buffers, file descriptors, pipes, signals, shared memory, and IPC troubleshooting.
Linux Storage Health and Performance
Linux storage observability, iostat, SMART, NVMe health, kernel I/O errors, discard, queue depth, latency, saturation, and failure response.
Linux System Call Debugging
Linux syscall debugging with strace, errno, blocking calls, file and socket syscalls, poll, epoll, futexes, and production triage boundaries.
Linux TCP Kernel Tuning
Linux TCP listen queues, SYN backlog, socket buffers, ephemeral ports, TIME_WAIT, keepalives, conntrack pressure, and safe sysctl tuning.
Linux eBPF and Tracing
Practical Linux eBPF tracing with tracepoints, kprobes, uprobes, bpftrace, BCC, tc, XDP, and production-safe observability boundaries.
Load Balancers and Proxies
L4 and L7 load balancing, reverse proxies, health checks, TLS termination, source IP preservation, PROXY protocol, headers, and timeout failures.
Logs and Observability
Linux logs, journald, syslog, dmesg, service status, boot logs, log rotation, metrics, and incident evidence collection.
NAT Gateways and Network Address Translation
Deep operational guide to NAT, NAT gateways, SNAT, DNAT, PAT, masquerade, conntrack, ephemeral ports, cloud egress, hairpin NAT, and NAT troubleshooting.
Network Namespaces and Virtual Networking
Linux network namespaces, veth pairs, bridges, macvlan, ipvlan, tun/tap, VXLAN, overlays, and CNI-style packet paths.
Networking
Practical systems networking: OSI layers, NICs, ARP, IP routing, TCP/UDP, DNS, TLS, and packet debugging.
Packet Capture and Analysis
Practical tcpdump and Wireshark packet analysis for filters, handshakes, retransmits, resets, DNS, TLS SNI, VLAN tags, MTU, and rolling captures.
PostgreSQL Operations, HA, Replication, and Recovery
Operational PostgreSQL guide covering HA managers, failover, physical and logical replication, sharding, backups, restore drills, WAL, high CPU, high RAM, and incident checks.
Request Path
A master walkthrough of one request through DNS, client behavior, TCP, TLS, proxies, Kubernetes Service routing, CNI, Pod, application, database, and the response path.
Resilience, Timeouts, and Draining
Practical network resilience patterns: timeout budgets, retries, backoff, connection pooling, keepalive, idle timeouts, load balancer draining, rolling restarts, DNS TTLs, and zero-downtime deploy behavior.
Scenario Labs
Operational labs for practicing cross-layer debugging with symptoms, evidence, checks, and answers.
Troubleshooting
First checks and failure paths for common Kubernetes incidents.
Troubleshooting Examples
Practical troubleshooting examples for repeatable incident capture and evidence preservation.
Troubleshooting and Error Handling
Cross-technology troubleshooting and error handling for Linux, networking, DNS, Kubernetes, Istio, Ceph, PostgreSQL, and distributed systems operations.
UDP, QUIC, and Connectionless Traffic
UDP behavior, DNS and DHCP traffic, QUIC, connection tracking, timeouts, packet loss, retries, and operational debugging.
VPNs and IPsec Tunnels
VPN fundamentals, IPsec tunnel mode, IKEv2, ESP, NAT-T, traffic selectors, route-based and policy-based tunnels, split tunneling, MTU, and operational troubleshooting.
ext4, XFS, and Filesystem Repair
ext4 and XFS operations, mkfs, fsck, xfs_repair, online growth, inode usage, journals, quotas, discard, and safe repair workflows.
resolv.conf
Linux resolver configuration, nameservers, search domains, ndots, timeouts, attempts, rotation, and systemd-resolved behavior.
systemd
systemd units, services, targets, dependencies, timers, journals, cgroups, resource controls, and troubleshooting.
systemd Networking
systemd-networkd, systemd-resolved, .network, .netdev, .link files, network targets, wait-online behavior, and operational troubleshooting.
systemd Socket Activation and Network Services
systemd socket activation, ListenStream, ListenDatagram, Accept modes, service dependencies, restart behavior, resource controls, and network sandboxing.
ubuntu
Debian and Ubuntu Operations
Ubuntu-first Linux operations: apt, dpkg, systemd, journald, netplan, ufw, CA certificates, logs, services, and package troubleshooting.
Linux Package and Boot Recovery
Debian and Ubuntu recovery runbooks for broken packages, bad kernels, initramfs failures, GRUB rescue, emergency mode, and rollback.
SSH Access
OpenSSH server administration on Ubuntu: sshd config, keys, authentication, host keys, logs, access control, and troubleshooting.
Time, Hostname, and Identity
Time synchronization, timezone, hostname, machine-id, DNS identity, certificates, logs, Kerberos sensitivity, and Ubuntu host identity operations.
Users, Permissions, and sudo
Linux identities, groups, file modes, ownership, sudo, service users, locked accounts, ACLs, and permission troubleshooting on Ubuntu.
udp
upgrades
Istio Zero Downtime Upgrades on Kubernetes
Operational runbook for low-risk Istio upgrades on Kubernetes using canary control planes, revisions, revision tags, workload restarts, gateway canaries, rollback, PDBs, and validation gates.
Kubernetes Storage and Upgrades
Persistent storage, CSI, StatefulSets, volume lifecycle, and safe Kubernetes upgrade practices.
PostgreSQL Zero-Downtime Upgrades on Kubernetes
Zero-downtime and low-downtime PostgreSQL upgrade strategies on Kubernetes, including CloudNativePG rolling updates, major upgrades, logical replication cutovers, PgBouncer draining, and ordinary PostgreSQL runbooks.
virtualization
vision
vlan
vllm
Advanced Inference and vLLM
Advanced LLM inference with vLLM, PagedAttention, continuous batching, disaggregated prefill, chunked prefill, speculative decoding, prefix caching, KV-cache math, tensor and pipeline parallelism, quantized serving, LoRA serving, and autoscaling.
LLM Inference Systems
Production LLM inference systems: model files, weights, memory, inference engines, vLLM, TensorRT-LLM, TGI, llama.cpp, SGLang, Ollama, KV cache, PagedAttention, batching, quantization, routing, performance tests, and runbooks.
ML Serving, Inference, and vLLM
Production ML inference with model servers, batching, streaming, KV cache, autoscaling, canaries, inference optimization, and deep vLLM operations.
vLLM Operations
Operational guide for vLLM serving with scheduler concepts, PagedAttention, KV cache, prefix caching, speculative decoding, parallelism, metrics, flags, and runbooks.
vpn
weights
ML Models, Types, and Weights
Model families, model types, parameters, weights, checkpoints, tokenizers, embeddings, transformers, diffusion models, and inference boundaries.
Model Memory Math
Practical LLM memory math for weights, activations, KV cache, optimizer state, model loading, quantization, model sizes, and serving capacity planning.