Machine Learning

Machine learning systems turn data, model architectures, weights, training objectives, hardware, evaluation, and deployment constraints into predictions or generated outputs. The engineering challenge is not only “train a model”; it is choosing the right model type, managing data and weights, measuring behavior, operating accelerators, adapting models safely, and explaining failures.

Critical Subtopics

Topic Why It Matters
ML 101 Foundations Covers what ML is, when not to use it, datasets, features, labels, splits, training, inference, metrics, overfitting, and bias/variance.
Math for ML Covers vectors, matrices, tensors, dot products, cosine similarity, gradients, probability, softmax, cross entropy, KL divergence, and attention intuition.
Classical ML Covers linear/logistic regression, trees, random forests, gradient boosting, SVMs, k-means, feature engineering, and tabular workflows.
Deep Learning Fundamentals Covers neural networks, activations, embeddings, CNNs, RNNs, transformers, optimizers, normalization, regularization, and training stability.
Models, Types, and Weights Covers supervised, unsupervised, self-supervised, generative, discriminative, transformer, embedding, and diffusion model shapes plus what weights represent.
Transformer Internals Covers tokenization, embeddings, positional encodings, RoPE, self-attention, MLP blocks, normalization, decoder-only models, KV cache, context windows, and MoE basics.
LLM Training Lifecycle Covers pretraining, continued pretraining, SFT, RLHF, DPO, RLAIF, preference data, synthetic data, filtering, and stage-specific evaluation.
Accelerators: GPU and TPU Covers GPU and TPU execution models, memory, batching, precision, utilization, and accelerator troubleshooting.
PyTorch Fundamentals Covers tensors, modules, autograd, losses, optimizers, dataloaders, training loops, mixed precision, and checkpointing.
Fine-Tuning and LoRA Covers full fine-tuning, parameter-efficient tuning, LoRA adapters, data preparation, evaluation, and rollback.
Advanced Fine-Tuning Covers QLoRA, adapter composition, multi-adapter serving, dataset mixing, packing, long-context tuning, forgetting, and safety-preserving releases.
Retrieval-Augmented Generation Covers embeddings, chunking, vector indexes, retrieval, reranking, context assembly, citations, and freshness boundaries.
Advanced RAG Covers hybrid retrieval, query planning, multi-hop retrieval, rerankers, GraphRAG, context compression, citation verification, agentic RAG, retrieval evals, and secure RAG.
Agents and Tool Use Covers planning loops, tool calling, memory, state, guardrails, idempotency, and agent evaluation.
Advanced Agents Covers ReAct, planning, reflection, tool routing, multi-agent systems, durable execution, memory, sandboxing, approvals, trajectory evaluation, and failure recovery.
Multimodal ML Covers vision, audio, speech-to-text, text-to-speech, vision-language models, image embeddings, OCR pipelines, multimodal RAG, and video basics.
Serving, Inference, and vLLM Covers model serving, prefill/decode, batching, streaming, KV cache, vLLM, PagedAttention, prefix caching, speculative decoding, canaries, and inference optimization.
LLM Inference Systems Covers model files, weights, memory, inference engines, vLLM, TensorRT-LLM, TGI, llama.cpp, SGLang, Ollama, KV cache, PagedAttention, quantization, routing, performance tests, and runbooks.
Model Memory Math Covers weight memory, KV-cache sizing, MHA/GQA/MQA differences, model loading, cold start, and serving capacity worksheets.
Tokenizer and Chat Template Compatibility Covers tokenizer IDs, chat templates, special tokens, stop behavior, tool schemas, and migration debugging.
Inference Benchmarking Covers benchmark design, warmup, traffic mixes, TTFT, ITL, throughput, cost, quality gates, and anti-patterns.
Quantized Serving Covers AWQ, GPTQ, bitsandbytes, GGUF, FP8, INT8/INT4, KV-cache quantization, quality gates, and rollout checks.
Inference Engine Comparison Covers vLLM, TensorRT-LLM, TGI, llama.cpp, SGLang, Ollama, feature matrices, and migration checks.
vLLM Operations Covers vLLM scheduler concepts, flags, metrics, PagedAttention, prefix caching, speculative decoding, parallelism, and incidents.
MoE Inference Covers active vs total parameters, expert routing, expert parallelism, all-to-all communication, load balance, and serving tradeoffs.
Long-Context Serving Covers RoPE scaling, sliding windows, sink tokens, lost-in-the-middle behavior, prompt budgets, KV-cache growth, and evals.
Inference Runbooks Covers wrong output, high TTFT, slow ITL, OOM, queue buildup, low throughput, cache pressure, and release regressions.
Advanced Inference and vLLM Covers PagedAttention, continuous batching, disaggregated prefill, chunked prefill, speculative decoding, prefix caching, KV-cache math, parallelism, quantized serving, LoRA serving, and autoscaling.
Observability and Incident Response Covers serving metrics, traces, prompt and retrieval logging, drift, quality monitoring, feedback loops, and ML incident runbooks.
Advanced ML Observability Covers drift detection, data quality monitoring, online evals, feedback loops, trace schemas, prompt/retrieval/model/tool observability, safety monitoring, cost monitoring, and incident templates.
Data Pipelines and Feature Stores Covers dataset versioning, lineage, data contracts, feature stores, point-in-time correctness, train/serve skew, and privacy gates.
MLOps Systems Covers model registries, feature stores, training pipelines, artifact versioning, reproducibility, batch/online inference, canaries, shadow evaluation, rollback, and cost governance.
Evaluation and CI/CD Covers eval harnesses, golden sets, regression gates, human review, release scorecards, canaries, and behavioral CI/CD.
Evaluation Mastery Covers unit evals, golden sets, model-graded evals, human evals, preference evals, safety evals, regression gates, slice metrics, statistical confidence, and contamination controls.
Security and Privacy Covers prompt injection, data exfiltration, model supply chain, poisoned retrieval, tenant isolation, PII, retention, and audit logging.
ML Security Threats Covers prompt injection, jailbreaks, data poisoning, model extraction, membership inference, training data leakage, secrets in prompts, supply chain risk, tenant isolation, and secure RAG.
Prompt Operations Covers prompt templates, versioning, structured outputs, prompt injection resistance, evals, generation config, and rollback.
Advanced ML Architectures Covers Mixture of Experts, retrieval-augmented models, state space models, diffusion transformers, sparse attention, long-context architectures, and encoder/decoder tradeoffs.
ML Performance Engineering Covers training-loop profiling, GPU utilization, memory bandwidth, activation checkpointing, FlashAttention, fused kernels, distributed bottlenecks, inference throughput tuning, and cost/performance tradeoffs.
Alignment and Evaluation Covers preference tuning, RLHF/DPO concepts, safety policy, eval sets, red teaming, and regression gates.
Explainability Covers feature attribution, saliency, counterfactuals, interpretable models, LIME/SHAP, attention caveats, and model cards.
Responsible AI and Governance Covers model cards, dataset cards, risk classification, human oversight, audit trails, fairness checks, red-team programs, release approval, incidents, and regulatory documentation.

Core Mental Model

An ML system has several separable layers:

Layer Key Question
Data What examples, labels, documents, features, and feedback does the system learn from or retrieve?
Model What architecture maps inputs to outputs?
Weights What learned parameters encode behavior after training or adaptation?
Objective What loss, reward, or preference signal is optimized?
Hardware What accelerator, memory, precision, and parallelism constraints shape runtime?
Evaluation What tests prove the system works and stays inside boundaries?
Product loop How are user feedback, drift, monitoring, and rollback handled?

Separate these layers during design reviews and incidents. A bad answer might be a retrieval failure, model limitation, prompt issue, data drift, unsafe tool action, bad fine-tune, accelerator pressure, or evaluation gap.

ML 100 Gap Closure Matrix

The ML section tracks 100 concrete gaps that should be understood before operating ML systems in production. The identifiers below are implemented in the topic pages so the overview stays navigational while the details live with the relevant material.

Range Topic Page Gap Area
ML-GAP-001 through ML-GAP-012 Models, Types, and Weights Data splits, label quality, leakage, objectives, optimizers, decoding, tokenization, embeddings, calibration, distillation, quantization, and model cards.
ML-GAP-013 through ML-GAP-023 Accelerators: GPU and TPU Memory math, tensor cores, mixed precision, checkpointing, transfer bottlenecks, batches, distributed topology, collectives, fragmentation, profiling, and serving utilization.
ML-GAP-024 through ML-GAP-035 PyTorch Fundamentals Datasets, reproducibility, autograd lifetime, accumulation, clipping, optimizer state, schedules, AMP, DDP, resume state, compile/export, and evaluation-mode hazards.
ML-GAP-036 through ML-GAP-047 Fine-Tuning and LoRA SFT formats, chat templates, LoRA modules, rank, QLoRA, packing, forgetting, overfit, merge risk, eval contamination, safety regression, and rollback compatibility.
ML-GAP-048 through ML-GAP-060 Retrieval-Augmented Generation Ingestion, chunk boundaries, metadata, hybrid search, query rewriting, rerankers, ACLs, freshness, citations, context budget, hallucination triage, observability, and cost/latency.
ML-GAP-061 through ML-GAP-072 Agents and Tool Use Tool contracts, authorization, idempotency, state machines, approvals, sandboxes, retries, timeouts, parallel calls, memory poisoning, trajectory evals, and replay.
ML-GAP-073 through ML-GAP-086 Alignment and Evaluation Eval governance, golden sets, metric slices, release gates, preference labels, RLHF/DPO boundaries, red-team taxonomies, severity rubrics, abstention, fairness, monitoring, drift, canaries, and incidents.
ML-GAP-087 through ML-GAP-100 Explainability Method selection, SHAP baselines, LIME perturbations, counterfactuals, saliency, attention caveats, embedding neighborhoods, probes, attribution stability, debugging, user risks, documentation, explanation drift, and evidence.

Command Examples

python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
nvidia-smi
python -c "import torch; print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'cpu')"

Example output and meaning:

Command Example output What it does
Python snippet A version, tensor shape, score, retrieved IDs, metric delta, or explicit error. Turns the example into a measurable model, data, or pipeline signal.
nvidia-smi GPU utilization, memory use, CUDA visibility, model list, or serving metrics. Separates accelerator visibility from model-serving capacity and latency.
Python snippet A version, tensor shape, score, retrieved IDs, metric delta, or explicit error. Turns the example into a measurable model, data, or pipeline signal.

These checks prove local PyTorch import, CUDA visibility, and GPU identity. They do not prove model correctness.

Study Path

Stage Pages
101 ML 101 Foundations, Math for ML, Classical ML.
Core Deep Learning Fundamentals, Models/Weights, PyTorch, Transformer Internals, Accelerators.
LLM LLM Training Lifecycle, Fine-Tuning and LoRA, Advanced Fine-Tuning, RAG, Advanced RAG, Agents, Advanced Agents, Multimodal ML.
Production Serving and vLLM, LLM Inference Systems, Model Memory Math, Tokenizer and Chat Template Compatibility, Inference Benchmarking, Quantized Serving, Inference Engine Comparison, vLLM Operations, MoE Inference, Long-Context Serving, Inference Runbooks, Advanced Inference and vLLM, Observability, Advanced Observability, Data Pipelines and Feature Stores, MLOps Systems, Prompt Operations.
Advanced Evaluation Mastery, ML Security Threats, Advanced Architectures, Performance Engineering, Alignment, Explainability, Responsible AI and Governance.

Learn the stages in order, but use production gates early. Data leakage, weak evals, unsafe prompts, or unbounded serving cost can invalidate a model regardless of how advanced the architecture is.

Study Cards

Question

What are model weights?

Answer

Learned numeric parameters that encode how a model maps inputs to outputs after training or adaptation.

Question

Why separate data, model, weights, objective, hardware, and evaluation?

Answer

Each layer can cause different failures and needs different debugging evidence.

Question

When is RAG preferable to fine-tuning?

Answer

When answers need fresh, inspectable, source-grounded knowledge rather than changed model behavior.

Question

Why are agents harder to evaluate than single model calls?

Answer

They include state, tool effects, multi-step decisions, and failure recovery paths.

Question

Why separate prefill and decode in LLM inference?

Answer

Prefill and decode have different latency, memory, batching, and KV-cache behavior.

Question

Why treat prompts as production code?

Answer

Prompts control task behavior, safety boundaries, tool use, output format, and retrieval usage.

Ml Deck

105 cards

References