Advanced ML Architectures

Architecture choices decide what the model can represent and how expensive it is to train or serve. Advanced architectures usually trade simplicity for scale, context, sparsity, or modality.

Architecture Map

Architecture	Strength	Tradeoff
Encoder-only	Understanding and classification.	Not natural for autoregressive generation.
Decoder-only	Text/code generation and chat.	Prompt/context cost grows with sequence.
Encoder-decoder	Translation and conditional generation.	More serving complexity.
Mixture of Experts	Many parameters with sparse activation.	Routing and load balancing.
Retrieval-augmented models	External memory or evidence.	Retriever and generator coupling.
State space models	Long sequence efficiency.	Ecosystem and task fit.
Diffusion transformers	Generative media.	Sampling cost and safety review.
Sparse attention	Longer context at lower cost.	Kernel and quality tradeoffs.

Long-Context Design

Long-context architectures change where the bottleneck sits. More context can improve recall but also increases prefill latency, KV-cache memory, lost-in-the-middle behavior, and eval cost.

MoE Operations

MoE models need:

expert capacity planning,
router stability checks,
load-balance metrics,
expert-parallel serving support,
per-slice quality evaluation.

Practical Lab: Architecture Decision Record

task:
candidate_architectures:
context_needed:
latency_target:
training_data:
serving_hardware:
eval_slices:
rollback_plan:
chosen_architecture:

Study Cards

Question

What is a Mixture-of-Experts model?

Answer

A model that routes tokens to a subset of expert layers instead of activating all parameters for every token.

Question

Why can long context hurt operations?

Answer

It increases prefill latency, KV-cache memory, cost, and evaluation complexity.

Question

When are encoder-only models useful?

Answer

For understanding tasks such as classification, ranking, extraction, and embeddings.

Advanced ML Architectures

Architecture Map

Long-Context Design

MoE Operations

Practical Lab: Architecture Decision Record

Study Cards

References