Tech Study Guide
Math for ML
Practical math for machine learning: vectors, matrices, tensors, dot products, cosine similarity, gradients, softmax, cross entropy, KL divergence, probability, and attention intuition.
Math for ML
ML math is mostly about representing data as numbers, measuring error, and updating parameters to reduce that error. You do not need to derive every theorem to operate ML systems, but you do need to recognize what the math controls.
Command Examples
import numpy as np
a = np.array([1.0, 2.0, 3.0])
b = np.array([1.0, 0.0, 1.0])
print(float(a @ b))
print(float((a @ b) / (np.linalg.norm(a) * np.linalg.norm(b))))
Example output and meaning:
| Command | Example output | What it does |
|---|---|---|
Python example |
A numeric score, tensor shape, token IDs, retrieved IDs, or explicit error. |
Shows the example produces measurable output instead of silent success. |
This computes a dot product and cosine similarity, the same basic shape behind linear models and embedding search.
Core Objects
| Object | Meaning |
|---|---|
| Scalar | One number. |
| Vector | Ordered list of numbers. |
| Matrix | 2D table of numbers. |
| Tensor | N-dimensional numeric array. |
| Dot product | Weighted similarity or projection. |
| Norm | Size or length of a vector. |
| Gradient | Direction that changes a function fastest. |
Probability and Scores
| Concept | Why It Matters |
|---|---|
| Probability | Represents uncertainty or expected frequency. |
| Logit | Raw model score before probability normalization. |
| Softmax | Converts logits into a probability distribution. |
| Cross entropy | Penalizes confident wrong predictions. |
| KL divergence | Measures distribution mismatch. |
| Calibration | Checks whether predicted confidence matches observed correctness. |
Gradient Descent
Training repeatedly:
- runs the model,
- computes a loss,
- computes gradients,
- updates parameters opposite the gradient,
- evaluates on held-out data.
The learning rate controls update size. Too small is slow; too large can diverge.
Attention Intuition
Attention is weighted retrieval inside a model layer. Queries ask what a token needs, keys describe what other tokens offer, and values carry the information that gets mixed.
| Term | Operational Meaning |
|---|---|
| Query | What this position is looking for. |
| Key | What another position can match on. |
| Value | Information copied or mixed if matched. |
| Attention weights | Normalized scores over positions. |
Practical Lab: Cosine Similarity
import numpy as np
docs = {
"postgres": np.array([0.9, 0.1, 0.0]),
"kubernetes": np.array([0.1, 0.8, 0.1]),
"gpu": np.array([0.0, 0.2, 0.9]),
}
query = np.array([0.8, 0.2, 0.0])
def cosine(a, b):
return float((a @ b) / (np.linalg.norm(a) * np.linalg.norm(b)))
print(sorted((cosine(query, v), k) for k, v in docs.items()), reverse=True)
This toy example shows why embeddings and distance metrics are system contracts.
Study Cards
What does a dot product measure in ML?
A weighted alignment between two vectors, often used for scoring or similarity.
What is a gradient?
A direction and magnitude showing how changing parameters changes the loss.
Why does softmax matter?
It turns raw logits into a probability distribution over choices.