Math for ML

ML math is mostly about representing data as numbers, measuring error, and updating parameters to reduce that error. You do not need to derive every theorem to operate ML systems, but you do need to recognize what the math controls.

Command Examples

import numpy as np

a = np.array([1.0, 2.0, 3.0])
b = np.array([1.0, 0.0, 1.0])
print(float(a @ b))
print(float((a @ b) / (np.linalg.norm(a) * np.linalg.norm(b))))

Example output and meaning:

Command	Example output	What it does
`Python example`	`A numeric score, tensor shape, token IDs, retrieved IDs, or explicit error.`	Shows the example produces measurable output instead of silent success.

This computes a dot product and cosine similarity, the same basic shape behind linear models and embedding search.

Core Objects

Object	Meaning
Scalar	One number.
Vector	Ordered list of numbers.
Matrix	2D table of numbers.
Tensor	N-dimensional numeric array.
Dot product	Weighted similarity or projection.
Norm	Size or length of a vector.
Gradient	Direction that changes a function fastest.

Probability and Scores

Concept	Why It Matters
Probability	Represents uncertainty or expected frequency.
Logit	Raw model score before probability normalization.
Softmax	Converts logits into a probability distribution.
Cross entropy	Penalizes confident wrong predictions.
KL divergence	Measures distribution mismatch.
Calibration	Checks whether predicted confidence matches observed correctness.

Gradient Descent

Training repeatedly:

runs the model,
computes a loss,
computes gradients,
updates parameters opposite the gradient,
evaluates on held-out data.

The learning rate controls update size. Too small is slow; too large can diverge.

Attention Intuition

Attention is weighted retrieval inside a model layer. Queries ask what a token needs, keys describe what other tokens offer, and values carry the information that gets mixed.

Term	Operational Meaning
Query	What this position is looking for.
Key	What another position can match on.
Value	Information copied or mixed if matched.
Attention weights	Normalized scores over positions.

Practical Lab: Cosine Similarity

import numpy as np

docs = {
    "postgres": np.array([0.9, 0.1, 0.0]),
    "kubernetes": np.array([0.1, 0.8, 0.1]),
    "gpu": np.array([0.0, 0.2, 0.9]),
}
query = np.array([0.8, 0.2, 0.0])

def cosine(a, b):
    return float((a @ b) / (np.linalg.norm(a) * np.linalg.norm(b)))

print(sorted((cosine(query, v), k) for k, v in docs.items()), reverse=True)

This toy example shows why embeddings and distance metrics are system contracts.

Study Cards

Question

What does a dot product measure in ML?

Answer

A weighted alignment between two vectors, often used for scoring or similarity.

Question

What is a gradient?

Answer

A direction and magnitude showing how changing parameters changes the loss.

Question

Why does softmax matter?

Answer

It turns raw logits into a probability distribution over choices.

Math for ML

Command Examples

Core Objects

Probability and Scores

Gradient Descent

Attention Intuition

Practical Lab: Cosine Similarity

Study Cards

References