Load Balancers and Proxies

Load balancers and proxies are network devices, application infrastructure, and failure domains at the same time. They can fix availability problems, hide backend churn, terminate TLS, change source addresses, retry requests, and introduce their own timeout and protocol behavior.

Command Examples

curl -v https://example.com/
curl -vk --resolve example.com:443:198.51.100.10 https://example.com/
openssl s_client -connect example.com:443 -servername example.com
dig example.com A
ss -tan state established
tcpdump -nn -i any host 198.51.100.10

Example output and meaning:

Command Example output What it does
curl -v https://example.com/ HTTP status, headers, timing, JSON payload, or TLS/proxy error. Separates reachability, TLS, proxy, and application behavior.
curl -vk --resolve example.com:443:198.51.100.10 https://example.com/ HTTP status, headers, timing, JSON payload, or TLS/proxy error. Separates reachability, TLS, proxy, and application behavior.
YAML or text capture Concrete IDs, states, counters, versions, rows, or error strings. Turns the example from a command list into evidence for the next debugging step.

L4 Versus L7

Layer 4 load balancers make decisions mostly from IP addresses, ports, and connection metadata. Layer 7 proxies understand application protocols such as HTTP and can route on Host, path, headers, methods, cookies, or protocol-specific data.

The distinction matters because a TCP connection can succeed while an L7 proxy rejects the request for Host, SNI, HTTP version, header size, body size, authentication, or route configuration.

Health Checks

Health checks are only as good as what they check. A TCP check proves a port accepted a connection. An HTTP check proves one configured path returned an expected result. Neither automatically proves database connectivity, authorization, cache health, or the exact user path.

Common mistakes:

  • check path differs from real traffic path,
  • health check ignores Host or SNI requirements,
  • backend passes shallow checks while dependencies fail,
  • intervals and thresholds create slow failover,
  • all backends fail because the health check itself is broken.

Source IP and Headers

Proxies often replace the original source IP at the backend. Applications may need X-Forwarded-For, Forwarded, or the PROXY protocol to recover client identity. Trust those headers only from known proxies; clients can forge them if the edge does not sanitize input.

TLS Termination and SNI

TLS can terminate at the edge proxy, pass through to backends, or re-encrypt between proxy and backend. SNI lets a proxy choose a certificate or route before HTTP is visible. A mismatch between DNS name, SNI, certificate SAN, and backend Host routing is a common outage pattern.

Timeout Budget

Every hop has timeouts:

  • client connect and read timeout,
  • load balancer idle timeout,
  • proxy upstream timeout,
  • backend server timeout,
  • database or dependency timeout,
  • keepalive reuse limits.

Timeouts should form a deliberate budget. If an outer timeout is shorter than an inner timeout, clients may see failures while backends keep working on abandoned requests.

For deeper zero-downtime behavior, see Resilience, Timeouts, and Draining. The short version is: connection pools, retries, keepalive, load-balancer idle timeout, backend request timeout, and deployment drain timeout must agree.

Draining and Rolling Restarts

A backend leaving service should stop receiving new traffic before it exits. In Kubernetes that usually means readiness fails first, EndpointSlices update, the load balancer or proxy stops choosing the endpoint, and the process continues serving in-flight requests until terminationGracePeriodSeconds.

Common failure modes:

  • readiness stays true until the process exits,
  • preStop sleeps but the app keeps accepting new work,
  • load balancer deregistration delay is longer than pod grace period,
  • clients reuse pooled connections to a backend that is already draining,
  • retries repeat unsafe writes during rollout.

Draining timeline:

sequenceDiagram
  participant Orchestrator
  participant Backend
  participant Service as Service/LB Registry
  participant Proxy
  participant Client

  Orchestrator->>Backend: SIGTERM or drain signal
  Backend->>Backend: Stop accepting new work / fail readiness
  Service->>Proxy: Remove endpoint after propagation delay
  Client->>Proxy: Existing request or reused connection
  Proxy->>Backend: In-flight request completes during grace period
  Backend-->>Proxy: Response
  Backend->>Orchestrator: Exit before grace period ends

Drain budget checklist:

Timer Must Be Coordinated With
Kubernetes terminationGracePeriodSeconds App shutdown time, in-flight request maximum, sidecar drain time.
preStop hook Readiness failure and endpoint propagation delay, not a substitute for app drain logic.
Load balancer deregistration delay Pod grace period and backend keepalive behavior.
Proxy upstream idle timeout Client keepalive and backend server keepalive.
Client retry budget Idempotency and total request deadline.

Troubleshooting Flow

  1. Test DNS, TCP, TLS, and HTTP separately.
  2. Compare direct backend behavior with load-balanced behavior.
  3. Verify health checks use realistic Host, SNI, path, and protocol settings.
  4. Inspect source IP preservation and forwarding headers.
  5. Check idle, connect, read, and upstream timeouts at every hop.
  6. Check whether retries are safe for non-idempotent requests.
  7. Capture on client side, proxy side, and backend side when possible.

Study Cards

Question

What can an L7 proxy route on?

Answer

Application data such as HTTP Host, path, headers, methods, cookies, or protocol-specific fields.

Question

Why are shallow health checks risky?

Answer

They can mark a backend healthy even when dependencies or the real request path are broken.

Question

Why should forwarding headers be trusted only from known proxies?

Answer

Clients can forge headers such as X-Forwarded-For unless the edge proxy sanitizes them.

Question

Why fail readiness before shutdown?

Answer

It gives Services, proxies, and load balancers time to stop sending new requests before the process exits.

References