Same toy problem: F(x) = x², y = 9.5. We solve it three ways — posterior geometry, Metropolis-Hastings by hand, then live interactive sampling.
We have a nonlinear forward model F(x) = x², one observation y = 9.5, measurement noise σ_ε = 0.5, and a Gaussian prior x_a = 2, σ_a = 2.
From Bayes' theorem:
This is not Gaussian in x because x appears as x² inside the exponential. We cannot evaluate the normalising constant ∫P(x|y)dx analytically. This is exactly where MCMC becomes necessary.
We define the cost function J(x) = −2 log P(x|y), which we minimise in OE but only evaluate in MCMC:
A Markov chain is a sequence of random variables x_0, x_1, x_2, ... where the next state depends only on the current state — not the full history:
The transition kernel T(x_{i+1} | x_i) defines how we move from one state to the next. The magic of MCMC: we design T so that the chain's stationary distribution equals our posterior P(x|y).
The condition that guarantees this is called detailed balance:
We use a symmetric Gaussian proposal: x* ~ N(x_i, σ_prop²). This means q(x*|x_i) = q(x_i|x*), which simplifies the acceptance ratio.
P(x* | y)
α = min(1, ─────────)
P(x | y)
= min(1, exp(−J(x*)/2) / exp(−J(x_i)/2))
= min(1, exp(−[J(x*) − J(x_i)] / 2))
← only need DIFFERENCE of J — Z cancels!
3. Accept/Reject:
u ~ Uniform(0, 1)
if u ≤ α: x_{i+1} = x* (accept)
else: x_{i+1} = x_i (reject — stay put)
Starting at x_0 = 0.0, σ_prop = 1.0. We pre-fix the random draws for reproducibility.
| Diagnostic | What it measures | Good value | Formula / method |
|---|---|---|---|
| Trace plot | Looks like a "hairy caterpillar"? Chain is mixing well. | No drifts or sticking | Plot x_i vs i |
| Acceptance rate | Fraction of proposals accepted | ~23–44% | N_accepted / N_total |
| Autocorrelation τ | How many steps until samples are independent | τ as small as possible | τ = 1 + 2Σ ρ_k |
| Effective sample size | Number of "independent" samples | ESS > 200 | ESS = N / τ |
| Gelman-Rubin R̂ | Multiple chains agree? (between-chain vs within-chain variance) | R̂ < 1.01 | R̂ = √(V̂/W) where V̂ = pooled, W = within-chain |
| Burn-in | Initial transient before chain reaches typical set | Discard first ~20% | Visual inspection of trace plot |
Σ (x_i − x̄)(x_{i+k} − x̄)
ρ_k = ─────────────────────────────
Σ (x_i − x̄)²
── Integrated autocorrelation time ──────────────────────────────
τ = 1 + 2 Σ_{k=1}^{∞} ρ_k ← sum until ρ_k ≈ 0
── Effective sample size ─────────────────────────────────────────
ESS = N / τ ← independent draws equivalent
Run Metropolis-Hastings on our problem F(x) = x², y = 9.5. Adjust the proposal width and see how it affects mixing and the recovered posterior shape.
| Aspect | Optimal Estimation (OE) | MCMC (M-H) |
|---|---|---|
| Output | Single point x̂ + Gaussian Sx | Samples from true P(x|y) |
| Posterior shape | Always Gaussian (assumption) | Any shape — multimodal, skewed |
| Modes found | Only 1 (nearest to start) | All modes (if chain mixes) |
| Speed | ~5 iterations → seconds | 10⁴–10⁶ samples → minutes |
| Nonlinearity | Linearises F(x) — fails badly if curved | Handles full nonlinearity exactly |
| Key formula | Sx = (KᵀSε⁻¹K + Sa⁻¹)⁻¹ | α = min(1, P(x*|y)/P(x|y)) |
| Used in practice for CO₂ | Operational (millions of soundings/day) | Algorithm development, difficult scenes |