Rough Volatility & Deep Hedging — The Rough Bergomi Model
Classical stochastic-volatility models — Heston, SABR — drive the variance process with Brownian motion. The data disagree: estimate the Hurst exponent of realised volatility and it comes out near \(H \approx 0.1\), far rougher than the \(H = 0.5\) of Brownian motion. That single empirical fact has a sharp consequence for option markets — the at-the-money implied-volatility skew explodes at short maturities as \(T^{H-1/2}\), a near power-law blow-up that no Brownian-driven model can match, since their short-dated skew flattens to a constant. The rough Bergomi model (Bayer, Friz & Gatheral 2016) drives variance with a fractional kernel and reproduces this term structure with only three parameters. This module simulates rough Bergomi from exact fractional Brownian motion (Davies-Harte circulant embedding), verifies the \(T^{H-1/2}\) skew scaling against a Heston benchmark, calibrates \((H, \eta, \rho)\) by simulation-based gradient descent — there is no closed-form price, so the loss is only available through Monte Carlo — and trains an LSTM deep-hedging agent that, once transaction costs are paid, beats both the Black-Scholes delta and a stochastic-vol model delta. A pure-NumPy truncated path signature supplies an alternative rough-path feature map. Built on Python 3.11+ with numpy, scipy, torch, pandas, duckdb, pydantic v2, plotly, streamlit, and typer; packaged with hatchling and tested with pytest (22 offline, seed-42 tests).
%==========%
I. Interactive Dashboard:
The dashboard below runs entirely in the browser via stlite (Streamlit on WebAssembly — no server). The three rough Bergomi parameters sit on sidebar sliders. The four tabs show a simulated variance/spot ensemble that visibly roughens as \(H \to 0\), the ATM-skew term structure computed by Monte Carlo against a Heston benchmark and the SPX \(T^{H-1/2}\) power law, a hedging-error comparison whose spread opens with transaction costs, and the level-2 path signature of the rough vol path. First load downloads Pyodide and may take 20–40 seconds; subsequent loads are cached.
%==========%
II. Project Layout:
rough-bergomi/
├── pyproject.toml # hatchling build, deps, ruff + pytest
├── .env.example # DB_PATH
├── scripts/download_data.py # SPX skew term structure → DuckDB
├── src/rough_bergomi/
│ ├── fbm/
│ │ ├── davies_harte.py # Exact fBM via Wood-Chan circulant embedding
│ │ ├── cholesky.py # Exact O(n^3) fBM fallback
│ │ ├── volterra.py # Riemann-Liouville variance-matched kernel
│ │ └── hurst.py # Absolute-moment Hurst estimator
│ ├── rbergomi/
│ │ └── model.py # Simulation, MC smile, ATM skew term structure
│ ├── calibration/
│ │ └── calibrate.py # Common-random-number least squares
│ ├── hedging/
│ │ └── deep_hedge.py # LSTM deep hedger + BS/model-delta benchmarks
│ ├── signature/
│ │ └── signature.py # Pure-NumPy truncated path signatures
│ ├── data/ # schemas.py, fetchers.py, store.py (DuckDB)
│ ├── report/plots.py # Plotly figures
│ ├── cli.py # Typer CLI: simulate | skew | calibrate | hedge | dashboard
│ └── app.py # Streamlit server-side dashboard
└── tests/ # test_fbm, test_rbergomi, test_calibration, test_hedging, test_signature
%==========%
III. Data Sources:
The calibration target is the SPX ATM skew term structure: for each listed maturity, the slope of Black-Scholes implied volatility in log-moneyness at the money. Empirically this skew scales roughly as \(T^{-0.4}\) — that is \(T^{H-1/2}\) with \(H \approx 0.1\) — and flattens at longer maturities. Live option chains come from Yahoo Finance (the same feed used by the Heston project); for each near expiry the fetcher fits a local linear slope of implied vol against log-moneyness near the money. When offline the pipeline falls back to a documented synthetic SPX-like power-law target so calibration and the dashboard run without credentials.
# data/fetchers.py
def synthetic_spx_skew(maturities=None, *, a=0.18, H=0.1, noise=0.0, seed=42):
"""Synthetic SPX-like ATM skew term structure skew(T) = a * T^(H - 1/2)."""
if maturities is None:
maturities = np.array([1, 2, 3, 6, 9, 12, 18, 24]) / 12.0
skew = a * maturities ** (H - 0.5)
return pd.DataFrame({"maturity": maturities, "atm_skew": skew})
%==========%
IV. Why Volatility Is Rough:
Take a long series of realised variance, form the increments of its logarithm over a lag \(\Delta\), and measure how their \(q\)-th absolute moment scales: \(\mathbb{E}\,|\log v_{t+\Delta} - \log v_t|^q \propto \Delta^{qH}\). Across essentially every liquid index the estimated \(H\) lands near \(0.1\) — volatility is far rougher than the \(H = 0.5\) of a Brownian-driven model (Gatheral, Jaisson & Rosenbaum 2018). The option-market fingerprint of this roughness is the ATM skew term structure \(\psi(T) = |\partial \sigma_{BS}(k, T)/\partial k|_{k=0}\). Empirically \(\psi(T)\) blows up at the short end like a power law; under any model with a Brownian variance driver \(\psi(T)\) instead converges to a finite constant as \(T \to 0\). Rough Bergomi resolves the gap: a fractional kernel with exponent \(H - 1/2 < 0\) makes \(\psi(T) \sim T^{H-1/2}\), reproducing the observed explosion. The table contrasts the two regimes.
| Empirical fact | Brownian SV (Heston) | Rough Bergomi |
|---|---|---|
| Hurst of log-variance | \(H = 0.5\) by construction | \(H \approx 0.1\) (rough) |
| Short-dated ATM skew | Finite constant as \(T \to 0\) | Blows up as \(T^{H-1/2}\) |
| Parameters to fit the term structure | Needs jumps (Bates) for short end | \(H\) alone sets the exponent |
| Variance autocorrelation | Exponential (Markovian) | Slowly decaying (long memory in increments) |
%==========%
V. Exact Fractional Brownian Motion — Davies-Harte (fbm/davies_harte.py):
Fractional Brownian motion \(B^H\) is the centred Gaussian process with \(\mathbb{E}[B^H_t B^H_s] = \tfrac12(|t|^{2H} + |s|^{2H} - |t-s|^{2H})\); its increments (fractional Gaussian noise, fGN) have autocovariance \(\gamma(k) = \tfrac12(|k-1|^{2H} - 2|k|^{2H} + |k+1|^{2H})\). The Davies-Harte / Wood-Chan method samples \(n\) fGN values exactly in \(O(n\log n)\): embed the \(n\)-point Toeplitz autocovariance into a \(2(n-1)\)-circulant, recover its eigenvalues with one real FFT, draw spectrally, and invert. The subtlety — and the bug that silently biases naive implementations — is the spectral draw: the complex Gaussian weights must satisfy the Hermitian symmetry \(W_{m-k} = \overline{W_k}\) (with the \(k=0\) and \(k=m/2\) modes real) so the inverse FFT is real and carries the exact target covariance. Drop the symmetry and the cross-covariances are under-weighted, biasing \(\mathrm{Var}[B^H_1]\) low for every \(H \neq 1/2\); the test suite pins \(\mathrm{Var}[B^H_t] = t^{2H}\) to catch exactly this.
# davies_harte.py — exact fGN, the Hermitian-symmetric spectral draw
def _wood_chan_fgn(n, H, rng):
m = 2 * (n - 1)
gamma = fgn_autocovariance(n, H)
c = np.concatenate([gamma, gamma[-2:0:-1]]) # circulant first row
lam = np.fft.fft(c).real # eigenvalues (>= 0)
W = np.empty(m, dtype=complex)
W[0] = np.sqrt(lam[0]) * rng.standard_normal() # real DC mode
half = m // 2
W[half] = np.sqrt(lam[half]) * rng.standard_normal()
k = np.arange(1, half)
a, b = rng.standard_normal(half - 1), rng.standard_normal(half - 1)
W[k] = np.sqrt(lam[k] / 2.0) * (a + 1j * b)
W[m - k] = np.sqrt(lam[k] / 2.0) * (a - 1j * b) # Hermitian symmetry => real FFT
return (np.fft.fft(W).real / np.sqrt(m))[:n]
%==========%
VI. The Rough Bergomi Model (rbergomi/model.py):
Under the pricing measure the spot is driven by a variance process that is a deterministic transform of the rough Volterra (Riemann-Liouville) process
\[ Y_t = \sqrt{2H}\int_0^t (t-s)^{H-1/2}\,dW_s, \qquad \mathbb{E}[Y_t^2] = t^{2H}. \]The variance and spot are then
\[ v_t = \xi_0 \exp\!\Big(\eta\, Y_t - \tfrac12 \eta^2 t^{2H}\Big), \qquad \frac{dS_t}{S_t} = \sqrt{v_t}\,dZ_t, \quad Z = \rho W + \sqrt{1-\rho^2}\,W^\perp. \]The martingale correction \(-\tfrac12\eta^2 t^{2H}\) makes \(\mathbb{E}[v_t] = \xi_0\), a flat forward-variance curve, and \(W\) is the same Brownian motion that builds \(Y\), so \(\rho\) is the spot-vol correlation. Three parameters shape the smile: \(H\) (roughness, hence the short-end skew exponent), \(\eta\) (vol-of-vol, the smile amplitude), and \(\rho < 0\) (the downside tilt). The discretisation matters: a naive Riemann sum of the singular kernel under-states the variance by \(\sim 20\%\). The module instead uses variance-matched weights — each cell carries exactly its share of the kernel's \(L^2\) energy — so \(\mathrm{Var}[Y_{t_i}] = t_i^{2H}\) holds exactly on the grid while the convolution structure (and the driving noise for the spot correlation) is preserved.
# volterra.py — variance-matched kernel: Var[Y_{t_i}] = t_i^{2H} exactly
m = np.arange(1, n + 1)
g = np.sqrt((m * dt) ** (2.0 * H) - ((m - 1) * dt) ** (2.0 * H))
xi = rng.standard_normal((n_paths, n)) # standardised increments
for p in range(n_paths):
Y[p, 1:] = np.convolve(xi[p], g)[:n] # causal convolution against xi
dW = xi * np.sqrt(dt) # same noise drives the correlated spot
# model.py — variance and correlated spot
var_Y = t ** (2.0 * H)
v = params.xi0 * np.exp(params.eta * Y - 0.5 * params.eta ** 2 * var_Y)
dZ = rho * dW + np.sqrt(1.0 - rho ** 2) * dB
log_incr = -0.5 * v[:, :-1] * dt + np.sqrt(np.maximum(v[:, :-1], 0.0)) * dZ
%==========%
VII. The Skew Term Structure — rBergomi vs Heston:
There is no transform for rough Bergomi option prices, so the smile is built by Monte Carlo: simulate terminal spots, average call/put payoffs across a log-moneyness grid, and invert Black-Scholes to implied vol (puts via parity below the forward to stay in the well-conditioned region). The ATM skew is then a centred difference \(|\sigma(h) - \sigma(-h)|/2h\) at \(k = 0\). Sweeping maturities and fitting \(\log\psi\) against \(\log T\) recovers a slope of about \(-0.43\) at \(H = 0.1\) — the theoretical \(H - 1/2 = -0.40\), within Monte-Carlo error — while the Heston benchmark's skew is bounded as \(T \to 0\). Common random numbers across maturities keep the curve smooth.
# model.py
def atm_skew(params, T, *, h=0.01, n_steps=120, n_paths=60000, rng=None):
"""ATM skew |d sigma_BS / d k| at k = 0 via a centred difference."""
ivs = mc_smile(params, T, np.array([-h, 0.0, h]),
n_steps=n_steps, n_paths=n_paths, rng=rng)
return float(abs((ivs[2] - ivs[0]) / (2.0 * h)))
def power_law_exponent(maturities, skews):
"""Fit skew ~ T^a; a should sit near H - 1/2."""
good = (skews > 0) & np.isfinite(skews)
return float(np.polyfit(np.log(maturities[good]), np.log(skews[good]), 1)[0])
%==========%
VIII. Simulation-Based Calibration (calibration/calibrate.py):
Because option prices are only available through simulation, the calibration objective — the distance between model and target skew term structures — is a Monte-Carlo estimate, and this forces two design choices that distinguish rough-vol calibration from Heston's. First, common random numbers: the RNG seed is fixed so the same Brownian paths are reused at every parameter evaluation. Without CRN the objective is a noisy surface the optimiser cannot descend; with CRN it is smooth in \((H, \eta, \rho)\) and finite-difference gradients are meaningful. Second, derivative-free / finite-difference optimisation: no analytic gradient exists, so the fit is a bounded least-squares with finite-difference Jacobians evaluated under CRN. The forward-variance level \(\xi_0\) is pinned separately, so the three smile-shape parameters are identified by the skew term structure alone.
# calibrate.py
def residuals(x): # x = (H, eta, rho)
model = _model_skews(x, maturities, xi0, n_paths, n_steps_per_year, seed)
return model - target
res = least_squares(residuals, np.array(x0), bounds=(lower, upper),
diff_step=np.array([0.02, 0.05, 0.05]), max_nfev=max_nfev)
def _model_skews(x, maturities, xi0, n_paths, n_steps_per_year, seed):
params = RBergomiParams(H=x[0], eta=x[1], rho=x[2], xi0=xi0)
out = np.empty(len(maturities))
for i, T in enumerate(maturities):
rng = np.random.default_rng(seed) # CRN: identical paths every eval
out[i] = atm_skew(params, float(T), n_paths=n_paths, rng=rng)
return out
%==========%
IX. Deep Hedging (hedging/deep_hedge.py):
We sell one at-the-money European call and hedge it by trading the underlying on a discrete grid, paying proportional transaction costs on each rebalance. Under the deep-hedging paradigm (Buehler et al. 2019) no pricing model is assumed for the hedge: an LSTM maps the observable history — log-moneyness, time-to-maturity, and a causal realised-vol proxy — to the hedge ratio, trained to minimise the variance of the terminal P&L net of costs:
\[ \mathrm{PnL} = p_0 + \sum_t \delta_t (S_{t+1} - S_t) - (S_T - K)^+ - c\sum_t |\delta_t - \delta_{t-1}|\,S_t. \]The friction is the crux. With no costs and frequent rebalancing the constant-vol Black-Scholes delta is already near-optimal and hard to beat — a useful sanity check, not a weakness. Add realistic costs and the picture inverts: a delta hedge churns the position on every rough, autocorrelated vol move and bleeds turnover, whereas the recurrent deep hedger learns to under-react to transient vol and economise on trading. The recurrent state is what lets it exploit volatility persistence a Markovian delta ignores.
# deep_hedge.py — LSTM hedger and the risk objective
class LSTMHedger(torch.nn.Module):
def __init__(self, n_in=3, hidden=32):
super().__init__()
self.lstm = torch.nn.LSTM(n_in, hidden, batch_first=True)
self.head = torch.nn.Linear(hidden, 1)
def forward(self, x):
h, _ = self.lstm(x)
return torch.sigmoid(self.head(h)).squeeze(-1) # delta in (0, 1)
for _ in range(epochs):
deltas = model(feat_t) # (n_paths, n_steps)
turns = torch.abs(torch.diff(torch.cat([zeros, deltas], dim=1), dim=1))
tc = cost * (turns * S_dec_t).sum(dim=1)
pnl = premium + (deltas * dS_t).sum(dim=1) - payoff_t - tc
loss = pnl.var() # minimise hedging risk
loss.backward(); opt.step()
%==========%
X. Hedging Results — Deep vs BS vs Model Delta:
On a held-out set of rough Bergomi paths with 50 bps proportional costs, the trained deep hedge attains the lowest net-P&L variance and the lowest tail risk; the constant-vol Black-Scholes delta is intermediate; and the instantaneous-vol “model delta” — which reacts to every variance move — trades the most and fares worst. The ordering reverses the frictionless case and is the practical argument for learned hedging under rough volatility: the gains come not from predicting the smooth Brownian world better, but from managing turnover in the rough one.
| Strategy | What it trades on | Net P&L variance | 95% CVaR |
|---|---|---|---|
| Deep hedge (LSTM) | Learned function of observable history | Lowest | Lowest |
| BS delta | Constant vol \(\sqrt{\xi_0}\) | Intermediate | Intermediate |
| Model delta | Instantaneous vol \(\sqrt{v_t}\) | Highest (over-trades) | Highest |
The result is reproducible from the package: rbergomi hedge --maturity 0.25 trains the agent and prints the variance and CVaR table; the deterministic test test_deep_hedge_lowers_variance_under_costs asserts the deep hedge beats the model delta and matches or beats the BS delta.
%==========%
XI. Path Signatures as Features (signature/signature.py):
The signature of a path \(X:[0,T]\to\mathbb{R}^d\) is the sequence of iterated integrals \(S(X) = (1, \{\int dX^i\}, \{\int\!\int dX^i dX^j\}, \dots)\), an element of the tensor algebra that characterises the path up to tree-like equivalence. Its universal approximation property (Levin, Lyons & Ni 2013) is the reason it is the natural feature map for rough paths: any continuous functional of the path is approximated arbitrarily well by a linear function of a high-enough truncation of the signature — so a signature front-end lets a simple head capture the nonlinear dependence of the optimal hedge on the rough vol path. The module computes depth-\(m\) truncated signatures in pure NumPy by Chen's identity (each straight segment's signature is a tensor exponential; consecutive segments combine by the tensor product), avoiding the signatory C++ dependency. Two identities anchor the tests: level 1 equals the total increment, and the antisymmetric part of level 2 is the signed Lévy area.
# signature.py — Chen's identity, depth-m truncated signature
def path_signature(path, depth=3):
diffs = np.diff(np.asarray(path, float), axis=0)
sig = _tensor_exp_increment(diffs[0], depth) # exp of first increment
for i in range(1, diffs.shape[0]):
seg = _tensor_exp_increment(diffs[i], depth)
sig = _chen_product(sig, seg, depth) # tensor (Chen) product
return sig # [level1, level2, ..., level_m]
%==========%
XII. CLI — cli.py:
Five subcommands cover simulation through hedging, all sharing the DuckDB store.
# Install
pip install -e ".[dev]"
# Simulate paths and re-estimate the Hurst exponent from a variance path
rbergomi simulate --h 0.1 --eta 1.9 --rho -0.9
# ATM skew term structure and its fitted power-law exponent (~ H - 1/2)
rbergomi skew --h 0.1
# Calibrate (H, eta, rho) to a synthetic SPX skew term structure under CRN
rbergomi calibrate
# Train the LSTM deep hedger; print variance and CVaR vs BS / model delta
rbergomi hedge --maturity 0.25
# Launch the Streamlit server-side dashboard
rbergomi dashboard
| Command | Key options | Output |
|---|---|---|
rbergomi simulate | --h, --eta, --rho, --xi0 | Martingale & variance checks, re-estimated \(H\) |
rbergomi skew | --h, --eta, --rho | Skew by maturity, rBergomi vs Heston, power-law exponent |
rbergomi calibrate | --n-paths, --db | Fitted \((H, \eta, \rho)\), RMSE; stored to DuckDB |
rbergomi hedge | --maturity, --n-paths, --epochs | Net-P&L variance & CVaR: deep vs BS vs model delta |
rbergomi dashboard | — | Launches streamlit run app.py |
%==========%
XIII. Test Suite:
All 22 tests are offline and deterministic (seed 42). The fBM tests pin the defining identities — \(\mathrm{Var}[B^H_t] = t^{2H}\) across \(H \in \{0.1, 0.3, 0.5\}\), the empirical fGN autocovariance against \(\gamma(k)\), Davies-Harte and Cholesky sharing the same variance law, the Volterra process's \(t^{2H}\) variance, and Hurst recovery from a long path. The rough Bergomi tests assert the discounted-spot martingale and \(\mathbb{E}[v_t] = \xi_0\), the downside skew (\(\rho < 0 \Rightarrow\) OTM-put vol exceeds OTM-call vol), the \(T^{H-1/2}\) skew scaling via the fitted exponent, that lower \(H\) is rougher (steeper short-end skew), and that the Heston benchmark stays bounded. Calibration is checked to reduce the objective below the initial guess under CRN; the deep hedge is checked to lower net-P&L variance under costs; and the signature tests verify the level-1 increment identity, the level-2 shuffle (square) identity, and a nonzero Lévy area.
# test_rbergomi.py — the rough-volatility signature
def test_atm_skew_scales_like_T_pow_H_minus_half(params):
mats = np.array([1, 2, 3, 6, 12, 24]) / 12.0
mats, skews = atm_skew_term_structure(params, mats, n_paths=50000)
a = power_law_exponent(mats, skews)
assert a == pytest.approx(params.H - 0.5, abs=0.12) # ~ -0.40 at H = 0.1
assert skews[0] > 2.0 * skews[-1] # short end far steeper