Rough Volatility & Deep Hedging — The Rough Bergomi Model

Classical stochastic-volatility models — Heston, SABR — drive the variance process with Brownian motion. The data disagree: estimate the Hurst exponent of realised volatility and it comes out near \(H \approx 0.1\), far rougher than the \(H = 0.5\) of Brownian motion. That single empirical fact has a sharp consequence for option markets — the at-the-money implied-volatility skew explodes at short maturities as \(T^{H-1/2}\), a near power-law blow-up that no Brownian-driven model can match, since their short-dated skew flattens to a constant. The rough Bergomi model (Bayer, Friz & Gatheral 2016) drives variance with a fractional kernel and reproduces this term structure with only three parameters. This module simulates rough Bergomi from exact fractional Brownian motion (Davies-Harte circulant embedding), verifies the \(T^{H-1/2}\) skew scaling against a Heston benchmark, calibrates \((H, \eta, \rho)\) by simulation-based gradient descent — there is no closed-form price, so the loss is only available through Monte Carlo — and trains an LSTM deep-hedging agent that, once transaction costs are paid, beats both the Black-Scholes delta and a stochastic-vol model delta. A pure-NumPy truncated path signature supplies an alternative rough-path feature map. Built on Python 3.11+ with numpy, scipy, torch, pandas, duckdb, pydantic v2, plotly, streamlit, and typer; packaged with hatchling and tested with pytest (22 offline, seed-42 tests).


%==========%


I. Interactive Dashboard:

The dashboard below runs entirely in the browser via stlite (Streamlit on WebAssembly — no server). The three rough Bergomi parameters sit on sidebar sliders. The four tabs show a simulated variance/spot ensemble that visibly roughens as \(H \to 0\), the ATM-skew term structure computed by Monte Carlo against a Heston benchmark and the SPX \(T^{H-1/2}\) power law, a hedging-error comparison whose spread opens with transaction costs, and the level-2 path signature of the rough vol path. First load downloads Pyodide and may take 20–40 seconds; subsequent loads are cached.


%==========%


II. Project Layout:

rough-bergomi/
├── pyproject.toml                              # hatchling build, deps, ruff + pytest
├── .env.example                                # DB_PATH
├── scripts/download_data.py                    # SPX skew term structure → DuckDB
├── src/rough_bergomi/
│   ├── fbm/
│   │   ├── davies_harte.py                     # Exact fBM via Wood-Chan circulant embedding
│   │   ├── cholesky.py                         # Exact O(n^3) fBM fallback
│   │   ├── volterra.py                         # Riemann-Liouville variance-matched kernel
│   │   └── hurst.py                            # Absolute-moment Hurst estimator
│   ├── rbergomi/
│   │   └── model.py                            # Simulation, MC smile, ATM skew term structure
│   ├── calibration/
│   │   └── calibrate.py                        # Common-random-number least squares
│   ├── hedging/
│   │   └── deep_hedge.py                        # LSTM deep hedger + BS/model-delta benchmarks
│   ├── signature/
│   │   └── signature.py                        # Pure-NumPy truncated path signatures
│   ├── data/                                   # schemas.py, fetchers.py, store.py (DuckDB)
│   ├── report/plots.py                         # Plotly figures
│   ├── cli.py                                  # Typer CLI: simulate | skew | calibrate | hedge | dashboard
│   └── app.py                                  # Streamlit server-side dashboard
└── tests/                                      # test_fbm, test_rbergomi, test_calibration, test_hedging, test_signature
  

%==========%


III. Data Sources:

The calibration target is the SPX ATM skew term structure: for each listed maturity, the slope of Black-Scholes implied volatility in log-moneyness at the money. Empirically this skew scales roughly as \(T^{-0.4}\) — that is \(T^{H-1/2}\) with \(H \approx 0.1\) — and flattens at longer maturities. Live option chains come from Yahoo Finance (the same feed used by the Heston project); for each near expiry the fetcher fits a local linear slope of implied vol against log-moneyness near the money. When offline the pipeline falls back to a documented synthetic SPX-like power-law target so calibration and the dashboard run without credentials.


# data/fetchers.py
def synthetic_spx_skew(maturities=None, *, a=0.18, H=0.1, noise=0.0, seed=42):
    """Synthetic SPX-like ATM skew term structure skew(T) = a * T^(H - 1/2)."""
    if maturities is None:
        maturities = np.array([1, 2, 3, 6, 9, 12, 18, 24]) / 12.0
    skew = a * maturities ** (H - 0.5)
    return pd.DataFrame({"maturity": maturities, "atm_skew": skew})
  

%==========%


IV. Why Volatility Is Rough:

Take a long series of realised variance, form the increments of its logarithm over a lag \(\Delta\), and measure how their \(q\)-th absolute moment scales: \(\mathbb{E}\,|\log v_{t+\Delta} - \log v_t|^q \propto \Delta^{qH}\). Across essentially every liquid index the estimated \(H\) lands near \(0.1\) — volatility is far rougher than the \(H = 0.5\) of a Brownian-driven model (Gatheral, Jaisson & Rosenbaum 2018). The option-market fingerprint of this roughness is the ATM skew term structure \(\psi(T) = |\partial \sigma_{BS}(k, T)/\partial k|_{k=0}\). Empirically \(\psi(T)\) blows up at the short end like a power law; under any model with a Brownian variance driver \(\psi(T)\) instead converges to a finite constant as \(T \to 0\). Rough Bergomi resolves the gap: a fractional kernel with exponent \(H - 1/2 < 0\) makes \(\psi(T) \sim T^{H-1/2}\), reproducing the observed explosion. The table contrasts the two regimes.

Empirical factBrownian SV (Heston)Rough Bergomi
Hurst of log-variance\(H = 0.5\) by construction\(H \approx 0.1\) (rough)
Short-dated ATM skewFinite constant as \(T \to 0\)Blows up as \(T^{H-1/2}\)
Parameters to fit the term structureNeeds jumps (Bates) for short end\(H\) alone sets the exponent
Variance autocorrelationExponential (Markovian)Slowly decaying (long memory in increments)

%==========%


V. Exact Fractional Brownian Motion — Davies-Harte (fbm/davies_harte.py):

Fractional Brownian motion \(B^H\) is the centred Gaussian process with \(\mathbb{E}[B^H_t B^H_s] = \tfrac12(|t|^{2H} + |s|^{2H} - |t-s|^{2H})\); its increments (fractional Gaussian noise, fGN) have autocovariance \(\gamma(k) = \tfrac12(|k-1|^{2H} - 2|k|^{2H} + |k+1|^{2H})\). The Davies-Harte / Wood-Chan method samples \(n\) fGN values exactly in \(O(n\log n)\): embed the \(n\)-point Toeplitz autocovariance into a \(2(n-1)\)-circulant, recover its eigenvalues with one real FFT, draw spectrally, and invert. The subtlety — and the bug that silently biases naive implementations — is the spectral draw: the complex Gaussian weights must satisfy the Hermitian symmetry \(W_{m-k} = \overline{W_k}\) (with the \(k=0\) and \(k=m/2\) modes real) so the inverse FFT is real and carries the exact target covariance. Drop the symmetry and the cross-covariances are under-weighted, biasing \(\mathrm{Var}[B^H_1]\) low for every \(H \neq 1/2\); the test suite pins \(\mathrm{Var}[B^H_t] = t^{2H}\) to catch exactly this.


# davies_harte.py — exact fGN, the Hermitian-symmetric spectral draw
def _wood_chan_fgn(n, H, rng):
    m = 2 * (n - 1)
    gamma = fgn_autocovariance(n, H)
    c = np.concatenate([gamma, gamma[-2:0:-1]])          # circulant first row
    lam = np.fft.fft(c).real                             # eigenvalues (>= 0)
    W = np.empty(m, dtype=complex)
    W[0]      = np.sqrt(lam[0]) * rng.standard_normal()  # real DC mode
    half = m // 2
    W[half]   = np.sqrt(lam[half]) * rng.standard_normal()
    k = np.arange(1, half)
    a, b = rng.standard_normal(half - 1), rng.standard_normal(half - 1)
    W[k]      = np.sqrt(lam[k] / 2.0) * (a + 1j * b)
    W[m - k]  = np.sqrt(lam[k] / 2.0) * (a - 1j * b)      # Hermitian symmetry => real FFT
    return (np.fft.fft(W).real / np.sqrt(m))[:n]
  

%==========%


VI. The Rough Bergomi Model (rbergomi/model.py):

Under the pricing measure the spot is driven by a variance process that is a deterministic transform of the rough Volterra (Riemann-Liouville) process

\[ Y_t = \sqrt{2H}\int_0^t (t-s)^{H-1/2}\,dW_s, \qquad \mathbb{E}[Y_t^2] = t^{2H}. \]

The variance and spot are then

\[ v_t = \xi_0 \exp\!\Big(\eta\, Y_t - \tfrac12 \eta^2 t^{2H}\Big), \qquad \frac{dS_t}{S_t} = \sqrt{v_t}\,dZ_t, \quad Z = \rho W + \sqrt{1-\rho^2}\,W^\perp. \]

The martingale correction \(-\tfrac12\eta^2 t^{2H}\) makes \(\mathbb{E}[v_t] = \xi_0\), a flat forward-variance curve, and \(W\) is the same Brownian motion that builds \(Y\), so \(\rho\) is the spot-vol correlation. Three parameters shape the smile: \(H\) (roughness, hence the short-end skew exponent), \(\eta\) (vol-of-vol, the smile amplitude), and \(\rho < 0\) (the downside tilt). The discretisation matters: a naive Riemann sum of the singular kernel under-states the variance by \(\sim 20\%\). The module instead uses variance-matched weights — each cell carries exactly its share of the kernel's \(L^2\) energy — so \(\mathrm{Var}[Y_{t_i}] = t_i^{2H}\) holds exactly on the grid while the convolution structure (and the driving noise for the spot correlation) is preserved.


# volterra.py — variance-matched kernel: Var[Y_{t_i}] = t_i^{2H} exactly
m  = np.arange(1, n + 1)
g  = np.sqrt((m * dt) ** (2.0 * H) - ((m - 1) * dt) ** (2.0 * H))
xi = rng.standard_normal((n_paths, n))          # standardised increments
for p in range(n_paths):
    Y[p, 1:] = np.convolve(xi[p], g)[:n]        # causal convolution against xi
dW = xi * np.sqrt(dt)                            # same noise drives the correlated spot

# model.py — variance and correlated spot
var_Y = t ** (2.0 * H)
v = params.xi0 * np.exp(params.eta * Y - 0.5 * params.eta ** 2 * var_Y)
dZ = rho * dW + np.sqrt(1.0 - rho ** 2) * dB
log_incr = -0.5 * v[:, :-1] * dt + np.sqrt(np.maximum(v[:, :-1], 0.0)) * dZ
  

%==========%


VII. The Skew Term Structure — rBergomi vs Heston:

There is no transform for rough Bergomi option prices, so the smile is built by Monte Carlo: simulate terminal spots, average call/put payoffs across a log-moneyness grid, and invert Black-Scholes to implied vol (puts via parity below the forward to stay in the well-conditioned region). The ATM skew is then a centred difference \(|\sigma(h) - \sigma(-h)|/2h\) at \(k = 0\). Sweeping maturities and fitting \(\log\psi\) against \(\log T\) recovers a slope of about \(-0.43\) at \(H = 0.1\) — the theoretical \(H - 1/2 = -0.40\), within Monte-Carlo error — while the Heston benchmark's skew is bounded as \(T \to 0\). Common random numbers across maturities keep the curve smooth.


# model.py
def atm_skew(params, T, *, h=0.01, n_steps=120, n_paths=60000, rng=None):
    """ATM skew |d sigma_BS / d k| at k = 0 via a centred difference."""
    ivs = mc_smile(params, T, np.array([-h, 0.0, h]),
                   n_steps=n_steps, n_paths=n_paths, rng=rng)
    return float(abs((ivs[2] - ivs[0]) / (2.0 * h)))

def power_law_exponent(maturities, skews):
    """Fit skew ~ T^a; a should sit near H - 1/2."""
    good = (skews > 0) & np.isfinite(skews)
    return float(np.polyfit(np.log(maturities[good]), np.log(skews[good]), 1)[0])
  

%==========%


VIII. Simulation-Based Calibration (calibration/calibrate.py):

Because option prices are only available through simulation, the calibration objective — the distance between model and target skew term structures — is a Monte-Carlo estimate, and this forces two design choices that distinguish rough-vol calibration from Heston's. First, common random numbers: the RNG seed is fixed so the same Brownian paths are reused at every parameter evaluation. Without CRN the objective is a noisy surface the optimiser cannot descend; with CRN it is smooth in \((H, \eta, \rho)\) and finite-difference gradients are meaningful. Second, derivative-free / finite-difference optimisation: no analytic gradient exists, so the fit is a bounded least-squares with finite-difference Jacobians evaluated under CRN. The forward-variance level \(\xi_0\) is pinned separately, so the three smile-shape parameters are identified by the skew term structure alone.


# calibrate.py
def residuals(x):                                  # x = (H, eta, rho)
    model = _model_skews(x, maturities, xi0, n_paths, n_steps_per_year, seed)
    return model - target

res = least_squares(residuals, np.array(x0), bounds=(lower, upper),
                    diff_step=np.array([0.02, 0.05, 0.05]), max_nfev=max_nfev)

def _model_skews(x, maturities, xi0, n_paths, n_steps_per_year, seed):
    params = RBergomiParams(H=x[0], eta=x[1], rho=x[2], xi0=xi0)
    out = np.empty(len(maturities))
    for i, T in enumerate(maturities):
        rng = np.random.default_rng(seed)          # CRN: identical paths every eval
        out[i] = atm_skew(params, float(T), n_paths=n_paths, rng=rng)
    return out
  

%==========%


IX. Deep Hedging (hedging/deep_hedge.py):

We sell one at-the-money European call and hedge it by trading the underlying on a discrete grid, paying proportional transaction costs on each rebalance. Under the deep-hedging paradigm (Buehler et al. 2019) no pricing model is assumed for the hedge: an LSTM maps the observable history — log-moneyness, time-to-maturity, and a causal realised-vol proxy — to the hedge ratio, trained to minimise the variance of the terminal P&L net of costs:

\[ \mathrm{PnL} = p_0 + \sum_t \delta_t (S_{t+1} - S_t) - (S_T - K)^+ - c\sum_t |\delta_t - \delta_{t-1}|\,S_t. \]

The friction is the crux. With no costs and frequent rebalancing the constant-vol Black-Scholes delta is already near-optimal and hard to beat — a useful sanity check, not a weakness. Add realistic costs and the picture inverts: a delta hedge churns the position on every rough, autocorrelated vol move and bleeds turnover, whereas the recurrent deep hedger learns to under-react to transient vol and economise on trading. The recurrent state is what lets it exploit volatility persistence a Markovian delta ignores.


# deep_hedge.py — LSTM hedger and the risk objective
class LSTMHedger(torch.nn.Module):
    def __init__(self, n_in=3, hidden=32):
        super().__init__()
        self.lstm = torch.nn.LSTM(n_in, hidden, batch_first=True)
        self.head = torch.nn.Linear(hidden, 1)
    def forward(self, x):
        h, _ = self.lstm(x)
        return torch.sigmoid(self.head(h)).squeeze(-1)   # delta in (0, 1)

for _ in range(epochs):
    deltas = model(feat_t)                                # (n_paths, n_steps)
    turns  = torch.abs(torch.diff(torch.cat([zeros, deltas], dim=1), dim=1))
    tc     = cost * (turns * S_dec_t).sum(dim=1)
    pnl    = premium + (deltas * dS_t).sum(dim=1) - payoff_t - tc
    loss   = pnl.var()                                    # minimise hedging risk
    loss.backward(); opt.step()
  

%==========%


X. Hedging Results — Deep vs BS vs Model Delta:

On a held-out set of rough Bergomi paths with 50 bps proportional costs, the trained deep hedge attains the lowest net-P&L variance and the lowest tail risk; the constant-vol Black-Scholes delta is intermediate; and the instantaneous-vol “model delta” — which reacts to every variance move — trades the most and fares worst. The ordering reverses the frictionless case and is the practical argument for learned hedging under rough volatility: the gains come not from predicting the smooth Brownian world better, but from managing turnover in the rough one.

StrategyWhat it trades onNet P&L variance95% CVaR
Deep hedge (LSTM)Learned function of observable historyLowestLowest
BS deltaConstant vol \(\sqrt{\xi_0}\)IntermediateIntermediate
Model deltaInstantaneous vol \(\sqrt{v_t}\)Highest (over-trades)Highest

The result is reproducible from the package: rbergomi hedge --maturity 0.25 trains the agent and prints the variance and CVaR table; the deterministic test test_deep_hedge_lowers_variance_under_costs asserts the deep hedge beats the model delta and matches or beats the BS delta.


%==========%


XI. Path Signatures as Features (signature/signature.py):

The signature of a path \(X:[0,T]\to\mathbb{R}^d\) is the sequence of iterated integrals \(S(X) = (1, \{\int dX^i\}, \{\int\!\int dX^i dX^j\}, \dots)\), an element of the tensor algebra that characterises the path up to tree-like equivalence. Its universal approximation property (Levin, Lyons & Ni 2013) is the reason it is the natural feature map for rough paths: any continuous functional of the path is approximated arbitrarily well by a linear function of a high-enough truncation of the signature — so a signature front-end lets a simple head capture the nonlinear dependence of the optimal hedge on the rough vol path. The module computes depth-\(m\) truncated signatures in pure NumPy by Chen's identity (each straight segment's signature is a tensor exponential; consecutive segments combine by the tensor product), avoiding the signatory C++ dependency. Two identities anchor the tests: level 1 equals the total increment, and the antisymmetric part of level 2 is the signed Lévy area.


# signature.py — Chen's identity, depth-m truncated signature
def path_signature(path, depth=3):
    diffs = np.diff(np.asarray(path, float), axis=0)
    sig = _tensor_exp_increment(diffs[0], depth)        # exp of first increment
    for i in range(1, diffs.shape[0]):
        seg = _tensor_exp_increment(diffs[i], depth)
        sig = _chen_product(sig, seg, depth)            # tensor (Chen) product
    return sig                                          # [level1, level2, ..., level_m]
  

%==========%


XII. CLI — cli.py:

Five subcommands cover simulation through hedging, all sharing the DuckDB store.


# Install
pip install -e ".[dev]"

# Simulate paths and re-estimate the Hurst exponent from a variance path
rbergomi simulate --h 0.1 --eta 1.9 --rho -0.9

# ATM skew term structure and its fitted power-law exponent (~ H - 1/2)
rbergomi skew --h 0.1

# Calibrate (H, eta, rho) to a synthetic SPX skew term structure under CRN
rbergomi calibrate

# Train the LSTM deep hedger; print variance and CVaR vs BS / model delta
rbergomi hedge --maturity 0.25

# Launch the Streamlit server-side dashboard
rbergomi dashboard
  
CommandKey optionsOutput
rbergomi simulate--h, --eta, --rho, --xi0Martingale & variance checks, re-estimated \(H\)
rbergomi skew--h, --eta, --rhoSkew by maturity, rBergomi vs Heston, power-law exponent
rbergomi calibrate--n-paths, --dbFitted \((H, \eta, \rho)\), RMSE; stored to DuckDB
rbergomi hedge--maturity, --n-paths, --epochsNet-P&L variance & CVaR: deep vs BS vs model delta
rbergomi dashboardLaunches streamlit run app.py

%==========%


XIII. Test Suite:

All 22 tests are offline and deterministic (seed 42). The fBM tests pin the defining identities — \(\mathrm{Var}[B^H_t] = t^{2H}\) across \(H \in \{0.1, 0.3, 0.5\}\), the empirical fGN autocovariance against \(\gamma(k)\), Davies-Harte and Cholesky sharing the same variance law, the Volterra process's \(t^{2H}\) variance, and Hurst recovery from a long path. The rough Bergomi tests assert the discounted-spot martingale and \(\mathbb{E}[v_t] = \xi_0\), the downside skew (\(\rho < 0 \Rightarrow\) OTM-put vol exceeds OTM-call vol), the \(T^{H-1/2}\) skew scaling via the fitted exponent, that lower \(H\) is rougher (steeper short-end skew), and that the Heston benchmark stays bounded. Calibration is checked to reduce the objective below the initial guess under CRN; the deep hedge is checked to lower net-P&L variance under costs; and the signature tests verify the level-1 increment identity, the level-2 shuffle (square) identity, and a nonzero Lévy area.


# test_rbergomi.py — the rough-volatility signature
def test_atm_skew_scales_like_T_pow_H_minus_half(params):
    mats = np.array([1, 2, 3, 6, 12, 24]) / 12.0
    mats, skews = atm_skew_term_structure(params, mats, n_paths=50000)
    a = power_law_exponent(mats, skews)
    assert a == pytest.approx(params.H - 0.5, abs=0.12)   # ~ -0.40 at H = 0.1
    assert skews[0] > 2.0 * skews[-1]                     # short end far steeper