Hierarchical Risk Parity — Allocation Without Return Forecasts:

Mean-variance optimisation is famously error-maximising. The max-Sharpe weights \(w \propto \Sigma^{-1}\mu\) invert a noisy covariance and multiply it by an even noisier return forecast \(\mu\), so tiny estimation errors are amplified into wildly unstable, concentrated, high-turnover portfolios. Hierarchical Risk Parity (Lopez de Prado, 2016) sidesteps the problem by never touching \(\mu\) and never inverting \(\Sigma\). It clusters the covariance matrix into a tree of similar assets, seriates it so related assets sit adjacent, and allocates risk by recursive bisection down the hierarchy. This project implements HRP from scratch on SciPy, equips it with four covariance estimators — sample, EWMA, Ledoit-Wolf shrinkage, and a from-scratch DCC-GARCH — and asks one honest question over a ten-year monthly walk-forward: does HRP actually beat the simple, stubborn benchmarks (mean-variance, equal-weight, inverse-vol risk parity)? The answer is candid: HRP’s real edge is stability, not raw Sharpe. It cleanly beats naive \(1/N\) and utterly dominates return-forecast MVO — the error-maximising disaster the thesis predicts — but against a strong minimum-variance MVO it is an honest near-tie, winning on robustness and on needing no forecast and no matrix inversion rather than on return. Built on Python 3.12 with numpy, scipy, scikit-learn, pandas, plotly, streamlit, pydantic v2 and typer; packaged with hatchling and tested with pytest against deterministic seed-42 fixtures.

I. Interactive Dashboard:

The dashboard below runs entirely in the browser via stlite (Streamlit on WebAssembly — no server). It reproduces the full pipeline in pure NumPy + SciPy (SciPy’s hierarchical clustering runs natively in Pyodide): a seed-42 multi-asset universe with genuine block-correlation structure, the live single-linkage dendrogram, the HRP weight treemap, cumulative-return and drawdown charts against the benchmarks, and an EWMA half-life slider so you can watch the covariance estimator reshape the cluster tree and the allocation in real time. First load downloads Pyodide and may take 20–40 seconds.

II. Project Layout:

hrp/
├── pyproject.toml                              # Build config, deps, ruff + pytest settings
├── .env.example                                # Optional yfinance / FRED keys (real-data only)
├── dashboard.html                              # Self-contained stlite demo (numpy + scipy HRP)
├── scripts/
│   ├── make_thumbnail.py                       # Real matplotlib thumbnail (equity curves)
│   └── download_data.py                        # Optional yfinance / FRED multi-asset fetch
├── src/hrp/
│   ├── data/
│   │   ├── synthetic.py                        # Seed-42 block-correlation return generator
│   │   ├── schemas.py                          # Pydantic v2 return / weight / metric records
│   │   ├── fetchers.py                         # yfinance / FRED multi-asset fetch (optional)
│   │   └── store.py                            # DuckDB cache (explicit INSERT column lists)
│   ├── covariance/
│   │   ├── sample.py                           # Sample covariance
│   │   ├── ewma.py                             # EWMA online covariance (tunable half-life)
│   │   ├── ledoit_wolf.py                      # Ledoit-Wolf shrinkage — from scratch
│   │   └── dcc_garch.py                        # Univariate GARCH(1,1) MLE + Engle DCC recursion
│   ├── allocation/
│   │   ├── hrp.py                              # Distance, clustering, seriation, recursive bisection
│   │   ├── mvo.py                              # Mean-variance + min-variance (simplex projection)
│   │   └── risk_parity.py                      # Equal-weight + inverse-vol benchmarks
│   ├── backtest/
│   │   ├── walkforward.py                      # No-leak monthly-rebalanced walk-forward
│   │   └── metrics.py                          # Sharpe, MaxDD, turnover, HHI
│   ├── analysis/
│   │   ├── cluster_stability.py                # Cophenetic tree churn per covariance estimator
│   │   └── return_overlay.py                   # Momentum tilt: does forecasting help or hurt?
│   ├── report/plots.py                         # Plotly: dendrogram, treemap, equity, drawdown
│   ├── cli.py                                  # Typer CLI: backtest | estimators | stability | overlay | weights
│   └── app.py                                  # Streamlit server-side dashboard
└── tests/                                      # 51 seed-42 offline tests
  
III. The HRP Algorithm (allocation/hrp.py):

HRP runs in three stages, each needing only the covariance — never a return forecast. (1) Tree clustering: convert the correlation matrix into Lopez de Prado’s distance \(d_{ij} = \sqrt{\tfrac12 (1 - \rho_{ij})}\) and run single-linkage agglomerative clustering on it. (2) Quasi-diagonalisation (seriation): read the leaf order off the dendrogram so similar assets sit adjacent and the covariance’s large entries cluster along the diagonal. (3) Recursive bisection: walk the seriated list top-down, splitting each cluster in two and dividing its weight between the halves in inverse proportion to each half’s inverse-variance portfolio variance:

\[ d_{ij} = \sqrt{\tfrac12\,(1 - \rho_{ij})}, \qquad \tilde V_{\text{cl}} = \tilde w^\top \Sigma_{\text{cl}}\, \tilde w \;\text{ with }\; \tilde w \propto \operatorname{diag}(\Sigma_{\text{cl}})^{-1}, \qquad \alpha = 1 - \frac{\tilde V_{\text{left}}}{\tilde V_{\text{left}} + \tilde V_{\text{right}}}. \]

def hrp_weights(cov, method="single"):
    d = np.sqrt(np.diag(cov)); corr = cov / np.outer(d, d)
    sort_ix = quasi_diagonal_order(corr, method=method)   # seriation off the dendrogram
    w = np.ones(cov.shape[0]); clusters = [sort_ix]
    while clusters:                                        # recursive bisection
        new = []
        for cl in clusters:
            if len(cl) <= 1: continue
            left, right = cl[:len(cl)//2], cl[len(cl)//2:]
            v_l, v_r = cluster_variance(cov, left), cluster_variance(cov, right)
            alpha = 1.0 - v_l / (v_l + v_r)               # inverse-variance split
            for i in left:  w[i] *= alpha
            for i in right: w[i] *= 1.0 - alpha
            new += [left, right]
        clusters = new
    return w / w.sum()                                    # non-negative, sums to 1

The result is a fully invested, long-only book that allocates risk along the asset hierarchy — diversifying across clusters before within them — without ever inverting \(\Sigma\) or estimating a single expected return.

IV. Four Covariance Estimators (covariance/):

HRP is only as good as the correlation matrix it clusters, so the static sample covariance is swapped for three estimators and tested head-to-head. EWMA weights recent observations by an exponential half-life, reacting fast but noisily. Ledoit-Wolf shrinks the sample covariance toward a scaled-identity target by the analytically optimal intensity (implemented from scratch and cross-checked against scikit-learn). DCC-GARCH is a lightweight in-house implementation — a univariate GARCH(1,1) fit by maximum likelihood for each asset, then Engle’s dynamic conditional correlation recursion with the two DCC scalars profiled on a small grid — capturing time-varying volatility and correlation while staying positive semi-definite. The honest caveat: DCC-GARCH is responsive but is not a production estimator (~30s on 40 assets); the heavyweight arch package is listed only as an optional extra and is never imported by the package or tests.

V. The Walk-Forward Leaderboard (cli.py backtest):

Seed-42 universe of 40 assets across five classes (Equity / Bond / Commodity / REIT / FX), monthly-rebalanced ten-year walk-forward with no look-ahead, Ledoit-Wolf covariance. Metrics are annualised Sharpe, maximum drawdown, monthly turnover, and weight concentration (HHI):

StrategySharpeMax drawdownTurnoverHHI
MVO (minimum-variance)1.14−4.4%0.0390.045
HRP1.02−4.6%0.1030.046
Inverse-vol risk parity0.28−11.2%0.0080.029
Equal-weight (1/N)−0.04−16.8%0.0000.025
MVO (mean-variance, return forecasts)−0.73−79%0.7190.055

Read top to bottom, the story is the thesis made concrete. The moment MVO is handed a trailing-return \(\mu\) it becomes the error-maximising disaster on the bottom row — Sharpe \(-0.73\), a \(-79\%\) drawdown, and \(72\%\) monthly turnover. HRP, using the same covariance but no forecast, sits near the top with a shallow drawdown and an order of magnitude less turnover. The candid part: a well-built minimum-variance MVO edges HRP on Sharpe (1.14 vs 1.02) on this universe — HRP’s win over it is robustness and simplicity, not return.

VI. Is the Edge Real? Significance Across Seeds:

A single backtest can flatter any method, so the comparison is repeated across ten independent universes (seeds 42–51) and the per-seed Sharpe differences are tested with a paired \(t\)-test:

StrategyMean Sharpe (10 seeds)vs HRP
HRP0.76 ± 0.25
MVO (minimum-variance)0.75 ± 0.29honest tie
Equal-weight (1/N)0.41 ± 0.28HRP wins, paired-t p = 0.008

The result is deliberately un-oversold: HRP beats \(1/N\) significantly (\(p = 0.008\)) with a best-in-class drawdown, and is an honest statistical tie with minimum-variance MVO. \(1/N\) is the notoriously hard-to-beat benchmark (DeMiguel, Garlappi & Uppal 2009); clearing it on both Sharpe and drawdown, while needing no return forecast and no covariance inversion, is exactly the defensible HRP claim — and it stops precisely there.

VII. Which Covariance Gives the Most Stable Tree? (analysis/cluster_stability.py):

HRP’s premise is a stable hierarchy: if the cluster tree thrashes every month, so does the allocation. Estimators are ranked by month-to-month cophenetic dendrogram churn (the tree-distance between consecutive linkages):

Covariance estimatorCophenetic churn (lower = steadier)
Ledoit-Wolf shrinkage0.016
Sample covariance0.017
DCC-GARCH~0.027
EWMA0.044

The robust takeaway: shrinkage gives the steadiest tree and reactive EWMA churns the most (its topology flips in ~25% of months), with DCC-GARCH in between. Under Ledoit-Wolf the HRP topology changes in only ~6.5% of rebalances and the seriation order drifts by a Kendall-tau of just ~0.17 — the hierarchy is genuinely stable, which is exactly what HRP needs to deliver its low-turnover edge.

VIII. Does a Return Forecast Help? The Overlay Stress Test (analysis/return_overlay.py):

The natural temptation is to bolt a return signal onto HRP. We tilt the weights by a momentum score, \(w \propto w_{\text{hrp}}\cdot e^{\lambda z}\), and sweep the tilt strength \(\lambda\) from 0 to 4:

Tilt strength \(\lambda\)SharpeTurnoverMax drawdown
0 (pure HRP)1.490.10−4.6%
4 (heavy tilt)−0.470.69−67%

The degradation is monotonic: every increment of return-forecasting erodes Sharpe and inflates turnover and drawdown. In a market with no built-in return predictability, the overlay does exactly what the thesis warns — it re-introduces the instability HRP was designed to avoid. The precise condition for it to help: the return signal must carry real, persistent, cross-sectional information stronger than the turnover and drawdown its tilt induces. Here, by construction, it does not — and that is the point.

IX. CLI — cli.py:
# Install
pip install -e ".[dev]"

# Walk-forward leaderboard: HRP vs MVO vs 1/N vs inverse-vol
hrp backtest

# HRP under sample / EWMA / Ledoit-Wolf covariance
hrp estimators

# Rank covariance estimators by cluster-tree stability
hrp stability --include-dcc

# Return-overlay sweep: does forecasting help or destabilise?
hrp overlay

# Today's HRP weights + seriation order
hrp weights

# Launch the server-side Streamlit dashboard
streamlit run src/hrp/app.py
  
CommandWhat it doesOutput
hrp backtestMonthly walk-forward across all strategiesSharpe / MaxDD / turnover / HHI leaderboard
hrp stabilityCophenetic tree churn per estimatorEstimator stability ranking
hrp overlayMomentum-tilt sweep over \(\lambda\)Sharpe / turnover / drawdown vs tilt
X. Test Suite:

Fifty-one tests, fully offline, seed-42. Allocation tests verify the HRP weights are non-negative and sum to one, that quasi-diagonalisation seriates correctly, and — the heart of the thesis — that HRP turns over far less than mean-variance MVO out-of-sample. Covariance tests confirm every estimator is positive semi-definite, that Ledoit-Wolf shrinkage stays in \([0,1]\) and matches scikit-learn, and that DCC correlations stay in \([-1,1]\). Backtest tests check the no-leak rebalance index and the metric formulae; analysis tests cover cophenetic churn and the overlay degradation.

def test_hrp_weights_valid(cov):
    w = hrp_weights(cov)
    assert np.all(w >= 0) and abs(w.sum() - 1.0) < 1e-10   # long-only, fully invested

def test_hrp_more_stable_than_mvo(returns):
    bt_hrp = walk_forward(returns, weigher=hrp_weights)
    bt_mvo = walk_forward(returns, weigher=mean_variance_weights)
    assert bt_hrp.turnover < bt_mvo.turnover               # the stability claim, OOS
XI. Configuration & Setup:

cd assets/projects/hrp
python -m venv .venv && .venv\Scripts\Activate.ps1        # Windows
pip install -e ".[dev]"
hrp backtest                                              # reproduce the leaderboard
hrp stability --include-dcc                               # covariance-estimator stability
PYTHONPATH=src pytest -q                                  # 51 tests, offline
streamlit run src/hrp/app.py
  

No data download is required: the models, tests and dashboard all run on the seed-42 synthetic block-correlation generator with no API keys. The optional scripts/download_data.py and data/fetchers.py build a real 30–50 asset universe (equities, bonds, commodities, REITs, FX) from Yahoo Finance plus a FRED risk-free rate, gated behind keys in .env — never touched by CI.


Team:

Theodosios Dimitrasopoulos, personal project.

Tools & methods:

Python 3.12, NumPy, SciPy (hierarchical clustering), scikit-learn, pandas, Pydantic v2, Typer, rich, Plotly, Streamlit, pytest, ruff, hatchling. Methods: Hierarchical Risk Parity (Lopez de Prado 2016) — correlation-distance single-linkage clustering, quasi-diagonalisation / seriation, and recursive-bisection inverse-variance allocation; covariance estimation via EWMA, Ledoit-Wolf shrinkage (from scratch), and an in-house DCC-GARCH (univariate GARCH(1,1) MLE + Engle dynamic conditional correlation); minimum-variance and mean-variance optimisation by simplex-projected gradient; no-leak monthly walk-forward backtesting with annualised Sharpe, maximum drawdown, turnover and HHI; paired-\(t\) significance across seeds; cophenetic dendrogram-churn cluster-stability analysis; and a momentum return-overlay stress test.