Portfolio Risk Engine:
A command-line tool that ingests position files and market-data CSVs (or fetches them live from Yahoo Finance or Alpha Vantage) and computes a full suite of portfolio risk metrics: rolling volatility, Value at Risk (VaR), Conditional VaR (CVaR), stress tests, VaR backtesting, and factor return decomposition. Built on Python 3.11+ using pandas, scipy, pydantic v2, typer, and rich; packaged with hatchling and tested with pytest against a deterministic 500-day synthetic fixture requiring no network access. The engine supports both offline analysis via pre-downloaded CSVs and live data ingestion, with optional FRED macro overlays.
%==========%
I. Project Layout:
portfolio-risk-engine/
├── pyproject.toml # Build config, dependencies, ruff + pytest settings
├── .env.example # API key template
├── data/
│ ├── sample_positions.csv # 10-position demo file
│ ├── positions_av.csv # 5-position file matching Alpha Vantage tickers
│ ├── positions_extended.csv # 27-position file matching yfinance tickers
│ ├── prices_alphavantage.csv # 100-day price history (Alpha Vantage)
│ ├── prices_yfinance.csv # 5-year price history (yfinance, 28 tickers)
│ └── fred_macro.csv # 5-year macro overlays (FRED)
├── scripts/
│ └── download_data.py # Populates data/ from AV, yfinance, and FRED
├── src/portfolio_risk/
│ ├── data/
│ │ ├── schemas.py # Pydantic v2: Position, Portfolio, PriceHistory
│ │ ├── fetchers.py # yfinance, Alpha Vantage, FRED HTTP fetchers
│ │ └── loaders.py # CSV / list-of-dicts -> validated Portfolio schema
│ ├── portfolio/
│ │ ├── model.py # Portfolio: weights, P&L, exposure breakdowns
│ │ └── returns.py # Rolling returns, Sharpe ratio, drawdown
│ ├── risk/
│ │ ├── volatility.py # Rolling, EWMA, and annualised volatility
│ │ ├── var.py # Historical + parametric VaR, rolling series
│ │ ├── cvar.py # Historical + parametric CVaR / Expected Shortfall
│ │ └── stress.py # Built-in and custom stress scenarios
│ ├── backtest/
│ │ └── breach.py # VaR breach counting + Kupiec POF chi-squared test
│ ├── factors/
│ │ └── decomposition.py # Brinson-style factor attribution by class/sector/geo
│ └── cli.py # Typer CLI: run | var | stress | backtest
├── tests/
│ ├── conftest.py # Deterministic 500-day synthetic price fixtures
│ ├── test_portfolio.py # Weights, P&L, exposure, drawdown invariants
│ ├── test_risk.py # Vol, VaR, CVaR, backtest correctness
│ └── test_stress.py # Scenario results, factor decomposition invariants
└── docs/
└── technical_note.md # Extended methodology reference
%==========%
II. Data Layer — schemas.py & loaders.py:
All position data flows through Pydantic v2 models. Position holds ticker, quantity, and optional metadata. load_positions() accepts a CSV path or a list of dicts and returns a validated Portfolio schema. Tickers are normalised to uppercase; zero-quantity positions are rejected at validation time.
# schemas.py
from __future__ import annotations
from datetime import date
from typing import Literal
import pandas as pd
from pydantic import BaseModel, field_validator, model_validator
AssetClass = Literal["equity", "etf", "fx", "crypto", "bond", "other"]
class Position(BaseModel):
ticker: str
quantity: float
asset_class: AssetClass = "equity"
sector: str = "unknown"
geography: str = "unknown"
currency: str = "USD"
@field_validator("ticker")
@classmethod
def normalise_ticker(cls, v: str) -> str:
return v.strip().upper()
@field_validator("quantity")
@classmethod
def non_zero_quantity(cls, v: float) -> float:
if v == 0:
raise ValueError("quantity must be non-zero")
return v
class Portfolio(BaseModel):
positions: list[Position]
base_currency: str = "USD"
@model_validator(mode="after")
def at_least_one_position(self) -> "Portfolio":
if not self.positions:
raise ValueError("portfolio must have at least one position")
return self
@property
def tickers(self) -> list[str]:
return [p.ticker for p in self.positions]
class PriceHistory(BaseModel):
prices: pd.DataFrame
start: date
end: date
model_config = {"arbitrary_types_allowed": True}
@model_validator(mode="after")
def validate_prices(self) -> "PriceHistory":
if self.prices.empty:
raise ValueError("price history is empty")
if self.prices.isnull().all().any():
bad = self.prices.columns[self.prices.isnull().all()].tolist()
raise ValueError(f"all-NaN columns in price history: {bad}")
return self
def returns(self, fill: bool = True) -> pd.DataFrame:
df = self.prices.copy()
if fill:
df = df.ffill()
return df.pct_change().dropna(how="all")
# loaders.py
from __future__ import annotations
from pathlib import Path
import pandas as pd
from .schemas import Portfolio, Position
_REQUIRED_COLS = {"ticker", "quantity"}
_OPTIONAL_COLS = {"asset_class", "sector", "geography", "currency"}
def load_positions(source: str | Path | list[dict]) -> Portfolio:
if isinstance(source, (str, Path)):
path = Path(source)
if not path.exists():
raise FileNotFoundError(f"Positions file not found: {path}")
df = pd.read_csv(path)
elif isinstance(source, list):
df = pd.DataFrame(source)
else:
raise TypeError(f"source must be a file path or list of dicts, got {type(source)}")
df.columns = [c.strip().lower() for c in df.columns]
missing = _REQUIRED_COLS - set(df.columns)
if missing:
raise ValueError(f"Positions file missing required columns: {missing}")
positions = [
Position(**{k: v for k, v in row.items() if k in _REQUIRED_COLS | _OPTIONAL_COLS})
for row in df.to_dict(orient="records")
]
return Portfolio(positions=positions)
Sample positions file:
ticker,quantity,asset_class,sector,geography,currency
AAPL,100,equity,technology,us,USD
MSFT,50,equity,technology,us,USD
JPM,75,equity,financials,us,USD
XOM,60,equity,energy,us,USD
AMZN,20,equity,consumer_discretionary,us,USD
BND,500,bond,fixed_income,us,USD
GLD,30,etf,commodities,global,USD
BTC-USD,0.5,crypto,digital_assets,global,USD
%==========%
III. Portfolio Model — model.py & returns.py:
The Portfolio dataclass holds the validated schema alongside the price DataFrame. On construction it computes market values and weights from the latest available price. Weights are static for the object's lifetime — the engine does not rebalance intra-period. Methods cover daily returns, dollar P&L, cumulative P&L, and market-value exposure grouped by any position attribute.
# model.py
from __future__ import annotations
from dataclasses import dataclass, field
import numpy as np
import pandas as pd
from portfolio_risk.data.schemas import Portfolio as PortfolioSchema
@dataclass
class Portfolio:
"""Holds price history and position metadata; computes weights and P&L."""
schema: PortfolioSchema
prices: pd.DataFrame # dates × tickers, adjusted close
weights: pd.Series = field(init=False)
_latest_prices: pd.Series = field(init=False)
def __post_init__(self) -> None:
self._latest_prices = self.prices.iloc[-1]
market_values = pd.Series(
{p.ticker: p.quantity * self._latest_prices.get(p.ticker, np.nan)
for p in self.schema.positions}
)
self.weights = market_values / market_values.sum()
def daily_returns(self) -> pd.DataFrame:
return self.prices.pct_change().dropna(how="all")
def portfolio_returns(self) -> pd.Series:
"""Weighted daily portfolio return series."""
rets = self.daily_returns()
tickers = [p.ticker for p in self.schema.positions if p.ticker in rets.columns]
w = self.weights[tickers].reindex(rets.columns[rets.columns.isin(tickers)]).fillna(0)
return (rets[tickers] * w).sum(axis=1).rename("portfolio_return")
def pnl(self) -> pd.DataFrame:
"""Dollar P&L per ticker per day, and total portfolio P&L."""
price_changes = self.prices.diff()
quantities = pd.Series(
{p.ticker: p.quantity for p in self.schema.positions}
).reindex(self.prices.columns).fillna(0)
pnl_df = price_changes * quantities
pnl_df["total"] = pnl_df.sum(axis=1)
return pnl_df.dropna(how="all")
def cumulative_pnl(self) -> pd.Series:
return self.pnl()["total"].cumsum().rename("cumulative_pnl")
def exposure_by(self, attribute: str) -> pd.Series:
"""Market-value exposure grouped by asset_class, sector, or geography."""
result: dict[str, float] = {}
for pos in self.schema.positions:
key = getattr(pos, attribute, "unknown")
mv = pos.quantity * self._latest_prices.get(pos.ticker, 0.0)
result[key] = result.get(key, 0.0) + mv
return pd.Series(result, name=f"exposure_by_{attribute}").sort_values(ascending=False)
def weight_summary(self) -> pd.DataFrame:
rows = [
{"ticker": pos.ticker, "quantity": pos.quantity,
"asset_class": pos.asset_class, "sector": pos.sector,
"geography": pos.geography,
"weight_pct": round(self.weights.get(pos.ticker, 0.0) * 100, 2)}
for pos in self.schema.positions
]
return pd.DataFrame(rows).sort_values("weight_pct", ascending=False)
# returns.py
from __future__ import annotations
import numpy as np
import pandas as pd
_TRADING_DAYS = 252
def compute_rolling_returns(prices: pd.DataFrame, window: int = 21,
annualise: bool = True) -> pd.DataFrame:
daily = prices.pct_change()
rolling_mean = daily.rolling(window).mean()
if annualise:
rolling_mean = rolling_mean * _TRADING_DAYS
return rolling_mean.dropna(how="all")
def rolling_sharpe(portfolio_returns: pd.Series, window: int = 63,
risk_free_rate: float = 0.0, annualise: bool = True) -> pd.Series:
"""Rolling Sharpe ratio (annualised by default)."""
excess = portfolio_returns - risk_free_rate / _TRADING_DAYS
sharpe = excess.rolling(window).mean() / excess.rolling(window).std(ddof=1)
if annualise:
sharpe = sharpe * np.sqrt(_TRADING_DAYS)
return sharpe.rename("rolling_sharpe")
def drawdown(portfolio_returns: pd.Series) -> pd.DataFrame:
"""Compute drawdown series and maximum drawdown."""
cum = (1 + portfolio_returns).cumprod()
rolling_max = cum.cummax()
dd = (cum - rolling_max) / rolling_max
return pd.DataFrame({"cumulative_return": cum, "drawdown": dd, "rolling_max": rolling_max})
%==========%
IV. Risk Metrics:
IV.a — Volatility (
volatility.py):
Three estimators are available. Full-sample annualised volatility (\(\sigma \cdot \sqrt{252}\)) serves as a scalar baseline. Rolling window volatility reveals regime changes. EWMA (RiskMetrics, \(\lambda=0.94\)) responds faster to large moves because recent observations receive exponentially higher weight: $$\text{var}_t = \lambda \cdot \text{var}_{t-1} + (1-\lambda) \cdot r_t^2$$ The \(\sqrt{252}\) annualisation factor assumes i.i.d. returns and 252 trading days per year — an assumption that EWMA is specifically designed to relax during volatility clustering.
# volatility.py
from __future__ import annotations
import numpy as np
import pandas as pd
_TRADING_DAYS = 252
def rolling_volatility(returns: pd.Series | pd.DataFrame,
window: int = 21, annualise: bool = True) -> pd.Series | pd.DataFrame:
vol = returns.rolling(window).std(ddof=1)
if annualise:
vol = vol * np.sqrt(_TRADING_DAYS)
return vol
def annualised_volatility(returns: pd.Series | pd.DataFrame) -> float | pd.Series:
"""Full-sample annualised volatility."""
return returns.std(ddof=1) * np.sqrt(_TRADING_DAYS)
def ewma_volatility(returns: pd.Series, lam: float = 0.94,
annualise: bool = True) -> pd.Series:
"""Exponentially weighted moving-average (RiskMetrics) volatility.
λ=0.94 is the RiskMetrics daily decay factor.
"""
variance = returns.ewm(alpha=1 - lam, adjust=False).var()
vol = np.sqrt(variance)
if annualise:
vol = vol * np.sqrt(_TRADING_DAYS)
return vol.rename("ewma_vol")
def correlation_matrix(returns: pd.DataFrame, window: int | None = None) -> pd.DataFrame:
if window is not None:
return returns.tail(window).corr()
return returns.corr()
IV.b — Value at Risk (
var.py):
VaR is the maximum loss not exceeded at probability \(c\) over one trading day: $$P(\text{loss} > \text{VaR}) = 1 - c$$ Historical simulation sorts the past \(N\) daily returns and takes the \((1-c) \cdot N\)-th worst observation. Non-parametric and captures fat tails exactly, but slow to adapt and subject to the ghost effect — a single extreme day dominates the estimate for exactly \(N\) days. Parametric (delta-normal) assumes normally distributed returns: $$\text{VaR}_\text{para} = -\!\left(\mu + z \cdot \sigma\right), \quad z = \Phi^{-1}(1-c)$$ At 99%, \(z \approx -2.326\); at 95%, \(z \approx -1.645\). Fast and smooth but systematically underestimates fat-tailed losses by 20–40% in volatile markets. Both methods also expose rolling time-series variants used internally by the backtest command.
# var.py
from __future__ import annotations
import numpy as np
import pandas as pd
from scipy import stats
def historical_var(portfolio_returns: pd.Series,
confidence: float = 0.99, window: int | None = None) -> float:
data = portfolio_returns.dropna()
if window is not None:
data = data.tail(window)
return float(-np.percentile(data, (1 - confidence) * 100))
def parametric_var(portfolio_returns: pd.Series,
confidence: float = 0.99, window: int | None = None) -> float:
"""Delta-normal VaR assuming normally distributed returns."""
data = portfolio_returns.dropna()
if window is not None:
data = data.tail(window)
mu = data.mean()
sigma = data.std(ddof=1)
z = stats.norm.ppf(1 - confidence)
return float(-(mu + z * sigma))
def rolling_historical_var(portfolio_returns: pd.Series,
confidence: float = 0.99, window: int = 252) -> pd.Series:
def _var(x: pd.Series) -> float:
return -np.percentile(x.dropna(), (1 - confidence) * 100)
return (portfolio_returns.rolling(window)
.apply(_var, raw=False)
.rename(f"hist_var_{int(confidence*100)}"))
def rolling_parametric_var(portfolio_returns: pd.Series,
confidence: float = 0.99, window: int = 252) -> pd.Series:
z = stats.norm.ppf(1 - confidence)
def _pvar(x: pd.Series) -> float:
return -(x.mean() + z * x.std(ddof=1))
return (portfolio_returns.rolling(window)
.apply(_pvar, raw=False)
.rename(f"para_var_{int(confidence*100)}"))
IV.c — Conditional VaR / Expected Shortfall (
cvar.py):
CVaR answers a more useful question than VaR — not "what is the loss threshold?" but "when things go badly, how bad on average?": $$\text{CVaR} = \mathbb{E}\!\left[-R \;\middle|\; R < -\text{VaR}\right]$$ CVaR is a coherent risk measure (sub-additive), meaning it correctly rewards diversification. VaR is not coherent — portfolios exist where \(\text{VaR}(A+B) > \text{VaR}(A) + \text{VaR}(B)\). This deficiency led Basel III's FRTB regulation to replace the 99% VaR standard with 97.5% Expected Shortfall. CVaR \(\geq\) VaR always; the gap indicates tail heaviness. The parametric formula under normality is: $$\text{CVaR}_\text{para} = -\mu + \sigma \cdot \frac{\phi(z)}{1-c}$$
# cvar.py
from __future__ import annotations
import numpy as np
import pandas as pd
from scipy import stats
def historical_cvar(portfolio_returns: pd.Series,
confidence: float = 0.99, window: int | None = None) -> float:
"""Historical CVaR: mean of returns in the left tail beyond the VaR quantile."""
data = portfolio_returns.dropna()
if window is not None:
data = data.tail(window)
cutoff = np.percentile(data, (1 - confidence) * 100)
tail = data[data <= cutoff]
if tail.empty:
return float(-cutoff)
return float(-tail.mean())
def parametric_cvar(portfolio_returns: pd.Series,
confidence: float = 0.99, window: int | None = None) -> float:
"""Parametric CVaR under normality: ES = -μ + σ·φ(z)/(1-c)."""
data = portfolio_returns.dropna()
if window is not None:
data = data.tail(window)
mu = data.mean()
sigma = data.std(ddof=1)
z = stats.norm.ppf(1 - confidence)
return float(-mu + sigma * stats.norm.pdf(z) / (1 - confidence))
def rolling_historical_cvar(portfolio_returns: pd.Series,
confidence: float = 0.99, window: int = 252) -> pd.Series:
def _cvar(x: pd.Series) -> float:
cutoff = np.percentile(x, (1 - confidence) * 100)
tail = x[x <= cutoff]
return float(-tail.mean()) if not tail.empty else float(-cutoff)
return (portfolio_returns.rolling(window)
.apply(_cvar, raw=False)
.rename(f"cvar_{int(confidence*100)}"))
%==========%
V. Stress Testing — stress.py:
Stress scenarios apply deterministic, instantaneous asset-class shocks to current market values. They are not statistical — they are calibrated to specific historical crisis episodes. P&L impact per position is \(\text{shock}_\text{class} \times \text{MV}_i\); the aggregate is expressed as a fraction of total NAV. Custom shock dictionaries can be passed at runtime.
| Scenario | Equity | Bond | FX | Crypto | ETF | Calibration basis |
|---|---|---|---|---|---|---|
rate_shock | −8% | −10% | −1% | −5% | −7% | +200 bps parallel yield-curve shift |
equity_selloff | −30% | +5% | −3% | −40% | −25% | 2008–09 peak-to-trough drawdowns |
fx_move | −6% | −2% | −10% | −8% | −5% | USD +10% (DXY-style move) |
crypto_crash | −5% | +2% | 0% | −50% | −4% | 2022 crypto deleveraging |
# stress.py
from __future__ import annotations
from dataclasses import dataclass
import pandas as pd
from portfolio_risk.data.schemas import Portfolio as PortfolioSchema
_BUILTIN_SCENARIOS: dict[str, dict[str, float]] = {
"rate_shock": {"bond": -0.10, "equity": -0.08, "etf": -0.07, "fx": -0.01,
"crypto": -0.05, "other": -0.04},
"equity_selloff": {"equity": -0.30, "etf": -0.25, "bond": 0.05, "fx": -0.03,
"crypto": -0.40, "other": -0.15},
"fx_move": {"equity": -0.06, "etf": -0.05, "bond": -0.02, "fx": -0.10,
"crypto": -0.08, "other": -0.03},
"crypto_crash": {"crypto": -0.50, "equity": -0.05, "etf": -0.04, "bond": 0.02,
"fx": 0.00, "other": -0.02},
}
@dataclass
class ScenarioResult:
scenario_name: str
position_pnl: pd.Series
total_pnl: float
total_pnl_pct: float # fraction of initial portfolio value
def run_stress_scenarios(
schema: PortfolioSchema,
prices: pd.DataFrame,
scenarios: list[str] | None = None,
custom_shocks: dict[str, dict[str, float]] | None = None,
) -> list[ScenarioResult]:
all_scenarios = dict(_BUILTIN_SCENARIOS)
if custom_shocks:
all_scenarios.update(custom_shocks)
if scenarios is None:
scenarios = list(_BUILTIN_SCENARIOS.keys())
latest = prices.iloc[-1]
portfolio_value = sum(p.quantity * latest.get(p.ticker, 0.0) for p in schema.positions)
results: list[ScenarioResult] = []
for name in scenarios:
if name not in all_scenarios:
raise ValueError(f"Unknown scenario '{name}'. Available: {list(all_scenarios)}")
shocks = all_scenarios[name]
position_pnl = {
pos.ticker: pos.quantity * latest.get(pos.ticker, 0.0) * shocks.get(pos.asset_class, 0.0)
for pos in schema.positions
}
total = sum(position_pnl.values())
results.append(ScenarioResult(
scenario_name=name,
position_pnl=pd.Series(position_pnl),
total_pnl=total,
total_pnl_pct=total / portfolio_value if portfolio_value != 0 else 0.0,
))
return results
%==========%
VI. VaR Backtesting — breach.py:
A breach occurs when the realised return on day \(t\) is worse than the negative of the VaR estimated on day \(t-1\): $$\text{breach}_t = \mathbf{1}\!\left\{r_t < -\widehat{\text{VaR}}_{t-1}\right\}$$ The Kupiec Proportion-of-Failures (POF) test formalises calibration via a likelihood-ratio statistic: $$\text{LR} = -2\!\left[x\ln\!\frac{p}{\hat{p}} + (n-x)\ln\!\frac{1-p}{1-\hat{p}}\right]$$ where \(x\) = observed breaches, \(n\) = total observations, \(p = 1-c\) (expected breach probability), \(\hat{p} = x/n\). Under \(H_0\), \(\text{LR} \sim \chi^2(1)\). The test rejects at the 5% significance level when \(\text{LR} > 3.84\) (p-value \(<\) 0.05).
| Confidence | Min observations | Expected breaches in window |
|---|---|---|
| 90% | 100 | 10 |
| 95% | 200 | 10 |
| 99% | 500+ | ~5 |
# breach.py
from __future__ import annotations
from dataclasses import dataclass
import numpy as np
import pandas as pd
from scipy import stats
from portfolio_risk.risk.var import rolling_historical_var, rolling_parametric_var
@dataclass
class BacktestResult:
method: str
confidence: float
n_observations: int
n_breaches: int
breach_rate: float
expected_breach_rate: float
kupiec_lr_stat: float
kupiec_p_value: float
passes_kupiec: bool
breach_dates: pd.DatetimeIndex
def backtest_var(
portfolio_returns: pd.Series,
confidence: float = 0.99,
window: int = 252,
method: str = "historical",
) -> BacktestResult:
"""Backtest VaR against realised daily P&L.
A breach occurs on day t when realised return < -VaR(t-1).
"""
if method == "historical":
var_series = rolling_historical_var(portfolio_returns, confidence, window)
elif method == "parametric":
var_series = rolling_parametric_var(portfolio_returns, confidence, window)
else:
raise ValueError(f"method must be 'historical' or 'parametric', got '{method}'")
# compare realised return on day t with VaR estimated on day t-1
var_lag = var_series.shift(1)
aligned = pd.DataFrame({"return": portfolio_returns, "var": var_lag}).dropna()
breaches = aligned["return"] < -aligned["var"]
breach_dates = aligned.index[breaches]
n, x = len(aligned), int(breaches.sum())
p = 1 - confidence
p_hat = x / n if n > 0 else 0.0
# Kupiec POF log-likelihood ratio (H0: breach probability = 1 - confidence)
if x == 0:
lr_stat = -2 * n * np.log(1 - p)
elif x == n:
lr_stat = -2 * n * np.log(p)
else:
lr_stat = -2 * (x * np.log(p / p_hat) + (n - x) * np.log((1 - p) / (1 - p_hat)))
p_value = float(1 - stats.chi2.cdf(lr_stat, df=1))
return BacktestResult(
method=method, confidence=confidence,
n_observations=n, n_breaches=x,
breach_rate=p_hat, expected_breach_rate=p,
kupiec_lr_stat=float(lr_stat), kupiec_p_value=p_value,
passes_kupiec=p_value >= 0.05,
breach_dates=breach_dates,
)
%==========%
VII. Factor Decomposition — decomposition.py:
Factor decomposition answers: which groups of positions drove portfolio returns? Each position's daily contribution is \(w_i \cdot r_{i,t}\); contributions are summed by the grouping attribute (asset_class, sector, or geography). This is a Brinson-style attribution at the position level, cleanly separating which asset classes or sectors added or subtracted from NAV each day. The summary reports cumulative return and annualised volatility per factor group.
# decomposition.py
from __future__ import annotations
import pandas as pd
from portfolio_risk.data.schemas import Portfolio as PortfolioSchema
def factor_decomposition(
schema: PortfolioSchema,
returns: pd.DataFrame,
weights: pd.Series,
by: str = "asset_class",
) -> pd.DataFrame:
"""Decompose portfolio return contributions by a position attribute.
Returns DataFrame (dates × factor groups), each cell being the weighted
return contribution of that group on that day.
"""
valid_attrs = {"asset_class", "sector", "geography"}
if by not in valid_attrs:
raise ValueError(f"'by' must be one of {valid_attrs}, got '{by}'")
factor_map: dict[str, list[str]] = {}
for pos in schema.positions:
factor_map.setdefault(getattr(pos, by, "unknown"), []).append(pos.ticker)
contributions: dict[str, pd.Series] = {}
for group, tickers in factor_map.items():
available = [t for t in tickers if t in returns.columns]
if not available:
continue
w = weights[available].fillna(0)
if w.sum() == 0:
continue
contributions[group] = (returns[available] * w).sum(axis=1)
if not contributions:
return pd.DataFrame(index=returns.index)
result = pd.DataFrame(contributions)
result["total"] = result.sum(axis=1)
return result
def factor_attribution_summary(
schema: PortfolioSchema, returns: pd.DataFrame,
weights: pd.Series, by: str = "asset_class",
) -> pd.DataFrame:
"""Summarise cumulative return and annualised vol per factor group."""
decomp = factor_decomposition(schema, returns, weights, by=by)
rows = []
for col in decomp.columns:
s = decomp[col].dropna()
rows.append({
"factor": col,
"cumulative_return_pct": round(((1 + s).prod() - 1) * 100, 2),
"annualised_vol_pct": round(s.std(ddof=1) * (252**0.5) * 100, 2),
"n_days": len(s),
})
return pd.DataFrame(rows).sort_values("cumulative_return_pct", ascending=False)
%==========%
VIII. CLI — cli.py:
The CLI is built with typer and rich. Four subcommands share common options (--positions, --prices-file, --window); run and var/backtest additionally take --confidence. When --prices-file is omitted the engine fetches live data and prints the active source.
# cli.py
from __future__ import annotations
from datetime import date, timedelta
from pathlib import Path
from typing import Annotated, Optional
import typer
from dotenv import load_dotenv
from rich.console import Console
from rich.table import Table
load_dotenv()
app = typer.Typer(name="risk-engine",
help="Portfolio risk analytics: VaR, CVaR, stress testing, factor decomposition.",
add_completion=False)
console = Console()
def _load_portfolio(positions: Path, window: int, prices_file: Optional[Path] = None):
import pandas as pd
from portfolio_risk.data.loaders import load_positions
from portfolio_risk.portfolio.model import Portfolio
schema = load_positions(positions)
console.print(f"[bold]Loaded {len(schema.positions)} positions:[/bold] {schema.tickers}")
if prices_file is not None:
prices = pd.read_csv(prices_file, index_col=0, parse_dates=True)
prices.index = pd.to_datetime(prices.index).date
available = [t for t in schema.tickers if t in prices.columns]
missing = [t for t in schema.tickers if t not in prices.columns]
if missing:
console.print(f"[yellow]Tickers not in prices file (skipped):[/yellow] {missing}")
prices = prices[available].sort_index()
from portfolio_risk.data.schemas import Portfolio as PortfolioSchema
schema = PortfolioSchema(positions=[p for p in schema.positions if p.ticker in available])
console.print(f"[green]Loaded prices from file:[/green] {len(prices)} days × {len(prices.columns)} tickers")
else:
import os
from portfolio_risk.data.fetchers import fetch_prices
source = os.getenv("DATA_SOURCE", "yfinance")
end = date.today()
start = end - timedelta(days=max(window * 2, 365 * 2))
console.print(f"[bold]Data source:[/bold] {source} (set DATA_SOURCE in .env to change)")
with console.status(f"Fetching price data from {source}..."):
prices = fetch_prices(schema.tickers, start=start, end=end)
return Portfolio(schema=schema, prices=prices)
@app.command("run")
def cmd_run(positions: Path = Path("data/sample_positions.csv"),
prices_file: Optional[Path] = None,
confidence: float = 0.99, window: int = 60) -> None:
"""Full risk report: weights, VaR, CVaR, stress tests, factor decomposition."""
portfolio = _load_portfolio(positions, window, prices_file)
port_rets = portfolio.portfolio_returns()
from portfolio_risk.risk.var import historical_var, parametric_var
from portfolio_risk.risk.cvar import historical_cvar, parametric_cvar
from portfolio_risk.risk.volatility import annualised_volatility
from portfolio_risk.risk.stress import run_stress_scenarios
from portfolio_risk.factors.decomposition import factor_attribution_summary
console.rule("[bold blue]Risk Summary")
t = Table(show_header=True, header_style="bold cyan")
t.add_column("Metric"); t.add_column("Value", justify="right")
t.add_row("Annualised Volatility", f"{annualised_volatility(port_rets):.2%}")
t.add_row(f"Historical VaR ({int(confidence*100)}%)", f"{historical_var(port_rets, confidence, window):.4%}")
t.add_row(f"Parametric VaR ({int(confidence*100)}%)", f"{parametric_var(port_rets, confidence, window):.4%}")
t.add_row(f"Historical CVaR ({int(confidence*100)}%)",f"{historical_cvar(port_rets, confidence, window):.4%}")
t.add_row(f"Parametric CVaR ({int(confidence*100)}%)",f"{parametric_cvar(port_rets, confidence, window):.4%}")
console.print(t)
# ... stress + factor attribution printed with the same rich.Table pattern
@app.command("backtest")
def cmd_backtest(positions: Path = Path("data/sample_positions.csv"),
prices_file: Optional[Path] = None,
confidence: float = 0.99, window: int = 60,
method: str = "historical") -> None:
"""Backtest VaR breach rate and run the Kupiec POF test."""
portfolio = _load_portfolio(positions, window, prices_file)
from portfolio_risk.backtest.breach import backtest_var
result = backtest_var(portfolio.portfolio_returns(), confidence, window, method)
console.rule(f"[bold blue]VaR Backtest — {method} @ {int(confidence*100)}%")
t = Table(show_header=True, header_style="bold cyan")
t.add_column("Metric"); t.add_column("Value", justify="right")
t.add_row("Observations", str(result.n_observations))
t.add_row("Breaches", str(result.n_breaches))
t.add_row("Observed breach rate", f"{result.breach_rate:.2%}")
t.add_row("Expected breach rate", f"{result.expected_breach_rate:.2%}")
t.add_row("Kupiec LR statistic", f"{result.kupiec_lr_stat:.4f}")
t.add_row("Kupiec p-value", f"{result.kupiec_p_value:.4f}")
t.add_row("Kupiec test", "[green]PASS[/green]" if result.passes_kupiec else "[red]FAIL[/red]")
console.print(t)
if __name__ == "__main__":
app()
Usage:
# Install
pip install -e ".[dev]"
# Full risk report (offline, no network)
risk-engine run --positions data/positions_av.csv \
--prices-file data/prices_alphavantage.csv --confidence 0.99 --window 60
# Historical VaR at 95%
risk-engine var --positions data/positions_av.csv \
--prices-file data/prices_alphavantage.csv --confidence 0.95 --method historical
# All four stress scenarios
risk-engine stress --positions data/positions_av.csv \
--prices-file data/prices_alphavantage.csv
# Parametric backtest over full yfinance history
risk-engine backtest --positions data/positions_extended.csv \
--prices-file data/prices_yfinance.csv --confidence 0.99 --window 252 --method parametric
%==========%
IX. Sample Output:
Loaded 5 positions: ['AAPL', 'MSFT', 'JPM', 'XOM', 'AMZN']
Loaded prices from file: 100 days × 5 tickers
-------------------------------- Risk Summary --------------------------------
| Metric | Value |
|-----------------------+---------|
| Annualised Volatility | 15.55% |
| Historical VaR (99%) | 1.4817% |
| Parametric VaR (99%) | 1.7984% |
| Historical CVaR (99%) | 1.8219% |
| Parametric CVaR (99%) | 2.0879% |
------------------------------ Portfolio Weights -----------------------------
| ticker | quantity | asset_class | sector | geography | weight_pct |
|--------+----------+-------------+------------------------+-----------+------------|
| AAPL | 100.0 | equity | technology | us | 34.57 |
| JPM | 75.0 | equity | financials | us | 25.90 |
| MSFT | 50.0 | equity | technology | us | 23.77 |
| XOM | 60.0 | equity | energy | us | 10.13 |
| AMZN | 20.0 | equity | consumer_discretionary | us | 5.64 |
-------------------------------- Stress Tests --------------------------------
| Scenario | Total P&L (% of NAV) |
|----------------+----------------------|
| rate_shock | -8.00% |
| equity_selloff | -30.00% |
| fx_move | -6.00% |
| crypto_crash | -5.00% |
--------------------- Factor Attribution (asset class) ----------------------
| factor | cumulative_return_pct | annualised_vol_pct | n_days |
|--------+-----------------------+--------------------+--------|
| equity | 5.95 | 15.55 | 99 |
| total | 5.95 | 15.55 | 99 |
--------------------- VaR Backtest — parametric @ 95% ----------------------
| Metric | Value |
|----------------------+--------|
| Observations | 39 |
| Breaches | 1 |
| Observed breach rate | 2.56% |
| Expected breach rate | 5.00% |
| Kupiec LR statistic | 0.5885 |
| Kupiec p-value | 0.4430 |
| Kupiec test | PASS |
Breach dates:
2026-06-03
%==========%
X. Test Suite:
The test suite is entirely offline — a deterministic 500-day random-walk price series (seed 42) is generated in conftest.py and shared across all test classes via pytest fixtures. Tests verify mathematical invariants (weights sum to 1, CVaR ≥ VaR, drawdown ≤ 0) and structural properties (factor totals match sums of parts, rolling series have the correct length).
# conftest.py — shared fixtures
@pytest.fixture
def sample_prices(sample_schema) -> pd.DataFrame:
"""Synthetic: 500 trading days of log-normal random-walk prices (seed 42)."""
rng = np.random.default_rng(42)
n, tickers = 500, sample_schema.tickers
dates = pd.bdate_range("2022-01-01", periods=n)
log_returns = rng.normal(0.0005, 0.012, size=(n, len(tickers)))
prices = 100 * np.exp(np.cumsum(log_returns, axis=0))
return pd.DataFrame(prices, index=dates, columns=tickers)
# test_risk.py — invariant tests
class TestVaR:
def test_higher_confidence_higher_var(self, port_returns):
assert historical_var(port_returns, 0.99) >= historical_var(port_returns, 0.95)
class TestCVaR:
def test_cvar_exceeds_var(self, port_returns):
assert historical_cvar(port_returns, 0.99) >= historical_var(port_returns, 0.99) - 1e-10
class TestPortfolioModel:
def test_weights_sum_to_one(self, portfolio):
assert abs(portfolio.weights.sum() - 1.0) < 1e-9
def test_drawdown_never_positive(self, port_returns):
assert (drawdown(port_returns)["drawdown"] <= 0).all()
class TestFactorDecomposition:
def test_total_matches_sum(self, portfolio):
result = factor_decomposition(
portfolio.schema, portfolio.daily_returns(), portfolio.weights
)
factor_cols = [c for c in result.columns if c != "total"]
assert (result[factor_cols].sum(axis=1) - result["total"]).abs().max() < 1e-10
class TestStress:
def test_equity_selloff_is_negative(self, sample_schema, sample_prices):
result = run_stress_scenarios(sample_schema, sample_prices, ["equity_selloff"])
assert result[0].total_pnl < 0
def test_total_pnl_equals_sum_of_positions(self, sample_schema, sample_prices):
r = run_stress_scenarios(sample_schema, sample_prices, ["rate_shock"])[0]
assert abs(r.total_pnl - r.position_pnl.sum()) < 1e-6
%==========%
XI. Configuration & Data Sources:
| Variable | Default | Description |
|---|---|---|
DATA_SOURCE | yfinance | Live-fetch source when --prices-file is omitted (yfinance or alphavantage) |
ALPHA_VANTAGE_API_KEY | demo | Free key — 25 req/day on the compact endpoint |
FRED_API_KEY | (none) | Required for macro overlays (rates, CPI, VIX, WTI) |
| File generated | Source | Coverage |
|---|---|---|
prices_yfinance.csv | Yahoo Finance | 5 years, 28 tickers across equities, ETFs, bonds, crypto, VIX |
prices_alphavantage.csv | Alpha Vantage | Last 100 trading days, 5 US equities (free-tier compact endpoint) |
fred_macro.csv | FRED | 5 years, 9 macro series: Fed Funds, 10Y−2Y, HY OAS, VIX, CPI, GDP, unemployment, USD, WTI |
Team:
Theodosios Dimitrasopoulos, personal project.
Tools & methods:
Python 3.11, pandas, NumPy, SciPy (chi-squared test, normal PPF/PDF), Pydantic v2 (data validation & schema enforcement), Typer (CLI), rich (formatted console output), yfinance / Alpha Vantage / FRED (market data ingestion), pytest + pytest-cov, ruff (linting), hatchling (packaging). Risk methodology: historical simulation VaR, delta-normal parametric VaR, historical and parametric CVaR / Expected Shortfall, RiskMetrics EWMA volatility (\(\lambda=0.94\)), Kupiec Proportion-of-Failures likelihood-ratio backtest, Brinson-style position-level factor attribution.