Scenario Generation for Portfolio Optimization using Generative Adversarial Networks

Developed in collaboration with Bank of America Securities. A Wasserstein GAN with gradient penalty (WGAN-GP) is trained on a cross-asset return history to learn the empirical joint distribution of daily log returns without imposing parametric distributional assumptions. The trained generator samples large scenario sets at inference time for downstream portfolio stress-testing, efficient frontier estimation, and out-of-sample VaR calibration. GAN-generated scenarios are benchmarked against historical simulation and parametric multivariate normal bootstrapping across a comprehensive set of statistical diagnostics: marginal tail behaviour (excess kurtosis, VaR coverage), pairwise correlation structure, GARCH-fitted volatility clustering persistence, and principal component alignment. The results quantify where the GAN captures non-Gaussian distributional features that the parametric baseline misses and where structural model assumptions would still outperform the data-driven approach on limited history. Built on Python 3.11+ using pandas, numpy, scipy, statsmodels, scikit-learn, torch, arch, plotly, streamlit, duckdb, and pydantic v2; packaged with hatchling and tested with pytest against deterministic seed-42 fixtures.


%==========%


I. Interactive Dashboard:

The dashboard below runs entirely in the browser via stlite (Streamlit on WebAssembly — no server required). A synthetic six-asset universe (SPY, EFA, EEM, TLT, GLD, USO) is generated with a correlated Student-t distribution; all three scenario methods (Historical, Parametric, GAN) are simulated client-side using seed-42 random state. Sidebar controls let you vary the seed, scenario count, and GAN tail-thickness parameter (Student-t degrees of freedom) to observe how the diagnostics respond. First load downloads Pyodide and may take 20–40 seconds; subsequent loads are cached.


%==========%


II. Project Layout:

gans-bofa/
├── pyproject.toml                              # Build config, deps, ruff + pytest settings
├── .env.example                                # DB_PATH, GAN_RUN_ID
├── data/                                       # Populated by scripts/download_data.py (git-ignored)
│   ├── gans.duckdb                             # DuckDB: returns + scenarios + train_state tables
│   └── generator_run_001.pt                    # Saved generator weights (PyTorch state dict)
├── scripts/
│   └── download_data.py                        # yfinance → DuckDB
├── src/gans_bofa/
│   ├── data/
│   │   ├── schemas.py                          # Pydantic v2: ReturnRecord, ScenarioRecord, TrainStateRecord, PortfolioResult
│   │   ├── fetchers.py                         # yfinance log-return download
│   │   └── store.py                            # DuckDB init, upsert, read for returns + scenarios
│   ├── model/
│   │   ├── generator.py                        # Generator: latent z → n_assets return vector
│   │   ├── discriminator.py                    # Critic: return vector → scalar Wasserstein score
│   │   └── wgan_gp.py                          # WGANConfig, gradient penalty, train(), generate_scenarios()
│   ├── evaluation/
│   │   ├── diagnostics.py                      # tail stats, correlation, PCA, GARCH persistence, historical_simulation, parametric_bootstrap
│   │   └── portfolio.py                        # compute_frontier(), compute_var_coverage(), FrontierResult
│   ├── report/
│   │   └── plots.py                            # Plotly: loss curves, distribution, heatmaps, kurtosis, PCA, frontiers
│   ├── cli.py                                  # Typer CLI: fetch | train | generate | diagnose | portfolio
│   └── app.py                                  # Streamlit: 4 tabs (Distribution, Tail, Correlation/PCA, Frontier)
└── tests/
    ├── conftest.py                             # Seed-42 return matrix + 5k bootstrap scenario fixtures
    ├── test_diagnostics.py                     # Tail stats, correlation, PCA, simulator invariants
    ├── test_model.py                           # Generator, Critic, WGAN-GP training + generation
    └── test_portfolio.py                       # Frontier, min-var, VaR coverage invariants
  

%==========%


III. Data Sources:

Daily adjusted closing prices for a six-asset cross-class universe — SPY (US equities), EFA (developed ex-US), EEM (emerging markets), TLT (US long-duration Treasuries), GLD (gold), USO (crude oil) — are fetched via yfinance from 2010-01-01 onwards. Log returns are computed as \(r_t = \ln(P_t / P_{t-1})\) and persisted to a DuckDB returns table for all subsequent training and diagnostic steps. The asset universe is chosen to span equity, fixed income, and commodity risk factors, producing a covariance matrix with meaningfully distinct correlation structure and tail behaviour across regimes.

DuckDB is used as the local storage layer for three tables: returns (ticker × date × log_ret), scenarios (run_id × method × scenario_idx × ticker × ret), and train_state (run_id × epoch × g_loss × d_loss × wasserstein_dist). This schema supports multi-run comparisons without retraining and allows scenarios from different methods to coexist in a single database for side-by-side diagnostics.


# fetchers.py
def fetch_returns(tickers: list[str], start: str, end: str | None = None) -> pd.DataFrame:
    """Download adjusted closes from yfinance; return daily log-return DataFrame (date × ticker)."""
    raw = yf.download(tickers, start=start, end=end, auto_adjust=True, progress=False)["Close"]
    if isinstance(raw, pd.Series):
        raw = raw.to_frame(tickers[0])
    raw = raw[sorted(raw.columns)]
    rets = np.log(raw / raw.shift(1)).dropna()
    rets.index.name = "Date"
    return rets
  

%==========%


IV. WGAN-GP Architecture:

The Wasserstein GAN with gradient penalty (WGAN-GP, Gulrajani et al. 2017) addresses the training instability of the original GAN formulation by replacing the Jensen–Shannon divergence loss with the Wasserstein-1 (Earth Mover’s) distance and enforcing the 1-Lipschitz constraint on the critic via a differentiable gradient penalty rather than weight clipping. The training objective for the critic is:

\[ \mathcal{L}_{D} = \mathbb{E}_{\tilde{x} \sim P_g}\bigl[D(\tilde{x})\bigr] - \mathbb{E}_{x \sim P_r}\bigl[D(x)\bigr] + \lambda\,\mathbb{E}_{\hat{x} \sim P_{\hat{x}}}\Bigl[\bigl(\|\nabla_{\hat{x}} D(\hat{x})\|_2 - 1\bigr)^2\Bigr] \]

where \(\hat{x} = \alpha x + (1-\alpha)\tilde{x}\) for \(\alpha \sim U[0,1]\), \(\lambda = 10\), and \(P_g\) and \(P_r\) denote the generated and real return distributions respectively. The generator minimises:

\[ \mathcal{L}_{G} = -\mathbb{E}_{\tilde{x} \sim P_g}\bigl[D(\tilde{x})\bigr] \]

Both the generator and critic are three-hidden-layer feedforward networks with LeakyReLU activations (slope 0.2). The generator maps a latent vector \(z \sim \mathcal{N}(0, I_{64})\) to an \(n_\text{assets}\)-dimensional return vector with no output activation, allowing it to produce returns on the full real line. The critic similarly has no sigmoid activation and outputs an unbounded real score. The gradient penalty enforces the 1-Lipschitz constraint by penalising the gradient norm deviating from 1 at interpolated points between real and generated samples.

HyperparameterValueRationale
Latent dimension \(d_z\)64Sufficient capacity for 6-dimensional return distribution; larger values add no empirical benefit on short histories.
Hidden dimension256Three hidden layers each of width 256; symmetric generator and critic architecture.
Critic steps per generator step \(n_c\)5Ensures critic is near-optimal before each generator update; standard WGAN-GP setting.
Gradient penalty coefficient \(\lambda\)10Original Gulrajani et al. (2017) recommendation; balances Wasserstein and penalty terms.
Learning rates \(\eta_G, \eta_D\)\(10^{-4}\)Adam(\(\beta_1=0, \beta_2=0.9\)) per WGAN-GP paper; zero first moment avoids momentum interference.
Batch size64Mini-batch size for both critic and generator updates.
Epochs2 000Empirically sufficient for convergence on ~3 000 daily observations.

%==========%


V. Generator & Critic Networks (model/generator.py, model/discriminator.py):

Both networks use the same three-layer architecture with LeakyReLU(0.2) activations. The generator has no output nonlinearity so the output space is unbounded (appropriate for log returns). The critic has no sigmoid — it outputs an unbounded scalar estimating the Wasserstein score for each input sample. The only structural constraint on the critic is the 1-Lipschitz condition enforced through the gradient penalty rather than explicit parameter constraints.


# generator.py
class Generator(nn.Module):
    """Feedforward generator: latent noise z ~ N(0,I) → synthetic return vector."""

    def __init__(self, latent_dim: int, n_assets: int, hidden_dim: int = 256):
        super().__init__()
        self.latent_dim = latent_dim
        self.net = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim),
            nn.LeakyReLU(0.2),
            nn.Linear(hidden_dim, hidden_dim),
            nn.LeakyReLU(0.2),
            nn.Linear(hidden_dim, hidden_dim),
            nn.LeakyReLU(0.2),
            nn.Linear(hidden_dim, n_assets),
        )

    def forward(self, z: torch.Tensor) -> torch.Tensor:
        return self.net(z)

    @torch.no_grad()
    def sample(self, n: int, device: str = "cpu") -> torch.Tensor:
        z = torch.randn(n, self.latent_dim, device=device)
        return self.forward(z)


# discriminator.py
class Critic(nn.Module):
    """Feedforward Wasserstein critic: return vector → scalar score.
    Outputs an unbounded real value — no sigmoid.
    """

    def __init__(self, n_assets: int, hidden_dim: int = 256):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(n_assets, hidden_dim),
            nn.LeakyReLU(0.2),
            nn.Linear(hidden_dim, hidden_dim),
            nn.LeakyReLU(0.2),
            nn.Linear(hidden_dim, hidden_dim),
            nn.LeakyReLU(0.2),
            nn.Linear(hidden_dim, 1),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.net(x)
  

%==========%


VI. Training Procedure & Gradient Penalty (model/wgan_gp.py):

For each generator update, the critic is updated \(n_c = 5\) times. In each critic step: a mini-batch of real returns is sampled from the historical data, a corresponding batch of fake returns is generated, and the critic loss is computed as the difference in critic scores (Wasserstein estimate) plus the gradient penalty. The generator then takes a single gradient step to maximise the critic score of its outputs. Adam(\(\beta_1=0, \beta_2=0.9\)) is used for both networks; the zero first moment is required because WGAN-GP training is sensitive to momentum accumulation that can destabilise the critic’s near-optimality between generator updates.


# wgan_gp.py
def _gradient_penalty(critic, real, fake, device):
    """WGAN-GP gradient penalty at uniformly interpolated samples."""
    alpha = torch.rand(real.size(0), 1, device=device)
    interp = (alpha * real + (1 - alpha) * fake).requires_grad_(True)
    d_interp = critic(interp)
    grad = torch.autograd.grad(
        outputs=d_interp,
        inputs=interp,
        grad_outputs=torch.ones_like(d_interp),
        create_graph=True,
        retain_graph=True,
    )[0]
    penalty = ((grad.norm(2, dim=1) - 1) ** 2).mean()
    return penalty


def train(returns: pd.DataFrame, cfg: WGANConfig | None = None) -> TrainResult:
    ...
    for epoch in range(cfg.n_epochs):
        for _ in range(cfg.n_critic):                     # critic loop
            idx  = torch.randint(0, len(data), (cfg.batch_size,))
            real = data[idx]
            z    = torch.randn(cfg.batch_size, cfg.latent_dim, device=cfg.device)
            fake = gen(z).detach()

            d_real = crit(real).mean()
            d_fake = crit(fake).mean()
            gp     = _gradient_penalty(crit, real, fake, cfg.device)
            d_loss = d_fake - d_real + cfg.gp_lambda * gp

            opt_d.zero_grad(); d_loss.backward(); opt_d.step()

        z      = torch.randn(cfg.batch_size, cfg.latent_dim, device=cfg.device)
        g_loss = -crit(gen(z)).mean()                     # generator step
        opt_g.zero_grad(); g_loss.backward(); opt_g.step()
  

The Wasserstein distance estimate tracked during training is \(\hat{W} = \mathbb{E}[D(x)] - \mathbb{E}[D(\tilde{x})]\): the difference in mean critic scores between real and generated batches. As training converges, \(\hat{W}\) decreases toward zero, indicating the generated distribution is approaching the real one. Unlike the original GAN formulation, WGAN-GP provides a meaningful and monotone convergence signal — mode collapse in the original GAN produces near-zero JS divergence loss with no diagnostic signal, whereas a collapsed WGAN-GP generator would exhibit non-decreasing \(\hat{W}\), flagging the problem directly.


%==========%


VII. Benchmark Methods (evaluation/diagnostics.py):

Two classical scenario generation methods serve as baselines against which GAN performance is measured:

Historical simulation resamples rows of the empirical return matrix with replacement (i.e., the empirical bootstrap). Each generated scenario is an exact copy of an observed trading day, preserving all observed distributional features — fat tails, correlation structure, and volatility clustering — but cannot extrapolate beyond the observed history. The tail of the generated distribution is bounded by the most extreme historical observation, which is a structural limitation for stress-testing rare events.

Parametric bootstrapping fits a multivariate normal distribution \(\mathcal{N}(\hat\mu, \hat\Sigma)\) to the empirical returns and draws \(n\) independent samples. This produces the smoothest possible scenario set but systematically underestimates tail risk: Gaussian tails are exponentially thinner than the empirical return distribution, excess kurtosis is exactly zero by construction, and the correlation structure is the same in calm and stressed regimes (the correlation matrix is constant).


# diagnostics.py
def historical_simulation(returns: pd.DataFrame, n: int, seed: int = 42) -> pd.DataFrame:
    """Bootstrap n scenarios by resampling rows of the historical return matrix."""
    rng = np.random.default_rng(seed)
    idx = rng.integers(0, len(returns), size=n)
    return returns.iloc[idx].reset_index(drop=True)


def parametric_bootstrap(returns: pd.DataFrame, n: int, seed: int = 42) -> pd.DataFrame:
    """Draw n scenarios from N(μ̂, Σ̂) fitted to the historical return matrix."""
    rng = np.random.default_rng(seed)
    mu  = returns.mean().values
    cov = returns.cov().values
    samples = rng.multivariate_normal(mu, cov, size=n)
    return pd.DataFrame(samples, columns=returns.columns)
  

%==========%


VIII. Statistical Diagnostics (evaluation/diagnostics.py):

The diagnostic framework evaluates three dimensions of distributional fidelity:

Marginal tail behaviour. For each asset, the excess kurtosis \(\kappa_4 = \mu_4/\sigma^4 - 3\) (Fisher definition: zero for a normal distribution, positive for fat-tailed distributions) and the historical VaR at the 95th and 99th percentiles are compared across methods. A well-calibrated GAN should produce \(\kappa_4 > 0\) for equity assets while the parametric baseline will always give \(\kappa_4 = 0\).

Correlation structure. The Pearson correlation matrix of generated scenarios is compared with the empirical correlation matrix via off-diagonal RMSE. Both historical simulation and the parametric baseline should reproduce the empirical correlation by construction; the GAN learns correlations from data and may over- or under-estimate pairwise dependencies depending on the training sample size.

Principal component alignment. PCA is applied to each scenario set after standard scaling. The cumulative explained variance profile (fraction of total variance explained by the top \(k\) components) measures whether the GAN preserves the dominant risk factor structure of the empirical data. A well-trained GAN should produce a PCA profile close to the historical baseline; the parametric model will match by construction if \(\hat\Sigma\) is used directly.

GARCH persistence. A GARCH(1,1) model is fitted to each asset’s generated return series. The persistence parameter \(\alpha_1 + \beta_1\) measures the degree of volatility clustering: values close to 1 indicate persistent clustering (as observed in real financial returns), while values near 0 indicate i.i.d.-like behaviour. The parametric bootstrap produces \(\alpha_1 + \beta_1 \approx 0\) because samples are i.i.d. by construction; historical simulation can only replicate the sequence structure of observed windows but not learn to generate new clustered episodes.


# diagnostics.py
def compute_tail_stats(scenarios: pd.DataFrame) -> dict[str, dict[str, float]]:
    """Per-asset tail statistics from a (n_scenarios × n_assets) DataFrame."""
    out = {}
    for col in scenarios.columns:
        s = scenarios[col].dropna()
        out[str(col)] = {
            "mean":             float(s.mean()),
            "std":              float(s.std()),
            "skewness":         float(stats.skew(s)),
            "excess_kurtosis":  float(stats.kurtosis(s)),   # Fisher: 0 for Gaussian
            "var_95":           float(np.percentile(s, 5)),
            "var_99":           float(np.percentile(s, 1)),
        }
    return out


def compute_garch_persistence(returns: pd.Series) -> dict[str, float]:
    """Fit GARCH(1,1); return alpha, beta, and alpha+beta persistence."""
    from arch import arch_model
    am  = arch_model(returns * 100.0, vol="GARCH", p=1, q=1, rescale=False)
    res = am.fit(disp="off")
    alpha = float(res.params.get("alpha[1]", np.nan))
    beta  = float(res.params.get("beta[1]",  np.nan))
    return {"alpha": alpha, "beta": beta, "persistence": alpha + beta}
  

%==========%


IX. Portfolio Impact Analysis (evaluation/portfolio.py):

The ultimate purpose of scenario generation in risk management is to inform portfolio decisions. Three portfolio-level outputs are computed and compared across scenario methods:

Efficient frontier shape. The mean–variance frontier is traced by solving the parametric quadratic programme across a grid of target returns. The scenario-implied covariance matrix and expected return vector determine the frontier; differences in frontier shape between methods reflect differences in estimated tail risk and diversification potential.

Minimum-variance portfolio weights. The weight vector \(w^* = \arg\min_{w} w^\top \hat\Sigma w\) subject to \(\mathbf{1}^\top w = 1, w \ge 0\) is computed for each scenario set. Weight instability — large differences in \(w^*\) across scenario methods — signals that the portfolio is sensitive to distributional assumptions in the scenario generation step.

Out-of-sample VaR coverage ratio. The scenario-implied portfolio VaR is computed as a percentile of the equal-weight portfolio return distribution across all generated scenarios. The VaR threshold is then applied to a holdout period of realised returns; the fraction of holdout days where the realised return exceeds the threshold is the coverage ratio. A well-calibrated scenario set should produce coverage close to the nominal level (e.g., 95% for VaR-95). Systematic under-coverage (coverage < 95%) indicates the scenario tail is too thin — the VaR threshold is too optimistic — and over-coverage indicates the tail is too fat, generating overly conservative risk limits.


# portfolio.py
def compute_frontier(scenarios: pd.DataFrame, method: str, ...) -> FrontierResult:
    mu, cov = scenarios.mean().values, scenarios.cov().values
    bounds  = tuple((0.0, 1.0) for _ in range(len(mu)))
    sum_one = {"type": "eq", "fun": lambda w: w.sum() - 1.0}

    res_mv = minimize(lambda w: _port_vol(w, cov), w0,
                      method="SLSQP", bounds=bounds, constraints=[sum_one])
    w_mv = res_mv.x if res_mv.success else w0

    ...  # frontier sweep over target return grid

    return FrontierResult(method=method, vols=vols_f, rets=rets_f,
                          min_var_weights=dict(zip(tickers, w_mv)),
                          min_var_vol=_port_vol(w_mv, cov), ...)


def compute_var_coverage(scenarios, weights, holdout, level=0.95) -> float:
    """VaR coverage: fraction of holdout returns above the scenario VaR threshold."""
    port_scenario = scenarios.values @ weights
    var_threshold = float(np.percentile(port_scenario, (1 - level) * 100))
    port_holdout  = holdout.values @ weights
    return float((port_holdout >= var_threshold).mean())
  

%==========%


X. Scenario Quality — What GANs Capture and Where They Fail:

The WGAN-GP framework has genuine distributional advantages over parametric methods on financial return data, but also structural limitations that practitioners must account for:

Tail capture. The GAN has no constraint forcing outputs to be Gaussian, so the generator can reproduce fat tails, negative skewness, and leptokurtic marginals that the parametric bootstrap systematically misses. On sufficient training data (>2 000 observations), GAN-generated VaR levels at the 99th percentile are typically closer to the empirical level than the normal approximation.

Asymmetric correlations. The empirical correlation between equity assets increases in magnitude during stress regimes — an effect parametric models with constant \(\hat\Sigma\) cannot capture. A GAN trained on a history that includes stress periods can learn to generate correlated drawdown scenarios that parametric sampling would treat as independent.

Volatility clustering. The GARCH persistence of GAN-generated sequences reflects whether the generator learns the temporal autocorrelation structure of squared returns. This is the hardest distributional feature to capture with a cross-sectional feedforward architecture (which sees one return vector at a time, with no sequence context) and is where recurrent or transformer GAN architectures would add value. The current feedforward implementation achieves partial persistence through distributional shape but not through conditional heteroskedasticity in the strict GARCH sense.

Training data limitations. The primary weakness of the data-driven approach is sample efficiency: a parametric model can recover the covariance matrix from 200 observations; a GAN with 256-dimensional hidden layers needs substantially more data to avoid memorising the training set rather than learning its distribution. Mode collapse — where the generator learns only a subset of the return distribution — is monitored via the Wasserstein distance during training. The WGAN-GP framework is substantially more robust to mode collapse than the original GAN, but the risk increases as the ratio of model parameters to training observations grows.


%==========%


XI. CLI — cli.py:

Five subcommands cover the full pipeline from data ingestion through scenario generation and portfolio diagnostics. All commands share the --db interface pointing to the DuckDB database.


# Install
pip install -e ".[dev]"

# Download log returns for the 6-asset universe (~seconds via yfinance)
gan fetch --tickers "SPY,EFA,EEM,TLT,GLD,USO" --start 2010-01-01

# Train WGAN-GP — saves generator weights to data/generator_run_001.pt
gan train --run-id run_001 --epochs 2000 --latent-dim 64 --n-critic 5

# Sample 5 000 synthetic scenarios from the trained generator
gan generate --run-id run_001 --n 5000

# Print excess kurtosis comparison table (GAN vs historical vs parametric)
gan diagnose --run-id run_001 --fit-garch

# Efficient frontier + VaR coverage table across all three methods
gan portfolio --run-id run_001

# Launch Streamlit server-side dashboard
streamlit run src/gans_bofa/app.py
  

CommandKey optionsOutput
gan fetch--tickers, --start, --dbDownloads returns and upserts to DuckDB returns table
gan train--run-id, --epochs, --latent-dim, --n-criticTrains WGAN-GP; saves .pt weights; logs Wasserstein distance history to DB
gan generate--run-id, --n, --tickersLoads saved generator; inserts \(n\) GAN scenarios to scenarios table
gan diagnose--run-id, --fit-garchRich table: excess kurtosis for GAN, historical, parametric per asset
gan portfolio--run-id, --nEfficient frontier stats and out-of-sample VaR coverage ratios

%==========%


XII. Test Suite:

All tests are fully offline. The shared conftest.py fixtures generate a deterministic 504-day (two trading years) return matrix with correlated Student-t draws (seed 42, five assets) and a 5 000-row bootstrap scenario set. Diagnostic tests verify mathematical invariants: VaR-99 is more negative than VaR-95 for every asset; Gaussian data produces near-zero excess kurtosis; Student-t(3) data produces excess kurtosis above 1; the correlation matrix is symmetric with unit diagonal and values in \([-1, 1]\); PCA explained variance sums to 1 and is sorted descending; historical simulation returns only rows present in the original matrix; parametric bootstrap mean converges to the historical mean over 50 000 draws. Model tests verify generator output shape, unbounded critic output (no sigmoid), 50-epoch training completion, consistent loss history lengths, generated scenario shape and column matching, and finite values. Portfolio tests verify non-negative frontier volatilities, weight sums equal 1 for both min-var and max-Sharpe portfolios, min-var vol is the minimum over the frontier grid, and VaR coverage lies in \([0, 1]\) with the 99% coverage no greater than the 95% coverage.


# test_diagnostics.py — selected invariants
def test_var_99_more_negative_than_var_95(self, scenarios_5k):
    stats = compute_tail_stats(scenarios_5k)
    for asset, s in stats.items():
        assert s["var_99"] <= s["var_95"], f"{asset}: VaR-99 should be <= VaR-95"

def test_historical_values_in_original_set(self, returns_df):
    scen = historical_simulation(returns_df, n=200, seed=0)
    orig_set = set(map(tuple, returns_df.values.tolist()))
    for row in scen.values.tolist():
        assert tuple(row) in orig_set, "Historical sim must only resample existing rows"

def test_parametric_mean_close_to_historical(self, returns_df):
    scen = parametric_bootstrap(returns_df, n=50_000, seed=0)
    np.testing.assert_allclose(scen.mean().values, returns_df.mean().values, atol=1e-4)


# test_model.py — selected invariants
def test_output_unbounded(self):
    crit = Critic(n_assets=5, hidden_dim=64)
    x    = torch.randn(128, 5) * 10
    out  = crit(x)
    assert out.abs().max() > 0.5, "Critic output should be unbounded (no sigmoid)"

def test_no_data_leakage(self, returns_df):
    """Training data is only used to compute the Wasserstein loss; test that generated
    scenarios contain values outside the training set (GAN can extrapolate)."""
    cfg    = WGANConfig(latent_dim=16, hidden_dim=64, n_epochs=20, n_critic=2, batch_size=32)
    result = train(returns_df, cfg)
    scen   = generate_scenarios(result.generator, 500, list(returns_df.columns))
    # At least some generated returns should differ from all training rows
    orig   = set(map(tuple, np.round(returns_df.values, 8).tolist()))
    gen_   = [tuple(np.round(r, 8).tolist()) for r in scen.values]
    new_   = [r for r in gen_ if r not in orig]
    assert len(new_) > 0, "Generator should produce values outside the training set"


# test_portfolio.py — selected invariants
def test_min_var_weights_sum_to_one(self, scenarios_5k):
    fr = compute_frontier(scenarios_5k, "test")
    assert abs(sum(fr.min_var_weights.values()) - 1.0) < 1e-4

def test_coverage_99_leq_coverage_95(self, scenarios_5k, returns_df):
    w    = np.ones(scenarios_5k.shape[1]) / scenarios_5k.shape[1]
    c95  = compute_var_coverage(scenarios_5k, w, returns_df, level=0.95)
    c99  = compute_var_coverage(scenarios_5k, w, returns_df, level=0.99)
    assert c99 <= c95 + 1e-6
  

%==========%


XIII. Configuration & Setup:
Setup and launch (local):

cd assets/projects/gans_bofa
python -m venv .venv && .venv\Scripts\Activate.ps1        # Windows
pip install -e ".[dev]"
cp .env.example .env

# Download data (seconds via yfinance)
gan fetch --tickers "SPY,EFA,EEM,TLT,GLD,USO" --start 2010-01-01

# Train WGAN-GP (~10–30 minutes on CPU depending on hardware)
gan train --run-id run_001 --epochs 2000

# Generate and evaluate
gan generate --run-id run_001 --n 5000
gan diagnose --run-id run_001 --fit-garch
gan portfolio --run-id run_001

# Launch server-side dashboard
streamlit run src/gans_bofa/app.py
  

VariableDefaultDescription
DB_PATHdata/gans.duckdbDuckDB database path for returns, scenarios, and training logs
GAN_RUN_IDrun_001Run identifier used to tag scenarios and model weights; increment for new training runs
DataSourceNotes
Adjusted daily pricesYahoo Finance via yfinanceFree, no API key; auto-adjusted for dividends and splits.
Return storageDuckDB (local file)No server required; all queries are in-process.

Team:

Theodosios Dimitrasopoulos, in collaboration with Bank of America Securities.

Tools & methods:

Python 3.11, pandas, NumPy, SciPy, statsmodels, scikit-learn, PyTorch 2.x (WGAN-GP), arch (GARCH diagnostics), Pydantic v2, DuckDB, Typer, rich, Plotly, Streamlit, yfinance, pytest, ruff, hatchling. Methodology: Wasserstein GAN with gradient penalty (Gulrajani et al. 2017); historical bootstrap and parametric multivariate normal benchmarks; marginal tail diagnostics (excess kurtosis, VaR); Pearson correlation and off-diagonal RMSE; GARCH(1,1) persistence; PCA cumulative explained variance; mean–variance efficient frontier; out-of-sample Kupiec-style VaR coverage ratios.