Mathematical Framework
IMPORTANT: All empirical analysis in this project is conducted at daily frequency using daily OHLCV bars from Polygon.io. No intraday, tick, or order-book data is used in the current experiment.
Core Mathematical Foundations
1. Cross-Asset Alpha Generation Model
Alpha Decomposition
The total alpha generated by the system can be decomposed as:
Where: - \(\alpha_t^{tech}\): Technical analysis alpha - \(\alpha_t^{micro}\): Daily microstructure-inspired alpha (computed from daily OHLCV bars, not true intraday data) - \(\alpha_t^{cross}\): Cross-asset alpha - \(\epsilon_t\): Idiosyncratic noise
Regime-Conditional Alpha
Alpha generation is conditioned on market regime:
Where: - \(S_t\): Market regime at time \(t\) - \(K\): Number of regimes - \(P(S_t = k | \mathcal{F}_{t-1})\): Regime probability given information set - \(\alpha_t^{(k)}\): Regime-specific alpha
2. Hidden Markov Model Framework
State Space Representation
The HMM assumes markets exist in \(K\) unobservable states with dynamics:
Transition Matrix: \(\(A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1K} \\ a_{21} & a_{22} & \cdots & a_{2K} \\ \vdots & \vdots & \ddots & \vdots \\ a_{K1} & a_{K2} & \cdots & a_{KK} \end{pmatrix}\)\)
Where \(\sum_{j=1}^{K} a_{ij} = 1\) for all \(i\).
Observation Model
Observable variables are generated by regime-dependent distributions:
Where: - \(O_t\): Observable feature vector at time \(t\) - \(\mu_k\): Mean vector for regime \(k\) - \(\Sigma_k\): Covariance matrix for regime \(k\)
Forward Algorithm
Compute the probability of observing sequence up to time \(t\) and being in state \(j\):
Recursion: \(\(\alpha_t(j) = \left[\sum_{i=1}^{K} \alpha_{t-1}(i) a_{ij}\right] b_j(O_t)\)\)
Where \(b_j(O_t)\) is the emission probability of observation \(O_t\) in state \(j\).
Backward Algorithm
Compute the probability of future observations given current state:
Recursion: \(\(\beta_t(i) = \sum_{j=1}^{K} a_{ij} b_j(O_{t+1}) \beta_{t+1}(j)\)\)
Viterbi Algorithm
Find the most likely state sequence:
Recursion: \(\(\delta_t(j) = \max_{1 \leq i \leq K} [\delta_{t-1}(i) a_{ij}] b_j(O_t)\)\)
3. Feature Engineering Mathematics
Technical Features
Multi-Horizon Returns: \(\(r_{t,h} = \frac{P_t - P_{t-h}}{P_{t-h}}\)\)
Realized Volatility: \(\(\sigma_{t,h} = \sqrt{\frac{252}{h} \sum_{i=1}^{h} r_{t-i+1}^2}\)\)
Momentum Score: \(\(M_{t,h} = \frac{1}{h} \sum_{i=1}^{h} \text{sign}(r_{t-i+1}) \cdot |r_{t-i+1}|^{\gamma}\)\)
Where \(\gamma\) controls the weighting of large moves.
RSI Calculation: \(\(RSI_t = 100 - \frac{100}{1 + RS_t}\)\)
Where: \(\(RS_t = \frac{\text{EMA}(\text{Gains}_t)}{\text{EMA}(\text{Losses}_t)}\)\)
Daily Microstructure-Inspired Features
Note: All features are computed from daily OHLCV bars. No intraday or tick data is used.
VWAP Deviation (computed from daily bars): \(\(D_{t}^{VWAP} = \frac{P_t - VWAP_t}{VWAP_t}\)\)
Where \(VWAP_t\) is computed from daily OHLCV data: \(\(VWAP_t = \frac{\sum_{i=1}^{n} P_i \cdot V_i}{\sum_{i=1}^{n} V_i}\)\)
Volume Z-Score (computed from daily volume): \(\(Z_t^{Vol} = \frac{V_t - \mu_{V,t}}{\sigma_{V,t}}\)\)
Where \(\mu_{V,t}\) and \(\sigma_{V,t}\) are rolling mean and standard deviation of daily volume.
Cross-Asset Features
Rolling Correlation: \(\(\rho_{t,h}(X,Y) = \frac{\sum_{i=1}^{h}(X_{t-i+1} - \bar{X})(Y_{t-i+1} - \bar{Y})}{\sqrt{\sum_{i=1}^{h}(X_{t-i+1} - \bar{X})^2 \sum_{i=1}^{h}(Y_{t-i+1} - \bar{Y})^2}}\)\)
Cross-Asset Volatility Ratio: \(\(VR_{t,h} = \frac{\sigma_{equity,t,h}}{\sigma_{bond,t,h}}\)\)
4. Machine Learning Model Mathematics
Random Forest Prediction
For \(B\) trees, the prediction is:
Where \(T_b(x)\) is the prediction from tree \(b\).
Feature Importance (Gini)
For feature \(j\) in tree \(T\):
Where: - \(p(t)\): Proportion of samples reaching node \(t\) - \(\Delta_t\): Impurity decrease at node \(t\)
Gini Impurity: \(\(G(t) = 1 - \sum_{k=1}^{K} p_k^2(t)\)\)
SHAP Values
For feature \(j\) and prediction instance \(x\):
Where: - \(F\): Set of all features - \(S\): Subset of features - \(f(S)\): Model prediction using feature subset \(S\)
5. Portfolio Construction Mathematics
Mean-Variance Optimization
Minimize portfolio variance subject to expected return constraint:
Subject to: - \(w^T \mu = \mu_p\) (target return) - \(w^T \mathbf{1} = 1\) (fully invested) - \(|w_i| \leq w_{max}\) (position limits)
Kelly Criterion
Optimal fraction to invest:
Where: - \(b\): Odds received on the wager - \(p\): Probability of winning - \(q = 1 - p\): Probability of losing
Generalized Kelly for Multiple Assets: \(\(w^* = \Sigma^{-1} \mu\)\)
Where \(\mu\) is expected excess return vector and \(\Sigma\) is covariance matrix.
Risk Parity Weights
Equal risk contribution weights:
Risk Contribution: \(\(RC_i = w_i \frac{\partial \sigma_p}{\partial w_i} = w_i \frac{(\Sigma w)_i}{\sigma_p}\)\)
6. Risk Management Mathematics
Value at Risk (VaR)
For confidence level \(\alpha\):
Where \(F\) is the cumulative distribution function of portfolio returns.
Parametric VaR (assuming normality): \(\(VaR_\alpha = -(\mu + z_\alpha \sigma)\)\)
Expected Shortfall (Conditional VaR)
For normal distribution: \(\(ES_\alpha = -\mu + \sigma \frac{\phi(z_\alpha)}{1-\alpha}\)\)
Where \(\phi\) is the standard normal PDF.
Maximum Drawdown
Where \(V_t\) is portfolio value at time \(t\).
7. Performance Attribution Mathematics
Information Ratio
Where: - \(R_p\): Portfolio return - \(R_b\): Benchmark return
Sharpe Ratio
Where \(R_f\) is the risk-free rate.
Calmar Ratio
8. Transaction Cost Mathematics
Market Impact Model (Square Root Law)
Where: - \(I\): Market impact - \(\sigma\): Volatility - \(Q\): Order size - \(V\): Average daily volume - \(g(\tau)\): Time function
Implementation Shortfall
9. Statistical Tests
Augmented Dickey-Fuller Test
Test statistic for unit root:
From regression: \(\(\Delta y_t = \alpha + \beta t + \gamma y_{t-1} + \sum_{i=1}^{p} \delta_i \Delta y_{t-i} + \epsilon_t\)\)
Ljung-Box Test
Test for autocorrelation:
Where \(\hat{\rho}_k\) is the sample autocorrelation at lag \(k\).
This mathematical framework provides the rigorous foundation for all calculations and algorithms used in the Cross-Asset Alpha Engine.