Mathematical Framework

IMPORTANT: All empirical analysis in this project is conducted at daily frequency using daily OHLCV bars from Polygon.io. No intraday, tick, or order-book data is used in the current experiment.

Core Mathematical Foundations

1. Cross-Asset Alpha Generation Model

Alpha Decomposition

The total alpha generated by the system can be decomposed as:

\[\alpha_t = \alpha_t^{tech} + \alpha_t^{micro} + \alpha_t^{cross} + \epsilon_t\]

Where: - \(\alpha_t^{tech}\): Technical analysis alpha - \(\alpha_t^{micro}\): Daily microstructure-inspired alpha (computed from daily OHLCV bars, not true intraday data) - \(\alpha_t^{cross}\): Cross-asset alpha - \(\epsilon_t\): Idiosyncratic noise

Regime-Conditional Alpha

Alpha generation is conditioned on market regime:

\[\alpha_t = \sum_{k=1}^{K} P(S_t = k | \mathcal{F}_{t-1}) \cdot \alpha_t^{(k)}\]

Where: - \(S_t\): Market regime at time \(t\) - \(K\): Number of regimes - \(P(S_t = k | \mathcal{F}_{t-1})\): Regime probability given information set - \(\alpha_t^{(k)}\): Regime-specific alpha

2. Hidden Markov Model Framework

State Space Representation

The HMM assumes markets exist in \(K\) unobservable states with dynamics:

\[P(S_t = j | S_{t-1} = i) = a_{ij}\]

Transition Matrix: \(\(A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1K} \\ a_{21} & a_{22} & \cdots & a_{2K} \\ \vdots & \vdots & \ddots & \vdots \\ a_{K1} & a_{K2} & \cdots & a_{KK} \end{pmatrix}\)\)

Where \(\sum_{j=1}^{K} a_{ij} = 1\) for all \(i\).

Observation Model

Observable variables are generated by regime-dependent distributions:

\[O_t | S_t = k \sim \mathcal{N}(\mu_k, \Sigma_k)\]

Where: - \(O_t\): Observable feature vector at time \(t\) - \(\mu_k\): Mean vector for regime \(k\) - \(\Sigma_k\): Covariance matrix for regime \(k\)

Forward Algorithm

Compute the probability of observing sequence up to time \(t\) and being in state \(j\):

\[\alpha_t(j) = P(O_1, O_2, \ldots, O_t, S_t = j | \lambda)\]

Recursion: \(\(\alpha_t(j) = \left[\sum_{i=1}^{K} \alpha_{t-1}(i) a_{ij}\right] b_j(O_t)\)\)

Where \(b_j(O_t)\) is the emission probability of observation \(O_t\) in state \(j\).

Backward Algorithm

Compute the probability of future observations given current state:

\[\beta_t(i) = P(O_{t+1}, O_{t+2}, \ldots, O_T | S_t = i, \lambda)\]

Recursion: \(\(\beta_t(i) = \sum_{j=1}^{K} a_{ij} b_j(O_{t+1}) \beta_{t+1}(j)\)\)

Viterbi Algorithm

Find the most likely state sequence:

\[\delta_t(j) = \max_{s_1, s_2, \ldots, s_{t-1}} P(s_1, s_2, \ldots, s_{t-1}, s_t = j, O_1, O_2, \ldots, O_t | \lambda)\]

Recursion: \(\(\delta_t(j) = \max_{1 \leq i \leq K} [\delta_{t-1}(i) a_{ij}] b_j(O_t)\)\)

3. Feature Engineering Mathematics

Technical Features

Multi-Horizon Returns: \(\(r_{t,h} = \frac{P_t - P_{t-h}}{P_{t-h}}\)\)

Realized Volatility: \(\(\sigma_{t,h} = \sqrt{\frac{252}{h} \sum_{i=1}^{h} r_{t-i+1}^2}\)\)

Momentum Score: \(\(M_{t,h} = \frac{1}{h} \sum_{i=1}^{h} \text{sign}(r_{t-i+1}) \cdot |r_{t-i+1}|^{\gamma}\)\)

Where \(\gamma\) controls the weighting of large moves.

RSI Calculation: \(\(RSI_t = 100 - \frac{100}{1 + RS_t}\)\)

Where: \(\(RS_t = \frac{\text{EMA}(\text{Gains}_t)}{\text{EMA}(\text{Losses}_t)}\)\)

Daily Microstructure-Inspired Features

Note: All features are computed from daily OHLCV bars. No intraday or tick data is used.

VWAP Deviation (computed from daily bars): \(\(D_{t}^{VWAP} = \frac{P_t - VWAP_t}{VWAP_t}\)\)

Where \(VWAP_t\) is computed from daily OHLCV data: \(\(VWAP_t = \frac{\sum_{i=1}^{n} P_i \cdot V_i}{\sum_{i=1}^{n} V_i}\)\)

Volume Z-Score (computed from daily volume): \(\(Z_t^{Vol} = \frac{V_t - \mu_{V,t}}{\sigma_{V,t}}\)\)

Where \(\mu_{V,t}\) and \(\sigma_{V,t}\) are rolling mean and standard deviation of daily volume.

Cross-Asset Features

Rolling Correlation: \(\(\rho_{t,h}(X,Y) = \frac{\sum_{i=1}^{h}(X_{t-i+1} - \bar{X})(Y_{t-i+1} - \bar{Y})}{\sqrt{\sum_{i=1}^{h}(X_{t-i+1} - \bar{X})^2 \sum_{i=1}^{h}(Y_{t-i+1} - \bar{Y})^2}}\)\)

Cross-Asset Volatility Ratio: \(\(VR_{t,h} = \frac{\sigma_{equity,t,h}}{\sigma_{bond,t,h}}\)\)

4. Machine Learning Model Mathematics

Random Forest Prediction

For \(B\) trees, the prediction is:

\[\hat{y} = \frac{1}{B} \sum_{b=1}^{B} T_b(x)\]

Where \(T_b(x)\) is the prediction from tree \(b\).

Feature Importance (Gini)

For feature \(j\) in tree \(T\):

\[I_j(T) = \sum_{t \in T} p(t) \Delta_t\]

Where: - \(p(t)\): Proportion of samples reaching node \(t\) - \(\Delta_t\): Impurity decrease at node \(t\)

Gini Impurity: \(\(G(t) = 1 - \sum_{k=1}^{K} p_k^2(t)\)\)

SHAP Values

For feature \(j\) and prediction instance \(x\):

\[\phi_j(x) = \sum_{S \subseteq F \setminus \{j\}} \frac{|S|!(|F|-|S|-1)!}{|F|!} [f(S \cup \{j\}) - f(S)]\]

Where: - \(F\): Set of all features - \(S\): Subset of features - \(f(S)\): Model prediction using feature subset \(S\)

5. Portfolio Construction Mathematics

Mean-Variance Optimization

Minimize portfolio variance subject to expected return constraint:

\[\min_w \frac{1}{2} w^T \Sigma w\]

Subject to: - \(w^T \mu = \mu_p\) (target return) - \(w^T \mathbf{1} = 1\) (fully invested) - \(|w_i| \leq w_{max}\) (position limits)

Kelly Criterion

Optimal fraction to invest:

\[f^* = \frac{bp - q}{b}\]

Where: - \(b\): Odds received on the wager - \(p\): Probability of winning - \(q = 1 - p\): Probability of losing

Generalized Kelly for Multiple Assets: \(\(w^* = \Sigma^{-1} \mu\)\)

Where \(\mu\) is expected excess return vector and \(\Sigma\) is covariance matrix.

Risk Parity Weights

Equal risk contribution weights:

\[w_i = \frac{1/\sigma_i}{\sum_{j=1}^{N} 1/\sigma_j}\]

Risk Contribution: \(\(RC_i = w_i \frac{\partial \sigma_p}{\partial w_i} = w_i \frac{(\Sigma w)_i}{\sigma_p}\)\)

6. Risk Management Mathematics

Value at Risk (VaR)

For confidence level \(\alpha\):

\[VaR_\alpha = -F^{-1}(\alpha)\]

Where \(F\) is the cumulative distribution function of portfolio returns.

Parametric VaR (assuming normality): \(\(VaR_\alpha = -(\mu + z_\alpha \sigma)\)\)

Expected Shortfall (Conditional VaR)

\[ES_\alpha = -E[R | R \leq -VaR_\alpha]\]

For normal distribution: \(\(ES_\alpha = -\mu + \sigma \frac{\phi(z_\alpha)}{1-\alpha}\)\)

Where \(\phi\) is the standard normal PDF.

Maximum Drawdown

\[MDD = \max_{t \in [0,T]} \left[ \max_{s \in [0,t]} V_s - V_t \right]\]

Where \(V_t\) is portfolio value at time \(t\).

7. Performance Attribution Mathematics

Information Ratio

\[IR = \frac{E[R_p - R_b]}{\sigma(R_p - R_b)}\]

Where: - \(R_p\): Portfolio return - \(R_b\): Benchmark return

Sharpe Ratio

\[SR = \frac{E[R_p - R_f]}{\sigma(R_p)}\]

Where \(R_f\) is the risk-free rate.

Calmar Ratio

\[CR = \frac{\text{Annualized Return}}{\text{Maximum Drawdown}}\]

8. Transaction Cost Mathematics

Market Impact Model (Square Root Law)

\[I = \sigma \sqrt{\frac{Q}{V}} \cdot g(\tau)\]

Where: - \(I\): Market impact - \(\sigma\): Volatility - \(Q\): Order size - \(V\): Average daily volume - \(g(\tau)\): Time function

Implementation Shortfall

\[IS = \sum_{i=1}^{N} w_i \left[ (P_i^{exec} - P_i^{decision}) + \frac{1}{2} \sigma_i \sqrt{\frac{Q_i}{V_i}} \right]\]

9. Statistical Tests

Augmented Dickey-Fuller Test

Test statistic for unit root:

\[ADF = \frac{\hat{\gamma}}{\text{SE}(\hat{\gamma})}\]

From regression: \(\(\Delta y_t = \alpha + \beta t + \gamma y_{t-1} + \sum_{i=1}^{p} \delta_i \Delta y_{t-i} + \epsilon_t\)\)

Ljung-Box Test

Test for autocorrelation:

\[Q = n(n+2) \sum_{k=1}^{h} \frac{\hat{\rho}_k^2}{n-k}\]

Where \(\hat{\rho}_k\) is the sample autocorrelation at lag \(k\).

This mathematical framework provides the rigorous foundation for all calculations and algorithms used in the Cross-Asset Alpha Engine.