Finance

Application: Finance 💡

NoteFinancial Asset Recommendation using Bandit Algorithm Techniques (Capstone Group 10)

The research paper FAR-Trans: An Investment Dataset for Financial Asset Recommendation (Sanz-Cruzado, Droukas, and McCreadie 2024) provides a public dataset on Financial Asset Recommendations (FAR), composed of a rich collection of user investment transactions.


University of Glasgow & National Bank of Greece (2024) (Sanz-Cruzado, Droukas, and McCreadie 2024)
Link to Research Paper

Leveraging this public dataset, the goal of this research project three-fold:

The action space \(\mathcal{A}\) consists of the available financial assets:

\[ \mathcal{A} = \{ \text{Asset}_1, \ \text{Asset}_2, \ldots, \ \text{Asset}_k \} \]

To organize the features over time, assets, and dimensions, we define a feature tensor \(\mathsf{X}\) as:

\[ \mathsf{X}_{(T, |\mathcal{A}|, d)} = \begin{bmatrix} [ \mathbf{x}_{1,1} & \mathbf{x}_{1,2} & \cdots & \mathbf{x}_{1,|\mathcal{A}|} ] \\ [ \mathbf{x}_{2,1} & \mathbf{x}_{2,2} & \cdots & \mathbf{x}_{2,|\mathcal{A}|} ] \\ \vdots & \vdots & & \vdots \\ [ \mathbf{x}_{T,1} & \mathbf{x}_{T,2} & \cdots & \mathbf{x}_{T,|\mathcal{A}|} ] \end{bmatrix}, \quad \mathbf{x}_{t,a} \in \mathbb{R}^{1 \times d} \]

The reward function \(R\) is defined as the relative change in closing prices between consecutive time steps:

\[ R = \frac{\text{Closing Price}[A_t]_t - \text{Closing Price}[A_t]_{t-1}}{\text{Closing Price}[A_t]_{t-1}} \]

The empirical regret quantifies the performance gap between the optimal policy and the agent’s chosen policy. In this case, it would quantify the best relative change in closing price for all assets minus the one experienced:

\[ \rho_T^\text{empirical} = \sum_{t=1}^{T} \Big( \max_{k \in \{1,\dots,K\}} r_t^k - r_t^{A_t} \Big) \]