3.5 (Optional) Contextual Multi-Armed Bandits (CMAB)
NoteLinUCB
LinUCB is a contextual multi-armed bandit algorithm where the expected reward of arm \(a\) at time \(t\) is modeled as a linear function of its feature vector \(\mathbf{x}_{t,a} \in \mathbb{R}^d\).
Yahoo! 2010 (Li et al. 2010)
Link to Research Paper