3.5 (Optional) Contextual Multi-Armed Bandits (CMAB)

What if the best choice depended not just on luck — but on the situation you’re in? 🎭

LinUCB

LinUCB is a contextual multi-armed bandit algorithm where the expected reward of arm \(a\) at time \(t\) is modeled as a linear function of its feature vector \(\mathbf{x}_{t,a} \in \mathbb{R}^d\).

Yahoo! 2010 (Li et al. 2010)
Link to Research Paper