Solving multiarmed bandits: A comparison of epsilon-greedy and Thompson sampling

Conor Mack
12 min readSep 30, 2018

The Multiarmed-bandit problem

The multi-armed bandit (MAB) is a classic problem in decision sciences. Effectively, it is one of optimal resource allocation under uncertainty. The name is derived from old slot machines that where operated by pulling an arm — they are called bandits because they rob those who…

--

--