Solving multiarmed bandits: A comparison of epsilon-greedy and Thompson sampling

12 min readSep 30, 2018

The Multiarmed-bandit problem

The multi-armed bandit (MAB) is a classic problem in decision sciences. Effectively, it is one of optimal resource allocation under uncertainty. The name is derived from old slot machines that where operated by pulling an arm — they are called bandits because they rob those who…

Solving multiarmed bandits: A comparison of epsilon-greedy and Thompson sampling

The Multiarmed-bandit problem

Written by Conor Mack