Reward Observing Restless Multi Armed Bandits.


Yoni Nazarathy


University of Queensland


Wed, 23/08/2017 - 4:00pm


RC-3085, The Red Centre, UNSW


Much of operations research has to do with constraints. You want to do your best, but need to operate within certain bounds. A key question in the field is: what choice to make so as to maximise reward? You often try to answer this question in a dynamic manner over time, occasionally dealing with uncertainty and randomness. One such class of problems is the case where you have D assets that evolve in some random manner over time. An asset can be in “good” state or “bad” state. You can only choose K out of D such assets at every time slot. Which do you choose? Clearly giving priority to those in “good” state makes sense. But perhaps you don’t have full information about the state of the assets. What do you do then? In this talk we’ll show different approaches and variants of this problem and discuss some solution and performance analysis methods. Some concepts to be encountered are Markov Chains, Partially Observable Markov Decision Processes and Restless Bandits.

School Seminar Series: