Authors:
(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;
(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.
2 Setting and 2.1 Models of behaviorally-biased opponents
4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent
4.3 Win-Stay, Lose-Shift Opponent
4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent
5 Generalizing
5.1 Other Behaviorally-Biased Strategies
5.2 Exploiting an Unknown Strategy from a Known Set of Strategies
A Appendix
A.1 Win-Stay Lose-Shift Variant: Tie-Stay
A.2 Follow-the-Leader Variant: Limited History
A.4 Highest Average Payoff Opponent
We assume the opponent initializes each action with an average payoff of 0.
For this opponent, our high-level strategy will be to learn a best response to each action, and then use the well-known ellipsoid algorithm to predict the opponent’s actions while playing best responses to the predicted actions.
The round preceding such a switch must have been a loss for the opponent: its net payoff must have gone down during that round for the average payoff to go from non-negative to negative. We showed that the opponent will proceed through each of the n actions in order when it switches during phase 1, so as long as we can reliably force the opponent to switch actions, we correctly record a best response to each of the first n − 1 actions in phase 1.
This paper is available on arxiv under CC BY 4.0 DEED license.