Authors:
(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;
(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.
2 Setting and 2.1 Models of behaviorally-biased opponents
4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent
4.3 Win-Stay, Lose-Shift Opponent
4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent
5 Generalizing
5.1 Other Behaviorally-Biased Strategies
5.2 Exploiting an Unknown Strategy from a Known Set of Strategies
A Appendix
A.1 Win-Stay Lose-Shift Variant: Tie-Stay
A.2 Follow-the-Leader Variant: Limited History
A.4 Highest Average Payoff Opponent
Recall that the limited-history variant of the Follow-the-Leader opponent plays the action that would have achieved the highest net payoff against the last r rounds of our play. Note that we assume r > 0. If r = 0, the opponent would simply play the same action every round (the first action in their action ordering) regardless of our play. In that case, their play would not reveal any information about best responses, so the best we could do is tie in most rounds by playing the same action as the opponent.
We again use the ellipsoid algorithm for prediction, essentially the same way as described for the unlimited-history variant of this opponent in 4.4. In this case, the inequality we observe when a mistake is made is only relative to the past r rounds.
This paper is available on arxiv under CC BY 4.0 DEED license.