paint-brush
Algorithm 8’s Approach to Countering the Highest Average Payoff Opponentby@algorithmicbias

Algorithm 8’s Approach to Countering the Highest Average Payoff Opponent

by Algorithmic Bias (dot tech)January 24th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This section introduces Algorithm 8 for defeating the Highest Average Payoff opponent. By learning best responses through sequential play and employing the ellipsoid algorithm for prediction, players can exploit the opponent's average-based strategy. Losses are limited to O(4^n + n^4 log(nr)), balancing efficient play with accurate predictions over extensive rounds.
featured image - Algorithm 8’s Approach to Countering the Highest Average Payoff Opponent
Algorithmic Bias (dot tech) HackerNoon profile picture
0-item

Authors:

(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;

(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.

Abstract and 1 Introduction

2 Setting and 2.1 Models of behaviorally-biased opponents

3 Preliminaries and Intuition

4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent

4.3 Win-Stay, Lose-Shift Opponent

4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent

5 Generalizing

5.1 Other Behaviorally-Biased Strategies

5.2 Exploiting an Unknown Strategy from a Known Set of Strategies

6 Future Work and References

A Appendix

A.1 Win-Stay Lose-Shift Variant: Tie-Stay

A.2 Follow-the-Leader Variant: Limited History

A.3 Ellipsoid Mistake Bounds

A.4 Highest Average Payoff Opponent

A.4 Highest Average Payoff Opponent

We assume the opponent initializes each action with an average payoff of 0.


For this opponent, our high-level strategy will be to learn a best response to each action, and then use the well-known ellipsoid algorithm to predict the opponent’s actions while playing best responses to the predicted actions.





The round preceding such a switch must have been a loss for the opponent: its net payoff must have gone down during that round for the average payoff to go from non-negative to negative. We showed that the opponent will proceed through each of the n actions in order when it switches during phase 1, so as long as we can reliably force the opponent to switch actions, we correctly record a best response to each of the first n − 1 actions in phase 1.






This paper is available on arxiv under CC BY 4.0 DEED license.