paint-brush
Ellipsoid Algorithms as a Tool Against Predictable Opponentsby@algorithmicbias

Ellipsoid Algorithms as a Tool Against Predictable Opponents

by Algorithmic Bias (dot tech)January 24th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This section details Algorithm 5 for defeating the Follow-the-Leader opponent by leveraging the ellipsoid algorithm to approximate payoff matrices and predict actions. The strategy ensures nearly all rounds are won after learning best responses in an initial phase. A Limited History variant, which considers only recent play, is also addressed, achieving a polynomial bound on losses with similar techniques.
featured image - Ellipsoid Algorithms as a Tool Against Predictable Opponents
Algorithmic Bias (dot tech) HackerNoon profile picture
0-item

Authors:

(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;

(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.

Abstract and 1 Introduction

2 Setting and 2.1 Models of behaviorally-biased opponents

3 Preliminaries and Intuition

4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent

4.3 Win-Stay, Lose-Shift Opponent

4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent

5 Generalizing

5.1 Other Behaviorally-Biased Strategies

5.2 Exploiting an Unknown Strategy from a Known Set of Strategies

6 Future Work and References

A Appendix

A.1 Win-Stay Lose-Shift Variant: Tie-Stay

A.2 Follow-the-Leader Variant: Limited History

A.3 Ellipsoid Mistake Bounds

A.4 Highest Average Payoff Opponent

4.4 Follow-the-Leader Opponent

Recall that the Follow-the-Leader opponent plays the best action in retrospect, defined as the action that would have achieved the highest payoff against our entire history of play. For this opponent, our strategy will be to learn a best response to each action, and then use the well-known ellipsoid algorithm to predict the opponent’s actions while playing best responses to the predicted actions.


Using Ellipsoid for Prediction





Note that while the bound on the number of losses and ties we incur is exponential, the runtime can be considered efficient since choosing which action to play in each round is efficient; this is an improvement over the simple general prediction algorithm we considered in section 3, which requires considering an exponential number of game matrices to choose which action to play in a single round. We also consider a limited-history variant of the Follow-the-Leader opponent below, against which we can achieve a polynomial bound on losses and ties.

4.4.1 Variant: Limited History

4.5 Highest Average Payoff Opponent

The Highest Average Payoff opponent plays the action that has achieved the highest average payoff over the times they have played it. We discuss this opponent in A.4.


This paper is available on arxiv under CC BY 4.0 DEED license.