paint-brush
Ways to Counter Limited-History Opponents with Algorithmic Toolsby@algorithmicbias

Ways to Counter Limited-History Opponents with Algorithmic Tools

by Algorithmic Bias (dot tech)January 24th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This section presents Algorithm 7 for defeating the Limited-History variant of the Follow-the-Leader opponent. By playing each action 𝑟 times in sequence, players record accurate best responses, then use the ellipsoid algorithm to predict and counter the opponent. The strategy ensures winning most rounds, with losses and ties limited to O(n^4 log(nr)+nr), where 𝑟 is the opponent’s history parameter.
featured image - Ways to Counter Limited-History Opponents with Algorithmic Tools
Algorithmic Bias (dot tech) HackerNoon profile picture
0-item

Authors:

(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;

(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.

Abstract and 1 Introduction

2 Setting and 2.1 Models of behaviorally-biased opponents

3 Preliminaries and Intuition

4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent

4.3 Win-Stay, Lose-Shift Opponent

4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent

5 Generalizing

5.1 Other Behaviorally-Biased Strategies

5.2 Exploiting an Unknown Strategy from a Known Set of Strategies

6 Future Work and References

A Appendix

A.1 Win-Stay Lose-Shift Variant: Tie-Stay

A.2 Follow-the-Leader Variant: Limited History

A.3 Ellipsoid Mistake Bounds

A.4 Highest Average Payoff Opponent

A.2 Follow-the-Leader Variant: Limited History

Recall that the limited-history variant of the Follow-the-Leader opponent plays the action that would have achieved the highest net payoff against the last r rounds of our play. Note that we assume r > 0. If r = 0, the opponent would simply play the same action every round (the first action in their action ordering) regardless of our play. In that case, their play would not reveal any information about best responses, so the best we could do is tie in most rounds by playing the same action as the opponent.


We again use the ellipsoid algorithm for prediction, essentially the same way as described for the unlimited-history variant of this opponent in 4.4. In this case, the inequality we observe when a mistake is made is only relative to the past r rounds.




This paper is available on arxiv under CC BY 4.0 DEED license.