Algorithm 8’s Approach to Countering the Highest Average Payoff Opponent

Too Long; Didn't Read

This section introduces Algorithm 8 for defeating the Highest Average Payoff opponent. By learning best responses through sequential play and employing the ellipsoid algorithm for prediction, players can exploit the opponent's average-based strategy. Losses are limited to O(4^n + n^4 log(nr)), balancing efficient play with accurate predictions over extensive rounds.

Authors:

(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;

(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.

Table of Links

Abstract and 1 Introduction

2 Setting and 2.1 Models of behaviorally-biased opponents

3 Preliminaries and Intuition

4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent

4.3 Win-Stay, Lose-Shift Opponent

4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent

5 Generalizing

5.1 Other Behaviorally-Biased Strategies

5.2 Exploiting an Unknown Strategy from a Known Set of Strategies

6 Future Work and References

A Appendix

A.1 Win-Stay Lose-Shift Variant: Tie-Stay

A.2 Follow-the-Leader Variant: Limited History

A.3 Ellipsoid Mistake Bounds

A.4 Highest Average Payoff Opponent

We assume the opponent initializes each action with an average payoff of 0.

For this opponent, our high-level strategy will be to learn a best response to each action, and then use the well-known ellipsoid algorithm to predict the opponent’s actions while playing best responses to the predicted actions.

The round preceding such a switch must have been a loss for the opponent: its net payoff must have gone down during that round for the average payoff to go from non-negative to negative. We showed that the opponent will proceed through each of the n actions in order when it switches during phase 1, so as long as we can reliably force the opponent to switch actions, we correctly record a best response to each of the first n − 1 actions in phase 1.

This paper is available on arxiv under CC BY 4.0 DEED license.

Algorithm 8’s Approach to Countering the Highest Average Payoff Opponent

Too Long; Didn't Read

Table of Links

A.4 Highest Average Payoff Opponent

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

Algorithm 8’s Approach to Countering the Highest Average Payoff Opponent

Too Long; Didn't Read

Table of Links

A.4 Highest Average Payoff Opponent

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics