Concept: Nash equilibrium
- Proceedings of the National Academy of Sciences of the United States of America
- Published over 4 years ago
Recent work has revealed a new class of “zero-determinant” (ZD) strategies for iterated, two-player games. ZD strategies allow a player to unilaterally enforce a linear relationship between her score and her opponent’s score, and thus to achieve an unusual degree of control over both players' long-term payoffs. Although originally conceived in the context of classical two-player game theory, ZD strategies also have consequences in evolving populations of players. Here, we explore the evolutionary prospects for ZD strategies in the Iterated Prisoner’s Dilemma (IPD). Several recent studies have focused on the evolution of “extortion strategies,” a subset of ZD strategies, and have found them to be unsuccessful in populations. Nevertheless, we identify a different subset of ZD strategies, called “generous ZD strategies,” that forgive defecting opponents but nonetheless dominate in evolving populations. For all but the smallest population sizes, generous ZD strategies are not only robust to being replaced by other strategies but can selectively replace any noncooperative ZD strategy. Generous strategies can be generalized beyond the space of ZD strategies, and they remain robust to invasion. When evolution occurs on the full set of all IPD strategies, selection disproportionately favors these generous strategies. In some regimes, generous strategies outperform even the most successful of the well-known IPD strategies, including win-stay-lose-shift.
Recent theories from complexity science argue that complex dynamics are ubiquitous in social and economic systems. These claims emerge from the analysis of individually simple agents whose collective behavior is surprisingly complicated. However, economists have argued that iterated reasoning-what you think I think you think-will suppress complex dynamics by stabilizing or accelerating convergence to Nash equilibrium. We report stable and efficient periodic behavior in human groups playing the Mod Game, a multi-player game similar to Rock-Paper-Scissors. The game rewards subjects for thinking exactly one step ahead of others in their group. Groups that play this game exhibit cycles that are inconsistent with any fixed-point solution concept. These cycles are driven by a “hopping” behavior that is consistent with other accounts of iterated reasoning: agents are constrained to about two steps of iterated reasoning and learn an additional one-half step with each session. If higher-order reasoning can be complicit in complex emergent dynamics, then cyclic and chaotic patterns may be endogenous features of real-world social and economic systems.
We study evolutionary game dynamics on structured populations in which individuals take part in several layers of networks of interactions simultaneously. This multiplex of interdependent networks accounts for the different kind of social ties each individual has. By coupling the evolutionary dynamics of a Prisoner’s Dilemma game in each of the networks, we show that the resilience of cooperative behaviors for extremely large values of the temptation to defect is enhanced by the multiplex structure. Furthermore, this resilience is intrinsically related to a non-trivial organization of cooperation across the network layers, thus providing a new way out for cooperation to survive in structured populations.
Humans and animals face decision tasks in an uncertain multi-agent environment where an agent’s strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.
Zero-determinant strategies are a new class of probabilistic and conditional strategies that are able to unilaterally set the expected payoff of an opponent in iterated plays of the Prisoner’s Dilemma irrespective of the opponent’s strategy (coercive strategies), or else to set the ratio between the player’s and their opponent’s expected payoff (extortionate strategies). Here we show that zero-determinant strategies are at most weakly dominant, are not evolutionarily stable, and will instead evolve into less coercive strategies. We show that zero-determinant strategies with an informational advantage over other players that allows them to recognize each other can be evolutionarily stable (and able to exploit other players). However, such an advantage is bound to be short-lived as opposing strategies evolve to counteract the recognition.
- Proceedings of the National Academy of Sciences of the United States of America
- Published almost 6 years ago
The two-player Iterated Prisoner’s Dilemma game is a model for both sentient and evolutionary behaviors, especially including the emergence of cooperation. It is generally assumed that there exists no simple ultimatum strategy whereby one player can enforce a unilateral claim to an unfair share of rewards. Here, we show that such strategies unexpectedly do exist. In particular, a player X who is witting of these strategies can (i) deterministically set her opponent Y’s score, independently of his strategy or response, or (ii) enforce an extortionate linear relation between her and his scores. Against such a player, an evolutionary player’s best response is to accede to the extortion. Only a player with a theory of mind about his opponent can do better, in which case Iterated Prisoner’s Dilemma is an Ultimatum Game.
How humans make decisions in non-cooperative strategic interactions is a big question. For the fundamental Rock-Paper-Scissors (RPS) model game system, classic Nash equilibrium (NE) theory predicts that players randomize completely their action choices to avoid being exploited, while evolutionary game theory of bounded rationality in general predicts persistent cyclic motions, especially in finite populations. However as empirical studies have been relatively sparse, it is still a controversial issue as to which theoretical framework is more appropriate to describe decision-making of human subjects. Here we observe population-level persistent cyclic motions in a laboratory experiment of the discrete-time iterated RPS game under the traditional random pairwise-matching protocol. This collective behavior contradicts with the NE theory but is quantitatively explained, without any adjustable parameter, by a microscopic model of win-lose-tie conditional response. Theoretical calculations suggest that if all players adopt the same optimized conditional response strategy, their accumulated payoff will be much higher than the reference value of the NE mixed strategy. Our work demonstrates the feasibility of understanding human competition behaviors from the angle of non-equilibrium statistical physics.
Coordination in groups faces a sub-optimization problem and theory suggests that some randomness may help to achieve global optima. Here we performed experiments involving a networked colour coordination game in which groups of humans interacted with autonomous software agents (known as bots). Subjects (n = 4,000) were embedded in networks (n = 230) of 20 nodes, to which we sometimes added 3 bots. The bots were programmed with varying levels of behavioural randomness and different geodesic locations. We show that bots acting with small levels of random noise and placed in central locations meaningfully improve the collective performance of human groups, accelerating the median solution time by 55.6%. This is especially the case when the coordination problem is hard. Behavioural randomness worked not only by making the task of humans to whom the bots were connected easier, but also by affecting the gameplay of the humans among themselves and hence creating further cascades of benefit in global coordination in these heterogeneous systems.
In 1964, Bell discovered that quantum mechanics is a nonlocal theory. Three years later, in a seemingly unconnected development, Harsanyi introduced the concept of Bayesian games. Here we show that, in fact, there is a deep connection between Bell nonlocality and Bayesian games, and that the same concepts appear in both fields. This link offers interesting possibilities for Bayesian games, namely of allowing the players to receive advice in the form of nonlocal correlations, for instance using entangled quantum particles or more general no-signalling boxes. This will lead to novel joint strategies, impossible to achieve classically. We characterize games for which nonlocal resources offer a genuine advantage over classical ones. Moreover, some of these strategies represent equilibrium points, leading to the notion of quantum/no-signalling Nash equilibrium. Finally, we describe new types of question in the study of nonlocality, namely the consideration of nonlocal advantage given a set of Bell expressions.
The capacity for strategic thinking about the payoff-relevant actions of conspecifics is not well understood across species. We use game theory to make predictions about choices and temporal dynamics in three abstract competitive situations with chimpanzee participants. Frequencies of chimpanzee choices are extremely close to equilibrium (accurate-guessing) predictions, and shift as payoffs change, just as equilibrium theory predicts. The chimpanzee choices are also closer to the equilibrium prediction, and more responsive to past history and payoff changes, than two samples of human choices from experiments in which humans were also initially uninformed about opponent payoffs and could not communicate verbally. The results are consistent with a tentative interpretation of game theory as explaining evolved behavior, with the additional hypothesis that chimpanzees may retain or practice a specialized capacity to adjust strategy choice during competition to perform at least as well as, or better than, humans have.