Concept: Matching pennies
Humans and animals face decision tasks in an uncertain multi-agent environment where an agent’s strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.
In competitive situations, individuals need to adjust their behavioral strategy dynamically in response to their opponent’s behavior. In the present study, we investigated the neural basis of how individuals adjust their strategy during a simple, competitive game of matching pennies. We used entropy as a behavioral index of randomness in decision-making, because maximizing randomness is thought to be an optimal strategy in the game, according to game theory. While undergoing functional magnetic resonance imaging (fMRI), subjects played matching pennies with either a human or computer opponent in each block, although in reality they played the game with the same computer algorithm under both conditions. The winning rate of each block was also manipulated. Both the opponent (human or computer), and the winning rate, independently affected subjects' block-wise entropy during the game. The fMRI results revealed that activity in the bilateral anterior insula was positively correlated with subjects' (not opponent’s) behavioral entropy during the game, which indicates that during an interpersonal competitive game, the anterior insula tracked how uncertain subjects' behavior was, rather than how uncertain subjects felt their opponent’s behavior was. Our results suggest that intuitive or automatic processes based on somatic markers may be a key to optimally adjusting behavioral strategies in competitive situations.
How humans make decisions in non-cooperative strategic interactions is a big question. For the fundamental Rock-Paper-Scissors (RPS) model game system, classic Nash equilibrium (NE) theory predicts that players randomize completely their action choices to avoid being exploited, while evolutionary game theory of bounded rationality in general predicts persistent cyclic motions, especially in finite populations. However as empirical studies have been relatively sparse, it is still a controversial issue as to which theoretical framework is more appropriate to describe decision-making of human subjects. Here we observe population-level persistent cyclic motions in a laboratory experiment of the discrete-time iterated RPS game under the traditional random pairwise-matching protocol. This collective behavior contradicts with the NE theory but is quantitatively explained, without any adjustable parameter, by a microscopic model of win-lose-tie conditional response. Theoretical calculations suggest that if all players adopt the same optimized conditional response strategy, their accumulated payoff will be much higher than the reference value of the NE mixed strategy. Our work demonstrates the feasibility of understanding human competition behaviors from the angle of non-equilibrium statistical physics.
It is often assumed that in public goods games, contributors are either strong or weak players and each individual has an equal probability of exhibiting cooperation. It is difficult to explain why the public good is produced by strong individuals in some cooperation systems, and by weak individuals in others. Viewing the asymmetric volunteer’s dilemma game as an evolutionary game, we find that whether the strong or the weak players produce the public good depends on the initial condition (i.e., phenotype or initial strategy of individuals). These different evolutionarily stable strategies (ESS) associated with different initial conditions, can be interpreted as the production modes of public goods of different cooperation systems. A further analysis revealed that the strong player adopts a pure strategy but mixed strategies for the weak players to produce the public good, and that the probability of volunteering by weak players decreases with increasing group size or decreasing cost-benefit ratio. Our model shows that the defection probability of a “strong” player is greater than the “weak” players in the model of Diekmann (1993). This contradicts Selten’s (1980) model that public goods can only be produced by a strong player, is not an evolutionarily stable strategy, and will therefore disappear over evolutionary time. Our public good model with ESS has thus extended previous interpretations that the public good can only be produced by strong players in an asymmetric game.
Cooperation among self-interested players in a social dilemma is fragile and easily interrupted by mistakes. In this work, we study the repeated n-person public-goods game and search for a strategy that forms a cooperative Nash equilibrium in the presence of implementation error with a guarantee that the resulting payoff will be no less than any of the co-players'. By enumerating strategic possibilities for n=3, we show that such a strategy indeed exists when its memory length m equals three. It means that a deterministic strategy can be publicly employed to stabilize cooperation against error with avoiding the risk of being exploited. We furthermore show that, for general n-person public-goods game, m ≥ n is necessary to satisfy the above criteria.
Game theory describes social behaviors in humans and other biological organisms. By far, the most powerful tool available to game theorists is the concept of a Nash Equilibrium (NE), which is motivated by perfect rationality. NE specifies a strategy for everyone, such that no one would benefit by deviating unilaterally from his/her strategy. Another powerful tool available to game theorists are evolutionary dynamics (ED). Motivated by evolutionary and learning processes, ED specify changes in strategies over time in a population, such that more successful strategies typically become more frequent. A simple game that illustrates interesting ED is the generalized Rock-Paper-Scissors (RPS) game. The RPS game extends the children’s game to situations where winning or losing can matter more or less relative to tying. Here we investigate experimentally three RPS games, where the NE is always to randomize with equal probability, but the evolutionary stability of this strategy changes. Consistent with the prediction of ED we find that aggregate behavior is far away from NE when it is evolutionarily unstable. Our findings add to the growing literature that demonstrates the predictive validity of ED in large-scale incentivized laboratory experiments with human subjects.
Clustering is an effective topology control method in wireless sensor networks (WSNs), since it can enhance the network lifetime and scalability. To prolong the network lifetime in clustered WSNs, an efficient cluster head (CH) optimization policy is essential to distribute the energy among sensor nodes. Recently, game theory has been introduced to model clustering. Each sensor node is considered as a rational and selfish player which will play a clustering game with an equilibrium strategy. Then it decides whether to act as the CH according to this strategy for a tradeoff between providing required services and energy conservation. However, how to get the equilibrium strategy while maximizing the payoff of sensor nodes has rarely been addressed to date. In this paper, we present a game theoretic approach for balancing energy consumption in clustered WSNs. With our novel payoff function, realistic sensor behaviors can be captured well. The energy heterogeneity of nodes is considered by incorporating a penalty mechanism in the payoff function, so the nodes with more energy will compete for CHs more actively. We have obtained the Nash equilibrium (NE) strategy of the clustering game through convex optimization. Specifically, each sensor node can achieve its own maximal payoff when it makes the decision according to this strategy. Through plenty of simulations, our proposed game theoretic clustering is proved to have a good energy balancing performance and consequently the network lifetime is greatly enhanced.
We study a simple model for social-learning agents in a restless multiarmed bandit (rMAB). The bandit has one good arm that changes to a bad one with a certain probability. Each agent stochastically selects one of the two methods, random search (individual learning) or copying information from other agents (social learning), using which he/she seeks the good arm. Fitness of an agent is the probability to know the good arm in the steady state of the agent system. In this model, we explicitly construct the unique Nash equilibrium state and show that the corresponding strategy for each agent is an evolutionarily stable strategy (ESS) in the sense of Thomas. It is shown that the fitness of an agent with ESS is superior to that of an asocial learner when the success probability of social learning is greater than a threshold determined from the probability of success of individual learning, the probability of change of state of the rMAB, and the number of agents. The ESS Nash equilibrium is a solution to Rogers' paradox.
To describe the Papo Reto [Straight Talk] game and reflect on its theoretical-methodological basis.
In this paper, an aggregate game is adopted for the modeling and analysis of energy consumption control in smart grid. Since the electricity users' cost functions depend on the aggregate energy consumption, which is unknown to the end users, an average consensus protocol is employed to estimate it. By neighboring communication among the users about their estimations on the aggregate energy consumption, Nash seeking strategies are developed. Convergence properties are explored for the proposed Nash seeking strategies. For energy consumption game that may have multiple isolated Nash equilibria, a local convergence result is derived. The convergence is established by utilizing singular perturbation analysis and Lyapunov stability analysis. Energy consumption control for a network of heating, ventilation, and air conditioning systems is investigated. Based on the uniqueness of the Nash equilibrium, it is shown that the players' actions converge to a neighborhood of the unique Nash equilibrium nonlocally. More specially, if the unique Nash equilibrium is an inner Nash equilibrium, an exponential convergence result is obtained. Energy consumption game with stubborn players is studied. In this case, the actions of the rational players can be driven to a neighborhood of their best response strategies by using the proposed method. Numerical examples are presented to verify the effectiveness of the proposed methods.