Concept: Trembling hand perfect equilibrium
Humans and animals face decision tasks in an uncertain multi-agent environment where an agent’s strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.
- Risk analysis : an official publication of the Society for Risk Analysis
- Published about 3 years ago
The New York City 9/11 terrorist attacks urged people from academia as well as from industry to pay more attention to operational security research. The required focus in this type of research is human intention. Unlike safety-related accidents, security-related accidents have a deliberate nature, and one has to face intelligent adversaries with characteristics that traditional probabilistic risk assessment techniques are not capable of dealing with. In recent years, the mathematical tool of game theory, being capable to handle intelligent players, has been used in a variety of ways in terrorism risk assessment. In this article, we analyze the general intrusion detection system in process plants, and propose a game-theoretical model for security management in such plants. Players in our model are assumed to be rational and they play the game with complete information. Both the pure strategy and the mixed strategy solutions are explored and explained. We illustrate our model by an illustrative case, and find that in our case, no pure strategy but, instead, a mixed strategy Nash equilibrium exists.
We consider an environment where players are involved in a public goods game and must decide repeatedly whether to make an individual contribution or not. However, players lack strategically relevant information about the game and about the other players in the population. The resulting behavior of players is completely uncoupled from such information, and the individual strategy adjustment dynamics are driven only by reinforcement feedbacks from each player’s own past. We show that the resulting “directional learning” is sufficient to explain cooperative deviations away from the Nash equilibrium. We introduce the concept of k-strong equilibria, which nest both the Nash equilibrium and the Aumann-strong equilibrium as two special cases, and we show that, together with the parameters of the learning model, the maximal k-strength of equilibrium determines the stationary distribution. The provisioning of public goods can be secured even under adverse conditions, as long as players are sufficiently responsive to the changes in their own payoffs and adjust their actions accordingly. Substantial levels of public cooperation can thus be explained without arguments involving selflessness or social preferences, solely on the basis of uncoordinated directional (mis)learning.
As an equilibrium refinement of the Nash equilibrium, evolutionarily stable strategy (ESS) is a key concept in evolutionary game theory and has attracted growing interest. An ESS can be either a pure strategy or a mixed strategy. Even though the randomness is allowed in mixed strategy, the selection probability of pure strategy in a mixed strategy may fluctuate due to the impact of many factors. The fluctuation can lead to more uncertainty. In this paper, such uncertainty involved in mixed strategy has been further taken into consideration: a belief strategy is proposed in terms of Dempster-Shafer evidence theory. Furthermore, based on the proposed belief strategy, a belief-based ESS has been developed. The belief strategy and belief-based ESS can reduce to the mixed strategy and mixed ESS, which provide more realistic and powerful tools to describe interactions among agents.
One important approach of multiagent reinforcement learning (MARL) is equilibrium-based MARL, which is a combination of reinforcement learning and game theory. Most existing algorithms involve computationally expensive calculation of mixed strategy equilibria and require agents to replicate the other agents' value functions for equilibrium computing in each state. This is unrealistic since agents may not be willing to share such information due to privacy or safety concerns. This paper aims to develop novel and efficient MARL algorithms without the need for agents to share value functions. First, we adopt pure strategy equilibrium solution concepts instead of mixed strategy equilibria given that a mixed strategy equilibrium is often computationally expensive. In this paper, three types of pure strategy profiles are utilized as equilibrium solution concepts: pure strategy Nash equilibrium, equilibrium-dominating strategy profile, and nonstrict equilibrium-dominating strategy profile. The latter two solution concepts are strategy profiles from which agents can gain higher payoffs than one or more pure strategy Nash equilibria. Theoretical analysis shows that these strategy profiles are symmetric meta equilibria. Second, we propose a multistep negotiation process for finding pure strategy equilibria since value functions are not shared among agents. By putting these together, we propose a novel MARL algorithm called negotiation-based Q-learning (NegoQ). Experiments are first conducted in grid-world games, which are widely used to evaluate MARL algorithms. In these games, NegoQ learns equilibrium policies and runs significantly faster than existing MARL algorithms (correlated Q-learning and Nash Q-learning). Surprisingly, we find that NegoQ also performs well in team Markov games such as pursuit games, as compared with team-task-oriented MARL algorithms (such as friend Q-learning and distributed Q-learning).
In this paper, we envisage the architecture of Green Wireless Body Area Nanonetwork (GBAN) as a collection of nanodevices, in which each device is capable of communicating in both the molecular and wireless electromagnetic communication modes. The term green refers to the fact that the nanodevices in such a network can harvest energy from their surrounding environment, so that no nanodevice gets old solely due to the reasons attributed to energy depletion. However, the residual energy of a nanodevice can deplete substantially with the lapse of time, if the rate of energy consumption is not comparable with the rate of energy harvesting. It is observed that the rate of energy harvesting is nonlinear and sporadic in nature. So, the management of energy of the nanodevices is fundamentally important. We specifically address this problem in a ubiquitous healthcare monitoring scenario and formulate it as a cooperative Nash Bargaining game. The optimal strategy obtained from the Nash equilibrium solution provides improved network performance in terms of throughput and delay.