Concept: Game theory
- Proceedings of the National Academy of Sciences of the United States of America
- Published about 5 years ago
Recent work has revealed a new class of “zero-determinant” (ZD) strategies for iterated, two-player games. ZD strategies allow a player to unilaterally enforce a linear relationship between her score and her opponent’s score, and thus to achieve an unusual degree of control over both players' long-term payoffs. Although originally conceived in the context of classical two-player game theory, ZD strategies also have consequences in evolving populations of players. Here, we explore the evolutionary prospects for ZD strategies in the Iterated Prisoner’s Dilemma (IPD). Several recent studies have focused on the evolution of “extortion strategies,” a subset of ZD strategies, and have found them to be unsuccessful in populations. Nevertheless, we identify a different subset of ZD strategies, called “generous ZD strategies,” that forgive defecting opponents but nonetheless dominate in evolving populations. For all but the smallest population sizes, generous ZD strategies are not only robust to being replaced by other strategies but can selectively replace any noncooperative ZD strategy. Generous strategies can be generalized beyond the space of ZD strategies, and they remain robust to invasion. When evolution occurs on the full set of all IPD strategies, selection disproportionately favors these generous strategies. In some regimes, generous strategies outperform even the most successful of the well-known IPD strategies, including win-stay-lose-shift.
Rock, Paper, Scissors (RPS) represents a unique gaming space in which the predictions of human rational decision-making can be compared with actual performance. Playing a computerized opponent adopting a mixed-strategy equilibrium, participants revealed a non-significant tendency to over-select Rock. Further violations of rational decision-making were observed using an inter-trial analysis where participants were more likely to switch their item selection at trial n + 1 following a loss or draw at trial n, revealing the strategic vulnerability of individuals following the experience of negative rather than positive outcome. Unique switch strategies related to each of these trial n outcomes were also identified: after losing participants were more likely to ‘downgrade’ their item (e.g., Rock followed by Scissors) but after drawing participants were more likely to ‘upgrade’ their item (e.g., Rock followed by Paper). Further repetition analysis revealed that participants were more likely to continue their specific cyclic item change strategy into trial n + 2. The data reveal the strategic vulnerability of individuals following the experience of negative rather than positive outcome, the tensions between behavioural and cognitive influences on decision making, and underline the dangers of increased behavioural predictability in other recursive, non-cooperative environments such as economics and politics.
Dopamine has a central role in motivation and reward. Dopaminergic neurons in the ventral tegmental area (VTA) signal the discrepancy between expected and actual rewards (that is, reward prediction error), but how they compute such signals is unknown. We recorded the activity of VTA neurons while mice associated different odour cues with appetitive and aversive outcomes. We found three types of neuron based on responses to odours and outcomes: approximately half of the neurons (type I, 52%) showed phasic excitation after reward-predicting odours and rewards in a manner consistent with reward prediction error coding; the other half of neurons showed persistent activity during the delay between odour and outcome that was modulated positively (type II, 31%) or negatively (type III, 18%) by the value of outcomes. Whereas the activity of type I neurons was sensitive to actual outcomes (that is, when the reward was delivered as expected compared to when it was unexpectedly omitted), the activity of type II and type III neurons was determined predominantly by reward-predicting odours. We ‘tagged’ dopaminergic and GABAergic neurons with the light-sensitive protein channelrhodopsin-2 and identified them based on their responses to optical stimulation while recording. All identified dopaminergic neurons were of type I and all GABAergic neurons were of type II. These results show that VTA GABAergic neurons signal expected reward, a key variable for dopaminergic neurons to calculate reward prediction error.
- Proceedings of the National Academy of Sciences of the United States of America
- Published almost 6 years ago
The strong reciprocity model of the evolution of human cooperation has gained some acceptance, partly on the basis of support from experimental findings. The observation that unfair offers in the ultimatum game are frequently rejected constitutes an important piece of the experimental evidence for strong reciprocity. In the present study, we have challenged the idea that the rejection response in the ultimatum game provides evidence of the assumption held by strong reciprocity theorists that negative reciprocity observed in the ultimatum game is inseparably related to positive reciprocity as the two sides of a preference for fairness. The prediction of an inseparable relationship between positive and negative reciprocity was rejected on the basis of the results of a series of experiments that we conducted using the ultimatum game, the dictator game, the trust game, and the prisoner’s dilemma game. We did not find any correlation between the participants' tendencies to reject unfair offers in the ultimatum game and their tendencies to exhibit various prosocial behaviors in the other games, including their inclinations to positively reciprocate in the trust game. The participants' responses to postexperimental questions add support to the view that the rejection of unfair offers in the ultimatum game is a tacit strategy for avoiding the imposition of an inferior status.
Recent theories from complexity science argue that complex dynamics are ubiquitous in social and economic systems. These claims emerge from the analysis of individually simple agents whose collective behavior is surprisingly complicated. However, economists have argued that iterated reasoning-what you think I think you think-will suppress complex dynamics by stabilizing or accelerating convergence to Nash equilibrium. We report stable and efficient periodic behavior in human groups playing the Mod Game, a multi-player game similar to Rock-Paper-Scissors. The game rewards subjects for thinking exactly one step ahead of others in their group. Groups that play this game exhibit cycles that are inconsistent with any fixed-point solution concept. These cycles are driven by a “hopping” behavior that is consistent with other accounts of iterated reasoning: agents are constrained to about two steps of iterated reasoning and learn an additional one-half step with each session. If higher-order reasoning can be complicit in complex emergent dynamics, then cyclic and chaotic patterns may be endogenous features of real-world social and economic systems.
We study evolutionary game dynamics on structured populations in which individuals take part in several layers of networks of interactions simultaneously. This multiplex of interdependent networks accounts for the different kind of social ties each individual has. By coupling the evolutionary dynamics of a Prisoner’s Dilemma game in each of the networks, we show that the resilience of cooperative behaviors for extremely large values of the temptation to defect is enhanced by the multiplex structure. Furthermore, this resilience is intrinsically related to a non-trivial organization of cooperation across the network layers, thus providing a new way out for cooperation to survive in structured populations.
Humans and animals face decision tasks in an uncertain multi-agent environment where an agent’s strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.
The fair division of a surplus is one of the most widely examined problems. This paper focuses on bargaining problems with fixed disagreement payoffs where risk-neutral agents have reached an agreement that is the Nash-bargaining solution (NBS). We consider a stochastic environment, in which the overall return consists of multiple pies with uncertain sizes and we examine how these pies can be allocated with fairness among agents. Specifically, fairness is based on the Aristotle’s maxim: “equals should be treated equally and unequals unequally, in proportion to the relevant inequality”. In this context, fairness is achieved when all the individual stochastic surplus shares which are allocated to agents are distributed in proportion to the NBS. We introduce a novel algorithm, which can be used to compute the ratio of each pie that should be allocated to each agent, in order to ensure fairness within a symmetric or asymmetric NBS.
Using clinical indicators to facilitate quality improvement via the accreditation process: an adaptive study into the control relationship
- International journal for quality in health care : journal of the International Society for Quality in Health Care / ISQua
- Published over 5 years ago
OBJECTIVE: /st>The aim of the study was to determine accreditation surveyors' and hospitals' use and perceived usefulness of clinical indicator reports and the potential to establish the control relationship between the accreditation and reporting systems. The control relationship refers to instructional directives, arising from appropriately designed methods and efforts towards using clinical indicators, which provide a directed moderating, balancing and best outcome for the connected systems. DESIGN: /st>Web-based questionnaire survey. SETTING: /st>Australian Council on Healthcare Standards' (ACHS) accreditation and clinical indicator programmes. RESULTS: /st>Seventy-three of 306 surveyors responded. Half used the reports always/most of the time. Five key messages were revealed: (i) report use was related to availability before on-site investigation; (ii) report use was associated with the use of non-ACHS reports; (iii) a clinical indicator set’s perceived usefulness was associated with its reporting volume across hospitals; (iv) simpler measures and visual summaries in reports were rated the most useful; (v) reports were deemed to be suitable for the quality and safety objectives of the key groups of interested parties (hospitals' senior executive and management officers, clinicians, quality managers and surveyors). CONCLUSIONS: /st>Implementing the control relationship between the reporting and accreditation systems is a promising expectation. Redesigning processes to ensure reports are available in pre-survey packages and refined education of surveyors and hospitals on how to better utilize the reports will support the relationship. Additional studies on the systems' theory-based model of the accreditation and reporting system are warranted to establish the control relationship, building integrated system-wide relationships with sustainable and improved outcomes.
We study the dynamics of a predator-prey system where predators fight for captured prey besides searching for and handling (and digestion) of the prey. Fighting for prey is modelled by a continuous time hawk-dove game dynamics where the gain depends on the amount of disputed prey while the costs for fighting is constant per fighting event. The strategy of the predator-population is quantified by a trait being the proportion of the number of predator-individuals playing hawk tactics. The dynamics of the trait is described by two models of adaptation: the replicator dynamics (RD) and the adaptive dynamics (AD). In the RD-approach a variant individual with an adapted trait value changes the population’s strategy, and consequently its trait value, only when its payoff is larger than the population average. In the AD-approach successful replacement of the resident population after invasion of a rare variant population with an adapted trait value is a step in a sequence changing the population’s strategy, and hence its trait value. The main aim is to compare the consequences of the two adaptation models. In an equilibrium predator-prey system this will lead to convergence to a neutral singular strategy, while in the oscillatory system to a continuous singular strategy where in this endpoint the resident population is not invasible by any variant population. In equilibrium (low prey carrying capacity) RD and AD-approach give the same results, however not always in a periodically oscillating system (high prey carrying-capacity) where the trait is density-dependent. For low costs the predator population is monomorphic (only hawks) while for high costs dimorphic (hawks and doves). These results illustrate that intra-specific trait dynamics matters in predator-prey dynamics.