Humans and animals face decision tasks in an uncertain multi-agent environment where an agent’s strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.
A number of new crops have been developed that address important traits of particular relevance for smallholder farmers in Africa. Scientists, policy makers, and other stakeholders have raised concerns that the approval process for these new crops causes delays that are often scientifically unjustified. This article develops a real option model for the optimal regulation of a risky technology that enhances economic welfare and reduces malnutrition. We consider gradual adoption of the technology and show that delaying approval reduces uncertainty about perceived risks of the technology. Optimal conditions for approval incorporate parameters of the stochastic processes governing the dynamics of risk. The model is applied to three cases of improved crops, which either are, or are expected to be, delayed by the regulatory process. The benefits and costs of the crops are presented in a partial equilibrium that considers changes in adoption over time and the foregone benefits caused by a delay in approval under irreversibility and uncertainty. We derive the equilibrium conditions where the net-benefits of the technology equal the costs that would justify a delay. The sooner information about the safety of the technology arrive, the lower the costs for justifying a delay need to be i.e. it pays more to delay. The costs of a delay can be substantial: e.g. a one year delay in approval of the pod-borer resistant cowpea in Nigeria will cost the country about 33 million USD to 46 million USD and between 100 and 3,000 lives.
A large fraction of microbial life on earth exists in complex communities where metabolic exchange is vital. Microbes trade essential resources to promote their own growth in an analogous way to countries that exchange goods in modern economic markets. Inspired by these similarities, we developed a framework based on general equilibrium theory (GET) from economics to predict the population dynamics of trading microbial communities. Our biotic GET (BGET) model provides an a priori theory of the growth benefits of microbial trade, yielding several novel insights relevant to understanding microbial ecology and engineering synthetic communities. We find that the economic concept of comparative advantage is a necessary condition for mutualistic trade. Our model suggests that microbial communities can grow faster when species are unable to produce essential resources that are obtained through trade, thereby promoting metabolic specialization and increased intercellular exchange. Furthermore, we find that species engaged in trade exhibit a fundamental tradeoff between growth rate and relative population abundance, and that different environments that put greater pressure on group selection versus individual selection will promote varying strategies along this growth-abundance spectrum. We experimentally tested this tradeoff using a synthetic consortium of Escherichia coli cells and found the results match the predictions of the model. This framework provides a foundation to study natural and engineered microbial communities through a new lens based on economic theories developed over the past century.
Mechanisms supporting human ultra-cooperativeness are very much subject to debate. One psychological feature likely to be relevant is the formation of expectations, particularly about receiving cooperative or generous behavior from others. Without such expectations, social life will be seriously impeded and, in turn, expectations leading to satisfactory interactions can become norms and institutionalize cooperation. In this paper, we assess people’s expectations of generosity in a series of controlled experiments using the dictator game. Despite differences in respective roles, involvement in the game, degree of social distance or variation of stakes, the results are conclusive: subjects seldom predict that dictators will behave selfishly (by choosing the Nash equilibrium action, namely giving nothing). The majority of subjects expect that dictators will choose the equal split. This implies that generous behavior is not only observed in the lab, but also expected by subjects. In addition, expectations are accurate, matching closely the donations observed and showing that as a society we have a good grasp of how we interact. Finally, correlation between expectations and actual behavior suggests that expectations can be an important ingredient of generous or cooperative behavior.
A long tradition of cultural evolutionary studies has developed a rich repertoire of mathematical models of social learning. Early studies have laid the foundation of more recent endeavours to infer patterns of cultural transmission from observed frequencies of a variety of cultural data, from decorative motifs on potsherds to baby names and musical preferences. While this wide range of applications provides an opportunity for the development of generalisable analytical workflows, archaeological data present new questions and challenges that require further methodological and theoretical discussion. Here we examine the decorative motifs of Neolithic pottery from an archaeological assemblage in Western Germany, and argue that the widely used (and relatively undiscussed) assumption that observed frequencies are the result of a system in equilibrium conditions is unwarranted, and can lead to incorrect conclusions. We analyse our data with a simulation-based inferential framework that can overcome some of the intrinsic limitations in archaeological data, as well as handle both equilibrium conditions and instances where the mode of cultural transmission is time-variant. Results suggest that none of the models examined can produce the observed pattern under equilibrium conditions, and suggest. instead temporal shifts in the patterns of cultural transmission.
Cooperative behavior, where one individual incurs a cost to help another, is a wide spread phenomenon. Here we study direct reciprocity in the context of the alternating Prisoner’s Dilemma. We consider all strategies that can be implemented by one and two-state automata. We calculate the payoff matrix of all pairwise encounters in the presence of noise. We explore deterministic selection dynamics with and without mutation. Using different error rates and payoff values, we observe convergence to a small number of distinct equilibria. Two of them are uncooperative strict Nash equilibria representing always-defect (ALLD) and Grim. The third equilibrium is mixed and represents a cooperative alliance of several strategies, dominated by a strategy which we call Forgiver. Forgiver cooperates whenever the opponent has cooperated; it defects once when the opponent has defected, but subsequently Forgiver attempts to re-establish cooperation even if the opponent has defected again. Forgiver is not an evolutionarily stable strategy, but the alliance, which it rules, is asymptotically stable. For a wide range of parameter values the most commonly observed outcome is convergence to the mixed equilibrium, dominated by Forgiver. Our results show that although forgiving might incur a short-term loss it can lead to a long-term gain. Forgiveness facilitates stable cooperation in the presence of exploitation and noise.
Socio-ecological systems are increasingly modelled by games played on complex networks. While the concept of Nash equilibrium assumes perfect rationality, in reality players display heterogeneous bounded rationality. Here we present a topological model of bounded rationality in socio-ecological systems, using the rationality parameter of the Quantal Response Equilibrium. We argue that system rationality could be measured by the average Kullback–Leibler divergence between Nash and Quantal Response Equilibria, and that the convergence towards Nash equilibria on average corresponds to increased system rationality. Using this model, we show that when a randomly connected socio-ecological system is topologically optimised to converge towards Nash equilibria, scale-free and small world features emerge. Therefore, optimising system rationality is an evolutionary reason for the emergence of scale-free and small-world features in socio-ecological systems. Further, we show that in games where multiple equilibria are possible, the correlation between the scale-freeness of the system and the fraction of links with multiple equilibria goes through a rapid transition when the average system rationality increases. Our results explain the influence of the topological structure of socio-ecological systems in shaping their collective cognitive behaviour, and provide an explanation for the prevalence of scale-free and small-world characteristics in such systems.
- Proceedings of the National Academy of Sciences of the United States of America
- Published almost 4 years ago
Information transfer is a basic feature of life that includes signaling within and between organisms. Owing to its interactive nature, signaling can be investigated by using game theory. Game theoretic models of signaling have a long tradition in biology, economics, and philosophy. For a long time the analyses of these games has mostly relied on using static equilibrium concepts such as Pareto optimal Nash equilibria or evolutionarily stable strategies. More recently signaling games of various types have been investigated with the help of game dynamics, which includes dynamical models of evolution and individual learning. A dynamical analysis leads to more nuanced conclusions as to the outcomes of signaling interactions. Here we explore different kinds of signaling games that range from interactions without conflicts of interest between the players to interactions where their interests are seriously misaligned. We consider these games within the context of evolutionary dynamics (both infinite and finite population models) and learning dynamics (reinforcement learning). Some results are specific features of a particular dynamical model, whereas others turn out to be quite robust across different models. This suggests that there are certain qualitative aspects that are common to many real-world signaling interactions.
Understanding the causal impact of beliefs on contributions in Threshold Public Goods (TPGs) is particularly important since the social optimum can be supported as a Nash Equilibrium and best-response contributions are a function of beliefs. Unfortunately, investigations of the impact of beliefs on behavior are plagued with endogeneity concerns. We create a set of instruments by cleanly and exogenously manipulating beliefs without deception. Tests indicate that the instruments are valid and relevant. Perhaps surprisingly, we fail to find evidence that beliefs are endogenous in either the one-shot or repeated-decision settings. TPG allocations are determined by a base contribution and beliefs in a one shot-setting. In the repeated-decision environment, once we instrument for first-round allocations, we find that second-round allocations are driven equally by beliefs and history. Moreover, we find that failing to instrument prior decisions overstates their importance.
Cooperation among self-interested players in a social dilemma is fragile and easily interrupted by mistakes. In this work, we study the repeated n-person public-goods game and search for a strategy that forms a cooperative Nash equilibrium in the presence of implementation error with a guarantee that the resulting payoff will be no less than any of the co-players'. By enumerating strategic possibilities for n=3, we show that such a strategy indeed exists when its memory length m equals three. It means that a deterministic strategy can be publicly employed to stabilize cooperation against error with avoiding the risk of being exploited. We furthermore show that, for general n-person public-goods game, m ≥ n is necessary to satisfy the above criteria.