Concept: Pure strategy
Recent advances in the HMP (human microbiome project) research have revealed profound implications of the human microbiome to our health and diseases. We postulated that there should be distinctive features associated with healthy and/or diseased microbiome networks. Following Occam’s razor principle, we further hypothesized that triangle motifs or trios, arguably the simplest motif in a complex network of the human microbiome, should be sufficient to detect changes that occurred in the diseased microbiome. Here we test our hypothesis with six HMP datasets that cover five major human microbiome sites (gut, lung, oral, skin, and vaginal). The tests confirm our hypothesis and demonstrate that the trios involving the special nodes (e.g., most abundant OTU or MAO, and most dominant OTU or MDO, etc.) and interactions types (positive vs. negative) can be a powerful tool to differentiate between healthy and diseased microbiome samples. Our findings suggest that 12 kinds of trios (especially, dominantly inhibitive trio with mixed strategy, dominantly inhibitive trio with pure strategy, and fully facilitative strategy) may be utilized as in silico biomarkers for detecting disease-associated changes in the human microbiome, and may play an important role in personalized precision diagnosis of the human microbiome associated diseases.
One important approach of multiagent reinforcement learning (MARL) is equilibrium-based MARL, which is a combination of reinforcement learning and game theory. Most existing algorithms involve computationally expensive calculation of mixed strategy equilibria and require agents to replicate the other agents' value functions for equilibrium computing in each state. This is unrealistic since agents may not be willing to share such information due to privacy or safety concerns. This paper aims to develop novel and efficient MARL algorithms without the need for agents to share value functions. First, we adopt pure strategy equilibrium solution concepts instead of mixed strategy equilibria given that a mixed strategy equilibrium is often computationally expensive. In this paper, three types of pure strategy profiles are utilized as equilibrium solution concepts: pure strategy Nash equilibrium, equilibrium-dominating strategy profile, and nonstrict equilibrium-dominating strategy profile. The latter two solution concepts are strategy profiles from which agents can gain higher payoffs than one or more pure strategy Nash equilibria. Theoretical analysis shows that these strategy profiles are symmetric meta equilibria. Second, we propose a multistep negotiation process for finding pure strategy equilibria since value functions are not shared among agents. By putting these together, we propose a novel MARL algorithm called negotiation-based Q-learning (NegoQ). Experiments are first conducted in grid-world games, which are widely used to evaluate MARL algorithms. In these games, NegoQ learns equilibrium policies and runs significantly faster than existing MARL algorithms (correlated Q-learning and Nash Q-learning). Surprisingly, we find that NegoQ also performs well in team Markov games such as pursuit games, as compared with team-task-oriented MARL algorithms (such as friend Q-learning and distributed Q-learning).