Multi-Agent Machine Learning: A Reinforcement Approach
Format: PDF / Kindle (mobi) / ePub
The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-player grid games—two player grid games, Q-learning, and Nash Q-learning. Chapter 5 discusses differential games, including multi player differential games, actor critique structure, adaptive fuzzy control and fuzzy interference systems, the evader pursuit game, and the defending a territory games. Chapter 6 discusses new ideas on learning within robotic swarms and the innovative idea of the evolution of personality traits.
• Framework for understanding a variety of methods and approaches in multi-agent machine learning.
• Discusses methods of reinforcement learning such as a number of forms of multi-agent Q-learning
• Applicable to research professors and graduate students studying electrical and computer engineering, computer science, and mechanical and aerospace engineering
and p1(0)=q1(0)=0.5-->. We run the simulation for 30,000 iterations. Figure 3-10 shows that the players' strategies move close to the NE strategies (both players' second actions) as the learning proceeds. Table 3.3 Examples of two-player matrix games. Figure 3-10 Trajectories of players' strategies during learning in prisoners' dilemma. Reproduced from , � X. Lu. The third game we simulate in this chapter is the rock-paper-scissors game. This game has two players and each player has three
equilibrium strategies after that. If we know Qi∗(s,ai,a−i)-->, we can solve Eq. (4.7) and find player i-->'s Nash equilibrium strategy π∗(s)-->. Similar to finding the minimax solution for (3.8), one can use linear programming to solve Eq. (4.7). For a MARL problem, Qi∗(s,ai,a−i)--> is unknown to the players in the game. The minimax-Q algorithm is listed in Algorithm 4.1. The minimax-Q algorithm can guarantee the convergence to a Nash equilibrium if all the possible states and players' possible
steps, Up and Left, and wins at the same time as agent 2 takes two steps, Up and Up, and wins. Therefore, we get v1(1,3)=0+0.99×100=0.99-->, and we can compute the optimal Q-->-value as 4.36 Q1((0,2),Up,Left)=0+0.9912×0+12×99=49--> Finally, we compute the value for Q1((0,2)Up,Up)-->. In this case, there are four possible outcomes for the next state. Each agent has a 50% probability of moving Up, therefore there is 25% probability for both agents to move Up at the same time. The probability of
In our work, we will investigate how the players learn to behave with no knowledge of the optimal strategies. Therefore, the above problem becomes a multiagent learning problem in a multiagent system. In the literature, there are a number of papers on multiagent systems [21, 22]. Among the multiagent learning applications, the predator–prey or the pursuit problem in a grid world has been well studied [22, 23]. To better understand the learning process of the two players in the game, we create a
consideration . When an action αk--> is chosen to be executed, all personality traits γ¯--> are then updated. The effect of the action taken is evaluated using the equation 6.6 ?(s,γ¯,αk)=h∑j=1nγj?j(st,αk,t)--> where the individual reward functions ?j--> are related to how beneficial for the robot the execution of action αk--> in the presence of state st--> is according to each personality trait γj--> and, therefore, determines the reward and/or penalty related to the trait of personality.