Months rarely pass without news of an artificial intelligence dominating humans in a complex game, so it should come as no surprise that Google’s DeepMind has mastered winning strategies for Quake III Arena. But unlike past AI victories, Google’s latest approach to reinforced learning allowed DeepMind to succeed with practically no instruction and even without its key technical advantages.
Even if you didn’t already know how to play Capture the Flag—the core gameplay mechanic in Quake III Arena—you could grasp the rules in under a minute. Strategic talent, on the other hand, can take a while to develop. If you wanted to program a machine to play even a simple game it would require significantly more instruction as well as time. Recent developments in AI have changed this because we can specify the parameters of the artificial neurons as well as the feedback that they provide to the machine when performing a task. The machine only knows the actions it can take, whether or not it failed, and that it should work towards the goal of failing as infrequently as possible. In this particular case, DeepMind could only learn from on-screen pixels within the context of those basic parameters.
Reinforced learning methods allow the AI to fail often, memorize its mistakes, and find patterns that lead to success. It’s easy enough for an AI to succeed without many obstacles and variables, but in a game that requires team cooperation (like Quake III Arena), the AI needs to consider enemy behavior as well as its allies. Winning strategies in team games rarely involve a single player. Michael Jordan’s early basketball career clearly demonstrates how a star player who only plays for himself will not lead a team to victory. But AI isn’t encumbered by conflicting goals. In about 450,000 games—roughly four years of practice for a human—DeepMind intuited successful team-based strategies without guidance that allowed it to win against proficient human players far more often than it lost.
Google used this training data to create DeepMind’s “For the Win” (FTW) agents to play as individual team members in Quake III Arena. In each game played, Google randomly assigned teams from an equal mix of human players and FTW agents. The FTW agents managed an average probable “win rate” of about 1.23x greater than the strongest human players. When playing with average human players, that win rate jumped up to about 1.5x. Of course, machines have a key advantage when it comes to the speed of processing precise and detailed information from memory. Nevertheless, even introducing a regular 257-millisecond delay only caused the FTW agents to lose against proficient players about 79 percent of the time.
DeepMind’s FTW agents owe their success to a few core elements of the reinforced learning process. While no instruction was provided, neurons were encoded to respond to specific game events like the capture of an agent’s flag or when a teammate held a flag in order to calculate context for these events. Because all learning happened visually, the arrangement of the artificial neurons was modeled after the visual cortex of the human brain. Two long short-term memory (LTSM) networks, each operating on separate timescales, process the visual data with their own varied learning objectives. This concurrent, dual process gives each FTW agent the advantage of comparing possibilities taken from the machine-equivalent of different perspectives. The agents derive their choices based on the result of this process and play the game by emulating a game controller. As you can see in the video above, the fast-paced movements offer a clear advantage and show a distinct gameplay style that few humans—if any—could manage.
In one-on-one games, AI superiority can feel like an insurmountable roadblock for even the best of players. In a team environment, however, AI and humans can actually work together and compete in a way that doesn’t sacrifice the enjoyment of the game.
VentureBeat spoke to Thore Graepel, a DeepMind scientist and professor of computer science at London’s Global University, who further explains the benefits of these efforts:
Our results demonstrate that multiagent reinforcement learning can successfully tackle a complex game to the point that human players even think computer players are better teammates. They also provide a fascinating in-depth analysis of how the trained agents behave, work together, and represent their environment What makes these results so exciting is that these agents perceive their environment from a first-person perspective, just as a human player would. In order to learn how to play tactically and collaborate with their teammates, these agents must rely on feedback from the game outcomes — without any teacher or coach showing them what to do.
These efforts provide a more optimistic look at how humans and artificial intelligence can coexist in a beneficial way. While that may not alleviate some of the more significant concerns AI raises about the near future, these positive examples help determine the right ways to use this powerful new technology.
Former Tesla Employees Warn of Defective Model 3 Batteries
Some former Tesla employees report that many Model 3 batteries were assembled by hand, and there could be quality issues that make them more prone to failure. Tesla, however, denies this.
Microsoft’s Windows Defender ATP Catches Law Enforcement Spyware
Microsoft has developed its threat detection model enough to catch professional malware. There's an impressive difference between the level of expertise in these high-end samples versus conventional malware products.
EA Admits Defeat, Unlocks All Battlefront 2 Heroes, Removes Pay-to-Win Mechanics
EA has revamped the entire Battlefront 2 progression system, removed its pay-to-win mechanics, and now rewards players for, you know, actually playing the game.
US Patriot Missile Defense System Malfunctions, Crashes in Saudi Arabia’s Capital
New video shows a Patriot missile defense system malfunctioning when Saudi Arabian forces attempted to shoot down incoming Houthi missiles.