Stanford Reinforcement Learning System Simulates Evolution – TechTalks


This article is part of our reviews of AI research articles, a series of articles that explore the latest findings in artificial intelligence.

Hundreds of millions of years of evolution have blessed our planet with a wide variety of life forms, each intelligent in its own way. Each species has evolved to develop innate skills, learning abilities, and physical fitness that ensure its survival in its environment.

But despite being inspired by nature and evolution, the field of artificial intelligence has largely focused on creating the elements of intelligence separately and merging them after development. While this approach has yielded great results, it has also limited the flexibility of AI agents in some of the basic skills found even in the simplest of life forms.

In a new paper published in the scientific journal Nature, AI researchers at Stanford University are presenting a new technique that can help overcome some of these limits. Entitled “Deep Evolutionary Reinforcement Learning”, the new technique uses a complex virtual environment and reinforcement learning to create virtual agents that can evolve in both their physical structure and their learning capacities. The findings may have important implications for the future of AI and robotics research.

Evolution is difficult to simulate

artificial intelligence evolution

In nature, the body and the brain evolve together. Across many generations, each animal species has gone through countless cycles of mutation to develop limbs, organs and a nervous system to support the functions it needs in its environment. Mosquitoes have thermal vision to detect body heat. Bats have wings to fly and an echolocation device to navigate dark places. Sea turtles have fins for swimming and a magnetic field detection system for traveling very long distances. Humans have an upright posture that frees their arms and allows them to see the distant horizon, nimble hands and fingers that can manipulate objects, and a brain that makes them the best social creatures and problem solvers out there. planet.

Interestingly, all of these species descend from the first life form that appeared on Earth billions of years ago. Based on the selection pressures caused by the environment, the descendants of these early living things evolved in many different directions.

Studying the evolution of life and intelligence is interesting. But reproducing it is extremely difficult. An AI system that would like to recreate intelligent life in the same way evolution did would have to look for a very large space of possible morphologies, which is extremely computationally expensive. It would take a lot of cycles of parallel and sequential trial and error.

AI researchers use several shortcuts and predefined features to overcome some of these challenges. For example, they correct the architecture or physical design of an AI or robotic system and focus on optimizing the learning parameters. Another shortcut is the use of Lamarckien rather than Darwinian evolution, in which AI agents pass their learned parameters on to their descendants. Another approach is to separately train different AI subsystems (vision, locomotion, language, etc.) and then assemble them into a final AI or robotics system. While these approaches speed up the process and reduce the costs of training and upgrading AI agents, they also limit the flexibility and variety of results that can be achieved.

Deep learning by evolutionary reinforcement

DERL structure

In their new work, the Stanford researchers aim to bring AI research closer to the real evolutionary process while keeping costs as low as possible. “Our aim is to elucidate certain principles governing the relationships between environmental complexity, evolved morphology and the capacity for learning intelligent control,” they write in their article.

Their framework is called Deep Evolutionary Reinforcement Learning. In DERL, each agent uses deep reinforcement learning to acquire the skills necessary to maximize their goals over their lifetime. DERL uses Darwinian evolution to search morphological space for optimal solutions, which means that when a new generation of AI agents is spawned, they inherit only physical and architectural traits from their parents (with slight mutations). None of the learned parameters are passed from one generation to the next.

“DERL opens the door to performing large-scale in silico experiments to provide scientific information on how learning and evolution cooperatively create sophisticated relationships between environmental complexity, morphological intelligence, and learning control tasks, “write the researchers.

Simulate evolution

For their framework, the researchers used MuJoCo, a virtual environment that provides highly accurate physical simulation of rigid bodies. Their design space is called UNIversal aniMAL (UNIMAL), in which the goal is to create morphologies that learn tasks of locomotion and manipulation of objects on a variety of terrains.

Each environmental agent is made up of a genotype that defines its members and joints. The direct descendant of each agent inherits the genotype of the parent and undergoes mutations that can create new members, remove existing members, or make small changes in characteristics such as degrees of freedom or member size.

Each agent is trained with reinforcement learning to maximize rewards in various environments. The most basic task is locomotion, in which the agent is rewarded for the distance he travels during an episode. Agents whose physical structure is better suited to crossing the terrain learn more quickly to use their limbs to move around.

To test the results of the system, the researchers generated agents in three types of terrain: flat (FT), variable (VT), and variable terrain with modifiable objects (MVT). The flat terrain exerts less selection pressure on the morphology of the agents. Variable terrain, on the other hand, forces officers to develop a more versatile physical structure that can climb slopes and navigate obstacles. The MVT variant presents the added challenge of requiring agents to manipulate objects to achieve their goals.

The advantages of DERL

DERL morphology variety
Deep Evolutionary Reinforcement Learning generates a variety of successful morphologies in different environments

One of the interesting findings of the DERL is the diversity of the results. Other approaches to evolutionary AI tend to converge on a solution, as new agents directly inherit the physical and the learnings of their parents. But in DERL, only morphological data is transmitted to the descendants, the system ends up creating a diverse set of successful morphologies including bipeds, tripeds and quadrupeds with and without arms.

At the same time, the system shows features of the Baldwin effect, which suggests that agents who learn faster are more likely to reproduce and pass their genes on to the next generation. The DERL shows that evolution “selects faster learners without any direct selection pressure to do so,” according to the Stanford article.

“Interestingly, the existence of this morphological Baldwin effect could be exploited in future studies to create embodied agents with lower sample complexity and higher generalization capacity,” the researchers write.

DERL assessment tasks
DERL-trained officers are assessed through a variety of tasks

Finally, the DERL framework also validates the hypothesis that more complex environments will give rise to smarter agents. The researchers tested the evolved agents on eight different tasks, including patrolling, evasion, object manipulation, and exploration. Their results show that, in general, agents who have evolved over variable terrain learn faster and perform better than AI agents who have only experienced flat terrain.

Their results appear to be in line with another hypothesis from DeepMind researchers that a complex environment, appropriate reward structure, and reinforcement learning can eventually lead to the emergence of all kinds of intelligent behaviors.

AI and robotics research

The DERL environment has only a fraction of the complexities of the real world. “While DERL allows us to take a significant step forward in scaling the complexity of scalable environments, an important line of future work will be to design scalable environments that are more open, physically realistic and multi-agent,” write the researchers.

In the future, researchers will broaden the range of assessment tasks to better assess how agents can improve their ability to learn behaviors relevant to humans.

The work may have important implications for the future of AI and robotics and push researchers to use exploration methods much closer to natural evolution.

“We hope that our work will encourage further large-scale explorations of learning and evolution in other contexts to bring new scientific knowledge about the emergence of rapidly learnable intelligent behaviors, as well as new technical advances in our ability to instantiate them in machines “, explain the researchers. write.


Leave A Reply