Learning Robust Real-Time Cultural Transmission without Human Data

Over millennia, humankind has discovered, evolved, and accumulated a wealth of cultural knowledge, from navigation routes to mathematics and social norms to works of art. Cultural transmission, defined as efficiently passing information from one individual to another, is the inheritance process underlying this exponential increase in human capabilities.

Our agent, in blue, imitates and remembers the demonstration of both bots (left) and humans (right), in red.

For more videos of our agents in action, visit our website.

In this work, we use deep reinforcement learning to generate artificial agents capable of test-time cultural transmission. Once trained, our agents can infer and recall navigational knowledge demonstrated by experts. This knowledge transfer happens in real time and generalises across a vast space of previously unseen tasks. For example, our agents can quickly learn new behaviours by observing a single human demonstration, without ever training on human data.

A summary of our reinforcement learning environment. The tasks are navigational representatives for a broad class of human skills, which require particular sequences of strategic decisions, such as cooking, wayfinding, and problem solving.

We train and test our agents in procedurally generated 3D worlds, containing colourful, spherical goals embedded in a noisy terrain full of obstacles. A player must navigate the goals in the correct order, which changes randomly on every episode. Since the order is impossible to guess, a naive exploration strategy incurs a large penalty. As a source of culturally transmitted information, we provide a privileged “bot” that always enters goals in the correct sequence.

Our MEDAL(-ADR) agent outperforms ablations on held-out tasks, in worlds without obstacles (top) and with obstacles (bottom).

Via ablations, we identify a minimal sufficient “starter kit” of training ingredients required for cultural transmission to emerge, dubbed MEDAL-ADR. These components include memory (M), expert dropout (ED), attentional bias towards the expert (AL), and automatic domain randomization (ADR). Our agent outperforms the ablations, including the state-of-the-art method (ME-AL), across a range of challenging held-out tasks. Cultural transmission generalises out of distribution surprisingly well, and the agent recalls demonstrations long after the expert has departed. Looking into the agent’s brain, we find strikingly interpretable neurons responsible for encoding social information and goal states.

Our agent generalises outside the training distribution (top) and possesses individual neurons that encode social information (bottom).

In summary, we provide a procedure for training an agent capable of flexible, high-recall, real-time cultural transmission, without using human data in the training pipeline. This paves the way for cultural evolution as an algorithm for developing more generally intelligent artificial agents.

This authors’ notes is based on joint work by the Cultural General Intelligence Team: Avishkar Bhoopchand, Bethanie Brownfield, Adrian Collister, Agustin Dal Lago, Ashley Edwards, Richard Everett, Alexandre Fréchette, Edward Hughes, Kory W. Mathewson, Piermaria Mendolicchio, Yanko Oliveira, Julia Pawar, Miruna Pîslar, Alex Platonov, Evan Senter, Sukhdeep Singh, Alexander Zacherl, and Lei M. Zhang.

Source link

Back to top button