Workshop Report: Decoding Communication in Nonhuman Species II

CETI sperm whale with credit - Photo by Amanda Cotton

by Christoph Drösser

Three years ago, in August 2020, the Simons Institute co-hosted a workshop on Decoding Communication in Nonhuman Species. Looking back at the titles of the talks, none used the AI buzzwords that have become household terms in the course of the last several months: ChatGPT, large language models, generative AI, chatbots. Yet the technology these terms refer to is central to the task at hand. The idea of the workshop was to apply cutting-edge methods from the field of natural language processing, especially large language models, to animal communication. This past June, the Institute co-hosted a follow-up workshop, Decoding Communication in Nonhuman Species II. Now that the technology is advancing at a breathtaking pace, this second workshop afforded participants an opportunity to take a look at how these efforts have advanced in the last three years — and whether the field has produced new tools that can help researchers understand what animals are talking about. 

“Three years ago, the idea of translating a nonhuman communication system, such as that of sperm whales, seemed like a crazy idea. We can now have a roadmap to making this vision possible, but it will be a massive undertaking, as most of the recent advancements are centered around the human communication system,” said David Gruber, founder and president of Project CETI and distinguished professor of biology at Baruch College, CUNY. “Our aim is to demonstrate how technology can be used to amplify the magic of our natural world and bring us closer to nature.”  

The workshop was funded in part by Oceankind, and co-hosted with Project CETI, a nonprofit, interdisciplinary scientific and conservation initiative on a mission to listen to and translate the communication of sperm whales. It was founded in 2017 after Simons Institute Director Shafi Goldwasser (now CETI’s theoretical analysis lead) met marine biologist David Gruber and computer scientist Michael Bronstein (now CETI’s machine learning lead and DeepMind Professor of AI at Oxford) during a year-long fellowship at the Radcliffe Institute for Advanced Study at Harvard. At the time, they were listening to sperm whale sounds and discussing how they communicate through a series of clicks, arranged in little snippets called “codas,” which are reminiscent of Morse code or similar digital signals. They realized these clicks could be digitized for analysis by computer algorithms such as large language models (LLMs). 

There was already a database of sperm whale codas that the marine biologist Shane Gero (now CETI’s biology lead) had been recording in the waters around the Caribbean island of Dominica since 2005. Project CETI will be collecting much more of this communication in the coming years, together with data about the whales’ behavior. Project CETI was selected as a TED Audacious Project in 2020; and by 2025, they aim to collect between 400 million to 4 billion coda clicks, as described in their 2022 scientific roadmap paper, “Toward understanding the communication in sperm whales,” published in iScience. The CETI team has grown to over 30 participating scientists and collaborators, spanning 15 institutions and eight countries.

LLMs are good at analyzing large collections of language data and generating new outputs in the same language by always predicting the next word. Their so-called transformer models manipulate the elements (“tokens”) of the language without knowing anything about their meaning. It is conceivable that an LLM model trained on whale clicks could reproduce “sentences” that, when played back to sperm whales, would be accepted by the animals. (No one is planning to do that.) But we still wouldn’t have an idea of what they actually say. Several talks during the workshop focused on the question of how to learn the formal structure of animal communication while at the same time correlating it with the whales’ behavior.

MIT and CETI colleagues Jacob Andreas and Pratyusha Sharma presented first results on the analysis of a limited sample of whale codas, and compared this problem with the situation of human astronomers intercepting the broadcast of an alien civilization. “We don't even know what the basic units of this language might be, how those pieces get put together to form larger meanings,” said Andreas. This is especially difficult if all we have is a fixed data set to work with. A language model trained on a large set of animal utterances could be used like a black box, to see how it reacts to new inputs, either from animals or construed by researchers. In this way, people could draw conclusions about the inner workings of the unknown language in an interactive fashion.

Sharma spoke about the first analyses that their team performed on the whale clicks. She pointed out that while it’s far too early to talk about the possible meaning of whale codas, the data could be analyzed for its complexity by probing it with more and more complex language models to see if the communication system has any structure that compares to the grammatical complexity of human speech.

The small size of the current dataset makes it impossible to decide whether the codas are the building blocks of something that you could actually call a language. “We don’t even know whether a coda is a word, or a letter, or a sentence,” said Andreas. “It might turn out that even when we have lots of data, the conversations don't have enough long-term structure to make prediction possible. But to answer that question, we will need bigger models and more data.” How much more data? “Right now, we have one point on the scaling curve. You need a few more points to figure out what the shape of the curve is. So right now, it's hard to say.”

Even if the size of the dataset multiplies in the coming years with the new experiments in Dominica, it will never reach the volume of the human language data that is fed into models like GPT-4. A less complex language might mean that less training data is necessary — but researchers are also looking for new methods of learning from fewer examples. After all, human children don’t need billions of words to get a grasp on their mother tongue.

One of these methods was presented by Isabel Papadimitriou from Stanford University. All human languages have some underlying principles in common, even if they are not related to each other on the surface. For example, all languages use some kind of recursion, meaning you can append one sentence at the end of another: “The cat sat on the mat” becomes “I think that the cat sat on the mat.” A similar principle is nesting several layers of sentences into each other: “The lawyer that the man whom the dog bit hired was disbarred.” These principles are one reason that the number of possible sentences is infinite. Of course, in practice we are limited by the ability of the recipient to untangle the loops.

These common features make it possible to put a certain “bias” into the modeling algorithm. This has nothing to do with the bias that is often discussed in AI applications. In this case, bias means: if we assume that a new language has similar structural elements as one that we already have a model for, we can train our new model a lot faster and with a smaller dataset. That principle is being used when the big LLMs for English are used to create models for languages that aren’t very common on the internet.

The same principles can be used to detect whether an unknown language uses familiar structures such as recursion, nesting, crossing links or large-scale dependencies. If the model learns faster with these “biases” built in, that’s an indication that the language uses these. Do whales build complicated sentences as humans do? So far, there is no indication of this.

Antonio Norelli and Emanuele Rodolà from the Sapienza University of Rome also pointed to the problems of the huge datasets that are used to train current language models. That not only makes their development very expensive but also leads to another issue:  “For some problems, we may also be close to the limit of the available data,” said Norelli.

He pointed out that while these systems create knowledge from deluges of data, that’s not how scientists work. They create hypotheses from small data samples and verify them on new data. “We want to create the artificial scientist,” Norelli said.

Applied to the problem of decoding sperm whale communication, what could that mean? The Italian team’s explanatory learning algorithm could create hypotheses of which whale behaviors are correlated with which strings of symbols: are there fish around when a certain kind of string is uttered? Do the whales dive down into the deep?

It’s important, Norelli said, to start with a small set of simple utterances to bootstrap the understanding of more complex codes. He compared it to Alan Turing’s work on cracking the German Enigma codes in Word War II, when Turing figured out that every morning the message started with the date and some data about the weather. Once he had decoded that part, it made it easier to decipher the rest.

Prior to the work of Project CETI, the common assumption in the field was that whale codas form discrete basic “building blocks” of their communication system. Some speakers cautioned that this assumption would be analogous to turning a human vocal utterance into written text. This might correctly identify some linguistic elements, but you might also lose important information in the process.

Gašper Beguš from UC Berkeley, CETI’s linguistics lead, pointed this out in his talk. “Spoken language is the primary thing; written text is just a representation of spoken language. If we see a [sperm whale] click, then we immediately think: oh, that is a one/zero kind of thing.” But that’s our human (or computer scientist) bias. 

When we model a language from text, we can learn a lot about it, but we also lose a lot of non-textual communication. “If you're approaching something as unknown as whale communication, you don't want to be throwing away anything.”

He is using a flavor of Generative AI called a General Adversarial Network (GAN). GANs have the advantage that they can work with the original audio recordings to create translations or representations. They consist of two AI players, the generator and the discriminator. The discriminator has been trained on a set of data, e.g. audio recordings. The generator, which never sees the original training set, looks at new data and tries to describe them, starting with random strings of symbols. The two parts of the algorithm play a kind of game: the generator tries to get better and better at describing the data, while the discriminator tries to get better at telling correct descriptions from false ones. As each party tries to outperform the other, the whole system gets better at interpreting the data.

“GANs are also very nice because they replicate stages in the acquisition of language,” Beguš said. So scientists could use them to study how whale calves start acquiring their language in small steps, just as human children don't start talking in complex sentences.

Beguš doesn’t see GANs as a competitive alternative to using LLMs, however. Rather, these different AI methods complement each other in understanding different aspects of the whales’ world. “GANs are appropriate for understanding what is meaningful at the most basic level, or how their vocalizations are learned,” he said. “Using LLMs helps us with longer conversations and more discretized representations of their vocalizations. Graph neural networks that operate with mathematical graphs can be used to model their complex social structure and behavior.” 

Maybe the most radical objection to reducing the whales’ communication to digital codas came from Bryan Pardo, a computer scientist and musician from Northwestern University. “Perhaps you all are dreaming of talking to the animals,” Pardo said. “I am imagining a world of perhaps jamming with the animals.” Interpreting the signals as carrying a referential meaning might be premature. When we sing a song, whether with lyrics or meaningless syllables like “fa la la”, the meaning of the communication is not the verbal transcript.

“Do we honestly believe that English is going to be a representational scheme that could adequately capture the concepts expressed by another species?” Pardo asked provocatively. In a more playful exercise, he has experimented with training a language model on the audio communication of tiki monkeys recorded in the Bolivian rainforest, trying to “translate” nonsensical, music-like human utterances into something that these monkeys might appreciate. While never meaning to apply this in the field, Pardo, like Beguš, cautioned against throwing out the full audio recordings of the sperm whales and only using the codas — we might lose some important part of their communication in the process.

The workshop was not just devoted to the sperm whales that Project CETI is studying. Diana Reiss (Hunter College, CUNY) talked about decades-long work communicating with bottlenose dolphins. Michael Pardo (Colorado State) presented his research into how African elephants call each other by names. And Kay Holekamp (Michigan State) spoke about her studies into the communication of spotted hyenas in Africa.