Thursday, June 28th, 2018

# The Mind, the Brain and the Computer

Santiago Ramón y Cajal (1852–1934) combined art and science in almost 3,000 drawings of brain cells as seen through the microscope. The drawing above shows a few neurons of the human cerebral cortex, distinguishing the two kinds of fibers that emanate from each cell body. Those covered with a fuzz of spines are dendrites, whereas the smooth fibers are axons. The arrows represent one of Cajal's most important contributions to neuroscience: the inference that signals flow from the axon of one cell to the dendrites of the next. Copyright Herderos de Santiago Ramon y Cajal.

Seventy-five years ago, the newly invented digital computer was explained to the world as an "electronic brain," or a "thinking machine." More recently, as the computer has become a familiar presence in everyday life, the metaphor has been turned upside down: Now the brain is described as a kind of computer, an information-processing machine made of neurons instead of transistors.

Five months' immersion in the program on The Brain and Computation at the Simons Institute for the Theory of Computing has given me a new perspective on those glib analogies. The connections between computer science and neuroscience are even deeper than I knew, but also subtler. The brain is indeed a computational device, subject to the same rules that define the capabilities of a Turing machine or a silicon chip. But the differences between technological and biological computation are at least as important as the similarities. Knowing what goes on inside my laptop is not an ideal starting point for understanding what goes on inside my head.

I attended the program as an observer and a writer — specifically as Journalist in Residence. In this space I can't report all that I learned. What follows are selected impressions gathered from the lecture hall, the seminar room, and conversations at tea time.

Meetings of minds
Even writers have unwritten rules. One of the secret canons of journalism says: If there's no controversy, there's no story. Following this dictum, I could frame the program on The Brain and Computation as a culture clash between irreconcilable scientific traditions. On one side are the laboratory neuroscientists, who revel in the complexity and diversity of the brain's fabric. On the other side are the computer scientists, eager to sweep away all extraneous detail and replace it with a few elegant mathematical abstractions.

These contrasting tendencies go all the way back to the beginnings of the two disciplines. Modern neuroscience began in the 1890s with the work of Santiago Ramón y Cajal, a Spanish anatomist, who made magnificent drawings of brain tissue as seen through the microscope. Cajal lovingly rendered the intricate details of individual neurons — the "arbor" of dendrites, the bulbous cell body, the taproot-like axon — and he documented the extent of variation from one cell to another. Yet Cajal also formulated one of the fundamental abstractions of neuroscience: the principle that signals flow from the dendrites through the cell body to the axon, which then passes the message along to the dendrites of other cells. (It's now clear that this one-way flow of information is not the whole story, but it's still the main story.)

In the 1940s, Warren S. McCulloch and Walter Pitts took a further step toward abstraction, introducing a highly simplified and stylized model of the neuron. Gone are the filigreed crown of dendrites and the long, meandering axon; the cell is reduced to a mathematical device, which sums its inputs and produces an output if the total exceeds a threshold. McCulloch and Pitts presented their model as a contribution to neuroscience, but their simplified neurons also became prototypes for the logic gates of the digital computer. In the hands of this new community, the gates were simplified even further to compute logical functions such as AND and OR.

A multicolored reconstruction shows dozens of interlaced dendrites and axons within a tiny sliver of neural tissue. Whereas Cajal's drawings visualize just a few selectively stained neurons, Jeff Lichtman and his colleagues at Harvard University have set out to trace the shapes and the interconnections of all neurons in a cubic millimeter of mouse brain tissue. This preliminary study of a much smaller volume shows two dendrites (the bright red and green fibers at the core of the two cylindrical forms) with all the other cells that come within range of their dendritic spines. Image courtesy of Jeff Lichtman.

Both the biological and the computational cultures were well represented in the Simons Institute program. During the introductory "boot camp" workshop, Jeff Lichtman of Harvard University reviewed his ongoing project to identify every neuron and every interconnecting synapse in a cubic millimeter of brain tissue. And Murray Sherman of the University of Chicago gave a tutorial on the architecture of the cerebral cortex and the multiple pathways that connect it with the thalamus, a relay station deep inside the brain. From the computer science side of the fence, Bartlett Mel of the University of Southern California asked what the appropriate level of abstraction is for a model of a single neuron. He described a complex model that divides the cell into 443 compartments, but then demonstrated that a far simpler two-compartment model could capture most of the same behavior. And when Christos Papadimitriou of Columbia University presented his approach to understanding the brain, his point of departure was the Erdős-Rényi random graph, a mathematical structure that condenses the tangled network of neurons to mere dots and lines. The payoff for this severe abstraction is a model whose properties can be enumerated in great detail.

The tension I am describing between biological verisimilitude and mathematical austerity is not entirely the invention of a writer trying to enliven a story. The participants in the program on The Brain and Computation held strong and often divergent opinions, and they were not shy about expressing them. Talks were interrupted with questions and skeptical objections. And yet I have to confess that there was very little actual conflict across the great interdisciplinary divide. Everyone present, as far as I could tell, agreed that experimental ground truth is where theories must begin and end. They also acknowledged that a computational model incorporating every known detail of neurophysiology would be as hard to fathom as the brain itself.

In any event, if I were to assign the participants to opposing factions, it's not clear who would belong on which team. Many of the program's computer scientists have been working on questions about the brain for decades, and many of the biologists have a thorough mastery of computational methods. Then there's this curious fact: When I tallied up the academic origins of the long-term visitors, I discovered that the largest contingent had their roots in neither neuroscience nor computer science but in physics. There were also mathematicians, statisticians, and engineers in the group.

As the semester went on, I began to get a clearer sense of what's known about the brain with reasonable certainty and which questions are still wide open. Here I want to focus on a few areas in the middle distance — subjects for which there is ongoing progress to report, and maybe hope of a breakthrough, but not yet a consensus.

A sense of place
Recordings made from neurons in the brains of freely moving animals have revealed a surprising variety of cells sensitive to spatial position, orientation, and velocity. Place cells fire whenever the animal is at a specific location within its environment. Another class of neurons, the grid cells, respond not just to a single spot but to all the sites in a hexagonal lattice of locations, as if they were marking the tiles of a bathroom floor. Border cells are attuned to edges or boundaries; speed cells fire when the animal is running at a certain rate; head-direction cells act as a kind of compass, indicating which way the head is pointed with respect to some external frame of reference. Presumably, the firing patterns of the various geography-sensitive neurons give the animal a sense of place; even in the dark and in a terrain with few landmarks, it can know its own position and orientation. The question is: How do the neurons know their position and orientation? About a dozen talks over the course of the semester addressed issues connected with this navigational system.

Spatial computations in the brain are revealed by recording the activity of a single neuron as a laboratory rat explores its environment. Black trails trace the rat's movements; red dots record the firing of the selected "grid cell," which lies in a brain region called the entorhinal cortex. The red dots form a distinctive pattern, with clusters near the vertices of a triangular lattice, giving the animal a kind of spatial coordinate system. (The recording was made by Edvard I. Moser and his colleagues and discussed in a talk by Ila Fiete.) Image courtesy of Ila Fiete.

The proposed mechanism is simplest in the case of the head-direction cells, where the variable being measured is one-dimensional: an angle. Yoram Burak of Hebrew University and Ila Fiete of the University of Texas at Austin model the system by a ring of neurons with full connectivity — that is, each neuron can communicate with every other neuron in the ring. The connections are arranged so that whenever a cell is stimulated, it excites its near neighbors but inhibits activity in all the more distant members of the ring. This pattern of interactions has a stable solution: The ring develops a single, self-reinforcing "bump" of excited neurons, with activity suppressed everywhere else. The bump can be interpreted as indicating the current head direction. When the animal turns its head, sensory signals nudge the bump around the ring, but the stabilizing interactions ensure it retains its integrity as a single bump.

Another Burak-Fiete model describes the hexagonal lattice of grid cells. In a two-dimensional sheet of neural tissue, each neuron has a center-surround receptive field: Stimuli near the middle of the field tend to activate the cell, but those in the surrounding annulus inhibit it. These interactions generate a stable polka-dot pattern whose scale depends only on the size of the center-surround doughnuts. The regularity of the array allows it to act as a global coordinate system.

There's a problem, however, with the use of grid cells as an internal GPS system: Because a given cell lights up in the same way at each lattice point, there's no way to distinguish one lattice point from another. It's like navigating through a city without street signs; you know when you reach an intersection, but you can't tell which intersection. The Burak-Fiete model provides an ingenious solution to this problem, with a mathematical flourish. It turns out the brain has multiple sets of grid cells, with different spacing between adjacent grid points. Comparing the phases of these independent signals removes the ambiguity. For example, three sets of grid cells with nearest-neighbor spacings of 5, 6, and 7 units would create unique markers across an entire field with a diameter of 5 × 6 × 7 = 210. (The number of uniquely identifiable points along each dimension is equal to the least common multiple of the grid spacings.)

A brain network for spatial awareness may seem like a rather narrowly specialized subsystem, but the mechanism for forming such self-organized, stable arrays of cells may have broader applications. The various types of navigational networks are found in the hippocampus and a nearby area called the entorhinal cortex, both of which have important roles in memory and learning. An intriguing question is whether these functions are related, so that study of the navigational networks might reveal something about the mechanism of memory. Some of these questions were addressed in a hippocampus discussion group organized by Laurenz Wiskott of Ruhr University Bochum in Germany.

Making waves
The first electrical signals ever detected from a living brain were recorded almost 100 years ago, using the technique now called electroencephalography, or EEG. Electrodes placed on the scalp record faint oscillations at frequencies ranging from less than 1 Hz to about 50 Hz. In order of increasing frequency, the bands of oscillations are designated delta, theta, alpha, beta, and gamma. All of the waves are thought to arise from the combined electrical activity of large, dispersed populations of neurons.

More topics from the program on The Brain and Computation

Sophie Denève of the École Normale Supérieure in Paris presented work in a number of areas, a common thread being the brain's mechanisms for regulating its own activity and its energy economy. She emphasized the need to maintain a balance between excitatory and inhibitory activity; failure in one direction leads to a seizure, in the other to stupor or death.

Peggy Seriès of the University of Edinburgh introduced a topic that was not on my radar: computational psychiatry. Denève also spoke on the subject. The underlying idea is that some forms of mental illness might be understood as disorders of Bayesian inference, the statistical framework for combining prior beliefs with new data. The statistical analysis at issue here is not carried out at the level of conscious reasoning but by some less-accessible mechanism in the brain, which can inform mood and judgment. Giving too much weight to prior beliefs — trusting one's internal model of the world in spite of contrary external evidence — creates a confirmatory bias. In Denève's phrase, the brain's rule becomes, "See what you expect." Overweighting sensory experience and ignoring statistical prior assumptions leaves you vulnerable to delusions: "Expect what you see."

Naftali Tishby of Hebrew University in Jerusalem arrived in the middle of the semester and offered a rapid-fire series of lectures and tutorials on his "information bottleneck" theory of deep neural networks. These networks are not the biological ones of the brain; they are the computational models now fashionable for tasks such as image recognition or transcribing speech. Why are they so successful? Tishby argues for a two-phase process: Initially the network learns details about the structure of each input item (the pixels of an image, say); the second phase is dedicated to compressing or forgetting those details in order to formulate generalizations that apply to all the inputs.

Lena Ting of Emory University offered a mechanical engineer's analysis of motor skills, such as standing upright and walking. These are things we do "without thinking," and in fact we generally cannot explain how we do them. Ting points out that some motor tasks can be done not only without thinking but also without the brain; many algorithms needed for standing and walking are wired into the spinal cord and the peripheral nervous system. Indeed, the muscles and skeleton "know" a great deal about how to walk.

Two speakers addressed the formidable problem of understanding consciousness, trying to explain the self-aware inner presence that serves as the first-person actor in our mental lives. Manuel Blum of Carnegie Mellon University began with a theatrical metaphor: Consciousness consists of whatever is happening on the stage of short-term memory, observed by an audience in long-term memory. His aim (in joint work with Lenore Blum) is to develop this idea beyond the level of metaphor and build a theory of conscious machines. Kevin O'Regan of the University of Paris Descartes gave a "sensorimotor" account of consciousness, arguing that the seat of consciousness is not in the brain but emerges from a "loop" of sensory input, internal processing, and motor output leading to revised sensory input. Interestingly, both Blum and O'Regan emphasized the ability to feel pain as a test of consciousness.

Over the years, research interest in these rhythms has waxed and waned; it is on the upswing just now. The Simons Institute program included a number of talks touching on the theme, as well as a lively discussion group organized by Rufin VanRullen of the Center for Research on the Brain and Cognition in Toulouse, France, and with notable contributions by Fritz Sommer of Berkeley.

Oscillations in brain tissue invite comparison with the clock signal in an electronic computer, which beats a steady tempo for all the other circuits in the machine. The brain rhythms may likewise help to coordinate or synchronize events, but there is an important difference. Whereas the computer clock signal is generated by a separate, dedicated oscillator and distributed to the rest of the components, neural oscillations seem to emerge spontaneously from the activity of large-scale networks. There is no metronome regulating the brain's pace; it's more like the beat of a jazz band, determined cooperatively by the entire ensemble.

In the 1970s Valentino Braitenberg of the Max Planck Institute for Biological Cybernetics in Tübingen suggested one possible function for oscillations. Suppose an idea or memory is represented in the brain by the activation of some large population of neurons. Each neuron in the group fires only when the sum of its inputs exceeds a threshold; but the threshold is adjustable, being determined in part by the local electrical environment. Oscillations would alternately lower and raise the thresholds of all neurons in a broad region. The briefly lowered threshold would aid the "ignition" of the selected population of neurons; later, the rising threshold would help to extinguish the activity, so that the brain could move on to the next idea. Braitenberg called this process "the pump of thoughts."

Pascal Fries of the Ernst Strüngmann Institute in Frankfurt suggests a somewhat different role for gamma oscillations (the range with the highest frequencies). Consider an ambiguous image, with two competing interpretations: Is it a duck or a rabbit? Is it a white vase or the silhouette of a couple about to kiss? In the brain's visual pathway, early processing centers (such as the cortical region labeled V1) might simultaneously detect features consistent with both interpretations, but the higher cortical centers (notably V4) always commit to one view or the other. Fries argues that the choice depends on the precise timing of signals with respect to the phase of the gamma oscillation.  A V1 region whose gamma rhythm is in sync with the corresponding area in V4 is more likely to have its message heard. Supporting this hypothesis are occasional observations of coherent gamma oscillations in distant regions of the cortex.

Still another idea about the function of gamma waves comes from Dana Ballard of the University of Texas at Austin. The oscillations could enable a form of frequency multiplexing in which neurons simultaneously participate in multiple communications channels by attending to one frequency or another.

Terry Sejnowski of the Salk Institute in San Diego presented evidence that the brain rhythms detected by EEG are not just temporal oscillations but traveling waves that also have a spatial component. Recording techniques more sensitive than ordinary EEG show spiral waves propagating across large swaths of the cerebral cortex.

Curses and blessings of dimensionality
A population of $N$ neurons all doing their own thing has a huge variety of possible behaviors. Even if you pretend that each neuron has just two available states – firing and quiescent – the set of $N$ neurons has access to $2^{N}$ states. Each state of the system can be represented as a vector in a space of $N$ dimensions.

High-dimensional spaces are nothing like the world of everyday experience. In a 100-dimensional cube, almost all the volume is tucked away in the corners – there are $2^{100}$ of them. If you lose your car keys in such a place, you've little chance of ever retrieving them. The mathematician Richard Bellman called this effect "the curse of dimensionality." How can the brain manage to function in such an environment?

One answer is that the effective number of dimensions may be much smaller than the apparent number. Data embedded in a 100-dimensional cube don't necessarily spread out through all that space; the data points can cluster in a subspace of much lower dimensionality. This is a point made by Ila Fiete in her discussion of head-direction cells. Although a collection of 100 such cells could in principle have states scattered throughout the high-dimensional volume, the observed states all lie on a one-dimensional ring. Neuron populations whose states are confined to a two-dimensional surface have also been observed.

Another answer is that dimensionality can confer blessings as well as curses. In support of this notion, Bruno Olshausen of the Redwood Center for Theoretical Neuroscience (on the Berkeley campus) points out that the brain does dimensionality expansion as well as reduction. In the human visual pathway, signals from six million cone cells in the retina are squeezed into 1.5 million fibers in the optic nerve, which carries them to the lateral geniculate nucleus, one of the relay centers in the thalamus. This is a compression, or dimension-reduction, step. The expansion step comes when the signals finally reach the primary visual cortex, V1. The input layer of V1 has an extravagant overabundance of neurons compared with the number of inputs from the lateral geniculate nucleus. A small visual field, equivalent to 14 × 14 pixels, or roughly 200 pixels overall, is mapped onto an area of the cortex that has about 100,000 neurons. This mismatch may sound like a scandalous waste of resources, but in fact the excess neurons can be put to good use in a method called sparse coding.

"Coding," in this context, refers simply to methods of storing or representing information; the coding schemes can be classified along an axis from dense to sparse. A dense code uses every possible pattern of binary digits, which means an $N$-bit dense code has a capacity of $2^{N}$ items. For example, the ASCII code represents characters of the Latin alphabet as patterns of eight bits, and every such pattern from 00000000 through 11111111 is assigned a meaning. At the other end of the spectrum, the sparsest possible code is a 1-of-$N$ code, in which each binary pattern has a single 1 bit, and all the rest are 0s. This code has a capacity of only $N$ items. Between these extremes are sparse codes with a few 1s and many 0s in each pattern. For words of length $N$ with $m$ 1s, the capacity of the code is given by the binomial coefficient $\binom{N}{m}$, which generally lies between $N$ and $2^{N}$.

In electronic computers, dense coding is the norm. Compact representations make the best use of hardware resources. In the brain, on the other hand, sparse coding can be advantageous. In many cases, the critical quantity to be optimized in neural circuits is not the total number of neurons but the number that are firing at any one moment. This determines energy consumption; if a 1 in a code word corresponds to a firing neuron, then energy is minimized by making the code as sparse as possible. (As Olshausen put it: "More neurons, less firing.")

The sparse coding of visual information may also be more convenient when the pattern is passed along to higher centers of the brain for further processing and interpretation. An obvious encoding for this purpose allocates one bit to each feature that might be present or absent in a segment of the visual field. A sparse coding comes closer to this ideal than a dense coding.

Pentti Kanerva, also of the Redwood Center, has for many years been exploring a related idea called sparse distributed memory. The memory he envisions would encode stored items in very long vectors — perhaps 10,000 elements. The number of vectors available is so enormous ($2^{10{,}000}$) that billions of items could be stored without any two lying close to one another. If distance is measured by the number of bit positions where two vectors differ, almost all pairs of vectors would be separated by a distance of about 5,000 bits; only a vanishingly small fraction would be closer than 4,000 bits. As a result, memories could be recalled on the basis of partial or incorrect information. There's also enough space to store every item redundantly, thousands of times, allowing the system to degrade gracefully when components fail.

Sparse distributed memory was conceived as a theory of how human (and other mammalian) memory might work, but it also suggests a novel approach to computation. Kanerva writes of "a new breed of computers that, contrasted to present-day computers, work more like brains and, by implication, can produce behavior more like that produced by brains[...] It falls upon those of us who work on the theory of computing to work out the architecture. In that spirit, we are encouraged to explore the possibilities hidden in very high dimensionality and randomness."

Known unknowns
As an observer of science-in-the-making, it's grand to sit and watch the parade of new research findings pass by. But there are always questions that remain unanswered. Neuroscience has an exceptionally bountiful supply of deep, unsolved problems. I was intrigued to see how the community handles these shadowy territories.

One of the unsettled issues is referred to as the encoding problem. Neurons communicate via "spikes," or action potentials. A brief electrical discharge propagates down the axon; when it reaches a synapse, where the axon contacts a dendrite of another cell, the spike causes a discharge of neurotransmitter molecules. The basic physics and physiology of action potentials have been understood since the 1950s. What remains unclear is how information is encoded in the sequence of spikes. Although a spike is a discrete event, the code is not a digital one, with individual spikes representing 1s and 0s. The simplest hypothesis is that neurons employ a rate code, where the average number of spikes per second indicates the strength of a signal. That is the default assumption in many computational models of brain activity. However, there are cases where the timing of individual spikes seems to be important. Sometimes the first spike to arrive takes precedence over all others, or simultaneous spikes may have a different effect than sequential ones. Thus even if we could record all the spiking activity of a set of neurons, we might not be able to decipher the messages.

Learning and memory also remain mysterious. There is a near consensus that memory depends on "synaptic plasticity." When you learn a new fact, it is committed to long-term memory by altering the conductance of synapses somewhere in the brain, thereby either strengthening or weakening the connections between selected neurons. But which neurons, and how are they selected? Some memories can last a lifetime – I learned the hokey-pokey more than 60 years ago, and it's all still there in my head – so the altered synapses must somehow be protected from further change.

The organization of memory at a higher level of abstraction presents difficulties of another kind – not a lack of good ideas but a surfeit of them.

One scheme for information storage would allocate a single neuron to each learned concept; recalling that concept is then just a matter of activating the right cell. This notion was crystallized 50 years ago in the phrase grandmother cell  the hypothetical neuron that responds to all things grandmotherly and to nothing else. The idea was never taken seriously. All the schemes now under consideration would represent an idea in the activity of a whole ensemble of cells, probably thousands of them. The first clear statement of this idea was formulated in 1949 by Donald O. Hebb, who named the ensembles cell assemblies. A sensory stimulus might activate a subset of the neurons in a cell assembly, and the internal connections among the cells would then ignite the rest and sustain activity in all of them.

In the 1980s, another kind of ensemble model came along, based on the physics and mathematics of dynamical systems. John Hopfield, now of Princeton University, noted that many different patterns of activity in a large collection of neurons could all evolve to the same stable state (or perhaps a repeating sequence of states), called an attractor. Other patterns of activation in the same set of neurons would converge on different attractors. It is the attractors that represent memories. Kanerva's ideas on sparse distributed memory also emerged in the 1980s, and offer a third possible way of storing information in large ensembles of neurons.

Mechanisms for representing and transmitting information lie at the root of brain science, so vagueness about how these tasks are accomplished seems like a severe handicap. It's like trying to make progress in molecular biology without knowledge of DNA or the genetic code. No doubt every neuroscientist would be delighted to see the encoding and memory problems resolved; but in the meantime, the various theories and speculations live on under a doctrine of peaceful coexistence. I am reminded of what John Keats, the Romantic poet, called "negative capability": the patience to tolerate ongoing doubt and uncertainty, rather than insisting on a commitment to one hypothesis or another even when there's no clear basis for the choice. Keats considered it a virtue.

I am also reminded of a wry observation of uncertain provenance (the neuroscientist Gÿorgy Buzsáki attributes it to someone named Ken Hill): "If the brain were simple enough for us to understand it, we would be too simple to understand it."