No abstract available.

### Monday, June 17th, 2019

First order thalamic nuclei (e.g., the LGN) relay driver input from a subcortical source (e.g., retina), and higher order nuclei (e.g., the pulvinar) relay driver input from layer 5 of one cortical area to another and are involved in cortico-thalamo-cortical (or transthalamic corticocortical) circuits. Most of thalamus can be divided into first and higher order components. Many, and perhaps all, direct driver connections between cortical areas are paralleled by indirect transthalamic ones. Such transthalamic circuits represent a heretofore unappreciated role in cortical functioning, and this assessment challenges and extends conventional views regarding both the role of thalamus and mechanisms of corticocortical communication. Evidence for these transthalamic circuit as well as speculations as to why these two parallel routes exist will be offered.

We investigate the computational role of the feedback from layer VI of the primary visual cortex (V1) onto LGN relay cells. Our modeling is based on recent experimental results on cat and primate visual cortex that has shed light on the net polarity of feedback as a function of receptive field location of an lgn cell relative to that of a V1 column. These results postulate an excitatory feedback effect when the two receptive field locations largely overlap and an inhibitory feedback when the receptive fields are slightly offset, creating a “Mexican hat”. Our numerical experiments suggest that this feedback scheme results in an increase of information transferred from LGN to V1 when the extent of excitatory and inhibitory regions of the Mexican hat filter are within a range similar to what is observed in the experiments. We further set up an information maximization problem to learn the feedback kernel and observe that the resulting kernel has a Mexican hat shape with similar excitatory and inhibitory widths.

The complex connectivity structure unique to the brain network is believed to underlie its robust and efficient coding capability. The recent development of the structural mouse brain network available at the Allen Mouse Brain Connectivity Atlas, makes it possible to conduct in-depth analyses on connections between structure and computation in the brain network. In this talk, I will discuss computational strategies that can be inferred from the architecture of the mesoscopic mouse brain network constructed from viral tracing experiments. First, I will explore how network synchrony depends on complex connectivity structures of the whole-brain network. By simulating large-scale brain dynamics using a data-driven network of phase oscillators, we show that complexities added to the spatially embedded whole-brain connectome by sparse long-range connections, enable rapid transitions between local and global synchronizations. This result implicates computational roles of strong distal connections in the brain, which may be important for the brain?s exceptional ability to rapidly switch between modular and global computations?such rapid transition is known to be impaired in pathological brains (e.g. Alzheimer?s disease). The recent expansion of the Allen Mouse Brain Connectivity data includes cell-type and layer-specific cortical connectivity, constructed from viral tracing experiments in Cre-transgenic mice. In the second part of the talk, I will introduce an unsupervised method to find the hierarchical organization of the mouse cortical & thalamic network, based on the layer-specific connectivity. The implemented method discovers the hierarchy of the mouse brain areas based on their anatomical connectivity patterns, and provides a measure of ?hierarchy scores? for different connectomes. The uncovered hierarchy provides insights into the direction of information flows in the mouse brain, which has been less well-defined compared to the primate brain.

The cerebellum contains over half the neurons in the brain (the granule cells), as well as neurons with the largest number of modifiable synapses (the Purkinje cells). More than a century ago Santiago Ramon y Cajal mapped its circuits and left us with the puzzle of interpreting its function and operation. 70 years later David Marr (1969) and James Albus (1972) interpreted it as a neural associative memory. I will discuss this interpretation and its fit into a theory of computing with high-dimensional vectors. It turns out that computing with vectors resembles computing with numbers. Both need a large memory, to provide ready access to a lifetime's worth of information. I will also discuss the need to understand the cerebellum's connections to the rest of the nervous system in light of the theory of computing with vectors.

Driven by advances in multi-cell spike recording technology, the statistics of action potential populations are revealing many new details of dynamic signals in the cortex. However, it is still difficult to integrate slow Poisson spiking with much faster spike timing signals in the gamma frequency spectrum. A potential way forward is being sparked by advances in patch clamping methodologies that allow the exploration of communication strategies that use millisecond timescales. The voltage potential of a cell's soma recorded at 20 kilohertz in vivo allows its high resolution structure to be correlated with behaviors. We show that this signal can be potentially interpreted by a unified model that takes advantage of a single cycle of cell's somatic gamma frequency to modulate the generation of its action potentials. This capability can be seen as organized into a general-purpose method of coding fast computation in cortical networks that has three important advantages over traditional formalisms: 1) Its processing speed is two to three orders of magnitude times faster than population coding methods, 2) It allows multiple, independent processes to run in parallel, greatly increasing the processing capability of the cortex and 3) Its processes are not bound to specific locations, but migrate across cortical cells as a function of time, facilitating the maintenance of cortical cell calibration.

### Tuesday, June 18th, 2019

It is well-known that random linear features suffice to approximate any kernel, and hence Random Projection (RP) provides a simple and efficient way to implement the kernel trick. It results in sample complexity bounds that depend only on the separation between categories for supervised learning. But how does the brain learn? It has been hypothesized, since Hebb, that the basic unit of memory and computation in the brain is an _assembly_ , which we interpret as a sparse distribution over neurons that can shift gradually over time. Roughly speaking, there is one assembly per "memory", with a hierarchy of assemblies ("blue", "whale", "blue whale" and "mammal" are all assemblies). How are such assemblies created, associated and used for computation? RP, together with inhibition (only the top k coordinates survive and the rest are zeroed) and plasticity (synapses between neurons that fire within small intervals get stronger), leads to a plausible and effective explanation: A small number of repeated (recurrent) applications of the RP&C (random projection and cap) primitive leads to a stable assembly under a range of parameter settings. We explore this behavior and its consequences theoretically (and to a modest extent, in simulations).

Stochastic gradient descent (SGD) has been the core optimization method for deep neural networks, contributing to their resurgence. While some progress has been made, it remains unclear why SGD leads the learning dynamics in overparameterized networks to solutions that generalize well. Here we show that for overparameterized networks with a degenerate valley in their loss function, SGD on average decreases the trace of the Hessian. We also show that isotropic noise in the non-degenerate subspace of the Hessian de-creases its determinant. This opens the door to anew optimization approach that guides the model to solutions with better generalization. We test our results with experiments on toy models and deep neural networks.

How does the brain beget the mind? How do molecules, cells and synapses effect reasoning, intelligence, language, science? Despite dazzling progress in experimental neuroscience we do not seem to be making progress in the overarching question -- the gap is huge and a completely new approach seems to be required. As Richard Axel recently put it: "We don't have a logic for the transformation of neural activity into thought." What kind of formal system would qualify as this "logic"? I will sketch a possible answer. (Joint work with Santosh Vempala, Dan Mitropolsky, Mike Collins, and Larry Abbott.)

### Wednesday, June 19th, 2019

Information coding by precise timing of spikes can be faster and more energy-efficient than traditional rate coding. However, spike-timing codes are often brittle, which has limited their use in theoretical neuroscience and computing applications. We propose a novel type of attractor neural network in complex state space, and show how it can be leveraged to construct spiking neural networks with robust computational properties through a phase-to-timing mapping.

Building on Hebbian neural associative memories, like Hopfield networks, we propose threshold phasor associative memory (TPAM) networks. Complex phasor patterns whose components can assume continuous-valued phase angles and binary magnitudes can be stored and retrieved as stable fixed points in the network dynamics. TPAM achieves high memory capacity when storing sparse phasor patterns, and we derive the energy function that governs its fixed point attractor dynamics.

Further, we show how the complex algebraic computations in TPAM can be approximated by a biologically plausible network of integrate-and-fire neurons with synaptic delays and recurrently connected inhibitory interneurons. The fixed points of TPAM in the complex domain are commensurate with stable periodic states of precisely timed spiking activity that are robust to perturbation. The link established between rhythmic firing patterns and complex attractor dynamics has implications for the interpretation of spike patterns seen in neuroscience, and can serve as a framework for computation in emerging neuromorphic devices.

Much of perception and cognition requires solving inverse problems: Given the photoreceptor activations in the retina, what is the structure of the external environment? Given a stream of words, what is the thought or concept being conveyed? Many of these inverse problems can essentially be formulated as factorization problems - e.g., factorizing form vs. motion from time-varying images, or factorizing the meaning and part of speech from a word. In Kanerva's high-dimensional computing framework, these problems can be posed as the factorization of a high-dimensional vector into its constituent components. 'Resonating circuits' provide an efficient means of solving this problem by iteratively estimating the factors via a set of coupled Hopfield networks. Here I shall present set of rigorous evaluations of its performance in comparison to alternative schemes such as multiplicative weights, alternating least squares, or map-seeking circuits. Resonator circuits vastly outperform these alternative methods in terms of operational capacity (the size of the search space from which a correct factorization can be found with high probability). Interestingly, all of these methods work by searching in superposition - that is, the estimate at any given point in time is a superposition of the possible factorizations, which works due to the properties of high-dimensional vector spaces. With Spencer Kent and Paxon Frady.

The forces that govern how languages assign meanings to words have been debated for decades. Recently, it has been suggested that human semantic systems are adapted for efficient communication. However, a major question has been left largely unaddressed: how does pressure for efficiency relate to language evolution? Here, we address this open question by grounding the notion of efficiency in a general information-theoretic principle, the Information Bottleneck (IB) principle. Specifically, we argue that languages efficiently encode meanings into words by optimizing the IB tradeoff between the complexity and accuracy of the lexicon. In support of this hypothesis, we first show that color naming across languages is near-optimally efficient in the IB sense. Furthermore, this finding suggests (1) a theoretical explanation for why inconsistent naming and stochastic categories may be efficient; and (2) that languages may evolve under pressure for efficiency, through an annealing-like process that synthesizes continuous and discrete aspects of previous accounts of color category evolution. This process generates quantitative predictions for how color naming systems may change over time. These predictions are directly supported by an analysis of recent data documenting changes over time in the color naming system of a single language. Finally, we show that this information-theoretic approach generalizes to two qualitatively different semantic domains: names for household containers and animal taxonomies. Taken together, these results suggest that efficient coding - a general principle that also applies to lower-level neural representations - may explain to a large extent the structure and evolution of semantic representations across languages.

Abstract neurobiological network models use various learning rules with different pros and cons. Popular learning rules include Hebbian learning and gradient descent. However, Hebbian learning has problems with correlated input data and dos not profit from seeing training patterns several times. Gradient descent has the problem of vanishing gradient for partially flat activation functions, especially in online learning. We analyze here a variant, we refer to as Hebbian-Descent, that addresses these problems by dropping the derivative of the activation function and by centering, i.e. keeping the neural activities mean free, leading to an update rule that is provably convergent, does not suffer from the vanishing gradient problem, can deal with correlated data, profits form seeing patterns several times, and enables successful online learning when centering is used.

No abstract available.

I will describe our recent work on a deep energy model which brings together kernel density estimators and empirical Bayes least squares estimators. The energy model is the first of its kind in that the learning does not involve inference (negative samples), and the density estimation is formulated purely as an optimization problem, scalable to high dimensions and efficient with double backpropagation and SGD. An elegant physical picture emerges of an interacting system of high-dimensional spheres around each data point together with a globally-defined probability flow field. The picture is powerful and it leads to a novel sampling algorithm (walk-jump sampling), a new notion of associative memory (Robbins associative memory), and it is instrumental in designing experiments. I will finish the talk by showcasing the emergence of remarkably rich creative modes when the model is trained on MNIST. Reference: https://arxiv.org/abs/1903.02334