# From the Inside: Foundations of Machine Learning

Machine learning stands at an interesting crossroads. On the one hand, seemingly each day we witness the automation of another task by machine learning methods. On the other hand, these latest methods are often opaque black boxes (with poorly understood failure modes!). All this suggests the need for a deeper investigation into the Foundations of Machine Learning.

To see how things can get so fractured, let’s build from a simple example: fitting a line to a collection of points in the plane. In the case that no line exactly passes through the points, we must resort to some principle that prefers some approximately fitting lines over others. For this we can turn back the clock and seek the guidance of our friend Carl Friedrich Gauss: he lends us his method of least squares, which assumes that the deviation of the points from a best fitting line is, well, Gaussian. From this assumption we may efficiently compute a now appropriately defined best-fitting line.

That was easy! But let’s see how complexities can accumulate.

• Sometimes we are not just fitting a given collection of points, but have the power to request more points in some region of the space. As a less abstract example, we can instruct a self-driving car to visit certain regions to understand specific new terrains. This is the general task of Interactive Learning: the machine interacts with us or simply its environment while it learns.
• Sometimes we aren't just fitting lines, but instead fitting much more complicated shapes. One way to accomplish this is to first transform our input points; a line in the transformed space now corresponds to something more complicated in the original space. For instance, to reason about English sentences, we may first choose to embed them in some vector space. This leads to the general task of Representation Learning: we have many choices of how to represent the original object in a new space, which renders different statistical questions easy or hard.
• Lastly, we've omitted basic computational questions in the preceding topics, for instance their running time, memory usage, etc. In machine learning, computational questions have some new twists; for instance, we may be able to prove that we can predict some phenomenon accurately after seeing a certain number of examples; however, we can only come up with exponential time algorithms. Said another way, we have an information theoretic understanding of the problem, but lack a satisfying algorithmic handle. This and related questions form the general topic of Computational Challenges in Machine Learning.

The Spring 2017 Simons Institute program on Foundations of Machine Learning was centered around the above three themes. To kick it off, we had a boot camp featuring not just standard obstacle courses like convex and submodular optimization, but also spectral methods, deep reinforcement learning, and generalizations of de Finetti's exchangeability, to name a few.

As above, the first of the themed workshops was on Interactive Learning. There are many forms of interactive learning; some have a machine and human interacting as student and teacher (respectively, or at least we hope for the immediate future), whereas some have machines collecting information on their own, as in the self-driving car mentioned earlier. The workshop had presentations on not only classical topics like active learning of supervised learning tasks (where the machine asks a human for labels of certain informative points), but also active learning of unsupervised tasks (e.g., the machine asks for help when clustering data), theoretical guarantees for reinforcement learning (e.g., to train our self-driving car) with various types of information, and many other recent topics.

The second workshop, on Representation Learning, made the interesting and valuable choice of hosting many empirical, non-theoretical presentations, indeed on topics currently resisting theoretical analysis. Many of these talks featured neural networks, both in more familiar settings like reinforcement learning (e.g., our friendly robotic car), and in less familiar settings like the modeling of human eye behavior. There were also neural network talks urging caution, showing simple cases where either classical methods perform better, or neural networks simply fail completely. The workshop hosted a pair of engaging panels, which are sadly not online, but rest assured: in response to worries of machines taking over, we were urged “not yet” and “we’ll be okay”. Let us hope we make it to our own Reunion Workshop, where we can ask the machines to allow us to revisit these topics again.

The final workshop covered Computational Challenges. Earlier it was pointed out that the statistical regime brings new challenges; one particularly cruel one, tactfully omitted above, is that many standard machine learning problems are NP-hard, even up to approximation and in seemingly innocuous settings. The workshop discussed not only workhorse methods such as gradient descent (which still only admits analyses in a fraction of the cases where it is used), but also methods at the core of the other workshops, for instance "bandit methods," which are a distillation of our friendly robot reinforcement learning problem above, and the painfully fundamental question of how one should even attempt to write down the goal of hierarchical clustering.

In the time between the various workshops, we have been making new friends and working together tirelessly on new problems. We look forward to cracking them, and urge you to follow our Reunion Workshop next year.

,