Tuesday, December 8th, 2015

Research Vignette: Mathematical Software Obfuscation

By Amit Sahai

From revolutionaries and spies to ordinary citizens living under repressive regimes, for centuries people have had the need to hide operational secrets. By an operational secret, we mean a secret that must be kept safe, where the secret is also used to perform ordinary tasks. For example, someone may want to keep the amount of cash she has on-hand secret, but she also needs to refer to this information when making everyday purchases. Even secrets like the location of hidden refugees under one’s protection could be an operational secret, since the holder of the secret would want to plan her ordinary movements to avoid the secret location that she is protecting. Through many clever means, people throughout history managed to protect such secrets. Essential to such protection was the refuge of the human mind, the ultimate sanctuary where secrets can be pondered and processed without fear of loss.

But what if this most basic assumption proves false? What if an adversary can read someone’s mind while she is thinking about the secret she needs to protect? Indeed, it is hard to imagine how someone can protect secrets when the inner dialogue of her mind can betray her. Fortunately, this scenario remains science fiction when applied to humanity. However, if we replace humans by computer programs, this situation is all too common. Computer programs, with inbuilt databases of sensitive information, are routinely captured by adversarial entities. These adversaries can reverse-engineer these programs and see every detail of how the programs “think” about their secrets as they perform their calculations. Furthermore, adversaries can even modify the programs to alter their behavior, in order to coax secrets out of the original computer code. Because computer programs with inbuilt operational secrets are so useful, cryptographic researchers have pondered this challenge as far back as the classic work of Diffie and Hellman in 1976. For decades, however, our ability to suitably “obfuscate” programs in order to defend against these kinds of attacks was based on unsatisfying approaches that failed to provide security against even moderately skilled adversaries.

This changed in 2013, due to the work of Garg, Gentry, Halevi, Raykova, Sahai and Waters, which gave the first sound mathematical approach to this problem. This work enabling a mathematical approach to securing software has been hailed as, “a watershed moment for cryptography,” by the Simons Foundation’s Quanta magazine, and its ramifications were a major theme of the Simons Institute program on Cryptography in Summer 2015. To illustrate why this recent advance in obfuscation has caused such a stir in the cryptographic community, let us consider an analogy with the ancient problem of sending encrypted messages.

The importance of mathematics: An analogy to message encryption
For centuries, the protection of secret messages was an ad-hoc process. For example, messages were translated into an unfamiliar or artificial language, or hidden in a messenger’s scalp underneath her hair. These techniques, while clever, were easily broken and often relied on security by obscurity. This changed with the introduction of sophisticated mathematical techniques for encryption, such as RSA. Now, breaking such encryption schemes is known to require solving extremely difficult mathematical problems. The use of encryption based on hard math problems has revolutionized data security, and enabled countless applications from secure e-commerce to secure e-mail.

Unfortunately, over the last several decades, the problem of protecting operational secrets within software remained closer in spirit to encryption of centuries past. Methods for protecting software were clever, but ad-hoc and ultimately broken. Ideas like the software equivalent of thinking in an unfamiliar language, or adding unnecessary steps and complication, have been mainstays of the field. Now, we finally have a mathematical approach to general-purpose software obfuscation, ripe for analytical study.

The exploration of mathematical techniques for software obfuscation remains in its infancy. This exploration of mathematical obfuscation was a major research theme during the Simons Institute Cryptography program. Several results in this area were presented during the program, and several new lines of investigation were initiated. At a high level, two broad questions drive the exploration of mathematical obfuscation:

  1. What kinds of operational secrets can be protected? What security notions for software obfuscation are achievable?
  2. What mathematical structures can be used to build secure obfuscation schemes? What arguments can we develop to support the hardness underlying these mathematical structures that is necessary for security of obfuscation schemes?

Operational secrets
The central challenge in protecting operational secrets within software is that the software must remain functional and useful even after such protection is enabled. This necessarily means that many operational secrets inherently cannot be protected within software. For example, suppose our software contains a secret number y, and when a user provides another number x as input to our software, the software tells the user if x < y. Then it is easy to see that by repeated use of this software, a user can recover the secret y completely by performing a simple binary search.

Thus, at a minimum, we need operational secrets to be unlearnable with input/output queries to the function computed by our software. Unfortunately, however, this is not enough: In 2001, the work of Barak, Goldreich, Impagliazzo, Rudich, Sahai, Vadhan and Yang showed that there exist families of functions {fy} such that: (1) on the one hand, no polynomial-time computation can learn y by just making input/output queries to a randomly chosen fy from our family; but (2) on the other hand, any efficient program P that faithfully implements fy allows an efficient adversary to learn the secret y completely. Indeed, this negative result effectively brought obfuscation research to a near standstill for over a decade.

However, in the same 2001 paper, we put forward an alternative, seemingly much weaker, notion of obfuscation called indistinguishability obfuscation (iO): Informally, iO requires that given two equal-size equivalent programs P0 and P1 that compute some function f, no efficient adversary can distinguish the obfuscation iO(P0) of the first program from the obfuscation iO(P1) of the second. In other words, iO is a pseudo-canonicalizer, outputting a randomized representation of a program that “acts” as a canonical program computing the function f. In 2007, Goldwasser and Rothblum showed that iO achieves a kind of “best-possible” obfuscation; however, it was still not clear how iO could be used to hide any particular secret.

This changed in 2013, when important new methods for leveraging iO to hide operational secrets were developed with wide-reaching applications (Sahai and Waters). During the program, several recent works were presented that showed how to leverage iO to achieve various applications, including secure cloud computing (Boneh, Gupta, Mironov and Sahai) and the hardness of finding a Nash Equilibrium (Bitansky, Paneth and Rosen). Both these applications work by showing that iO can be leveraged to hide operational secrets in the form of cryptographic keys embedded within software programs.

Another important line of work asks whether iO for weak classes of programs can be leveraged to obtain iO for larger classes of programs. This question of “bootstrapping” iO was first considered in the 2013 work of Garg et al., which showed that iO for NC1 circuits can be bootstrapped to obtain iO for general circuits. Bootstrapping was also the subject of several recent works presented during the program, including bootstrapping iO for circuits to iO for Turing Machines and RAM programs with bounded input (Bishop, Koppula and Waters; Canetti and Holmgren), and to iO for Turing Machines with bounded input where the obfuscated program is asymptotically only twice the size of the original program (Ananth, Jain and Sahai).

Beyond iO, however, there may be other notions of security of obfuscation that do not suffer from impossibility results. During the program, two such notions were presented (Ishai, Pandey and Sahai; Lin, Pass, Seth and Telang), both of which enable iO for Turing Machines with unbounded input size. Obtaining such a construction assuming only iO for circuits remains a major open question.

Mathematical structures enabling obfuscation
At present, there are only a few known mathematical approaches to general-purpose obfuscation. As such, two major avenues for research include (1) new “recipes” for building obfuscation, and (2) improving our understanding of the security properties of existing approaches. Both these directions were the subject of major ongoing discussions during the Simons program.

Current proposed constructions for general-purpose obfuscation build upon algebraic underpinnings introduced by Garg, Gentry and Halevi (GGH) in 2012. During the program, new cryptanalysis of the GGH algebraic setting was presented (Coron, Gentry, Halevi, Lepoint, Maji, Miles, Raykova, Sahai and Tibouchi), along with evidence that this new cryptanalysis should not affect current obfuscation constructions when used to obfuscate evasive functions (Badrinarayanan, Miles, Sahai and Zhandry). Extending this evidence beyond obfuscating evasive functions remains an important research direction.

In research conducted during the program, a new pathway to constructing iO from functional encryption for relatively simple functions was announced (Ananth and Jain; Bitansky and Vaikuntanathan; Ananth, Jain and Sahai). Unfortunately, at present no constructions of functional encryption exist even for the relatively simple functions that are needed, except for constructions that themselves use iO.

A major open problem in this area, for which an informal $100 prize was announced by the author during the program, is to find a way to base iO on the Learning With Errors (LWE) problem. However, although we hope that this is not the case, it may turn out that iO requires new mathematical structures not offered by LWE.

Protecting operational secrets within software is a powerful new capability that has the potential to transform software security. While the developments outlined above have greatly increased our understanding of mathematical software obfuscation, the most exciting work in this area still remains to be done.

Related Articles:

Letter from the Director, Fall 2015
Institute launches Family-Friendly Stipend Program
From the Inside, Fall 2015
Looking Ahead, Spring 2016: Genomics and Counting