Justin Thaler

Assistant Professor
Georgetown University
Department of Computer Science

St. Mary's Hall
Washington, DC 20057

e-mail: justin [dot] thaler [at] georgetown [dot] edu

I am broadly interested in algorithms and computational complexity, primarily focusing on the following three research goals.
• Understanding the power of low-degree polynomials to approximate Boolean functions. Answering these questions has a variety of applications in complexity theory, quantum algorithms, private data analysis, and learning theory.
• Designing protocols for proving the correctness of computations (possibly in zero-knowlege), in which the prover and verifier are highly efficient both in theory and in practice.
• Developing efficient streaming and sketching algorithms for basic tasks that are often performed on large data sets.

Since Fall 2016, I have been an Assistant Professor in the Department of Computer Science at Georgetown University. Prior to joining Georgetown, I spent two years as a Research Scientist at Yahoo Labs in New York. I was also a Research Fellow at the Simons Institute for the Theory of Computing at UC Berkeley. I received my Ph.D. from the Theory of Computation Group at Harvard University, where I was fortunate to be advised by Michael Mitzenmacher, and I graduated from Yale University in 2009 with a B.S. in Computer Science and a second major in Mathematics.

Research Papers

 2019 Quantum Algorithms and Approximating Polynomials for Composed Functions With Shared Inputs Mark Bun, Robin Kothari, Justin Thaler SODA 2019 Links: [ECCC] [Slides] [Talk Video (30 minutes)] Abstract: We give new quantum algorithms for evaluating composed functions whose inputs may be shared between bottom-level gates. Let $f$ be a Boolean function and consider a function $F$ obtained by applying $f$ to conjunctions of possibly overlapping subsets of $n$ variables. If $f$ has quantum query complexity $Q(f)$, we give an algorithm for evaluating $F$ using $\tilde{O}(\sqrt{Q(f) \cdot n})$ quantum queries. This improves on the bound of $O(Q(f) \cdot \sqrt{n})$ that follows by treating each conjunction independently, and is tight for worst-case choices of $f$. Using completely different techniques, we prove a similar tight composition theorem for the approximate degree of $f$. By recursively applying our composition theorems, we obtain a nearly optimal $\tilde{O}(n^{1-2^{-d}})$ upper bound on the quantum query complexity and approximate degree of linear-size depth-$d$ AC$^0$ circuits. As a consequence, such circuits can be PAC learned in subexponential time, even in the challenging agnostic setting. Prior to our work, a subexponential-time algorithm was not known even for linear-size depth-3 AC$^0$ circuits. We also show that any substantially faster learning algorithm will require fundamentally new techniques. 2018 The Large-Error Approximate Degree of AC^0 Mark Bun and Justin Thaler Manuscript Links: [ECCC] Abstract: We prove two new results about the inability of low-degree polynomials to uniformly approximate constant-depth circuits, even to slightly-better-than-trivial error. First, we prove a tight $\tilde{\Omega}(n^{1/2})$ lower bound on the threshold degree of the $\mathsf{SURJECTIVITY}$ function on $n$ variables. This matches the best known threshold degree bound for any AC$^0$ function, previously exhibited by a much more complicated circuit of larger depth (Sherstov, FOCS 2015). Our result also extends to a $2^{\tilde{\Omega}(n^{1/2})}$ lower bound on the sign-rank of an AC$^0$ function, improving on the previous best bound of $2^{\Omega(n^{2/5})}$ (Bun and Thaler, ICALP 2016). Second, for any $\delta>0$, we exhibit a function $f \colon \{-1, 1\}^n \to \{-1, 1\}$ that is computed by a circuit of depth $O(1/\delta)$ and is hard to approximate by polynomials in the following sense: $f$ cannot be uniformly approximated to error $\epsilon=1-2^{-\Omega(n^{1-\delta})}$, even by polynomials of degree $n^{1-\delta}$. Our recent prior work (Bun and Thaler, FOCS 2017) proved a similar lower bound, but which held only for error $\epsilon=1/3$. Our result implies $2^{\Omega(n^{1-\delta})}$ lower bounds on the complexity of AC$^0$ under a variety of basic measures such as discrepancy, margin complexity, and threshold weight. This nearly matches the trivial upper bound of $2^{O(n)}$ that holds for every function. The previous best lower bound on AC$^0$ for these measures was $2^{\Omega(n^{1/2})}$ (Sherstov, FOCS 2015). Additional applications in learning theory, communication complexity, and cryptography are described. Doubly-efficient zkSNARKs without trusted setup Riad S. Wahby, Ioanna Tzialla, abhi shelat, Justin Thaler, and Michael Walfish IEEE Symposium on Security and Privacy (S&P) 2018 Links: [e-print] Abstract: We present a zero-knowledge argument for NP with low communication complexity, low concrete cost for both the prover and the verifier, and no trusted setup, based on standard cryptographic assumptions (DDH). Specifically, communication is proportional to at most the square root of the witness size, plus $d \cdot log(G)$, for $d$ the depth and $G$ the width of the verifying circuit. Moreover, witness-related communication can be reduced below square root, at the cost of increased verifier runtime. When applied to batched or data-parallel statements, the prover's runtime is linear and the verifier's is sub-linear in the verifying circuit size, both with good constants. Together, these properties represent a new point in the tradeoffs among setup, complexity assumptions, proof size, and computational cost. Our argument is public coin, so we apply the Fiat-Shamir heuristic to produce a zero-knowledge succinct non-interactive argument of knowledge (zkSNARK), which we call Hyrax. We evaluate Hyrax on three benchmarks: SHA-256 Merkle trees, image transformation, and matrix multiplication. We find that Hyrax's proofs are 2−10x smaller than prior work with similar properties, and that Hyrax scales to 6−27x larger circuits than a highly-optimized prior system that requires trusted setup. The Polynomial Method Strikes Back: Tight Quantum Query Bounds via Dual Polynomials Mark Bun, Robin Kothari, Justin Thaler STOC 2018; Preliminary version in QIP 2018 (plenary talk). Invited to Theory of Computing. Manuscript link: [arXiv] Abstract: The approximate degree of a Boolean function f is the least degree of a real polynomial that approximates f pointwise to error at most 1/3. The approximate degree of f is known to be a lower bound on the quantum query complexity of f (Beals et al., FOCS 1998 and J. ACM 2001). We resolve or nearly resolve the approximate degree and quantum query complexities of several basic functions. Specifically, we show the following: k-distinctness: For any constant k, the approximate degree and quantum query complexity of the k-distinctness function is $\Omega(n^{3/4-1/(2k)})$. This is nearly tight for large $k$, as Belovs (FOCS 2012) has shown that for any constant k, the approximate degree and quantum query complexity of k-distinctness is $O(n^{3/4-1/(2^{k+2}-4)})$. Image Size Testing: The approximate degree and quantum query complexity of testing the size of the image of a function $[n] \rightarrow [n]$ is $\tilde{\Omega}(n^{1/2})$. This proves a conjecture of Ambainis et al. (SODA 2016), and it implies tight lower bounds on the approximate degree and quantum query complexity of the following natural problems. k-junta testing: A tight $\tilde{\Omega}(k^{1/2})$ lower bound for k-junta testing, answering the main open question of Ambainis et al. (SODA 2016). Statistical Distance from Uniform: A tight $\tilde{\Omega}(n^{1/2})$ lower bound for approximating the statistical distance from uniform of a distribution, answering the main question left open by Bravyi et al. (STACS 2010 and IEEE Trans. Inf. Theory 2011). Shannon entropy: A tight $\tilde{\Omega}(n^{1/2})$ lower bound for approximating Shannon entropy up to a certain additive constant, answering a question of Li and Wu (2017). Surjectivity: The approximate degree of the Surjectivity function is $\tilde{\Omega}(n^{3/4})$. The best prior lower bound was $\Omega(n^{2/3})$. Our result matches an upper bound of $\tilde{O}(n^{3/4})$ due to Sherstov, which we reprove using different techniques. The quantum query complexity of this function is known to be $\Theta(n)$ (Beame and Machmouchi, Quantum Inf. Comput. 2012 and Sherstov, FOCS 2015). Our upper bound for Surjectivity introduces new techniques for approximating Boolean functions by low-degree polynomials. Our lower bounds are proved by significantly refining techniques recently introduced by Bun and Thaler (FOCS 2017). Slides: [slides giving unified overview of BT17 and BKT18 (size: 3MB)] [slides more focused on BKT18 (size: 11MB)] Lecture notes: These lecture notes elucidate various aspects of the proof. Whereas the proof in the paper takes place entirely in the "dual" world, the notes explain how to replace a key step of the proof with a "primal" argument that many will find more intuitive. Approximate Degree and the Complexity of Depth Three Circuits Mark Bun and Justin Thaler RANDOM 2018 Links: [ECCC] [slides] Abstract: Threshold weight, margin complexity, and Majority-of-Threshold circuit size are basic complexity measures of Boolean functions that arise in learning theory, communication complexity, and circuit complexity. Each of these measures might exhibit a \emph{chasm} at depth three: namely, all polynomial size Boolean circuits of depth two have polynomial complexity under the measure, but there may exist Boolean circuits of depth three that have essentially maximal complexity $\exp(\Omega(n))$. However, existing techniques are far from showing this: for all three measures, the best lower bound for depth three circuits is $\exp(\Omega(n^{2/5}))$ . Moreover, current methods exclusively study block-composed functions. Such methods appear intrinsically unable to prove lower bounds better than $\exp(\Omega(n^{1/2}))$ even for depth four circuits, and have yet to prove lower bounds better than $\exp(\Omega(n^{1/2}))$ for circuits of any constant depth. We take a step toward showing that all of these complexity measures indeed exhibit a chasm at depth three. Specifically, for any arbitrarily small constant $\delta > 0$, we exhibit a depth three circuit of polynomial size (in fact, an $O(\log n)$-decision list) of complexity $\exp(\Omega(n^{1/2-\delta}))$ under each of these measures. Our methods go beyond the block-composed functions studied in prior work, and hence may not be subject to the same barriers. In particular, we suggest natural candidate functions that may exhibit stronger bounds, of the form $\exp(\tilde{\Omega}(n))$, where the notation hides factors polylogarithmic in $n$. 2017 Full Accounting for Verifiable Outsourcing Riad S. Wahby, Ye Ji, Andrew J. Blumberg, abhi shelat, Justin Thaler, Michael Walfish, and Thomas Wies CCS 2017 Links: [e-print] Abstract: Systems for verifiable outsourcing incur costs for a prover, a verifier, and precomputation; outsourcing makes sense when these costs are cheaper than not outsourcing. Yet, prover costs are generally ignored. The only exception is Verifiable ASICs (VA), wherein the prover is a custom chip; however, the only prior VA system ignores the cost of precomputation. This paper describes a new VA system, called Giraffe; charges Giraffe for all three costs; and identifies regimes where outsourcing is worthwhile. Giraffe’s base is an interactive proof geared to data parallel computation. Giraffe makes this protocol asymptotically optimal for the prover, which is of independent interest. Giraffe also develops a design template that produces hardware designs automatically for a wide range of parameters, introduces hardware primitives molded to the protocol’s data flows, and incorporates program analyses that expand applicability. Giraffe wins even when outsourcing several tens of sub-computations, scales to 500x larger computations than prior work, and can profitably outsource parts of programs that are not worthwhile to outsource in full. A High-Performance Algorithm for Identifying Frequent Items in Data Streams Daniel Anderson, Pryce Bevan, Kevin Lang, Edo Liberty, Lee Rhodes, and Justin Thaler IMC 2017 Links: [arxiv] Abstract: Estimating frequencies of items over data streams is a common building block in streaming data measurement and analysis. Misra and Gries introduced their seminal algorithm for the problem in 1982, and the problem has since been revisited many times due its practicality and applicability. We describe a highly optimized version of Misra and Gries’ algorithm that is suitable for deployment in industrial settings. Our code is made public via an open source library called DataSketches that is already used by several companies and production systems. Our algorithm improves on two theoretical and practical aspects of prior work. First, it handles weighted updates in amortized constant time, a common requirement in practice. Second, it uses a simple and fast method for merging summaries that asymptotically improves on prior work even for unweighted streams. We describe experiments confirming that our algorithms are more efficient than prior proposals. A Nearly Optimal Lower Bound on the Approximate Degree of AC$^0$ Mark Bun and Justin Thaler FOCS 2017; Invited to the special issue of SICOMP for FOCS '17 Manuscript link: [ECCC] Abstract: The approximate degree of a Boolean function $f \colon \{-1, 1\}^n \rightarrow \{-1, 1\}$ is the least degree of a real polynomial that approximates $f$ pointwise to error at most $1/3$. We introduce a generic method for increasing the approximate degree of a given function, while preserving its computability by constant-depth circuits. Specifically, we show how to transform any Boolean function $f$ with approximate degree $d$ into a function $F$ on $O(n \cdot \text{polylog}(n))$ variables with approximate degree at least $D = \Omega(n^{1/3} \cdot d^{2/3})$. In particular, if $d= n^{1-\Omega(1)}$, then $D$ is polynomially larger than $d$. Moreover, if $f$ is computed by a polynomial-size Boolean circuit of constant depth, then so is $F$. By recursively applying our transformation, for any constant $\delta > 0$ we exhibit an AC$^0$ function of approximate degree $\Omega(n^{1-\delta})$. This improves over the best previous lower bound of $\Omega(n^{2/3})$ due to Aaronson and Shi (J. ACM 2004), and nearly matches the trivial upper bound of $n$ that holds for any function. Our lower bounds also apply to (quasipolynomial-size) DNFs of polylogarithmic width. We describe several applications of these results. We give: For any constant $\delta > 0$, an $\Omega(n^{1-\delta})$ lower bound on the quantum communication complexity of a function in AC$^0$. A Boolean function $f$ with approximate degree at least $C(f)^{2-o(1)}$, where $C(f)$ is the certificate complexity of $f$. This separation is optimal up to the $o(1)$ term in the exponent. Improved secret sharing schemes with reconstruction procedures in AC$^0$. Slides and video: [slides] [BIRS Talk Video (45 minutes)] Lecture notes: These lecture notes elucidate various aspects of the proof. Whereas the proof in the paper takes place entirely in the "dual" world, the notes explain how to replace a key step of the proof with a "primal" argument that many will find more intuitive. On the Power of Statistical Zero Knowledge Adam Bouland, Lijie Chen, Dhiraj Holden, Justin Thaler, and Prashant Nalini Vasudevan FOCS 2017; Invited to the special issue of SICOMP for FOCS '17 Links: [ECCC] Abstract: We examine the power of statistical zero knowledge proofs (captured by the complexity class SZK) and their variants. First, we give the strongest known relativized evidence that SZK contains hard problems, by exhibiting an oracle relative to which SZK (indeed, even NISZK) is not contained in the class UPP, containing those problems solvable by randomized algorithms with unbounded error. This answers an open question of Watrous from 2002. Second, we "lift" this oracle separation to the setting of communication complexity, thereby answering a question of Goos et al. (ICALP 2016). Third, we give relativized evidence that perfect zero knowledge proofs (captured by the class PZK) are weaker than general zero knowledge proofs. Specifically, we exhibit oracles relative to which SZK is not contained in PZK, NISZK is not contained in NIPZK, and PZK is not equal to coPZK. The first of these results answers a question raised in 1991 by Aiello and Hastad (Information and Computation), and the second answers a question of Lovett and Zhang (2016). We also describe additional applications of these results outside of structural complexity. The technical core of our results is a stronger hardness amplification theorem for approximate degree, which roughly says that composing the gapped-majority function with any function of high approximate degree yields a function with high threshold degree. Reliably Learning the ReLU in Polynomial Time Surbhi Goel, Varun Kanade, Adam Klivans and Justin Thaler COLT 2017 Links: [arXiv] Abstract: We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form $x \mapsto max(0,w \cdot x)$ with $w \in \mathbb{S}^{n-1}$. Our algorithm works in the challenging Reliable Agnostic learning model of Kalai, Kanade, and Mansour (2009) where the learner is given access to a distribution $D$ on labeled examples but the labeling may be arbitrary. We construct a hypothesis that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by $D$, for any convex, bounded, and Lipschitz loss function. The algorithm runs in polynomial-time (in $n$) with respect to any distribution on $\mathbb{S}^{n-1}$ (the unit sphere in $n$ dimensions) and for any error parameter $\epsilon=\Omega(1/\log n)$ (this yields a PTAS for a question raised by F. Bach on the complexity of maximizing ReLUs). These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where $\epsilon$ must be $\Omega(1)$ and strong assumptions are required on the marginal distribution. We can compose our results to obtain the first set of efficient algorithms for learning constant-depth networks of ReLUs. Our techniques combine kernel methods and polynomial approximations with a "dual-loss" approach to convex programming. As a byproduct we obtain a number of applications including the first set of efficient algorithms for "convex piecewise-linear fitting" and the first efficient algorithms for noisy polynomial reconstruction of low-weight polynomials on the unit sphere. Determining Tournament Payout Structures for Daily Fantasy Sports Christopher Musco, Maxim Sviridenko, and Justin Thaler ALENEX 2017; Invited to special issue of ACM Journal of Experimental Algorithmics for ALENEX 2017 Links: [arXiv] Abstract: With an exploding global market and the recent introduction of online cash prize tournaments, fantasy sports contests are quickly becoming a central part of the social gaming and sports industries. For sports fans and online media companies, fantasy sports contests are an opportunity for large financial gains. However, they present a host of technical challenges that arise from the complexities involved in running a web-scale, prize driven fantasy sports platform. We initiate the study of these challenges by examining one concrete problem in particular: how to algorithmically generate contest payout structures that are 1) economically motivating and appealing to contestants and 2) reasonably structured and succinctly representable. We formalize this problem and present a general two-staged approach for producing satisfying payout structures given constraints on contest size, entry fee, prize bucketing, etc. We then propose and evaluate several potential algorithms for solving the payout problem efficiently, including methods based on dynamic programming, integer programming, and heuristic techniques. Experimental results show that a carefully designed heuristic scales very well, even to contests with over 100,000 prize winners. Our approach extends beyond fantasy sports -- it is suitable for generating engaging payout structures for any contest with a large number of entrants and a large number of prize winners, including other massive online games, poker tournaments, and real-life sports tournaments. 2016 Dual Polynomials for Collision and Element Distinctness Mark Bun and Justin Thaler Theory of Computing Links: [ECCC] Abstract: The approximate degree of a Boolean function $f \colon \{-1, 1\}^n \rightarrow \{-1, 1\}$ is the minimum degree of a real polynomial that approximates $f$ to within error $1/3$ in the $L_{\infty}$ norm. In an influential result, Aaronson and Shi (J. ACM 2004) proved tight $\tilde{\Omega}(n^{1/3})$ and $\tilde{\Omega}(n^{2/3})$ lower bounds on the approximate degree of the Collision and Element Distinctness functions, respectively. Their proof was non-constructive, using a sophisticated symmetrization argument and tools from approximation theory. More recently, several open problems in the study of approximate degree have been resolved via the construction of dual polynomials. These are explicit dual solutions to an appropriate linear program that captures the approximate degree of any function. We reprove Aaronson and Shi's results by constructing explicit dual polynomials for the Collision and Element Distinctness functions. Improved Bounds on the Sign-Rank of AC$^0$ Mark Bun and Justin Thaler ICALP 2016 Links: [ECCC] [slides] Abstract: The sign-rank of a matrix $A$ with entries in $\{-1, +1\}$ is the least rank of a real matrix $B$ with $A_{ij}\cdot B_{ij} > 0$ for all $i$, $j$. Razborov and Sherstov (2008) gave the first exponential lower bounds on the sign-rank of a function in AC$^0$, answering an old question of Babai, Frankl, and Simon (1986). Specifically, they exhibited a matrix $A=[F(x, y)]_{x, y}$ for a specific function $F: \{-1, 1\}^n \times \{-1, 1\}^n \rightarrow \{-1, 1\}$ in AC$^0$, such that $A$ has sign-rank $\exp(\Omega(n^{1/3}))$. We prove a generalization of Razborov and Sherstov's result, yielding exponential sign-rank lower bounds for a non-trivial class of functions (that includes the function used by Razborov and Sherstov). As a corollary of our general result, we improve Razborov and Sherstov's lower bound on the sign-rank of AC$^0$ from $\exp(\Omega(n^{1/3}))$ to $\exp(\tilde{\Omega}(n^{2/5}))$. We also describe several applications to communication complexity, learning theory, and circuit complexity. Lower Bounds for the Approximate Degree of Block-Composed Functions Justin Thaler ICALP 2016 Links: [ECCC] [slides] Abstract: We describe a new hardness amplification result for point-wise approximation of Boolean functions by low-degree polynomials. Specifically, for any function $f$ on $N$ bits, define $F(x_1, \dots, x_M)=\textsf{OMB}(f(x_1), ..., f(x_M))$ to be the function on $M\cdot N$bits obtained by block-composing $f$ with a specific DNF known as ODD-MAX-BIT. We show that, if $f$ requires large degree to approximate to error $2/3$ in a certain one-sided sense (captured by a complexity measure known as positive one-sided approximate degree), then $F$ requires large degree to approximate even to error $1-2^{-M}$. This generalizes a result of Beigel, who proved an identical result for the special case $f=\textsf{OR}$. Unlike related prior work, our result implies strong approximate degree lower bounds even for many functions $F$ that have low threshold degree. Our proof is constructive: we exhibit a solution to the dual of an appropriate linear program capturing the approximate degree of any function. Semi-Streaming Algorithms for Annotated Graph Streams Justin Thaler ICALP 2016 Links: [arXiv] [slides] Abstract: Considerable effort has been devoted to the development of streaming algorithms for analyzing massive graphs. Unfortunately, many results have been negative, establishing that a wide variety of problems require $\Omega(n^2)$ space to solve. One of the few bright spots has been the development of semi-streaming algorithms for a handful of graph problems -- these algorithms use space $O(n\cdot\text{polylog}(n))$. In the annotated data streaming model of Chakrabarti et al., a computationally limited client wants to compute some property of a massive input, but lacks the resources to store even a small fraction of the input, and hence cannot perform the desired computation locally. The client therefore accesses a powerful but untrusted service provider, who not only performs the requested computation, but also proves that the answer is correct. We put forth the notion of semi-streaming algorithms for annotated graph streams (semi-streaming annotation schemes for short). These are protocols in which both the client's space usage and the length of the proof are $O(n \cdot \text{polylog}(n))$. We give evidence that semi-streaming annotation schemes represent a substantially more robust solution concept than does the standard semi-streaming model. On the positive side, we give semi-streaming annotation schemes for two dynamic graph problems that are intractable in the standard model: (exactly) counting triangles, and (exactly) computing maximum matchings. The former scheme answers a question of Cormode. On the negative side, we identify for the first time two natural graph problems (connectivity and bipartiteness in a certain edge update model) that can be solved in the standard semi-streaming model, but cannot be solved by annotation schemes of "sub-semi-streaming" cost. That is, these problems are just as hard in the annotations model as they are in the standard model. Space Lower Bounds for Itemset Frequency Sketches Edo Liberty, Michael Mitzenmacher, Justin Thaler, and Jonathan Ullman, PODS 2016 Links: [arXiv] Abstract: Given a database, computing the fraction of rows that contain a query itemset or determining whether this fraction is above some threshold are fundamental operations in data mining. A uniform sample of rows is a good sketch of the database in the sense that all sufficiently frequent itemsets and their approximate frequencies are recoverable from the sample, and the sketch size is independent of the number of rows in the original database. For many seemingly similar problems there are better sketching algorithms than uniform sampling. In this paper we show that for itemset frequency sketching this is not the case. That is, we prove that there exist classes of databases for which uniform sampling is a space optimal sketch for approximate itemset frequency analysis, up to constant or iterated-logarithmic factors. A Framework for Estimating Stream Expression Cardinalities Anirban Dasgupta, Kevin Lang, Lee Rhodes, and Justin Thaler ICDT 2016; Best Newcomer Paper Award; Invited to special issue of TODS for ICDT 2016 Links: [arXiv] Abstract: Given $m$ distributed data streams $A_1,\dots,A_m$, we consider the problem of estimating the number of unique identifiers in streams defined by set expressions over $A_1,\dots,A_m$. We identify a broad class of algorithms for solving this problem, and show that the estimators output by any algorithm in this class are perfectly unbiased and satisfy strong variance bounds. Our analysis unifies and generalizes a variety of earlier results in the literature. To demonstrate its generality, we describe several novel sampling algorithms in our class, and show that they achieve a novel tradeoff between accuracy, space usage, update speed, and applicability. 2015 Streaming Verification in Data Analysis Samira Daruki, Justin Thaler, and Suresh Venkatasubramanian ISAAC 2015 Links: [arXiv] Abstract: Streaming interactive proofs (SIPs) are a framework to reason about outsourced computation, where a data owner (the verifier) outsources a computation to the cloud (the prover), but wishes to verify the correctness of the solution provided by the cloud service. In this paper we present streaming interactive proofs for problems in data analysis. We present protocols for clustering and shape fitting problems, as well as an improved protocol for rectangular matrix multiplication. The latter can in turn be used to verify $k$ eigenvectors of a (streamed) $n \times n$ matrix. In general our solutions use polylogarithmic rounds of communication and polylogarithmic total communication and verifier space. For special cases (when optimality certificates can be verified easily), we present constant round protocols with similar costs. For rectangular matrix multiplication and eigenvector verification, our protocols work in the more restricted annotated data streaming model, and use sublinear (but not polylogarithmic) communication. Variable Selection is Hard Dean Foster, Howard Karloff, and Justin Thaler COLT 2015 Links: [arXiv] [Slides (Short Talk)] Abstract: Variable selection for sparse linear regression is the problem of finding, given an $m \times p$ matrix $B$ and a target vector $y$, a sparse vector $x$ such that $B \cdot x$ approximately equals $y$. Assuming a standard complexity hypothesis, we show that no polynomial-time algorithm can find a $k'$-sparse $x$ with $\|Bx-y\|^2 \leq h(m,p)$, where $k'=k \cdot 2^{\log^{1-\delta} p}$ and $h(m,p) \leq p^{C_1} \cdot m^{1-C_2}$, where $\delta>0, C_1>0,C_2>0$ are arbitrary. This is true even under the promise that there is an unknown $k$-sparse vector $x^*$ satisfying $Bx^*=y$. We prove a similar result for a statistical version of the problem in which the data are corrupted by noise. To the authors' knowledge, these are the first hardness results for sparse regression that apply when the algorithm simultaneously has $k'>k$ and $h(m,p)>0$. Hardness Amplification and the Approximate Degree of Constant-Depth Circuits Mark Bun and Justin Thaler ICALP 2015 Links: [ECCC] [Slides] [Talk Video] Abstract: We establish a generic form of hardness amplification for the approximability of constant-depth Boolean circuits by polynomials. Specifically, we show that if a Boolean circuit cannot be pointwise approximated by low-degree polynomials to within constant error in a certain one-sided sense, then an OR of disjoint copies of that circuit cannot be pointwise approximated even with very high error. As our main application, we show that for every sequence of degrees $d(n)$, there is an explicit depth-three circuit $F: \{-1,1\}^n \to \{-1,1\}$ of polynomial-size such that any degree-$d$ polynomial cannot pointwise approximate $F$ to error better than $1-\exp\left(-\tilde{\Omega}(nd^{-3/2})\right)$. As a consequence of our main result, we obtain an $\exp\left(-\tilde{\Omega}(n^{2/5})\right)$ upper bound on the the discrepancy of a function in AC$^0$, and an $\exp\left(\tilde{\Omega}(n^{2/5})\right)$ lower bound on the threshold weight of AC$^0$, improving over the previous best results of $\exp\left(-\Omega(n^{1/3})\right)$ and $\exp\left(\Omega(n^{1/3})\right)$ respectively. Our techniques also yield a new lower bound of $\Omega\left(n^{1/2}/\log^{(d-2)/2}(n)\right)$ on the approximate degree of the AND-OR tree of depth $d$, which is tight up to polylogarithmic factors for any constant $d$, as well as new bounds for read-once DNF formulas. In turn, these results imply new lower bounds on the communication and circuit complexity of these classes, and demonstrate strong limitations on existing PAC learning algorithms. Verifiable Stream Computation and Arthur-Merlin Communication Amit Chakrabarti, Graham Cormode, Andrew McGregor, Justin Thaler, and Suresh Venkatasubramanian CCC 2015 Links: [ECCC] [Slides] [Talk Video] Abstract: In the setting of streaming interactive proofs (SIPs), a client (verifier) needs to compute a given function on a massive stream of data, arriving online, but is unable to store even a small fraction of the data. It outsources the processing to a third party service (prover), but is unwilling to blindly trust answers returned by this service. Thus, the service cannot simply supply the desired answer; it must convince the verifier of its correctness via a short interaction after the stream has been seen. In this work we study "barely interactive" SIPs. Specifically, we show that two or three rounds of interaction suffice to solve several query problems --- including Index, Median, Nearest Neighbor Search, Pattern Matching, and Range Counting---with polylogarithmic space and communication costs. Such efficiency with $O(1)$ rounds of interaction was thought to be impossible based on previous work. On the other hand, we initiate a formal study of the limitations of constant-round SIPs by introducing a new hierarchy of communication models called Online Interactive Proofs (OIPs). The online nature of these models is analogous to the streaming restriction placed upon the verifier in an SIP. We give upper and lower bounds that (1) characterize, up to quadratic blowups, every finite level of the OIP hierarchy in terms of other well-known communication complexity classes, (2) separate the first four levels of the hierarchy, and (3) reveal that the hierarchy collapses to the fourth level. Our study of OIPs reveals marked contrasts and some parallels with the classic Turing Machine theory of interactive proofs, establishes limits on the power of existing techniques for developing constant-round SIPs, and provides a new characterization of (non-online) Arthur-Merlin communication in terms of an online model. A Note on the GKR Protocol Justin Thaler Manuscript Links: [pdf] Abstract: This note describes a simplification of the GKR interactive proof for circuit evaluation (Goldwasser, Kalai, and Rothblum, J. ACM 2015), as efficiently instantiated by Cormode, Mitzenmacher, and Thaler (ITCS 2012). The simplification reduces the prover runtime, round complexity, and total communication cost of the protocol by roughly 33%. Verifiable Computation Using Multiple Provers Andrew J. Blumberg, Justin Thaler, Victor Vu, and Michael Walfish Manuscript Links: [e-print] Abstract: The increasing ubiquity of the cloud computing paradigm has renewed focus on the classical problem of allowing weak clients to check the results of computation delegated to powerful servers. Recent advances in proof-based verifiable computation have led to several near-practical protocols. Protocols based on interactive proofs (IPs) work with highly restrictive models of computation and are thus efficient only for a limited class of computations. In contrast, protocols based on argument systems apply to a much larger class of computations, but efficiency requires amortization of very expensive setup costs. This paper initiates the study of the practical efficiency of multiprover interactive proofs (MIPs). We present a new MIP for delegating computation that extends insights from a powerful IP protocol (Goldwasser et al., STOC, 2008). Without reductions or amplification, our protocol uses only two provers (departing from prior work on MIPs), and achieves both the efficiency of interactive proof-based protocols and the generality of argument system-based protocols. Also, this result, together with recently developed machinery, creates a potential avenue toward concretely efficient arguments without setup costs. We describe Clover, a built system for verifiable computation, based on our protocol. Although Clover does not implement the full theory (it has setup costs), it applies to problems that existing IPs cannot efficiently handle, and achieves performance comparable to, or better than, the best argument systems. 2014 Distribution-Independent Reliable Learning Varun Kanade and Justin Thaler COLT 2014 Links: [arXiv] Abstract: We study several questions in the reliable agnostic learning framework of Kalai et al. (2009), which captures learning tasks in which one type of error is costlier than other types. A positive reliable classifier is one that makes no false positive errors. The goal in the positive reliable agnostic framework is to output a hypothesis with the following properties: (i) its false positive error rate is at most $\epsilon$, (ii) its false negative error rate is at most $\epsilon$ more than that of the best positive reliable classifier from the class. A closely related notion is fully reliable agnostic learning, which considers partial classifiers that are allowed to predict unknown'' on some inputs. The best fully reliable partial classifier is one that makes no errors and minimizes the probability of predicting unknown'', and the goal in fully reliable learning is to output a hypothesis that is almost as good as the best fully reliable partial classifier from a class. For distribution-independent learning, the best known algorithms for PAC learning typically utilize polynomial threshold representations, while the state of the art agnostic learning algorithms use point-wise polynomial approximations. We show that one-sided polynomial approximations, an intermediate notion between polynomial threshold representations and point-wise polynomial approximations, suffice for learning in the reliable agnostic settings. We then show that majorities can be fully reliably learned and disjunctions of majorities can be positive reliably learned, through constructions of appropriate one-sided polynomial approximations. Our fully reliable algorithm for majorities provides the first evidence that fully reliable learning may be strictly easier than agnostic learning. Our algorithms also satisfy strong attribute-efficiency properties, and in many cases they provide smooth tradeoffs between sample complexity and running time. Parallel Peeling Algorithms Jiayang Jiang, Michael Mitzenmacher, and Justin Thaler SPAA 2014; Winner of Best Paper Award; Accepted to TOPC (special issue for SPAA 2014) Links: [arXiv] [slides] [Talk Video] Abstract: The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than $k$ are removed until there are no vertices of degree less than $k$ left. The remaining hypergraph is known as the $k$-core. In this paper, we analyze parallel peeling processes, where in each round, all vertices of degree less than k are removed. It is known that, below a specific edge density threshold, the $k$-core is empty with high probability. We show that, with high probability, below this threshold, only $(\log \log n)/\log(k-1)(r-1) + O(1)$ rounds of peeling are needed to obtain the empty $k$-core for $r$-uniform hypergraphs. Interestingly, we show that above this threshold, $\Omega(\log n)$ rounds of peeling are required to find the non-empty $k$-core. Since most algorithms and data structures aim to peel to an empty $k$-core, this asymmetry appears fortunate. We verify the theoretical results both with simulation and with a parallel implementation using graphical processing units (GPUs). Our implementation provides insights into how to structure parallel peeling algorithms for efficiency in practice. Annotations in Data Streams Amit Chakrabarti, Graham Cormode, Andrew McGregor, and Justin Thaler ACM Transactions on Algorithms Links: [ECCC] Note: This paper supersedes a preliminary version by Chakrabarti, Cormode, and McGregor that appeared in ICALP 2009. Abstract: The central goal of data stream algorithms is to process massive streams of data using sublinear storage space. Motivated by work in the database community on outsourcing database and data stream processing, we ask whether the space usage of such algorithms can be further reduced by enlisting a more powerful "helper" who can annotate the stream as it is read. We do not wish to blindly trust the helper, so we require that the algorithm be convinced of having computed a correct answer. We show upper bounds that achieve a non-trivial tradeoff between the amount of annotation used and the space required to verify it. We also prove lower bounds on such tradeoffs, often nearly matching the upper bounds, via notions related to Merlin-Arthur communication complexity. Our results cover the classic data stream problems of selection, frequency moments, and fundamental graph problems such as triangle-freeness and connectivity. Our work is also part of a growing trend---including recent studies of multi-pass streaming, read/write streams and randomly ordered streams---of asking more complexity-theoretic questions about data stream processing. It is a recognition that, in addition to practical relevance, the data stream model raises many interesting theoretical questions in its own right. Faster Private Release of Marginals on Small Databases Karthekeyan Chandrasekaran, Justin Thaler, Jonathan Ullman, and Andrew Wan ITCS 2014 Links: [arXiv] Abstract: We study the problem of answering $k$-way marginal queries on a database $D \in (\{0,1\}^d)^n$, while preserving differential privacy. The answer to a $k$-way marginal query is the fraction of the database's records $x \in \{0,1\}^d$ with a given value in each of a given set of up to $k$ columns. Marginal queries enable a rich class of statistical analyses on a dataset, and designing efficient algorithms for privately answering marginal queries has been identified as an important open problem in private data analysis. For any $k$, we give a differentially private online algorithm that runs in time $$\mathrm{poly}\left(n, 2^{o(d)} \right)$$ per query and answers any sequence of $\mathrm{poly}(n)$ many $k$-way marginal queries with error at most $\pm 0.01$ on every query, provided $n \gtrsim d^{0.51}$. To the best of our knowledge, this is the first algorithm capable of privately answering marginal queries with a non-trivial worst-case accuracy guarantee for databases containing $\mathrm{poly}(d, k)$ records in time $\exp(o(d))$. Our algorithm runs the private multiplicative weights algorithm (Hardt and Rothblum, FOCS '10) on a new approximate polynomial representation of the database. We derive our representation for the database by approximating the OR function restricted to low Hamming weight inputs using low-degree polynomials with coefficients of bounded $L_1$-norm. In doing so, we show new upper and lower bounds on the degree of such polynomials, which may be of independent approximation-theoretic interest. Annotations for Sparse Data Streams Amit Chakrabarti, Graham Cormode, Navin Goyal. and Justin Thaler SODA 2014 Links:[arXiv] [Slides] Abstract: Motivated by cloud computing, a number of recent works have studied annotated data streams and variants thereof. In this setting, a computationally weak verifier (cloud user), lacking the resources to store and manipulate his massive input locally, accesses a powerful but untrusted prover (cloud service). The verifier must work within the restrictive data streaming paradigm. The prover, who can annotate the data stream as it is read, must not just supply the answer but also convince the verifier of its correctness. Ideally, both the amount of annotation and the space used by the verifier should be sublinear in the relevant input size parameters. A rich theory of such algorithms---which we call schemes---has emerged. Prior work has shown how to leverage the prover's power to efficiently solve problems that have no non-trivial standard data stream algorithms. However, while optimal schemes are now known for several basic problems, such optimality holds only for streams whose length is commensurate with the size of the data universe. In contrast, many real-world datasets are relatively sparse, including graphs that contain only $O(n^2)$ edges, and IP traffic streams that contain much fewer than the total number of possible IP addresses, $2^{128}$ in IPv6. We design the first schemes that allow both the annotation and the space usage to be sublinear in the total number of stream updates rather than the size of the data universe. We solve significant problems, including variations of INDEX, SET-DISJOINTNESS, and FREQUENCY-MOMENTS, plus several natural problems on graphs. On the other hand, we give a new lower bound that, for the first time, rules out smooth tradeoffs between annotation and space usage for a specific problem. Our technique brings out new nuances in Merlin-Arthur communication complexity models, and provides a separation between online versions of the MA and AMA models. 2013 Time-Optimal Interactive Proofs for Circuit Evaluation Justin Thaler CRYPTO 2013 Links: [Extended Abstract] [Full Paper (arXiv)] [Source Code] [Slides] [Talk Video] Abstract: Recently, researchers have been working toward the development of practical general-purpose protocols for verifiable computation. These protocols enable a computationally weak verifier to offload computations to a powerful but untrusted prover, while providing the verifier with a guarantee that the prover performed the computations correctly. Despite substantial progress, existing implementations are not yet practical. The main bottleneck is typically the extra effort required by the prover to return an answer with a guarantee of correctness, compared to returning an answer with no guarantee. We describe a refinement of a powerful interactive proof protocol originally due to Goldwasser, Kalai, and Rothblum. Cormode, Mitzenmacher, and Thaler show how to implement the prover in this protocol in time $O(S \log S)$, where $S$ is the size of an arithmetic circuit computing the function of interest. Our refinements apply to circuits whose wiring pattern is sufficiently "regular"; for these circuits, we bring the runtime of the prover down to $O(S)$. That is, our prover can evaluate the circuit with a guarantee of correctness, with only a constant-factor blowup in work compared to evaluating the circuit with no guarantee. We argue that our refinements capture a large class of circuits, and prove some theorems formalizing this. Experimentally, our refinements yield a 200x speedup for the prover over the implementation of Cormode et al., and our prover is less than 10x slower than a C++ program that simply evaluates the circuit. Along the way, we describe a special-purpose protocol for matrix multiplication that is of interest in its own right. Our final contribution is a protocol targeted at general data parallel computation. Compared to prior work, this protocol can more efficiently verify complicated computations as long as that computation is applied independently to many pieces of data. Dual Lower Bounds for Approximate Degree and Markov-Bernstein Inequalities Mark Bun and Justin Thaler ICALP 2013; Winner of Best Paper award for Track A; Journal version in Information and Computation (special issue for ICALP 2013) Links: [arXiv] [Expository Blog Post] Abstract: The $\epsilon$-approximate degree of a Boolean function $f: \{-1, 1\}^n \to \{-1, 1\}$ is the minimum degree of a real polynomial that approximates $f$ to within $\epsilon$ in the $\ell_\infty$ norm. We prove several lower bounds on this important complexity measure by explicitly constructing solutions to the dual of an appropriate linear program. Our first result resolves the $\epsilon$-approximate degree of the two-level AND-OR tree for any constant $\epsilon > 0$. We show that this quantity is $\Theta(\sqrt{n})$, closing a line of incrementally larger lower bounds. The same lower bound was recently obtained independently by Sherstov using related techniques. Our second result gives an explicit dual polynomial that witnesses a tight lower bound for the approximate degree of any symmetric Boolean function, addressing a question of \v{S}palek. Our final contribution is to reprove several Markov-type inequalities from approximation theory by constructing explicit dual solutions to natural linear programs. These inequalities underly the proofs of many of the best-known approximate degree lower bounds, and have important uses throughout theoretical computer science. 2012 Faster Algorithms for Privately Releasing Marginals Justin Thaler, Jonathan Ullman, and Salil Vadhan ICALP 2012 Links: [arXiv] Abstract: We study the problem of releasing $k$-way marginals of a database $D \in (\{0,1\}^d)^n$, while preserving differential privacy. The answer to a $k$-way marginal query is the fraction of $D$'s records $x \in \{0,1\}^d$ with a given value in each of a given set of up to $k$ columns. Marginal queries enable a rich class of statistical analyses of a dataset, and designing efficient algorithms for privately releasing marginal queries has been identified as an important open problem in private data analysis (cf.~Barak et.~al., PODS '07). We give an algorithm that runs in time $d^{O(\sqrt{k})}$ and releases a private summary capable of answering any $k$-way marginal query with at most $\pm .01$ error on every query as long as $n \geq d^{O(\sqrt{k})}$. To our knowledge, ours is the first algorithm capable of privately releasing marginal queries with non-trivial worst-case accuracy guarantees in time substantially smaller than the number of $k$-way marginal queries, which is $d^{\Theta(k)}$ (for $k \ll d$). Attribute-Efficient Learning and Weight-Degree Tradeoffs for Polynomial Threshold Functions Li-Yang Tan, Rocco Servedio, and Justin Thaler COLT 2012 Links: [ECCC version] [Expository Blog Post] [Slides (20 Minute Presentation)] [Li-Yang's Slides (50 Minute Presentation)] Abstract: We study the challenging problem of learning decision lists attribute-efficiently, giving both positive and negative results. Our main positive result is a new tradeoff between the running time and mistake bound for learning length-$k$ decision lists over $n$ Boolean variables. When the allowed running time is relatively high, our new mistake bound improves significantly on the mistake bound of the best previous algorithm of Klivans and Servedio. Our main negative result is a new lower bound on the weight of any degree-$d$ polynomial threshold function (PTF) that computes a particular decision list over $k$ variables (the Odd-Max-Bit'' function). The main result of Beigel (Computational Complexity, 1994) is a weight lower bound of $2^{\Omega(k/d^2)}$, which was shown to be essentially optimal for $d \leq k^{1/3}$ by Klivans and Servedio. Here we prove a $2^{\Omega(\sqrt{k/d})}$ lower bound, which improves on Beigel's lower bound for $d > k^{1/3}.$ This lower bound establishes strong limitations on the effectiveness of the Klivans and Servedio approach and suggests that it may be difficult to improve on our positive result. The main tool used in our lower bound is a new variant of Markov's classical inequality which may be of independent interest; it provides a bound on the derivative of a univariate polynomial in terms of both its degree and the size of its coefficients. Peeling Arguments and Double Hashing Michael Mitzenmacher, and Justin Thaler Allerton 2012 (Invited Paper) Links: [pdf] Abstract: The analysis of several algorithms and data structures can be reduced to the analysis of the following greedy "peeling" process: start with a random hypergraph; find a vertex of degree at most $k$, and remove it and all of its adjacent hyperedges from the graph; repeat until there is no suitable vertex. This specific process finds the $k$-core of a hypergraph, and variations on this theme have proven useful in analyzing for example decoding from low-density parity-check codes, several hash-based data structures such as cuckoo hashing, and algorithms for satisfiability of random formulae. This approach can be analyzed several ways, with two common approaches being via a corresponding branching process or a fluid limit family of differential equations. In this paper, we make note of an interesting aspect of these types of processes: the results are generally the same when the randomness is structured in the manner of double hashing. This phenomenon allows us to use less randomness and simplify the implementation for several hash-based data structures and algorithms. We explore this approach from both an empirical and theoretical perspective, examining theoretical justifications as well as simulation results for specific problems. Verifiable Computation with Massively Parallel Interactive Proofs Justin Thaler, Mike Roberts, Michael Mitzenmacher, and Hanspeter Pfister HotCloud 2012 Links: [arXiv] [source code] [extended abstract] [Talk Video] [Slides] Abstract: As the cloud computing paradigm has gained prominence, the need for verifiable computation has grown increasingly urgent. The concept of verifiable computation enables a weak client to outsource difficult computations to a powerful, but untrusted, server. Protocols for verifiable computation aim to provide the client with a guarantee that the server performed the requested computations correctly, without requiring the client to perform the computations herself. By design, these protocols impose a minimal computational burden on the client. However, existing protocols require the server to perform a large amount of extra bookkeeping in order to enable a client to easily verify the results. Verifiable computation has thus remained a theoretical curiosity, and protocols for it have not been implemented in real cloud computing systems. Our goal is to leverage GPUs to reduce the server-side slowdown for verifiable computation. To this end, we identify abundant data parallelism in a state-of-the-art general-purpose protocol for verifiable computation, originally due to Goldwasser, Kalai, and Rothblum, and recently extended by Cormode, Mitzenmacher, and Thaler. We implement this protocol on the GPU, obtaining 40-120x server-side speedups relative to a state-of-the-art sequential implementation. For benchmark problems, our implementation reduces the slowdown of the server to factors of 100-500x relative to the original computations requested by the client. Furthermore, we reduce the already small runtime of the client by 100x. Similarly, we obtain 20-50x server-side and client-side speedups for related protocols targeted at specific streaming problems. We believe our results demonstrate the immediate practicality of using GPUs for verifiable computation, and more generally that protocols for verifiable computation have become sufficiently mature to deploy in real cloud computing systems. Continuous Time Channels with Interference Ioanna Ivan, Michael Mitzenmacher, Justin Thaler, and Henry Yuen ISIT 2012 Links: [arXiv] Abstract: Khanna and Sudan studied a natural model of continuous time channels where signals are corrupted by the effects of both noise and delay, and showed that, surprisingly, in some cases both are not enough to prevent such channels from achieving unbounded capacity. Inspired by their work, we consider channels that model continuous time communication with adversarial delay errors. The sender is allowed to subdivide time into an arbitrarily large number $M$ of micro-units in which binary symbols may be sent, but the symbols are subject to unpredictable delays and may interfere with each other. We model interference by having symbols that land in the same micro-unit of time be summed, and we study $k$-interference channels, which allow receivers to distinguish sums up to the value $k$. We consider both a channel adversary that has a limit on the maximum number of steps it can delay each symbol, and a more powerful adversary that only has a bound on the average delay. We give precise characterizations of the threshold between finite and infinite capacity depending on the interference behavior and on the type of channel adversary: for max-bounded delay, the threshold is at $D_{\text{max}}=\Theta(M \log\min{k, M}))$, and for average bounded delay the threshold is at $D_{\text{avg}} = \Theta(\sqrt{M \cdot \min\{k, M\}})$. Hierarchical Heavy Hitters with the Space Saving Algorithm Michael Mitzenmacher, Thomas Steinke, and Justin Thaler ALENEX 2012 Links: [arXiv] [source code] Abstract: The Hierarchical Heavy Hitters problem extends the notion of frequent items to data arranged in a hierarchy. This problem has applications to network traffic monitoring, anomaly detection, and DDoS detection. We present a new streaming approximation algorithm for computing Hierarchical Heavy Hitters that has several advantages over previous algorithms. It improves on the worst-case time and space bounds of earlier algorithms, is conceptually simple and substantially easier to implement, offers improved accuracy guarantees, is easily adopted to a distributed or parallel setting, and can be efficiently implemented in commodity hardware such as ternary content addressable memory (TCAMs). We present experimental results showing that for parameters of primary practical interest, our two-dimensional algorithm is superior to existing algorithms in terms of speed and accuracy, and competitive in terms of space, while our one-dimensional algorithm is also superior in terms of speed and accuracy for a more limited range of parameters. Practical Verified Computation with Streaming Interactive Proofs Graham Cormode, Michael Mitzenmacher, and Justin Thaler ITCS 2012 Links: [arXiv] [source code] [slides] Abstract: When delegating computation to a service provider, as in cloud computing, we seek some reassurance that the output is correct and complete. Yet recomputing the output as a check is inefficient and expensive, and it may not even be feasible to store all the data locally. We are therefore interested in proof systems which allow a service provider to prove the correctness of its output to a streaming (sublinear space) user, who cannot store the full input or perform the full computation herself. Our approach is two-fold. First, we describe a carefully chosen instantiation of one of the most efficient general-purpose constructions for arbitrary computations (streaming or otherwise), due to Goldwasser, Kalai, and Rothblum. This requires several new insights to make the methodology more practical. Our main contribution is in achieving a prover who runs in time $O(S(n) \log S(n))$, where $S(n)$ is the size of an arithmetic circuit computing the function of interest. Our experimental results demonstrate that a practical general-purpose protocol for verifiable computation may be significantly closer to reality than previously realized. Second, we describe techniques that achieve genuine scalability for protocols fine-tuned for specific important problems in streaming and database processing. Focusing in particular on non-interactive protocols for problems ranging from matrix-vector multiplication to bipartite perfect matching, we build on prior work to achieve a prover who runs in nearly linear-time, while obtaining optimal tradeoffs between communication cost and the user's working memory. Existing techniques required (substantially) superlinear time for the prover. We argue that even if general-purpose methods improve, fine-tuned protocols will remain valuable in real-world settings for key problems, and hence special attention to specific problems is warranted. Verifying Computations with Streaming Interactive Proofs Graham Cormode, Justin Thaler, and Ke Yi VLDB 2012 Links: [arXiv] [Conference slides] Abstract: When computation is outsourced, the data owner would like to be assured that the desired computation has been performed correctly by the service provider. In theory, proof systems can give the necessary assurance, but prior work is not sufficiently scalable or practical. In this paper, we develop new proof protocols for verifying computations which are streaming in nature: the verifier (data owner) needs only logarithmic space and a single pass over the input, and after observing the input follows a simple protocol with a prover (service provider) that takes logarithmic communication spread over a logarithmic number of rounds. These ensure that the computation is performed correctly: that the service provider has not made any errors or missed out some data. The guarantee is very strong: even if the service provider deliberately tries to cheat, there is only vanishingly small probability of doing so undetected, while a correct computation is always accepted. We first observe that some theoretical results can be modified to work with streaming verifiers, showing that there are efficient protocols for problems in the complexity classes NP and NC. Our main results then seek to bridge the gap between theory and practice by developing usable protocols for a variety of problems of central importance in streaming and database processing. All these problems require linear space in the traditional streaming model, and therefore our protocols demonstrate that adding a prover can exponentially reduce the effort needed by the verifier. Our experimental results show that our protocols are practical and scalable. Cache-Oblivious Dictionaries and Multimaps with Negligible Failure Probability Michael T. Goodrich, Dan Hirschberg, Michael Mitzenmacher, and Justin Thaler MedAlg 2012 Links: [pdf] [slides] Abstract: A dictionary (or map) is a key-value store that requires all keys be unique, and a multimap is a key-value store that allows for multiple values to be associated with the same key. We design hashing- based indexing schemes for dictionaries and multimaps that achieve worst-case optimal performance for lookups and updates, with minimal space overhead and sub-polynomial probability that the data structure will require a rehash operation. Our dictionary structure is designed for the Random Access Machine (RAM) model, while our multimap implementation is designed for the cache-oblivious external memory (I/O) model. The failure probabilities for our structures are sub-polynomial, which can be useful in cryptographic or data-intensive applications. 2011 On the Zero-Error Capacity Threshold for Deletion Channels Ian Kash, Michael Mitzenmacher, Justin Thaler, and Jonathan Ullman ITA 2011 Links: [arXiv] Abstract: We consider the zero-error capacity of deletion channels. Specifically, we consider the setting where we choose a codebook $C$ consisting of strings of n bits, and our model of the channel corresponds to an adversary who may delete up to $pn$ of these bits for a constant $p$. Our goal is to decode correctly without error regardless of the actions of the adversary. We consider what values of $p$ allow non-zero capacity in this setting. We suggest multiple approaches, one of which makes use of the natural connection between this problem and the problem of finding the expected length of the longest common subsequence of two random sequences. External Memory Multimaps Elaine Angelino, Michael Mitzenmacher, and Justin Thaler ISAAC 2011; Algorithmica (special issue for ISAAC 2011) Links:[arXiv] Abstract: Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of key-value pairs. A \emph{multimap} is generalization that allows multiple values to be associated with the same key. For example, the inverted file data structure that is used prevalently in the infrastructure supporting search engines is a type of multimap, where words are used as keys and document pointers are used as values. We study the multimap abstract data type and how it can be implemented efficiently online in external memory frameworks, with constant expected I/O performance. The key technique used to achieve our results is a combination of cuckoo hashing using buckets that hold multiple items with a multiqueue implementation to cope with varying numbers of values per key. Our external-memory results are for the standard two-level memory model. 2010 Streaming Graph Computations with a Helpful Advisor Graham Cormode, Michael Mitzenmacher, and Justin Thaler ESA 2010; Algorithmica 2013 Links: [ECCC] [Conference Slides] Abstract: Motivated by the trend to outsource work to commercial cloud computing services, we consider a variation of the streaming paradigm where a streaming algorithm can be assisted by a powerful helper that can provide annotations to the data stream. We extend previous work on such annotation models by considering a number of graph streaming problems. Without annotations, streaming algorithms for graph problems generally require significant memory; we show that for many standard problems, including all graph problems that can be expressed with totally unimodular integer programming formulations, only constant space (measured in words) is needed for single-pass algorithms given linear-sized annotations. We also obtain protocols achieving essentially optimal tradeoffs between annotation length and memory usage for several important problems, including integer matrix-vector multiplication, as well as shortest $s$-$t$ path in small-diameter graphs. We also obtain non-trivial tradeoffs for minimum weight bipartite perfect matching and shortest $s$-$t$ path in general graphs. 2009 Graph Covers and Quadratic Minimization Nicholas Ruozzi, Sekhar Tatikonda, and Justin Thaler Allerton 2009 Links: [pdf] [slides] Abstract: We formulate a new approach to understanding the behavior of the min-sum algorithm by exploiting the properties of graph covers. First, we present a new, natural characterization of scaled diagonally dominant matrices in terms of graph covers; this result motivates our approach because scaled diagonal dominance is a known sufficient condition for the convergence of min-sum in the case of quadratic minimization. We use our understanding of graph covers to characterize the periodic behavior of the min-sum algo- rithm on a single cycle. Lastly, we explain how to extend the single cycle results to understand the 2-periodic behavior of min-sum for general pairwise MRFs. Some of our tech- niques apply more broadly, and we believe that by capturing the notion of indistinguishability, graph covers represent a valuable tool for understanding the abilities and limitations of general message-passing algorithms. Expository Articles Technical Perspective: Catching Lies (and Mistakes) in Offloaded Computation Michael Mitzenmacher, and Justin Thaler CACM, February 2016 Stream Verification Justin Thaler Springer's Encyclopedia of Algorithms, 2015 Links: [arXiv] [slides] Abstract: In this short article, we survey models and algorithms for stream verification.

I have been, or currently am, a member of the following program committees:
• STOC 2019
• SOSA 2019
• SODA 2018
• FSTTCS 2017
• ICALP 2016
• ALENEX 2016
• SDM 2015

Open Source Software

• I am a co-creator and core contributor to Data Sketches, an open source library of production quality streaming algorithms for a variety of important query classes, including identifying frequent items, unique count queries, quantiles, and subset sum queries.