Bloomington summer school recap

A couple months ago, at Indiana University, David Fisher, Nets Katz, and I  organized a summer school on Analysis and geometry in the theory of computation.  This school is one in a series organized by David and funded by NSF grant DMS-0643546 (see, e.g. last year’s school). What follows is a brief synopsis of what the school covered.  All the lectures were given by the participants, and there are links to their lecture notes below.  This is essentially an extended version of an introductory document I wrote for the participants, who were a mix of mathematicians and theoretical computer scientists.

Approximation Algorithms

In the following discussion, we will use the word efficient to describe an algorithm that runs in time polynomial in the size of its input. For a graph {G=(V,E)}, we use {\textsf{MC}(G)} to denote the “MAX-CUT value,” i.e. the quantity

\displaystyle \max_{S \subseteq V} \frac{|E(S, \bar S)|}{|E|},

where {E(S, \bar S)} denotes the set of edges between {S} and its complement. It is well-known that computing {\textsf{MC}(G)} is {\mathsf{NP}}-complete, and thus assuming {\mathsf{P} \neq \mathsf{NP}}, there is no efficient algorithm that, given {G}, outputs {\textsf{MC}(G)}.

Given this state of affairs, it is natural to ask how well we can approximate the value {\mathsf{MC}(G)} with an efficient algorithm. For an algorithm {\mathcal A}, we use {\mathcal A(G)} to denote its output when run on the graph {G}. If {\mathcal A} satisfies {\mathcal A(G) \leq \mathsf{MC}(G)} for all {G}, we define its approximation ratio as

\displaystyle \alpha(\mathcal A) = \sup \left\{ \alpha : \mathcal A(G) \geq \alpha \cdot \mathsf{MC}(G) \textrm{ for all graphs}\right\}

Clearly {\mathcal A(G) \in [0,1]}. Now we are interested in the best approximation ratio achievable by an efficient algorithm {\mathcal A}, i.e. the quantity

\displaystyle \textrm{approx}(\mathsf{MC}) = \sup \left\{ \alpha(\mathcal A) : \mathcal A \textrm{ is efficient} \right\}

It should be clear that similar questions arise for all sorts of other values which are NP-hard to compute (e.g. the chromatic number of a graph, or the length of its shortest tour, or the length of the longest simple path, etc.) An algorithm of Goemans and Williamson (based on a form of convex optimization known as semi-definite programming) shows that

\displaystyle \mathrm{approx}(\mathsf{MC}) \geq \alpha_{\mathrm{GW}} = \frac{2}{\pi} \min_{0 < \theta < \pi} \frac{\theta}{1-\cos\theta} = 0.878\ldots

On the other hand, Håstad proved that, as a consequence of the PCP Theorem, it is NP-complete to obtain an approximation ratio better than {16/17}, i.e. if {\mathsf{P} \neq \mathsf{NP}}, then

\displaystyle \mathrm{approx}(\mathsf{MC}) \leq \frac{16}{17} = 0.941\ldots

How does one prove such a theorem? Well, the {\mathsf{NP}}-hardness of MAX-CUT is based on constructing graphs where every optimal solution has a particular structure (which eventually encodes the solution to another NP-hard problem like SATISFIABILITY). Similarly, the NP-hardness of of obtaining even “near-optimal” solutions is proved, in part, by constructing graphs where every solution whose value is close to optimal has some very specific structure (e.g. is close—in some stronger sense—to an optimal solution).

In this way, one of the main steps in proving the inapproximability of {\mathsf{NP}}-hard problems involves constructing objects which have such a “rigidity” property. This summer school is about how one can use the rigidity of analytic and geometric objects to obtain combinatorial objects with the same property. In fact, assuming something called the “Unique Games Conjecture” (which we will see later), the approximability of many constraint satisfaction problems can be tied directly to the existence of certain geometric configurations.

The Lectures

The first series of lectures will concern the Sparsest Cut problem in graphs and its relationship to bi-lipschitz {L_1} embeddings of finite metric spaces. In particular, we will look at rigidity properties of  “nice” subsets of the Heisenberg group, and how these can be used to prove limitations on a semi-definite programming approach to Sparsest Cut. In the second series, we will see how—assuming the Unique Games Conjecture (UGC)—proving lower bounds on certain simple semi-definite programs actually proves lower bounds against all efficient algorithms. This will entail, among other things, an analytic view of {\{0,1\}}-valued functions, primarily through harmonic analysis.

Sparsest Cut and {L_1} embeddings

The Sparsest Cut problem is classically described as follows. We have a graph {G=(V,E)} and two functions {C : V \times V \rightarrow \mathbb R_+} and {D : V \times V \rightarrow \mathbb R_+}, with {\mathrm{supp}(C) \subseteq E}. The goal is to compute

\displaystyle   \Phi^*(G;C,D) = \min_{S \subseteq V} \frac{C(S, \bar S)}{D(S, \bar S)},

where we use {C(A,B) = \sum_{a \in A, b\in B} C(a,b)} and {D(A,B) = \sum_{a \in A, b \in B} D(a,b)}. The problem has a number of important applications in computer science.

Computing {\Phi^*(G;C,D)} is NP-hard, but again we can ask for approximation algorithms. The best-known approach is based on computing the value of the Goemans-Linial semi-definite program, \mathsf{sdp}(G;C,D), which is

\displaystyle \min \left\{ \frac{\sum_{u,v} C(u,v) \|x_u-x_v\|_2^2}{\sum_{u,v} D(u,v) \|x_u-x_v\|_2^2}: \{x_u\}_{u \in V} \subseteq \mathbb R^V\textrm{ and }\|x_u-x_v\|^2 \leq \|x_u-x_w\|^2 + \|x_w-x_v\|^2 \textrm{ for all }  u,v,w \in V \right\}.

This value can be computed by a semi-definite program (SDP), as we will see. It is an easy exercise to check that {\mathsf{sdp}(G;C,D) \leq \Phi^*(G;C,D)}, and we can ask for the smallest {\alpha = \alpha(n)} such that for all {n}-node graphs {G} and all functions {C,D}, we have

\displaystyle \Phi^*(G;C,D) \leq \alpha(n) \cdot \mathsf{sdp}(G;C,D).

(E.g. it is now known that {(\log n)^{2^{-1000}} \leq \alpha(n) \leq O(\sqrt{\log n} \log \log n)}, with the upper bound proved here, and the lower bound proved here.)

By some duality arguments, one can characterize {\alpha(n)} in a different way. For a metric space {(X,d)}, write {c_1(X,d)} for the infimal constant {B} such that there exists a mapping {f : X \rightarrow L_1} satisfying, for all {x,y \in X},

\displaystyle   \|f(x)-f(y)\|_1 \leq d(x,y) \leq B \|f(x)-f(y)\|_1.

It turns out that

\displaystyle \alpha(n) = \sup \left\{ c_1(X,d) : |X|=n \textrm{ and } (X, \sqrt{d})\textrm{ embeds isometrically in } L_2\right\} (1)

This shows that determining the power of the preceding SDP is intimately connected to understanding bi-lipschitz embeddings into {L_1}. This is what we will study in the first 6 lectures.

  1. (Arnaud de Mesmay) In the first lecture, we will be introduced to the basic geometry of the 3-dimensional Heisenberg group {\mathbb H^3}, and how differentiation plays a roll in proving lower bounds on bi-lipschitz distortion. In particular, we will see Pansu’s approach for finite-dimensional targets and a generalization to spaces with the RNP, and also why a straightforward generalization would fail for {L_1}.
  2. (Mohammad Moharrami) Next, we will see how a differentiation approach to {L_1} embeddings might work in a toy setting that uses only finite graphs. The study of “monotone subsets” (which is elementary here) also arises in the work of Cheeger and Kleiner in lectures 4 and 5.  (See also this post.)
  3. (Sean Li) Here, we will see that there is an equivalent metric {d} on the Heisenberg group for which {(\mathbb H^3, \sqrt{d})} embeds isometrically into {L_2}. This is one half of proving lower bounds on {\alpha(n)} using (1).
  4. (Jeehyeon Seo and John Mackay) In Lectures 4-5, we’ll look at the approach of Cheeger and Kleiner for proving that {\mathbb H^3} does not bi-lipschitz embed into {L_1}.  (Note that these authors previously offered a different approach to non-embeddability, though the one presented in these lectures is somewhat simpler.)
  5. (Florent Baudier) Finally, in Lecture 6, we see some embedding theorems for finite metric spaces that allow us to prove upper bounds on {\alpha(n)}.

The UGC, semi-definite programs, and constraint satisfaction

In the second series of lectures, we’ll see how rigidity of geometric objects can possibly say something, not just about a single algorithm (like a semi-definite program), but about all efficient algorithms for solving a particular problem.

  1. (An-Sheng Jhang) First, we’ll review basic Fourier analysis on the discrete cube, and how this leads to some global rigidity theorems for cuts. These tools will be essential later.  (See also these lecture notes from Ryan O’Donnell.)
  2. (Igor Gorodezky) Next, we’ll see a semi-definite program (SDP) for the MAX-CUT problem, and a tight analysis of its approximation ratio (which turns out to be the {0.878\ldots} value we saw earlier).
  3. (Sam Daitch) In the third lecture, we’ll see the definition of the Unique Games Conjecture, and how it can be used (in an ad-hoc manner, for now) to transform our SDP analysis into a proof that the SDP-based algorithm is optimal (among all efficient algorithms) under some complexity-theoretic assumptions.
  4. (Deanna Needell) A key technical component of the preceding lecture is something called the Majority is Stablest Theorem that relates sufficiently nice functions on the discrete cube to functions on Gaussian space.
  5. (Sushant Sachdeva) In the final lecture, we’ll see Raghavendra’s work which shows that, for a certain broad class of NP-hard constraint satisfaction problems, assuming the UGC, the best-possible algorithm is the “canonical” semi-definite program. In other words, the approximation ratio for these problems is completely determined by the existence (or lack thereof) of certain vector configurations in {\mathbb R^n}.  (See also this post.)

Lecture 2: Spectral partitioning and near-optimal foams

In the last lecture, we reduced the problem of cheating in \mathcal G_m^{\otimes k} (the k-times repeated m-cycle game) to finding a small set of edges \mathcal E in (\mathbb Z_m^k)_\infty whose removal eliminates all topologically non-trivial cycles.  Such a set \mathcal E is called a spine. To get some intuition about how many edges such a spine should contain, let’s instead look at a continuous variant of the problem.

Spines, Foams, and Isoperimetry

Consider again the k-dimensional torus \mathcal T^k = \mathbb R^k/\mathbb Z^k, which one can think of as \lbrack 0,1)^k with opposite sides identified.  Say that a nice set (e.g. a compact, C^\infty surface) \mathcal E \subseteq \mathcal T^k is a spine if it intersects every non-contractible loop in \mathcal T^k.  This is the continuous analog of a spine in (\mathbb Z_m^k)_\infty.  We will try to find such a spine \mathcal E with surface area, i.e. \mathrm{Vol}_{k-1}(\mathcal E), as small as possible.

Let’s consider some easy bounds.  First, it is clear that the set

\displaystyle \mathcal E = \left\{(x_1, \ldots, x_k) \in [0,1)^k : \exists i \in \{1,2,\ldots,k\}, x_i = 0\right\}

is a spine with \mathrm{Vol}_{k-1}(\mathcal E) = k.  (A moment’s thought shows that this is “equivalent” to the provers playing independent games in each coordinate of \mathcal G_m^{\otimes k}.)

To get a good lower bound, it helps to relate spines to foams which tile \mathbb R^k according to \mathbb Z^k, as follows.  Take two potential spines.

To determine which curve is actually a spine, we can repeatedly tile them side-by-side.

The first tiling contains the blue bi-infinite curve, which obviously gives a non-trivial cycle in \mathcal T^k, while the second yields a tiling of \mathbb R^k by bodies of volume 1.  It is easy to deduce the following claim.

Claim: A surface \mathcal E \subseteq \mathcal T^k is a spine if and only if it induces a tiling of the plane by bodies of volume 1 which is invariant under shifts by \mathbb Z^k.

By the isoperimetric inequality in \mathbb R^k, this immediately yields the bound

\displaystyle \mathrm{Vol}_{k-1}(\mathcal E) \geq \mathrm{Vol}_{k-1}(S^{k-1}) \approx \sqrt{k},

where S^{k-1} is the unit (k-1)-dimensional sphere.

So the the surface area of an optimal spine lies somewhere between k and \sqrt{k}.  On the one hand, cubes tile very nicely but have large surface area.  On the other hand, we have sphere-like objects which have small surface area, but don’t seem (at least intuitively) to tile very well at all.  As first evidence that this isn’t quite right, note that it is known how to cover \mathbb R^k by disjoint bodies of volume at most 1 so that the surface area/volume ratio grows like \sqrt{k}.  See Lemma 3.16 in this paper, which is based on Chekuri, et. al.  It’s just that these covers are not invariant under \mathbb Z^k shifts.

Before we reveal the answer, let’s see what consequences the corresponding discrete bounds would have for parallel repetition of the m-cycle game.  If the “cube bound” were tight, we would have \mathsf{val}(\mathcal G_m^{\otimes k}) \approx 1 - \frac{k}{m}, which doesn’t rule out a strong parallel repetition theorem (\alpha^*=1 in the previous lecture).  If the “sphere bound” were tight, we would have \mathsf{val}(\mathcal G_m^{\otimes k}) \approx 1 - \frac{\sqrt{k}}{m}, which shows that \alpha^* \geq 2.   In the latter case, the approach to proving equivalence of the UGC and MAX-CUT conjectures doesn’t even get off the ground.

As the astute reader might have guessed, recently Ran Raz proved that \mathsf{val}(\mathcal G_m^{\otimes k}) \geq 1 - C\frac{\sqrt{k}}{m} for some constant C > 0, showing that a strong parallel repetition theorem—even for unique games—is impossible.   Subsequently, Kindler, O’Donnell, Rao, and Wigderson showed that there exists a spine \mathcal E \subseteq \mathcal T^k with \mathrm{Vol}_{k-1}(\mathcal E) \approx \sqrt{k}.  While it is not difficult to show that the continuous result implies Raz’s discrete result, we will take a direct approach found recently by Alon and Klartag.

Continue reading

Lecture 1: Cheating with foams

This the first lecture for CSE 599S:  Analytical and geometric methods in the theory of computation.  Today we’ll consider the gap amplification problem for 2-prover games, and see how it’s intimately related to some high-dimensional isoperimetric problems about foams.  In the next lecture, we’ll use spectral techniques to find approximately optimal foams (which will then let us cheat at repeated games).

The PCP Theorem, 2-prover games, and parallel repetition

For a 3-CNF formula \varphi, let \mathsf{sat}(\varphi) denote the maximum fraction of clauses in \varphi which are simultaneously satisfiable. For instance, \varphi is satisfiable if and only if \mathsf{sat}(\varphi)=1. One equivalent formulation of the PCP Theorem is that the following problem is NP-complete:

Formulation 1.

Given a 3-CNF formula \varphi, answer YES if \mathsf{sat}(\varphi)=1 and NO if \mathsf{sat}(\varphi) \leq 0.9 (any answer is acceptable if neither condition holds).

We can restate this result in the language of 2-prover games. A 2-prover game \mathcal G consists of four finite sets Q,Q',A,A', where Q and Q' are sets of questions, while A and A' are sets of answers to the questions in Q and Q', respectively. There is also a verifier V : Q \times Q' \times A \times A' \to \{0,1\} which checks the validity of answers. For a pair of questions (q,q') and answers (a,a'), the verifier is satisfied if and only if V(q,q',a,a')=1. The final component of \mathcal G is a probability distribution \mu on Q \times Q'.

Now a strategy for the game consists of two provers P : Q \to A and P' : Q' \to A' who map questions to answers. The score of the two provers (P,P') is precisely

\displaystyle\mathsf{val}_{P,P'}(\mathcal G) = \Pr \left[V(q, q', P(q), P(q'))=1\vphantom{\bigoplus}\right],

where (q,q') is drawn from \mu. This is just the probability that the verifier is happy with the answers provided by the two provers. The value of the game is now defined as

\displaystyle\mathsf{val}(\mathcal G) = \max_{P,P'} \mathsf{val}_{P,P'}(\mathcal G),

i.e. the best-possible score achievable by any two provers.

Now we can again restate the PCP Theorem as saying that the following problem is NP-complete:

Formulation 2.

Given a 2-prover game with \mathcal G with |A|,|A'|=O(1), answer YES if \mathsf{val}(\mathcal G)=1 and answer NO if \mathsf{val}(\mathcal G) \leq 0.99.

To see that Formulation 1 implies Formulation 2, consider, for any 3-CNF formula \varphi, the game \mathcal G_{\varphi} defined as follows. Q is the set of clauses in \varphi, Q' is the set of variables in \varphi, while A = \{ TTT, TTF, TFT, FTT, TFF, FTF, FFT, FFF \} and A' = \{T, F\}. Here A represents the set of eight possible truth assignments to a three-variable clause, and A' represents the set of possible truth assignments to a variable.

The distribution \mu is defined as follows: Choose first a uniformly random clause C \in Q, and then uniformly at random one of the three variables x \in Q' which appears in C. An answer (a_C, a_x) \in A \times A' is valid if the assignment a_C makes C true, and if a_x and a_C are consistent in the sense that they give the same truth value to the variable x. The following statement is an easy exercise:

For every 3-CNF formula \varphi, we have \mathsf{val}(\mathcal G_{\varphi}) = \frac23 + \frac13 \mathsf{sat}(\varphi).

The best strategy is to choose an assignment \mathcal A to the variables in \varphiP' plays according to \mathcal A, while P plays according to \mathcal A unless he is about to answer with an assignment that doesn’t satisfy the clause C.   At that point, he flips one of the literals to make his assignment satisfying (in this case, the chance of catching P cheating is only 1/3, the probability that P' is sent the variable that P flipped).

This completes our argument that Formulation 1 implies Formulation 2.

Parallel repetition
A very natural question is whether the constant 0.9 in Formulation 2 can be replaced by 0.001 (or an arbitrarily small constant). A natural way of gap amplification is by “parallel repetition” of a given game. Starting with a game \mathcal G = \langle Q,Q',A,A',V,\mu \rangle, we can consider the game \mathcal G^{\otimes k} = \langle Q^k, Q'^k, A^k, A'^k, V^{\otimes k}, \mu^{\otimes k} \rangle, where \mu^{\otimes k} is just the product distribution on (Q \times Q')^k \cong Q^k \times Q'^k. Here, we choose k pairs of questions (q_1, q'_1), \ldots, (q_k, q'_k) i.i.d. from \mu and the two provers then respond with answers (a_1, \ldots, a_k) \in A^k and (a'_1, \ldots, a'_k) \in A'^k. The verifier V^{\otimes k} is satisfied if and only if V(q_i, q'_i, a_i, a'_i)=1 for every i = 1, 2, \ldots, k.

Clearly \mathsf{val}(\mathcal G^{\otimes k}) \geq \mathsf{val}(G)^k because given a strategy (P,P') for \mathcal G, we can play the same strategy in every coordinate, and then our probability to win is just the probability that we simultaneously win k independent games. But is there a more clever strategy that can do better? Famously, early papers in this arena assumed it was obvious that \mathsf{val}(\mathcal G^{\otimes k}) = \mathsf{val}(\mathcal G)^k.

In fact, there are easy examples of games \mathcal G where \mathsf{val}(\mathcal G^{\otimes 2}) = \mathsf{val}(\mathcal G) = \frac12.  (Exercise: Show that this is true for the following game devised by Uri Feige. The verifier chooses two independent random bits b, b' \in \{0,1\}, and sends b to P and b' to P'. The answers of the two provers are from the set \{1,2\} \times \{0,1\}. The verifier accepts if both provers answer (1,b) or both provers answer (2,b').)

Nevertheless, in a seminal work, Ran Raz proved the Parallel Repetition Theorem, which states that the value of the repeated game does, in fact, drop exponentially.

Theorem 1.1: For every 2-prover game \mathcal G, there exists a constant c = O(\log(|A|+|A'|)) such that if \mathsf{val}(\mathcal G) \leq 1-\epsilon, then for every k \in \mathbb N,

\displaystyle\mathsf{val}(\mathcal G^{\otimes k}) \leq (1-\epsilon^{3})^{k/c}.

The exponent 3 above is actually due to an improvement of Holenstein (Raz’s original paper can be mined for an exponent of 32).

Continue reading