Lecture P1. From Integrality Gaps to Dictatorship Tests

Here are Prasad Raghavendra‘s notes on one of two guest lectures he gave for CSE 599S. Prasad will be a faculty member at Georgia Tech after he finishes his postdoc at MSR New England.

In yet another excursion in to applications of discrete harmonic analysis, we will now see an application of the invariance principle, to hardness of approximation. We will complement this discussion with the proof of the invariance principle in the next lecture.

1. Invariance Principle

In its simplest form, the central limit theorem asserts the following: “As {n} increases, the sum of {n} independent bernoulli random variables ({\pm 1} random variables) has approximately the same distribution as the sum of {n} independent normal (Gaussian with mean {0} and variance {1}) random variables”

Alternatively, as {n} increases, the value of the polynomial {F(\mathbf x) = 	\frac{1}{\sqrt{n}} (x^{(1)} + x^{(2)} + 	\ldots x^{(n)})} has approximately the same distribution whether the random variables {x^{(i)}} are i.i.d Bernoulli random variables or i.i.d normal random variables. More generally, the distribution of {p(\mathbf x)} is the approximately the same as long as the random variables {x^{(i)}} are independent with mean {0}, variance {1} and satisfy certain mild regularity assumptions.

A phenomenon of this nature where the distribution of a function of random variables, depends solely on a small number of their moments is referred to as invariance.

A natural approach to generalize of the above stated central limit theorem, is to replace the “sum” ({\frac{1}{\sqrt{n}} (x^{(1)} + x^{(2)} + 	\ldots x^{(n)})}) by other multivariate polynomials.   As we will see in the next lecture, the invariance principle indeed holds for low degree multilinear polynomials that are not influenced heavily by any single coordinate. Formally, define the influence of a coordinate on a polynomial as follows:

Definition 1 For a multilinear polynomial {F(\mathbf{x}) = \sum_{\sigma} 	\hat{F}_{\sigma} \prod_{i \in [R]} x^{(\sigma_i)}} define the influence of the {i^{th}} coordinate as follows:

{	\mathrm{Inf}_{i}(F) = \mathop{\mathbb E}_{x} [\mathrm{Var}_{x^{(i)}}[F]] = \sum_{\sigma_{i} \neq 	0} \hat{F}^2_\sigma}

Here {\mathrm{Var}_{x^{(i)}}[F]} denotes the variance of {F(x)} over random choice of {x_i}.

The invariance principle for low degree polynomials was first shown by Rotar in 1979. More recently, invariance principles for low degree polynomials were shown in different settings in the work of Mossel-O’Donnell-Olekszciewicz and Chatterjee. The former of the two works also showed the Majority is Stablest conjecture, and has been influential in introducing the powerful tool of invariance to hardness of approximation.

Here we state a special case of the invariance principle tailored to the application at hand.  To this end, let us first define a rounding function {f_{[-1,1]}} as follows:

\displaystyle f_{[-1,1]}(x) = \begin{cases} -1 &\text{ if } x < -1 \\ x &\text{ if } -1\leqslant x\leqslant 1 \\ 1 &\text{ if } x > 1 \end{cases}

Theorem 2 (Invariance Principle [Mossel-ODonnell-Oleszkiewicz]) Let {\mathbf{z} = \{z_1, z_2\}} and {\mathbf{G} = \{g_1,g_2\}} be sets of Bernoulli and Gaussian random variables respectively. Furthermore, let

\displaystyle  \mathop{\mathbb E}[z_i] = \mathop{\mathbb E}[g_i] = 0 \qquad \qquad \mathop{\mathbb E}[z_i^2] = \mathop{\mathbb E}[g_i^2] = 1 \qquad \qquad \forall i \in [2]

and {\mathop{\mathbb E}[z_1 z_2] = \mathop{\mathbb E}[g_1 g_2]}. Let {\mathbf{z}^R, \mathbf{G}^R} denote {R} independent copies of the random variables {\mathbf{z}} and {\mathbf{G}}.

Let {F} be a multilinear polynomial given by {F(\mathbf{x}) = \sum_{\sigma} \hat{F}_{\sigma} \prod_{i \in \sigma} x^{(i)}}, and let {H(\mathbf{x})=T_{1-\epsilon} F(\mathbf{x}) = \sum_{\sigma}(1-\epsilon)^{|\sigma|} \hat{F}_{\sigma} \prod_{i\in \sigma} x^{(i)}}. If {\mathrm{Inf}_\ell(H) \leqslant \tau} for all {\ell \in [R]}, then the following statements hold:

  1. For every function {\Psi : \mathbb{R}^2 	 \rightarrow \mathbb{R}} which is thrice differentiable with all its partial derivatives up to order {3} bounded uniformly by {C_0},

    \displaystyle  \Big|\mathop{\mathbb E}\Big[\Psi(H(\mathbf{z}_1^{R}), 	 H(\mathbf{z}_2^{R}))\Big] - \mathop{\mathbb E}\Big[\Psi(H(\mathbf{g}_1^R), H(\mathbf{g}_2^R))\Big] \Big| \leqslant \tau^{O(\epsilon)}

  2. Define the function {\xi : {\mathbb R}^2 \rightarrow 	 {\mathbb R}} as {\xi(\mathbf{x}) = \sum_{i\in [2]} (x_i - 	 f_{[-1,1]}(x_i))^2} Then, we have { \Big|\mathop{\mathbb E}[\xi(H(\mathbf{z}_1^{n}), H(\mathbf{z}_2^{n}))] - \mathop{\mathbb E}[\xi(H(\mathbf{g}_1^{n}),H(\mathbf{g}_2^{n}))] \Big| \leqslant \tau^{O(\epsilon)} }

2. Dictatorship Testing

A boolean function {\mathcal{F}(x_1,x_2,\ldots,x_n)} on {n} bits is a dictator or a long code if {\mathcal{F}(x) = x_i} for some {i}. Given the truth table of a function {\mathcal{F}}, a dictatorship test is a randomized procedure that queries a few locations (say {2}) in the truth table of {\mathcal{F}}, and tests a predicate {P} on the values it queried. If the queried values satisfy the predicate {P}, the test outputs ACCEPT else it outputs REJECT. The goal of the test is to determine if {\mathcal{F}} is a dictator or is far from being a dictator. The main parameters of interest in a dictatorship test are :

  • Completeness{(c)} Every dictator function {\mathcal{F}(x) = x_i} is accepted with probability at least {c}.
  • Soundness{(s)} Any function {\mathcal{F}} which is far from every dictator is accepted with probability at most {s}.
  • Number of queries made, and the predicate {P} used by the test.

2.1. Motivation

The motivation for the problem of dictatorship testing arises from hardness of approximation and PCP constructions. To show that an optimization problem {\Lambda} is {\mathbf{NP}}-hard to approximate, one constructs a reduction from a well-known {\mathbf{NP}}-hard problem such as Label Cover to {\Lambda}. Given an instance {\Im} of the Label Cover problem, a hardness reduction produces an instance {\Im'} of the problem {\Lambda}. The instance {\Im'} has a large optimum value if and only if {\Im} has a high optimum. Dictatorship tests serve as “gadgets” that encode solutions to the Label Cover, as solutions to the problem {\Lambda}. In fact, constructing an appropriate dictatorship test almost always translates in to a corresponding hardness result based on the Unique Games Conjecture.

Dictatorship tests or long code tests as they are also referred to, were originally conceived purely from the insight of error correcting codes. Let us suppose we are to encode a message that could take one of {R} different values {\{m_1,\ldots,m_{R}\}}. The long code encoding of the message {m_\ell} is a bit string of length {2^{R}}, consisting of the truth table of the function {\mathcal{F}(x_1,\ldots,x_R) = x_\ell}. This encoding is maximally redundant in that any binary encoding with more than {2^{R}} bits would contain {2} bits that are identical for all R messages. Intuitively, greater the redundancy in the encoding, the easier it is to perform the reduction.

While long code tests/dictatorship tests were originally conceived from a coding theoretic perspective, somewhat surprisingly these objects are intimately connected to semidefinite programs. This connection between semidefinite programs and dictatorship tests is the subject of today’s lecture. In particular, we will see a black-box reduction from SDP integrality gaps for Max Cut to dictatorship tests that can be used to show hardness of approximating Max Cut.

2.2. The case of Max Cut

The nature of dictatorship test needed for a hardness reduction varies with the specific problem one is trying to show is hard. To keep things concrete and simple, we will restrict our attention to the Max Cut problem.

A dictatorship test {\mathsf{DICT}} for the Max Cut problem consists of a graph on the set of vertices {\{\pm 1\}^{R}}. By convention, the graph {\mathsf{DICT}} is a weighted graph where the edge weights form a probability distribution (sum up to {1}). We will write {(\mathbf{z},\mathbf{z}') \in \mathsf{DICT}} to denote an edge sampled from the graph {\mathsf{DICT}} (here {\mathbf{z},\mathbf{z}' 	\in \{\pm 1\}^{R}}).

A cut of the {\mathsf{DICT}} graph can be thought of as a boolean function {\mathcal{F} : \{\pm 1\}^R \rightarrow \{\pm 1\}}. For a boolean function {\mathcal{F} : \{\pm 1\}^R \rightarrow \{\pm 1\}}, let {\mathsf{DICT}(\mathcal{F})} denote the value of the cut. The value of a cut {\mathcal{F}} is given by

\displaystyle  \mathsf{DICT}(\mathcal{F}) = \frac{1}{2}\mathop{\mathbb E}_{ (\mathbf{z}, \mathbf{z}')} \Big[ 		1 - \mathcal{F}(\mathbf{z}) \mathcal{F}(\mathbf{z}') \Big]

It is useful to define {\mathsf{DICT}(\mathcal{F})}, for non-boolean functions {\mathcal{F}: \{\pm 1\}^R \rightarrow [-1,1]} that take values in the interval {[-1,1]}. To this end, we will interpret a value {\mathcal{F}(\mathbf{z}) \in [-1,1]} as a random variable that takes {\{\pm 1\}} values. Specifically, we think of a number {a 	\in [-1,1]} as the following random variable

$ latex a = -1 $ with probability \frac{1-a}{2} and a = 1 with probability  $\frac{1+a}{2}$.

With this interpretation, the natural definition of {\mathsf{DICT}(\mathcal{F})} is as follows:

\displaystyle  \mathsf{DICT}(\mathcal{F}) = \frac{1}{2}\mathop{\mathbb E}_{ (\mathbf{z}, \mathbf{z}') \in 	\mathsf{DICT}} \Big[ 		1 - \mathcal{F}(\mathbf{z}) \mathcal{F}(\mathbf{z}') \Big] {\,.}

Indeed, the above expression is equal to the expected value of the cut obtained by randomly rounding the values of the function {\mathcal{F} : \{\pm 1\}^{R} \rightarrow [-1,1]} to {\{\pm 1\}} as described in Equation 2.

A Dictator Cut
A Dictator Cut

The dictator cuts are given by the functions {\mathcal{F}(\mathbf{z}) = z_\ell} for some {\ell \in [R]}. The {\mathsf{Completeness}} of the test {\mathsf{DICT}} is the minimum value of a dictator cut, i.e.,

\displaystyle  \mathsf{Completeness}(\mathsf{DICT}) = \min_{\ell \in [R]} 	\mathsf{DICT}(z_{\ell})

The soundness of the dictatorship test is the value of cuts that are far from every dictator. We will formalize the notion of being far from every dictator is formalized using influences as follows:

Cut Far From Every Dictator
Cut Far From Every Dictator

Continue reading

Lecture 8a. A primer on simplicial complexes and collapsibility

Before we can apply more advanced fixed point theorems to the Evasiveness Conjecture, we need a little background on simplicial complexes, and everything starts with simplices.


It’s most intuitive to start with the geometric viewpoint, in which case an n-simplex is defined to be the convex hull of n+1 affinely independent points in \mathbb R^n.  These points are called the vertices of the simplex.  Here are examples for n=0,1,2,3.



A simplicial complex is then a collection of simplices glued together along lower-dimensional simplices.  More formally, if S \subseteq \mathbb R^n is a (geometric) simplex, then a face of S is a subset F \subseteq S formed by taking the convex hull of a subset of the vertices of S.

Finally, a (geometric) simplicial complex is a collection \mathcal K of simplices such that

  1. If S \in \mathcal K and F is a face of S, then F \in \mathcal K, and
  2. If S,S' \in \mathcal K and S \cap S' \neq \emptyset, then S \cap S' is a face of both S and S'.

Property (1) gives us downward closure, and property (2) specifies how simplices can be glued together (only along faces).  For instance, the first picture depicts a simplicial complex.  The second does not.


Continue reading

Lecture 7: The evasiveness conjecture

Continuing our look at some toplogical methods, today we’ll see the evasiveness conjecture in decision tree complexity.  In the next lecture, we’ll see how we can sometimes analyze the complexity using fixed point theorems, and their generalizations (like the Hopf index formula), following the work of Kahn, Saks, and Sturtevant.  These two lectures are co-blogged with Elisa Celis, with a lot of input from Lovasz’s lecture notes.

Decision tree complexity and evasiveness

Consider a boolean function f : \{0,1\}^n \to \{0,1\} on n bits.  We define the decision tree complexity of f as follows.  Given an unknown input x \in \{0,1\}^n, you are allowed to ask about the values of various bits of x, e.g. x_{17}, x_{34}, x_3, \ldots.  Your goal is to compute f(x) using as few questions as possible, and your questions can be adaptive, depending on answers to previous questions.  The complexity of such a strategy is the maximum number of questions asked for any x \in \{0,1\}^n.  The decision tree complexity, written D(f), is the minimum complexity of any strategy that computes f.  (There are many other interesting models of decision complexity, see e.g. this survey.)

Clearly D(f) \leq n, because we can trivially query all the bits of x, and then output f(x).  A function f is called evasive if this upper bound is met, i.e. D(f)=n.  As an example, consider the parity function \mathsf{PARITY}(x) = x_1 \oplus x_2 \oplus \cdots \oplus x_n, where \oplus is addition modulo 2.  Clearly \mathsf{PARITY} is evasive because after n-1 bits of x are asked about, the setting of the final bit determines the value of f.

For a more general example, consider any f : \{0,1\}^n \to \{0,1\} such that \#\{ x : f(x)=1 \} is odd.  In this case, for every i =1,2,\ldots, n, exactly one of f|_{x_i=0} or f|_{x_i=1} has the same property that the number of inputs resulting in a 1 is odd.  (These two functions are the natural restriction of f to functions on n-1 bits, which results from fixing the value of the ith bit.)  Thus an adversary could keep answering questions “x_i?” so that the restricted function retains this property.  Since the number of inputs yielding a 1 is always odd, the restricted function always takes both possible values, implying that f is evasive–the advesary ensures that the value cannot be determined until all n possible questions are asked.

For an example of a non-evasive property, think of a point x \in \{0,1\}^{N \choose 2} as speciying a directed graph on N vertices, where there is exactly one directed edges connecting every pair of vertices, and x specifies the direction of this edge (this is called a tournament).  Thinking of the vertices as players, a directed edge from u to v means that u defeats v.  Now f(x)=1 if the digraph specified by x has one vertex that defeats everyone else.  What is D(f)?

Well, first we can conduct a single elimination tournament, where vertex 1 plays vertex 2, and the winner players vertex 3, and the winner of that players vertex 4, etc.  At the end, there is only one remaining vertex i that remains undefeated.  Now asking N-2 more questions, we can determine whether i indeeds defeats everyone else.  The total number of questions was (N-1)+(N-2) = 2N-3, hence D(f) \leq 2N-3 \ll {N \choose 2}, implying that f is not evasive.

Montone graph properties and the evasiveness conjecture

Let m = {N \choose 2}.  In general, we can encode an arbitrary undirected N-vertex graph as an element G \in \{0,1\}^{m}.  A function f: \{0,1\}^m \to \{0,1\} is called a graph property if relabeling the vertices of G doesn’t affect the value of f(G).  The function f is monotone if the value of the function can never change from 1 to 0 when flipping one of the input bits from 0 to 1.  In the setting of graph properties, this corresponds to those which are maintained under addition of edges to the graph, e.g. f(G) = “is G connected?” or f(G)= “does G have a k-clique?”

Evasiveness Conjecture (Aanderaa-Karp-Rosenberg): Every non-trivial monotone graph property is evasive.

Here, non-trivial means that f(\bar 0) \neq f(\bar 1), where \bar 0 and \bar 1 denote the all-zeros and all-ones strings, respectively.

For example, consider the example f(G) = “is G connected?”  The adversary is simple:  When asked about a possible edge \{i,j\}, she answers NO unless this answer would imply that the graph is disconnected.  In other words, she answers NO unless she has answered NO already for all edges across a cut (S,\bar S) except for \{i,j\}, in which case she has to answer YES.

Now, suppose there is a strategy which figures out the connectivity of G without asking a question about some edge \{i,j\}.  In this case, the conclusion must be that f(G) = \textrm{ YES} because the adversary always maintains that by answering everything in the future YES, she could force the graph to be connected.  In this case, the edges answered YES have to form a spanning tree of G (otherwise by answering all unasked questions NO, the graph would become disconnected).  Consider a path P from i to j in this YES spanning tree.  Let \{u,v\} be the edge of P which was asked about last.  Clearly the adversary answered YES for \{u,v\}, but this contradicts the advesary’s strategy.  Since \{i,j\} has not been asked yet, the adversary is safe to answer NO for \{u,v\}, and still later by answering YES on \{i,j\}, she could force the graph to be connected.  Thus no such strategy exists, and connectivity is evasive.

Continue reading

Lecture 6: Borsuk-Ulam and some combinatorial applications

Now we’ll move away from spectral methods, and into a few lectures on topological methods.  Today we’ll look at the Borsuk-Ulam theorem, and see a stunning application to combinatorics, given by Lovász in the late 70’s.  A great reference for this material is Matousek’s book, from which I borrow heavily.  I’ll also discuss why the Lovász-Kneser theorem arises in theoretical CS.

The Borsuk-Ulam Theorem

We begin with a statement of the theorem.  We will think of the n-dimensional sphere as the subset of {\mathbb R^{n+1}} given by

{S^n = \left\{ (x_1, \ldots, x_{n+1}) : x_1^2 + \cdots + x_{n+1}^2 = 1\right\}.}

Borsuk-Ulam Theorem: For every continuous mapping {f : S^n \to \mathbb R^n} , there exists a point {x \in S^n} with {f(x)=f(-x)} .

Pairs of points {x,-x \in S^n} are called antipodal.

There are a couple of common illustrative examples for the case {n=2} .  The theorem says that if you take the air out of a basketball, crumple it (no tearing), and flatten it out, then there are two points directly on top of each other which were antipodal before.  Another common example states that at every point in time, there must be two points on the earth which both have exactly the same temperature and barometric pressure (assuming, of course, that these two parameters vary continuously over the surface of the eath).

The n=1 case is completely elementary.  For the rest of the lecture, let’s use {N = (0,0,\ldots,1)} and {S=(0,0,\ldots,-1)} to denote the north and south poles (the dimension will be obvious from context).  To prove the n=1 case, simply trace out the path in {\mathbb R} starting at {f(N)} and going clockwise around {S^1} .  Simultaneously, trace out the path starting at {f(S)} and going counter-clockwise at the same speed.  It is easy to see that eventually these two paths have to collide:  At the point of collision, {f(x)=f(-x)} .

We will give the sketch of a proof for {n \geq 2.}   Let {g(x)=f(x)-f(-x)} , and note that our goal is to prove that {g(x)=0} for some {x \in S^n} .  Note that {g} is antipodal in the sense that {g(x)=-g(-x)} for all {x \in S^n} .  Now, if {g(x) \neq 0} for every {x} , then by compactness there exists an {\epsilon > 0} such that {\|g(x)\| > \epsilon} for all {x \in S^n} .  Because of this, we may approximate {g} arbitrarily well by a smooth map, and prove that the approximation has a 0.  So we will assume that {g} itself is smooth.

Now, define {h : S^n \to \mathbb R^n} by {h(x_1, \ldots, x_{n+1}) = (x_1, \ldots, x_n)} , i.e. the north/south projection map.  Let {X = S^n \times [0,1]} be a hollow cylinder, and let {F : X \to \mathbb R^n} be given by {F(x,t)=t g(x) + (1-t)h(x)} so that {F} linearly interpolates between {g} and {h} .

Also, let’s define an antipodality on {X} by {\nu(x,t) = (-x,t)} .  Note that {F} is antipodal with respect to {\nu} , i.e. {F(\nu(x,t))=-F(x,t)} , because both {g} and {h} are antipodal.  For the sake of contradiction, assume that {g(x) \neq 0} for all {x \in S^n} .

Now let’s consider the structure of the zero set {Z = F^{-1}(0)} .  Certainly {(N,0), (S,0) \in Z} since {h(N)=h(S)=0} , and these are h’s only zeros.  Here comes the sketchy part:  Since {g} is smooth, {F} is also smooth, and thus locally {F} can be approximated by an affine mapping {F_{\mathrm{loc}} : \mathbb R^{n+1} \to \mathbb R^n} .  It follows that if {F_{\mathrm{loc}}^{-1}(0)} is not empty, then it should be a subspace of dimension at least one.  By an arbitrarily small perturbation of the initial {g} , ensuring that {F} is sufficiently generic, we can ensure that {F_{\mathrm{loc}}^{-1}(0)} is either empty or a subspace of dimension one.  Thus locally, {F^{-1}(0)} should look like a two-sided curve, except at the boundaries {t=0} and {t=1} , where {F^{-1}(0)} (if non-empty) would look like a one-sided curve.  But, for instance, {F^{-1}(0)} cannot contain any Y-shaped branches.

It follows that {Z} is a union of closed cycles and paths whose endpoints must lie at the boundaries {t=0} and {t=1} .  ({Z} is represented by red lines in the cylinder above.)  But since there are only two zeros on the {t=0} sphere, and none on the {t=1} sphere, {Z} must contain a path {\gamma} from {(N,0)} to {(S,0).}   Since {F} is antipodal with respect to {\nu} , {\gamma} must also satisfy this symmetry, making it impossible for the segment initiating at N to ever meet up with the segment initiating at S.  Thus we arrive at a contradiction, implying that {g} must take the value 0.

Notice that the only important property we used about {h} (other than its smoothness) is that is has a number of zeros which is twice an odd number.  If {h} had, e.g. four zeros, then we could have two {\nu} -symmetric paths emanating from and returning to the bottom.  But if {h} has six zeros, then we would again reach a contradiction.

Continue reading

Lecture 5: Uniformizing graphs, multi-flows, and eigenvalues

In the previous lecture, we gave an upper bound on the second eigenvalue of the Laplacian of (bounded degree) planar graphs in order to analyze a simple spectral partitioning algorithm.  A natural question is whether these bounds extend to more general families of graphs.  Well-known generalizations of planar graphs are those which can be embedded on a surface of fixed genus, and, more generally, families of graphs that arise by forbidding minors.  In fact, Spielman and Teng conjectured that for any graph excluding K_h as a minor, one should have \lambda_2 \lesssim \mathrm{poly}(h) d_{\max}/n.   Of course planar graphs have genus 0, and by Wagner’s theorem, are precisely the graphs which exclude K_5 and K_{3,3} as minors.  In this lecture, we will follow an intrinsic approach of Biswal, myself, and Rao which, in particular, is able to resolve the conjecture of Spielman and Teng.  First, we see why even pushing the conformal approach to bounded genus graphs is difficult.

Bounded genus graphs

For graphs of bounded genus, there is hope to use an approach based on conformal mappings.  In 1980, Yang and Yau proved that

\displaystyle \lambda_2(M) \lesssim \frac{g+1}{\mathrm{vol}(M)}

for any compact Riemannian surface M of genus g.  (Note that for the Laplace-Beltrami operator, one usually writes \lambda_1 as the first non-zero eigenvalue, rather than \lambda_2.)  In analog with Hersch’s proof of the genus 0 case, they use Riemann-Roch to obtain a degree-(g+1) conformal mapping to the Riemann sphere, then try to pull back a second eigenfunction.  A factor of the degree is lost in the Rayleigh quotient (hence the g+1 factor in the preceding bound), and Hersch’s Möbius trick is still required.

An analogous proof for graphs G of bounded genus would proceed by constructing a circle packing of G on the sphere S^2, but instead of the circles having disjoint interiors, we would be assured that every point of S^2 is contained in at most g circles.  Unfortunately, such a result is impossible (this has to do with the handling of branch points in the discrete setting).  Kelner has to take a different approach in his proof that \lambda_2(G) \leq d_{\max}^{O(1)} (g+1)/n for graphs G of genus at most g.

He starts with a circle packing of G on a compact surface \mathbb S_0 of genus g (whose existence follows from results of Beardon and Stephenon and He and Schramm).  Then Kelner randomly subdivides G repeatedly, and these subdivisions give progressively better approximations to some sequence of surfaces \{\mathbb S_i\}.  Once the approximation is of high enough quality, one applies Riemann-Roch to \mathbb S_k, and infers something about a subdivision of G.  The final element is to track how the second eigenvalue of G changes (in expectation) under random subdivision.

Needless to say, this approach is already quite delicate, and for graphs that can’t be equipped with some kind of conformal structure, we seem to have reached a dead end.  In this lecture, we’ll see how to use intrinsic deformations of the geometry of G in order to bound its eigenvalues.  Eventually, this will reduce to the study of certain kinds of multi-commodity flows.

Metrics on graphs

Let G=(V,E) be an arbitrary n-vertex graph with maximum degree d_{\max}.  Recall that we can write

\displaystyle \lambda_2 = \min_{f \neq 0 : \sum_{x \in V} f(x)=0} \frac{\sum_{xy \in E} |f(x)-f(y)|^2}{\sum_{x \in V} f(x)^2}.

where f : V \to \mathbb R.  (Also recall that we can replace \mathbb R by any Hilbert space, and the same formula holds.)  The first step is to prepare this equality for “non-linearization” by getting rid of the linear condition \sum_{x \in V} f(x)=0 and the sum \sum_{x \in V} f(x)^2.  (This is a popular sort of passage in the non-linear geometry of Banach spaces, which also plays a rather important role in applications to the theoretical CS.)  The goal is to get only terms that look like |f(x)-f(y)|.  Fortunately, there is a well-known way to do this:

\displaystyle \lambda_2 = 2 n \cdot \min_{f : V \to \mathbb R} \frac{\sum_{xy \in E} |f(x)-f(y)|^2}{\sum_{x,y \in V} |f(x)-f(y)|^2},

which follows easily from the equality \sum_{x,y \in V} |f(x)-f(y)|^2 = 2n \sum_{x \in V} f(x)^2 when \sum_{x \in V} f(x)=0.

Thus if we want to bound \lambda_2 = O(1/n), we need to find an f : V \to \mathbb R for which the latter ratio (without the 2n) is O(1/n^2).  Now, for someone who works a lot with linear programming relaxations, it’s very natural to consider a “relaxation”

\displaystyle \gamma(G) = \min_{d} \frac{\sum_{xy \in E} d(x,y)^2}{\sum_{x,y \in V} d(x,y)^2},

where the minimization is over all pseudo-metrics d, i.e. symmetric non-negative functions d : V \times V \to \mathbb R which satisfy the triangle inequality, but might have d(x,y)=0 even for x \neq y.  Certainly \gamma(G) \leq \lambda_2/2n, but Bourgain’s embedding theorem (which states that every n-point metric space embeds into a Hilbert space with distortion at most O(\log n)) also assures us that \lambda_2(G) \leq O(n \log^2 n) \gamma(G).  Since we are trying to show that \gamma(G) = O(1/n^2), this O(\log^2 n) term is morally negligible.  One can see the paper for a more advanced embedding argument that doesn’t lose this factor, but for now we concentrate on proving that \gamma(G) = O(1/n^2).  The embedding theorems allow us to concentrate on finding an intrinsic metric on the graph with small “Rayleigh quotient,” without having to worry about an eventual geometric representation.

As a brief preview… we are going to find a good metric d by taking a certain kind of all-pairs multi-commodity flow at optimality, and weighting the edges by their congestion in the optimal flow.  Thus as the flow spreads out on the graph, it has the effect of “uniformizing” its geometry.

Discrete Riemannian metrics, convexification, and duality

Let’s now assume that G is planar.  We want to show that \gamma(G) = O(d_{\max}/n^2).  First, let’s restrict ourselves to vertex weighted metrics on G.  Given any non-negative weight function \omega : V \to \mathbb R, we can define the length of a path P in G by summing the weights of vertices along it:  \mathsf{len}_{\omega}(P) = \sum_{x \in P} \omega(x).  Then we can define a vertex-weighted shortest-path pseudo-metric on G in the natural way

\displaystyle \mathsf{dist}_{\omega}(x,y) = \min \left\{ \mathsf{len}_{\omega}(P) : P \in \mathcal P_{xy}\right\},

where \mathcal P_{uv} is the set of all u-v paths in G.  We also have the nice relationship

\displaystyle \sum_{xy \in E} \mathsf{dist}_{\omega}(x,y)^2 \leq 2 d_{\max} \sum_{x \in V} \omega(x)^2,\qquad(1)

since \mathsf{dist}_{\omega}(x,y) \leq [\omega(x)+\omega(y)]^2.

So if we define

\displaystyle \Lambda_0(\omega) = \frac{\displaystyle \sum_{x \in V} \omega(x)^2}{\displaystyle \sum_{x,y \in V} \mathsf{dist}_{\omega}(x,y)^2}

then by (1), we have \gamma(G) \leq 2 d_{\max} \min_{\omega} \Lambda_0(\omega).

Examples. Let’s try to exhibit weights \omega for two well-known examples:  the grid, and the complete binary tree.

For the \sqrt{n} \times \sqrt{n} grid, we can simply take \omega(x)=1 for all x \in V.  Clearly \sum_{x \in V} \omega(x)^2 = n.  On the other hand, a random pair of points in the grid is \Omega(\sqrt{n}) apart, hence \sum_{x,y \in V} \mathsf{dist}_{\omega}(x,y)^2 \approx n^2 \cdot (\sqrt{n})^2 = n^3.  It follows that \Lambda_0(\omega) = O(1/n^2), as desired.

For the complete binary tree with root r, we can simply put \omega(r)=1 and \omega(x)=0 for x \neq r.  (Astute readers will guess the geometrically decreasing weights are actually the optimal choice.)  In this case, \sum_{x \in V} \omega(x)^2 = 1, while all the pairs x,y on opposite sides of the root have \mathsf{dist}_{\omega}(x,y)=1.  It again follows that \Lambda_0(\omega) = O(1/n^2).  Our goal is to provide such a weight \omega for any planar graph.

Continue reading

Lecture 4: Conformal mappings, circle packings, and spectral geometry

In Lecture 2, we used spectral partitioning to rule out the existence of a strong parallel repetition theorem for unique games.  In practice, spectral methods are a very successful heuristic for graph partitioning, and in the present lecture we’ll see how to analyze these partitioning algorithms for some common families of graphs.

Balanced separators, eigenvalues, and Cheeger’s inequality

Lipton and Tarjan proved that every planar graph has a negligibly small set of nodes whose remval splits the graph into two roughly equal pieces.  More specifically, every n-node planar graph can be partitioned into three disjoint sets A,B,S such that there are no edges from A to B, the separator S has at most O(\sqrt{n}) nodes, and |A|,|B| \geq n/3.  This allows one to do all sorts of things, e.g. a simple divide-and-conquer algorithm gives a linear time (1+\epsilon)-approximation for the maximum independent set problem in such graphs, for any \epsilon > 0.

So there is a natural question of how well spectral methods do, for example, on planar graphs.  Spielman and Teng showed that for bounded-degree planar graphs, a simple recursive spectral algorithm recovers a partition V=A \cup B of the vertex set so that |E(A,B)| = O(\sqrt{n}).  In other words, for bounded-degree planar graphs, spectral methods recover the Lipton-Tarjan separator theorem!  This is proved by combining Cheeger’s inequality with their main theorem.

Theorem [Spielman-Teng]: Every n-node planar graph with maximum degree d_{\max} has \displaystyle \lambda_2(G) = O\left(\frac{d_{\max}}{n}\right), where \lambda_2(G) is the second eigenvalue of the combinatorial Laplacian on G.

Recall that we introduced the combinatorial Laplacian in Lecture 2.  If G=(V,E) is an arbitrary finite graph, in this lecture it will make more sense to think about the Laplacian \Delta as an operator on functions f : V \to \mathbb R given by

\displaystyle \Delta f(x) = \mathrm{deg}(x) f(x) - \sum_{y : xy \in E} f(y).

If we define the standard inner product \langle f,g\rangle = \sum_{x \in V} f(x)g(x), then one can easily check that for any such f, we have \langle f, \Delta f\rangle = \sum_{xy \in E} |f(x)-f(y)|^2.  In particular, this implies that \Delta is a positive semi-definite operator.  If we denote its eigenvalues by \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n, then it is also easy to check that \lambda_1 = 0, with corresponding eigenfunction f(x)=1 for every x\in V.

Thus by standard variational principles, we have

\displaystyle \lambda_2 = \min_{f \neq 0 : \sum_{x \in V} f(x)=0} \frac{\sum_{xy \in E} |f(x)-f(y)|^2}{\sum_{x \in V} f(x)^2}.

Let us also define the Cheeger constant h_G.  For an arbitrary subset S \subseteq V, let

\displaystyle h(S) = \frac{|E(S, \bar S)|}{\min(|S|,|\bar S|)},

note that this definition varies from the h we defined in Lecture 2, because we will be discussing eigenfunctions without boundary conditions.  Now one defines h_G = \min_{S \subseteq V} h(S).

Finally, we have the version of Cheeger’s inequality (proved by Alon and Milman in the discrete setting) for graphs without boundary.

Cheeger’s inequality: If G=(V,E) is any graph with maximum degree d_{\max}, then

\displaystyle \lambda_2(G) \geq \frac{h(G)^2}{2d_{\max}}.

This follows fairly easy from the Dirichlet version of Cheeger’s inequality presented in Lecture 2.  Here’s a sketch:  Let f : V \to \mathbb R satisfy \Delta f = \lambda_2 f, and suppose, without loss of generality, that V_+ = \{ x : f(x) > 0 \} has |V_+| \geq n/2.  Define f_+(x)=f(x) for f(x) > 0 and f_+(x)=0 otherwise.  Then f_+|_B = 0 for B = V \setminus V_+, so we can plug f_+ into the Dirichlet version of Cheeger’s inequality with boundary conditions on B.  For the full analysis, see this note which essentially follows this approach.  By examining the proof, note that one can find a subset S \subseteq V with h(S) \leq \sqrt{2 d_{\max} \lambda_2} by a simple “sweep” algorithm:  Arrange the vertices V = \{v_1, v_2, \ldots, v_n\} so that f(v_1) \leq f(v_2) \leq \cdots \leq f(v_n), and output the best of the n-1 cuts \{v_1, \ldots, v_i\}, \{v_{i+1}, \ldots, v_n\}.

So using the eigenvalue theorem of Spielman and Teng, along with Cheeger’s inequality, we can find a set S \subseteq V with h(S) \lesssim \sqrt{d_{\max}/n}.  While this cut has the right Cheeger constant, it is not necessarily balanced (i.e. \min(S, \bar S) could be very small).  But one can apply this algorithm recursively, perhaps continually cutting small chunks off of the graph until a balanced cut is collected.  Refer to the Spielman-Teng paper for details.  A great open question is how one might use spectral information about G to recover a balanced cut immediately, without the need for recursion.

Conformal mappings and circle packings

Now we focus on proving the bound \lambda_2(G) \lesssim d_{\max}/n for any planar graph G.  A natural analog is to look at what happens for the Laplace-Beltrami operator for a Riemannian metric on the 2-sphere.  In fact, Hersch considered this problem almost 40 years ago and proved that \lambda_2(M) \lesssim 1/\mathrm{vol}(M), for any such Riemannian manifold M.  His approach was to first use the uniformization theorem to get a conformal mapping from M onto S^2, and then try to pull-back the standard second eigenfunctions on S^2 \subseteq \mathbb R^3 (which are just the three coordinate projections).  Since the Dirichlet energy is conformally invariant in dimension 2, this almost works, except that the pulled-back map might not be orthogonal to the constant function.  To fix this, he has to post-process the initial conformal mapping with an appropriate Möbius transformation.

Unaware of Hersch’s work, Spielman and Teng derived eigenvalue bounds for planar graphs using the discrete analog of this approach:  Circle packings replace conformal mappings, and one still has to show the existence of an appropriate post-processing Möbius transformation.

Continue reading

Lecture 2: Spectral partitioning and near-optimal foams

In the last lecture, we reduced the problem of cheating in \mathcal G_m^{\otimes k} (the k-times repeated m-cycle game) to finding a small set of edges \mathcal E in (\mathbb Z_m^k)_\infty whose removal eliminates all topologically non-trivial cycles.  Such a set \mathcal E is called a spine. To get some intuition about how many edges such a spine should contain, let’s instead look at a continuous variant of the problem.

Spines, Foams, and Isoperimetry

Consider again the k-dimensional torus \mathcal T^k = \mathbb R^k/\mathbb Z^k, which one can think of as \lbrack 0,1)^k with opposite sides identified.  Say that a nice set (e.g. a compact, C^\infty surface) \mathcal E \subseteq \mathcal T^k is a spine if it intersects every non-contractible loop in \mathcal T^k.  This is the continuous analog of a spine in (\mathbb Z_m^k)_\infty.  We will try to find such a spine \mathcal E with surface area, i.e. \mathrm{Vol}_{k-1}(\mathcal E), as small as possible.

Let’s consider some easy bounds.  First, it is clear that the set

\displaystyle \mathcal E = \left\{(x_1, \ldots, x_k) \in [0,1)^k : \exists i \in \{1,2,\ldots,k\}, x_i = 0\right\}

is a spine with \mathrm{Vol}_{k-1}(\mathcal E) = k.  (A moment’s thought shows that this is “equivalent” to the provers playing independent games in each coordinate of \mathcal G_m^{\otimes k}.)

To get a good lower bound, it helps to relate spines to foams which tile \mathbb R^k according to \mathbb Z^k, as follows.  Take two potential spines.

To determine which curve is actually a spine, we can repeatedly tile them side-by-side.

The first tiling contains the blue bi-infinite curve, which obviously gives a non-trivial cycle in \mathcal T^k, while the second yields a tiling of \mathbb R^k by bodies of volume 1.  It is easy to deduce the following claim.

Claim: A surface \mathcal E \subseteq \mathcal T^k is a spine if and only if it induces a tiling of the plane by bodies of volume 1 which is invariant under shifts by \mathbb Z^k.

By the isoperimetric inequality in \mathbb R^k, this immediately yields the bound

\displaystyle \mathrm{Vol}_{k-1}(\mathcal E) \geq \mathrm{Vol}_{k-1}(S^{k-1}) \approx \sqrt{k},

where S^{k-1} is the unit (k-1)-dimensional sphere.

So the the surface area of an optimal spine lies somewhere between k and \sqrt{k}.  On the one hand, cubes tile very nicely but have large surface area.  On the other hand, we have sphere-like objects which have small surface area, but don’t seem (at least intuitively) to tile very well at all.  As first evidence that this isn’t quite right, note that it is known how to cover \mathbb R^k by disjoint bodies of volume at most 1 so that the surface area/volume ratio grows like \sqrt{k}.  See Lemma 3.16 in this paper, which is based on Chekuri, et. al.  It’s just that these covers are not invariant under \mathbb Z^k shifts.

Before we reveal the answer, let’s see what consequences the corresponding discrete bounds would have for parallel repetition of the m-cycle game.  If the “cube bound” were tight, we would have \mathsf{val}(\mathcal G_m^{\otimes k}) \approx 1 - \frac{k}{m}, which doesn’t rule out a strong parallel repetition theorem (\alpha^*=1 in the previous lecture).  If the “sphere bound” were tight, we would have \mathsf{val}(\mathcal G_m^{\otimes k}) \approx 1 - \frac{\sqrt{k}}{m}, which shows that \alpha^* \geq 2.   In the latter case, the approach to proving equivalence of the UGC and MAX-CUT conjectures doesn’t even get off the ground.

As the astute reader might have guessed, recently Ran Raz proved that \mathsf{val}(\mathcal G_m^{\otimes k}) \geq 1 - C\frac{\sqrt{k}}{m} for some constant C > 0, showing that a strong parallel repetition theorem—even for unique games—is impossible.   Subsequently, Kindler, O’Donnell, Rao, and Wigderson showed that there exists a spine \mathcal E \subseteq \mathcal T^k with \mathrm{Vol}_{k-1}(\mathcal E) \approx \sqrt{k}.  While it is not difficult to show that the continuous result implies Raz’s discrete result, we will take a direct approach found recently by Alon and Klartag.

Continue reading

Lecture 1: Cheating with foams

This the first lecture for CSE 599S:  Analytical and geometric methods in the theory of computation.  Today we’ll consider the gap amplification problem for 2-prover games, and see how it’s intimately related to some high-dimensional isoperimetric problems about foams.  In the next lecture, we’ll use spectral techniques to find approximately optimal foams (which will then let us cheat at repeated games).

The PCP Theorem, 2-prover games, and parallel repetition

For a 3-CNF formula \varphi, let \mathsf{sat}(\varphi) denote the maximum fraction of clauses in \varphi which are simultaneously satisfiable. For instance, \varphi is satisfiable if and only if \mathsf{sat}(\varphi)=1. One equivalent formulation of the PCP Theorem is that the following problem is NP-complete:

Formulation 1.

Given a 3-CNF formula \varphi, answer YES if \mathsf{sat}(\varphi)=1 and NO if \mathsf{sat}(\varphi) \leq 0.9 (any answer is acceptable if neither condition holds).

We can restate this result in the language of 2-prover games. A 2-prover game \mathcal G consists of four finite sets Q,Q',A,A', where Q and Q' are sets of questions, while A and A' are sets of answers to the questions in Q and Q', respectively. There is also a verifier V : Q \times Q' \times A \times A' \to \{0,1\} which checks the validity of answers. For a pair of questions (q,q') and answers (a,a'), the verifier is satisfied if and only if V(q,q',a,a')=1. The final component of \mathcal G is a probability distribution \mu on Q \times Q'.

Now a strategy for the game consists of two provers P : Q \to A and P' : Q' \to A' who map questions to answers. The score of the two provers (P,P') is precisely

\displaystyle\mathsf{val}_{P,P'}(\mathcal G) = \Pr \left[V(q, q', P(q), P(q'))=1\vphantom{\bigoplus}\right],

where (q,q') is drawn from \mu. This is just the probability that the verifier is happy with the answers provided by the two provers. The value of the game is now defined as

\displaystyle\mathsf{val}(\mathcal G) = \max_{P,P'} \mathsf{val}_{P,P'}(\mathcal G),

i.e. the best-possible score achievable by any two provers.

Now we can again restate the PCP Theorem as saying that the following problem is NP-complete:

Formulation 2.

Given a 2-prover game with \mathcal G with |A|,|A'|=O(1), answer YES if \mathsf{val}(\mathcal G)=1 and answer NO if \mathsf{val}(\mathcal G) \leq 0.99.

To see that Formulation 1 implies Formulation 2, consider, for any 3-CNF formula \varphi, the game \mathcal G_{\varphi} defined as follows. Q is the set of clauses in \varphi, Q' is the set of variables in \varphi, while A = \{ TTT, TTF, TFT, FTT, TFF, FTF, FFT, FFF \} and A' = \{T, F\}. Here A represents the set of eight possible truth assignments to a three-variable clause, and A' represents the set of possible truth assignments to a variable.

The distribution \mu is defined as follows: Choose first a uniformly random clause C \in Q, and then uniformly at random one of the three variables x \in Q' which appears in C. An answer (a_C, a_x) \in A \times A' is valid if the assignment a_C makes C true, and if a_x and a_C are consistent in the sense that they give the same truth value to the variable x. The following statement is an easy exercise:

For every 3-CNF formula \varphi, we have \mathsf{val}(\mathcal G_{\varphi}) = \frac23 + \frac13 \mathsf{sat}(\varphi).

The best strategy is to choose an assignment \mathcal A to the variables in \varphiP' plays according to \mathcal A, while P plays according to \mathcal A unless he is about to answer with an assignment that doesn’t satisfy the clause C.   At that point, he flips one of the literals to make his assignment satisfying (in this case, the chance of catching P cheating is only 1/3, the probability that P' is sent the variable that P flipped).

This completes our argument that Formulation 1 implies Formulation 2.

Parallel repetition
A very natural question is whether the constant 0.9 in Formulation 2 can be replaced by 0.001 (or an arbitrarily small constant). A natural way of gap amplification is by “parallel repetition” of a given game. Starting with a game \mathcal G = \langle Q,Q',A,A',V,\mu \rangle, we can consider the game \mathcal G^{\otimes k} = \langle Q^k, Q'^k, A^k, A'^k, V^{\otimes k}, \mu^{\otimes k} \rangle, where \mu^{\otimes k} is just the product distribution on (Q \times Q')^k \cong Q^k \times Q'^k. Here, we choose k pairs of questions (q_1, q'_1), \ldots, (q_k, q'_k) i.i.d. from \mu and the two provers then respond with answers (a_1, \ldots, a_k) \in A^k and (a'_1, \ldots, a'_k) \in A'^k. The verifier V^{\otimes k} is satisfied if and only if V(q_i, q'_i, a_i, a'_i)=1 for every i = 1, 2, \ldots, k.

Clearly \mathsf{val}(\mathcal G^{\otimes k}) \geq \mathsf{val}(G)^k because given a strategy (P,P') for \mathcal G, we can play the same strategy in every coordinate, and then our probability to win is just the probability that we simultaneously win k independent games. But is there a more clever strategy that can do better? Famously, early papers in this arena assumed it was obvious that \mathsf{val}(\mathcal G^{\otimes k}) = \mathsf{val}(\mathcal G)^k.

In fact, there are easy examples of games \mathcal G where \mathsf{val}(\mathcal G^{\otimes 2}) = \mathsf{val}(\mathcal G) = \frac12.  (Exercise: Show that this is true for the following game devised by Uri Feige. The verifier chooses two independent random bits b, b' \in \{0,1\}, and sends b to P and b' to P'. The answers of the two provers are from the set \{1,2\} \times \{0,1\}. The verifier accepts if both provers answer (1,b) or both provers answer (2,b').)

Nevertheless, in a seminal work, Ran Raz proved the Parallel Repetition Theorem, which states that the value of the repeated game does, in fact, drop exponentially.

Theorem 1.1: For every 2-prover game \mathcal G, there exists a constant c = O(\log(|A|+|A'|)) such that if \mathsf{val}(\mathcal G) \leq 1-\epsilon, then for every k \in \mathbb N,

\displaystyle\mathsf{val}(\mathcal G^{\otimes k}) \leq (1-\epsilon^{3})^{k/c}.

The exponent 3 above is actually due to an improvement of Holenstein (Raz’s original paper can be mined for an exponent of 32).

Continue reading