# Lecture P1. From Integrality Gaps to Dictatorship Tests

Here are Prasad Raghavendra‘s notes on one of two guest lectures he gave for CSE 599S. Prasad will be a faculty member at Georgia Tech after he finishes his postdoc at MSR New England.

In yet another excursion in to applications of discrete harmonic analysis, we will now see an application of the invariance principle, to hardness of approximation. We will complement this discussion with the proof of the invariance principle in the next lecture.

1. Invariance Principle

In its simplest form, the central limit theorem asserts the following: “As ${n}$ increases, the sum of ${n}$ independent bernoulli random variables (${\pm 1}$ random variables) has approximately the same distribution as the sum of ${n}$ independent normal (Gaussian with mean ${0}$ and variance ${1}$) random variables”

Alternatively, as ${n}$ increases, the value of the polynomial ${F(\mathbf x) = \frac{1}{\sqrt{n}} (x^{(1)} + x^{(2)} + \ldots x^{(n)})}$ has approximately the same distribution whether the random variables ${x^{(i)}}$ are i.i.d Bernoulli random variables or i.i.d normal random variables. More generally, the distribution of ${p(\mathbf x)}$ is the approximately the same as long as the random variables ${x^{(i)}}$ are independent with mean ${0}$, variance ${1}$ and satisfy certain mild regularity assumptions.

A phenomenon of this nature where the distribution of a function of random variables, depends solely on a small number of their moments is referred to as invariance.

A natural approach to generalize of the above stated central limit theorem, is to replace the “sum” (${\frac{1}{\sqrt{n}} (x^{(1)} + x^{(2)} + \ldots x^{(n)})}$) by other multivariate polynomials.   As we will see in the next lecture, the invariance principle indeed holds for low degree multilinear polynomials that are not influenced heavily by any single coordinate. Formally, define the influence of a coordinate on a polynomial as follows:

Definition 1 For a multilinear polynomial ${F(\mathbf{x}) = \sum_{\sigma} \hat{F}_{\sigma} \prod_{i \in [R]} x^{(\sigma_i)}}$ define the influence of the ${i^{th}}$ coordinate as follows:

${ \mathrm{Inf}_{i}(F) = \mathop{\mathbb E}_{x} [\mathrm{Var}_{x^{(i)}}[F]] = \sum_{\sigma_{i} \neq 0} \hat{F}^2_\sigma}$

Here ${\mathrm{Var}_{x^{(i)}}[F]}$ denotes the variance of ${F(x)}$ over random choice of ${x_i}$.

The invariance principle for low degree polynomials was first shown by Rotar in 1979. More recently, invariance principles for low degree polynomials were shown in different settings in the work of Mossel-O’Donnell-Olekszciewicz and Chatterjee. The former of the two works also showed the Majority is Stablest conjecture, and has been influential in introducing the powerful tool of invariance to hardness of approximation.

Here we state a special case of the invariance principle tailored to the application at hand.  To this end, let us first define a rounding function ${f_{[-1,1]}}$ as follows:

$\displaystyle f_{[-1,1]}(x) = \begin{cases} -1 &\text{ if } x < -1 \\ x &\text{ if } -1\leqslant x\leqslant 1 \\ 1 &\text{ if } x > 1 \end{cases}$

Theorem 2 (Invariance Principle [Mossel-ODonnell-Oleszkiewicz]) Let ${\mathbf{z} = \{z_1, z_2\}}$ and ${\mathbf{G} = \{g_1,g_2\}}$ be sets of Bernoulli and Gaussian random variables respectively. Furthermore, let

$\displaystyle \mathop{\mathbb E}[z_i] = \mathop{\mathbb E}[g_i] = 0 \qquad \qquad \mathop{\mathbb E}[z_i^2] = \mathop{\mathbb E}[g_i^2] = 1 \qquad \qquad \forall i \in [2]$

and ${\mathop{\mathbb E}[z_1 z_2] = \mathop{\mathbb E}[g_1 g_2]}$. Let ${\mathbf{z}^R, \mathbf{G}^R}$ denote ${R}$ independent copies of the random variables ${\mathbf{z}}$ and ${\mathbf{G}}$.

Let ${F}$ be a multilinear polynomial given by ${F(\mathbf{x}) = \sum_{\sigma} \hat{F}_{\sigma} \prod_{i \in \sigma} x^{(i)}}$, and let ${H(\mathbf{x})=T_{1-\epsilon} F(\mathbf{x}) = \sum_{\sigma}(1-\epsilon)^{|\sigma|} \hat{F}_{\sigma} \prod_{i\in \sigma} x^{(i)}}$. If ${\mathrm{Inf}_\ell(H) \leqslant \tau}$ for all ${\ell \in [R]}$, then the following statements hold:

1. For every function ${\Psi : \mathbb{R}^2 \rightarrow \mathbb{R}}$ which is thrice differentiable with all its partial derivatives up to order ${3}$ bounded uniformly by ${C_0}$,

$\displaystyle \Big|\mathop{\mathbb E}\Big[\Psi(H(\mathbf{z}_1^{R}), H(\mathbf{z}_2^{R}))\Big] - \mathop{\mathbb E}\Big[\Psi(H(\mathbf{g}_1^R), H(\mathbf{g}_2^R))\Big] \Big| \leqslant \tau^{O(\epsilon)}$

2. Define the function ${\xi : {\mathbb R}^2 \rightarrow {\mathbb R}}$ as ${\xi(\mathbf{x}) = \sum_{i\in [2]} (x_i - f_{[-1,1]}(x_i))^2}$ Then, we have ${ \Big|\mathop{\mathbb E}[\xi(H(\mathbf{z}_1^{n}), H(\mathbf{z}_2^{n}))] - \mathop{\mathbb E}[\xi(H(\mathbf{g}_1^{n}),H(\mathbf{g}_2^{n}))] \Big| \leqslant \tau^{O(\epsilon)} }$

2. Dictatorship Testing

A boolean function ${\mathcal{F}(x_1,x_2,\ldots,x_n)}$ on ${n}$ bits is a dictator or a long code if ${\mathcal{F}(x) = x_i}$ for some ${i}$. Given the truth table of a function ${\mathcal{F}}$, a dictatorship test is a randomized procedure that queries a few locations (say ${2}$) in the truth table of ${\mathcal{F}}$, and tests a predicate ${P}$ on the values it queried. If the queried values satisfy the predicate ${P}$, the test outputs ACCEPT else it outputs REJECT. The goal of the test is to determine if ${\mathcal{F}}$ is a dictator or is far from being a dictator. The main parameters of interest in a dictatorship test are :

• Completeness${(c)}$ Every dictator function ${\mathcal{F}(x) = x_i}$ is accepted with probability at least ${c}$.
• Soundness${(s)}$ Any function ${\mathcal{F}}$ which is far from every dictator is accepted with probability at most ${s}$.
• Number of queries made, and the predicate ${P}$ used by the test.

2.1. Motivation

The motivation for the problem of dictatorship testing arises from hardness of approximation and PCP constructions. To show that an optimization problem ${\Lambda}$ is ${\mathbf{NP}}$-hard to approximate, one constructs a reduction from a well-known ${\mathbf{NP}}$-hard problem such as Label Cover to ${\Lambda}$. Given an instance ${\Im}$ of the Label Cover problem, a hardness reduction produces an instance ${\Im'}$ of the problem ${\Lambda}$. The instance ${\Im'}$ has a large optimum value if and only if ${\Im}$ has a high optimum. Dictatorship tests serve as “gadgets” that encode solutions to the Label Cover, as solutions to the problem ${\Lambda}$. In fact, constructing an appropriate dictatorship test almost always translates in to a corresponding hardness result based on the Unique Games Conjecture.

Dictatorship tests or long code tests as they are also referred to, were originally conceived purely from the insight of error correcting codes. Let us suppose we are to encode a message that could take one of ${R}$ different values ${\{m_1,\ldots,m_{R}\}}$. The long code encoding of the message ${m_\ell}$ is a bit string of length ${2^{R}}$, consisting of the truth table of the function ${\mathcal{F}(x_1,\ldots,x_R) = x_\ell}$. This encoding is maximally redundant in that any binary encoding with more than ${2^{R}}$ bits would contain ${2}$ bits that are identical for all $R$ messages. Intuitively, greater the redundancy in the encoding, the easier it is to perform the reduction.

While long code tests/dictatorship tests were originally conceived from a coding theoretic perspective, somewhat surprisingly these objects are intimately connected to semidefinite programs. This connection between semidefinite programs and dictatorship tests is the subject of today’s lecture. In particular, we will see a black-box reduction from SDP integrality gaps for Max Cut to dictatorship tests that can be used to show hardness of approximating Max Cut.

2.2. The case of Max Cut

The nature of dictatorship test needed for a hardness reduction varies with the specific problem one is trying to show is hard. To keep things concrete and simple, we will restrict our attention to the Max Cut problem.

A dictatorship test ${\mathsf{DICT}}$ for the Max Cut problem consists of a graph on the set of vertices ${\{\pm 1\}^{R}}$. By convention, the graph ${\mathsf{DICT}}$ is a weighted graph where the edge weights form a probability distribution (sum up to ${1}$). We will write ${(\mathbf{z},\mathbf{z}') \in \mathsf{DICT}}$ to denote an edge sampled from the graph ${\mathsf{DICT}}$ (here ${\mathbf{z},\mathbf{z}' \in \{\pm 1\}^{R}}$).

A cut of the ${\mathsf{DICT}}$ graph can be thought of as a boolean function ${\mathcal{F} : \{\pm 1\}^R \rightarrow \{\pm 1\}}$. For a boolean function ${\mathcal{F} : \{\pm 1\}^R \rightarrow \{\pm 1\}}$, let ${\mathsf{DICT}(\mathcal{F})}$ denote the value of the cut. The value of a cut ${\mathcal{F}}$ is given by

$\displaystyle \mathsf{DICT}(\mathcal{F}) = \frac{1}{2}\mathop{\mathbb E}_{ (\mathbf{z}, \mathbf{z}')} \Big[ 1 - \mathcal{F}(\mathbf{z}) \mathcal{F}(\mathbf{z}') \Big]$

It is useful to define ${\mathsf{DICT}(\mathcal{F})}$, for non-boolean functions ${\mathcal{F}: \{\pm 1\}^R \rightarrow [-1,1]}$ that take values in the interval ${[-1,1]}$. To this end, we will interpret a value ${\mathcal{F}(\mathbf{z}) \in [-1,1]}$ as a random variable that takes ${\{\pm 1\}}$ values. Specifically, we think of a number ${a \in [-1,1]}$ as the following random variable

$latex a = -1$ with probability $\frac{1-a}{2}$ and $a = 1$ with probability  $\frac{1+a}{2}$.

With this interpretation, the natural definition of ${\mathsf{DICT}(\mathcal{F})}$ is as follows:

$\displaystyle \mathsf{DICT}(\mathcal{F}) = \frac{1}{2}\mathop{\mathbb E}_{ (\mathbf{z}, \mathbf{z}') \in \mathsf{DICT}} \Big[ 1 - \mathcal{F}(\mathbf{z}) \mathcal{F}(\mathbf{z}') \Big] {\,.}$

Indeed, the above expression is equal to the expected value of the cut obtained by randomly rounding the values of the function ${\mathcal{F} : \{\pm 1\}^{R} \rightarrow [-1,1]}$ to ${\{\pm 1\}}$ as described in Equation 2.

The dictator cuts are given by the functions ${\mathcal{F}(\mathbf{z}) = z_\ell}$ for some ${\ell \in [R]}$. The ${\mathsf{Completeness}}$ of the test ${\mathsf{DICT}}$ is the minimum value of a dictator cut, i.e.,

$\displaystyle \mathsf{Completeness}(\mathsf{DICT}) = \min_{\ell \in [R]} \mathsf{DICT}(z_{\ell})$

The soundness of the dictatorship test is the value of cuts that are far from every dictator. We will formalize the notion of being far from every dictator is formalized using influences as follows:

# Lecture 8a. A primer on simplicial complexes and collapsibility

Before we can apply more advanced fixed point theorems to the Evasiveness Conjecture, we need a little background on simplicial complexes, and everything starts with simplices.

### Simplices

It’s most intuitive to start with the geometric viewpoint, in which case an $n$-simplex is defined to be the convex hull of $n+1$ affinely independent points in $\mathbb R^n$.  These points are called the vertices of the simplex.  Here are examples for $n=0,1,2,3.$

A simplicial complex is then a collection of simplices glued together along lower-dimensional simplices.  More formally, if $S \subseteq \mathbb R^n$ is a (geometric) simplex, then a face of $S$ is a subset $F \subseteq S$ formed by taking the convex hull of a subset of the vertices of $S$.

Finally, a (geometric) simplicial complex is a collection $\mathcal K$ of simplices such that

1. If $S \in \mathcal K$ and $F$ is a face of $S$, then $F \in \mathcal K$, and
2. If $S,S' \in \mathcal K$ and $S \cap S' \neq \emptyset$, then $S \cap S'$ is a face of both $S$ and $S'$.

Property (1) gives us downward closure, and property (2) specifies how simplices can be glued together (only along faces).  For instance, the first picture depicts a simplicial complex.  The second does not.

# Lecture 7: The evasiveness conjecture

Continuing our look at some toplogical methods, today we’ll see the evasiveness conjecture in decision tree complexity.  In the next lecture, we’ll see how we can sometimes analyze the complexity using fixed point theorems, and their generalizations (like the Hopf index formula), following the work of Kahn, Saks, and Sturtevant.  These two lectures are co-blogged with Elisa Celis, with a lot of input from Lovasz’s lecture notes.

### Decision tree complexity and evasiveness

Consider a boolean function $f : \{0,1\}^n \to \{0,1\}$ on n bits.  We define the decision tree complexity of f as follows.  Given an unknown input $x \in \{0,1\}^n$, you are allowed to ask about the values of various bits of x, e.g. $x_{17}, x_{34}, x_3, \ldots$.  Your goal is to compute $f(x)$ using as few questions as possible, and your questions can be adaptive, depending on answers to previous questions.  The complexity of such a strategy is the maximum number of questions asked for any $x \in \{0,1\}^n$.  The decision tree complexity, written $D(f)$, is the minimum complexity of any strategy that computes f.  (There are many other interesting models of decision complexity, see e.g. this survey.)

Clearly $D(f) \leq n$, because we can trivially query all the bits of x, and then output f(x).  A function f is called evasive if this upper bound is met, i.e. $D(f)=n$.  As an example, consider the parity function $\mathsf{PARITY}(x) = x_1 \oplus x_2 \oplus \cdots \oplus x_n$, where $\oplus$ is addition modulo 2.  Clearly $\mathsf{PARITY}$ is evasive because after $n-1$ bits of x are asked about, the setting of the final bit determines the value of f.

For a more general example, consider any $f : \{0,1\}^n \to \{0,1\}$ such that $\#\{ x : f(x)=1 \}$ is odd.  In this case, for every $i =1,2,\ldots, n$, exactly one of $f|_{x_i=0}$ or $f|_{x_i=1}$ has the same property that the number of inputs resulting in a 1 is odd.  (These two functions are the natural restriction of f to functions on n-1 bits, which results from fixing the value of the ith bit.)  Thus an adversary could keep answering questions “$x_i$?” so that the restricted function retains this property.  Since the number of inputs yielding a 1 is always odd, the restricted function always takes both possible values, implying that f is evasive–the advesary ensures that the value cannot be determined until all n possible questions are asked.

For an example of a non-evasive property, think of a point $x \in \{0,1\}^{N \choose 2}$ as speciying a directed graph on $N$ vertices, where there is exactly one directed edges connecting every pair of vertices, and x specifies the direction of this edge (this is called a tournament).  Thinking of the vertices as players, a directed edge from u to v means that u defeats v.  Now $f(x)=1$ if the digraph specified by x has one vertex that defeats everyone else.  What is $D(f)$?

Well, first we can conduct a single elimination tournament, where vertex 1 plays vertex 2, and the winner players vertex 3, and the winner of that players vertex 4, etc.  At the end, there is only one remaining vertex $i$ that remains undefeated.  Now asking $N-2$ more questions, we can determine whether $i$ indeeds defeats everyone else.  The total number of questions was $(N-1)+(N-2) = 2N-3$, hence $D(f) \leq 2N-3 \ll {N \choose 2}$, implying that f is not evasive.

### Montone graph properties and the evasiveness conjecture

Let $m = {N \choose 2}$.  In general, we can encode an arbitrary undirected N-vertex graph as an element $G \in \{0,1\}^{m}$.  A function $f: \{0,1\}^m \to \{0,1\}$ is called a graph property if relabeling the vertices of $G$ doesn’t affect the value of $f(G)$.  The function f is monotone if the value of the function can never change from 1 to 0 when flipping one of the input bits from 0 to 1.  In the setting of graph properties, this corresponds to those which are maintained under addition of edges to the graph, e.g. $f(G) =$ “is G connected?” or $f(G)=$ “does G have a k-clique?”

Evasiveness Conjecture (Aanderaa-Karp-Rosenberg): Every non-trivial monotone graph property is evasive.

Here, non-trivial means that $f(\bar 0) \neq f(\bar 1)$, where $\bar 0$ and $\bar 1$ denote the all-zeros and all-ones strings, respectively.

For example, consider the example $f(G) =$ “is G connected?”  The adversary is simple:  When asked about a possible edge $\{i,j\}$, she answers NO unless this answer would imply that the graph is disconnected.  In other words, she answers NO unless she has answered NO already for all edges across a cut $(S,\bar S)$ except for $\{i,j\}$, in which case she has to answer YES.

Now, suppose there is a strategy which figures out the connectivity of $G$ without asking a question about some edge $\{i,j\}$.  In this case, the conclusion must be that $f(G) = \textrm{ YES}$ because the adversary always maintains that by answering everything in the future YES, she could force the graph to be connected.  In this case, the edges answered YES have to form a spanning tree of G (otherwise by answering all unasked questions NO, the graph would become disconnected).  Consider a path P from i to j in this YES spanning tree.  Let $\{u,v\}$ be the edge of P which was asked about last.  Clearly the adversary answered YES for $\{u,v\}$, but this contradicts the advesary’s strategy.  Since $\{i,j\}$ has not been asked yet, the adversary is safe to answer NO for $\{u,v\}$, and still later by answering YES on $\{i,j\}$, she could force the graph to be connected.  Thus no such strategy exists, and connectivity is evasive.

# Lecture 6: Borsuk-Ulam and some combinatorial applications

Now we’ll move away from spectral methods, and into a few lectures on topological methods.  Today we’ll look at the Borsuk-Ulam theorem, and see a stunning application to combinatorics, given by Lovász in the late 70’s.  A great reference for this material is Matousek’s book, from which I borrow heavily.  I’ll also discuss why the Lovász-Kneser theorem arises in theoretical CS.

### The Borsuk-Ulam Theorem

We begin with a statement of the theorem.  We will think of the n-dimensional sphere as the subset of ${\mathbb R^{n+1}}$ given by

${S^n = \left\{ (x_1, \ldots, x_{n+1}) : x_1^2 + \cdots + x_{n+1}^2 = 1\right\}.}$

Borsuk-Ulam Theorem: For every continuous mapping ${f : S^n \to \mathbb R^n}$, there exists a point ${x \in S^n}$ with ${f(x)=f(-x)}$.

Pairs of points ${x,-x \in S^n}$ are called antipodal.

There are a couple of common illustrative examples for the case ${n=2}$.  The theorem says that if you take the air out of a basketball, crumple it (no tearing), and flatten it out, then there are two points directly on top of each other which were antipodal before.  Another common example states that at every point in time, there must be two points on the earth which both have exactly the same temperature and barometric pressure (assuming, of course, that these two parameters vary continuously over the surface of the eath).

The n=1 case is completely elementary.  For the rest of the lecture, let’s use ${N = (0,0,\ldots,1)}$ and ${S=(0,0,\ldots,-1)}$ to denote the north and south poles (the dimension will be obvious from context).  To prove the n=1 case, simply trace out the path in ${\mathbb R}$ starting at ${f(N)}$ and going clockwise around ${S^1}$.  Simultaneously, trace out the path starting at ${f(S)}$ and going counter-clockwise at the same speed.  It is easy to see that eventually these two paths have to collide:  At the point of collision, ${f(x)=f(-x)}$.

We will give the sketch of a proof for ${n \geq 2.}$  Let ${g(x)=f(x)-f(-x)}$, and note that our goal is to prove that ${g(x)=0}$ for some ${x \in S^n}$.  Note that ${g}$ is antipodal in the sense that ${g(x)=-g(-x)}$ for all ${x \in S^n}$.  Now, if ${g(x) \neq 0}$ for every ${x}$, then by compactness there exists an ${\epsilon > 0}$ such that ${\|g(x)\| > \epsilon}$ for all ${x \in S^n}$.  Because of this, we may approximate ${g}$ arbitrarily well by a smooth map, and prove that the approximation has a 0.  So we will assume that ${g}$ itself is smooth.

Now, define ${h : S^n \to \mathbb R^n}$ by ${h(x_1, \ldots, x_{n+1}) = (x_1, \ldots, x_n)}$, i.e. the north/south projection map.  Let ${X = S^n \times [0,1]}$ be a hollow cylinder, and let ${F : X \to \mathbb R^n}$ be given by ${F(x,t)=t g(x) + (1-t)h(x)}$ so that ${F}$ linearly interpolates between ${g}$ and ${h}$.

Also, let’s define an antipodality on ${X}$ by ${\nu(x,t) = (-x,t)}$.  Note that ${F}$ is antipodal with respect to ${\nu}$, i.e. ${F(\nu(x,t))=-F(x,t)}$, because both ${g}$ and ${h}$ are antipodal.  For the sake of contradiction, assume that ${g(x) \neq 0}$ for all ${x \in S^n}$.

Now let’s consider the structure of the zero set ${Z = F^{-1}(0)}$.  Certainly ${(N,0), (S,0) \in Z}$ since ${h(N)=h(S)=0}$, and these are h’s only zeros.  Here comes the sketchy part:  Since ${g}$ is smooth, ${F}$ is also smooth, and thus locally ${F}$ can be approximated by an affine mapping ${F_{\mathrm{loc}} : \mathbb R^{n+1} \to \mathbb R^n}$.  It follows that if ${F_{\mathrm{loc}}^{-1}(0)}$ is not empty, then it should be a subspace of dimension at least one.  By an arbitrarily small perturbation of the initial ${g}$, ensuring that ${F}$ is sufficiently generic, we can ensure that ${F_{\mathrm{loc}}^{-1}(0)}$ is either empty or a subspace of dimension one.  Thus locally, ${F^{-1}(0)}$ should look like a two-sided curve, except at the boundaries ${t=0}$ and ${t=1}$, where ${F^{-1}(0)}$ (if non-empty) would look like a one-sided curve.  But, for instance, ${F^{-1}(0)}$ cannot contain any Y-shaped branches.

It follows that ${Z}$ is a union of closed cycles and paths whose endpoints must lie at the boundaries ${t=0}$ and ${t=1}$.  (${Z}$ is represented by red lines in the cylinder above.)  But since there are only two zeros on the ${t=0}$ sphere, and none on the ${t=1}$ sphere, ${Z}$ must contain a path ${\gamma}$ from ${(N,0)}$ to ${(S,0).}$  Since ${F}$ is antipodal with respect to ${\nu}$, ${\gamma}$ must also satisfy this symmetry, making it impossible for the segment initiating at N to ever meet up with the segment initiating at S.  Thus we arrive at a contradiction, implying that ${g}$ must take the value 0.

Notice that the only important property we used about ${h}$ (other than its smoothness) is that is has a number of zeros which is twice an odd number.  If ${h}$ had, e.g. four zeros, then we could have two ${\nu}$-symmetric paths emanating from and returning to the bottom.  But if ${h}$ has six zeros, then we would again reach a contradiction.

# Lecture 5: Uniformizing graphs, multi-flows, and eigenvalues

In the previous lecture, we gave an upper bound on the second eigenvalue of the Laplacian of (bounded degree) planar graphs in order to analyze a simple spectral partitioning algorithm.  A natural question is whether these bounds extend to more general families of graphs.  Well-known generalizations of planar graphs are those which can be embedded on a surface of fixed genus, and, more generally, families of graphs that arise by forbidding minors.  In fact, Spielman and Teng conjectured that for any graph excluding $K_h$ as a minor, one should have $\lambda_2 \lesssim \mathrm{poly}(h) d_{\max}/n$.   Of course planar graphs have genus 0, and by Wagner’s theorem, are precisely the graphs which exclude $K_5$ and $K_{3,3}$ as minors.  In this lecture, we will follow an intrinsic approach of Biswal, myself, and Rao which, in particular, is able to resolve the conjecture of Spielman and Teng.  First, we see why even pushing the conformal approach to bounded genus graphs is difficult.

### Bounded genus graphs

For graphs of bounded genus, there is hope to use an approach based on conformal mappings.  In 1980, Yang and Yau proved that

$\displaystyle \lambda_2(M) \lesssim \frac{g+1}{\mathrm{vol}(M)}$

for any compact Riemannian surface $M$ of genus $g$.  (Note that for the Laplace-Beltrami operator, one usually writes $\lambda_1$ as the first non-zero eigenvalue, rather than $\lambda_2$.)  In analog with Hersch’s proof of the genus 0 case, they use Riemann-Roch to obtain a degree-$(g+1)$ conformal mapping to the Riemann sphere, then try to pull back a second eigenfunction.  A factor of the degree is lost in the Rayleigh quotient (hence the $g+1$ factor in the preceding bound), and Hersch’s Möbius trick is still required.

An analogous proof for graphs $G$ of bounded genus would proceed by constructing a circle packing of $G$ on the sphere $S^2$, but instead of the circles having disjoint interiors, we would be assured that every point of $S^2$ is contained in at most $g$ circles.  Unfortunately, such a result is impossible (this has to do with the handling of branch points in the discrete setting).  Kelner has to take a different approach in his proof that $\lambda_2(G) \leq d_{\max}^{O(1)} (g+1)/n$ for graphs $G$ of genus at most $g$.

He starts with a circle packing of $G$ on a compact surface $\mathbb S_0$ of genus $g$ (whose existence follows from results of Beardon and Stephenon and He and Schramm).  Then Kelner randomly subdivides $G$ repeatedly, and these subdivisions give progressively better approximations to some sequence of surfaces $\{\mathbb S_i\}$.  Once the approximation is of high enough quality, one applies Riemann-Roch to $\mathbb S_k$, and infers something about a subdivision of $G$.  The final element is to track how the second eigenvalue of $G$ changes (in expectation) under random subdivision.

Needless to say, this approach is already quite delicate, and for graphs that can’t be equipped with some kind of conformal structure, we seem to have reached a dead end.  In this lecture, we’ll see how to use intrinsic deformations of the geometry of $G$ in order to bound its eigenvalues.  Eventually, this will reduce to the study of certain kinds of multi-commodity flows.

### Metrics on graphs

Let $G=(V,E)$ be an arbitrary n-vertex graph with maximum degree $d_{\max}$.  Recall that we can write

$\displaystyle \lambda_2 = \min_{f \neq 0 : \sum_{x \in V} f(x)=0} \frac{\sum_{xy \in E} |f(x)-f(y)|^2}{\sum_{x \in V} f(x)^2}.$

where $f : V \to \mathbb R$.  (Also recall that we can replace $\mathbb R$ by any Hilbert space, and the same formula holds.)  The first step is to prepare this equality for “non-linearization” by getting rid of the linear condition $\sum_{x \in V} f(x)=0$ and the sum $\sum_{x \in V} f(x)^2$.  (This is a popular sort of passage in the non-linear geometry of Banach spaces, which also plays a rather important role in applications to the theoretical CS.)  The goal is to get only terms that look like $|f(x)-f(y)|$.  Fortunately, there is a well-known way to do this:

$\displaystyle \lambda_2 = 2 n \cdot \min_{f : V \to \mathbb R} \frac{\sum_{xy \in E} |f(x)-f(y)|^2}{\sum_{x,y \in V} |f(x)-f(y)|^2},$

which follows easily from the equality $\sum_{x,y \in V} |f(x)-f(y)|^2 = 2n \sum_{x \in V} f(x)^2$ when $\sum_{x \in V} f(x)=0$.

Thus if we want to bound $\lambda_2 = O(1/n)$, we need to find an $f : V \to \mathbb R$ for which the latter ratio (without the $2n$) is $O(1/n^2)$.  Now, for someone who works a lot with linear programming relaxations, it’s very natural to consider a “relaxation”

$\displaystyle \gamma(G) = \min_{d} \frac{\sum_{xy \in E} d(x,y)^2}{\sum_{x,y \in V} d(x,y)^2},$

where the minimization is over all pseudo-metrics d, i.e. symmetric non-negative functions $d : V \times V \to \mathbb R$ which satisfy the triangle inequality, but might have $d(x,y)=0$ even for $x \neq y$.  Certainly $\gamma(G) \leq \lambda_2/2n$, but Bourgain’s embedding theorem (which states that every n-point metric space embeds into a Hilbert space with distortion at most $O(\log n)$) also assures us that $\lambda_2(G) \leq O(n \log^2 n) \gamma(G)$.  Since we are trying to show that $\gamma(G) = O(1/n^2)$, this $O(\log^2 n)$ term is morally negligible.  One can see the paper for a more advanced embedding argument that doesn’t lose this factor, but for now we concentrate on proving that $\gamma(G) = O(1/n^2)$.  The embedding theorems allow us to concentrate on finding an intrinsic metric on the graph with small “Rayleigh quotient,” without having to worry about an eventual geometric representation.

As a brief preview… we are going to find a good metric $d$ by taking a certain kind of all-pairs multi-commodity flow at optimality, and weighting the edges by their congestion in the optimal flow.  Thus as the flow spreads out on the graph, it has the effect of “uniformizing” its geometry.

### Discrete Riemannian metrics, convexification, and duality

Let’s now assume that $G$ is planar.  We want to show that $\gamma(G) = O(d_{\max}/n^2)$.  First, let’s restrict ourselves to vertex weighted metrics on $G$.  Given any non-negative weight function $\omega : V \to \mathbb R$, we can define the length of a path $P$ in $G$ by summing the weights of vertices along it:  $\mathsf{len}_{\omega}(P) = \sum_{x \in P} \omega(x)$.  Then we can define a vertex-weighted shortest-path pseudo-metric on $G$ in the natural way

$\displaystyle \mathsf{dist}_{\omega}(x,y) = \min \left\{ \mathsf{len}_{\omega}(P) : P \in \mathcal P_{xy}\right\},$

where $\mathcal P_{uv}$ is the set of all u-v paths in $G$.  We also have the nice relationship

$\displaystyle \sum_{xy \in E} \mathsf{dist}_{\omega}(x,y)^2 \leq 2 d_{\max} \sum_{x \in V} \omega(x)^2,\qquad(1)$

since $\mathsf{dist}_{\omega}(x,y) \leq [\omega(x)+\omega(y)]^2$.

So if we define

$\displaystyle \Lambda_0(\omega) = \frac{\displaystyle \sum_{x \in V} \omega(x)^2}{\displaystyle \sum_{x,y \in V} \mathsf{dist}_{\omega}(x,y)^2}$

then by (1), we have $\gamma(G) \leq 2 d_{\max} \min_{\omega} \Lambda_0(\omega)$.

Examples. Let’s try to exhibit weights $\omega$ for two well-known examples:  the grid, and the complete binary tree.

For the $\sqrt{n} \times \sqrt{n}$ grid, we can simply take $\omega(x)=1$ for all $x \in V$.  Clearly $\sum_{x \in V} \omega(x)^2 = n$.  On the other hand, a random pair of points in the grid is $\Omega(\sqrt{n})$ apart, hence $\sum_{x,y \in V} \mathsf{dist}_{\omega}(x,y)^2 \approx n^2 \cdot (\sqrt{n})^2 = n^3$.  It follows that $\Lambda_0(\omega) = O(1/n^2)$, as desired.

For the complete binary tree with root $r$, we can simply put $\omega(r)=1$ and $\omega(x)=0$ for $x \neq r$.  (Astute readers will guess the geometrically decreasing weights are actually the optimal choice.)  In this case, $\sum_{x \in V} \omega(x)^2 = 1$, while all the pairs $x,y$ on opposite sides of the root have $\mathsf{dist}_{\omega}(x,y)=1$.  It again follows that $\Lambda_0(\omega) = O(1/n^2)$.  Our goal is to provide such a weight $\omega$ for any planar graph.

# Lecture 4: Conformal mappings, circle packings, and spectral geometry

In Lecture 2, we used spectral partitioning to rule out the existence of a strong parallel repetition theorem for unique games.  In practice, spectral methods are a very successful heuristic for graph partitioning, and in the present lecture we’ll see how to analyze these partitioning algorithms for some common families of graphs.

### Balanced separators, eigenvalues, and Cheeger’s inequality

Lipton and Tarjan proved that every planar graph has a negligibly small set of nodes whose remval splits the graph into two roughly equal pieces.  More specifically, every n-node planar graph can be partitioned into three disjoint sets $A,B,S$ such that there are no edges from $A$ to $B$, the separator $S$ has at most $O(\sqrt{n})$ nodes, and $|A|,|B| \geq n/3$.  This allows one to do all sorts of things, e.g. a simple divide-and-conquer algorithm gives a linear time $(1+\epsilon)$-approximation for the maximum independent set problem in such graphs, for any $\epsilon > 0$.

So there is a natural question of how well spectral methods do, for example, on planar graphs.  Spielman and Teng showed that for bounded-degree planar graphs, a simple recursive spectral algorithm recovers a partition $V=A \cup B$ of the vertex set so that $|E(A,B)| = O(\sqrt{n})$.  In other words, for bounded-degree planar graphs, spectral methods recover the Lipton-Tarjan separator theorem!  This is proved by combining Cheeger’s inequality with their main theorem.

Theorem [Spielman-Teng]: Every n-node planar graph with maximum degree $d_{\max}$ has $\displaystyle \lambda_2(G) = O\left(\frac{d_{\max}}{n}\right)$, where $\lambda_2(G)$ is the second eigenvalue of the combinatorial Laplacian on $G$.

Recall that we introduced the combinatorial Laplacian in Lecture 2.  If $G=(V,E)$ is an arbitrary finite graph, in this lecture it will make more sense to think about the Laplacian $\Delta$ as an operator on functions $f : V \to \mathbb R$ given by

$\displaystyle \Delta f(x) = \mathrm{deg}(x) f(x) - \sum_{y : xy \in E} f(y).$

If we define the standard inner product $\langle f,g\rangle = \sum_{x \in V} f(x)g(x)$, then one can easily check that for any such $f$, we have $\langle f, \Delta f\rangle = \sum_{xy \in E} |f(x)-f(y)|^2$.  In particular, this implies that $\Delta$ is a positive semi-definite operator.  If we denote its eigenvalues by $\lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n$, then it is also easy to check that $\lambda_1 = 0$, with corresponding eigenfunction $f(x)=1$ for every $x\in V$.

Thus by standard variational principles, we have

$\displaystyle \lambda_2 = \min_{f \neq 0 : \sum_{x \in V} f(x)=0} \frac{\sum_{xy \in E} |f(x)-f(y)|^2}{\sum_{x \in V} f(x)^2}.$

Let us also define the Cheeger constant $h_G$.  For an arbitrary subset $S \subseteq V$, let

$\displaystyle h(S) = \frac{|E(S, \bar S)|}{\min(|S|,|\bar S|)}$,

note that this definition varies from the $h$ we defined in Lecture 2, because we will be discussing eigenfunctions without boundary conditions.  Now one defines $h_G = \min_{S \subseteq V} h(S)$.

Finally, we have the version of Cheeger’s inequality (proved by Alon and Milman in the discrete setting) for graphs without boundary.

Cheeger’s inequality: If $G=(V,E)$ is any graph with maximum degree $d_{\max}$, then

$\displaystyle \lambda_2(G) \geq \frac{h(G)^2}{2d_{\max}}.$

This follows fairly easy from the Dirichlet version of Cheeger’s inequality presented in Lecture 2.  Here’s a sketch:  Let $f : V \to \mathbb R$ satisfy $\Delta f = \lambda_2 f$, and suppose, without loss of generality, that $V_+ = \{ x : f(x) > 0 \}$ has $|V_+| \geq n/2$.  Define $f_+(x)=f(x)$ for $f(x) > 0$ and $f_+(x)=0$ otherwise.  Then $f_+|_B = 0$ for $B = V \setminus V_+$, so we can plug $f_+$ into the Dirichlet version of Cheeger’s inequality with boundary conditions on $B$.  For the full analysis, see this note which essentially follows this approach.  By examining the proof, note that one can find a subset $S \subseteq V$ with $h(S) \leq \sqrt{2 d_{\max} \lambda_2}$ by a simple “sweep” algorithm:  Arrange the vertices $V = \{v_1, v_2, \ldots, v_n\}$ so that $f(v_1) \leq f(v_2) \leq \cdots \leq f(v_n)$, and output the best of the $n-1$ cuts $\{v_1, \ldots, v_i\}, \{v_{i+1}, \ldots, v_n\}$.

So using the eigenvalue theorem of Spielman and Teng, along with Cheeger’s inequality, we can find a set $S \subseteq V$ with $h(S) \lesssim \sqrt{d_{\max}/n}$.  While this cut has the right Cheeger constant, it is not necessarily balanced (i.e. $\min(S, \bar S)$ could be very small).  But one can apply this algorithm recursively, perhaps continually cutting small chunks off of the graph until a balanced cut is collected.  Refer to the Spielman-Teng paper for details.  A great open question is how one might use spectral information about $G$ to recover a balanced cut immediately, without the need for recursion.

### Conformal mappings and circle packings

Now we focus on proving the bound $\lambda_2(G) \lesssim d_{\max}/n$ for any planar graph $G$.  A natural analog is to look at what happens for the Laplace-Beltrami operator for a Riemannian metric on the 2-sphere.  In fact, Hersch considered this problem almost 40 years ago and proved that $\lambda_2(M) \lesssim 1/\mathrm{vol}(M)$, for any such Riemannian manifold $M$.  His approach was to first use the uniformization theorem to get a conformal mapping from $M$ onto $S^2$, and then try to pull-back the standard second eigenfunctions on $S^2 \subseteq \mathbb R^3$ (which are just the three coordinate projections).  Since the Dirichlet energy is conformally invariant in dimension 2, this almost works, except that the pulled-back map might not be orthogonal to the constant function.  To fix this, he has to post-process the initial conformal mapping with an appropriate Möbius transformation.

Unaware of Hersch’s work, Spielman and Teng derived eigenvalue bounds for planar graphs using the discrete analog of this approach:  Circle packings replace conformal mappings, and one still has to show the existence of an appropriate post-processing Möbius transformation.

# Lecture 2: Spectral partitioning and near-optimal foams

In the last lecture, we reduced the problem of cheating in $\mathcal G_m^{\otimes k}$ (the k-times repeated m-cycle game) to finding a small set of edges $\mathcal E$ in $(\mathbb Z_m^k)_\infty$ whose removal eliminates all topologically non-trivial cycles.  Such a set $\mathcal E$ is called a spine. To get some intuition about how many edges such a spine should contain, let’s instead look at a continuous variant of the problem.

### Spines, Foams, and Isoperimetry

Consider again the $k$-dimensional torus $\mathcal T^k = \mathbb R^k/\mathbb Z^k$, which one can think of as $\lbrack 0,1)^k$ with opposite sides identified.  Say that a nice set (e.g. a compact, $C^\infty$ surface) $\mathcal E \subseteq \mathcal T^k$ is a spine if it intersects every non-contractible loop in $\mathcal T^k$.  This is the continuous analog of a spine in $(\mathbb Z_m^k)_\infty$.  We will try to find such a spine $\mathcal E$ with surface area, i.e. $\mathrm{Vol}_{k-1}(\mathcal E)$, as small as possible.

Let’s consider some easy bounds.  First, it is clear that the set

$\displaystyle \mathcal E = \left\{(x_1, \ldots, x_k) \in [0,1)^k : \exists i \in \{1,2,\ldots,k\}, x_i = 0\right\}$

is a spine with $\mathrm{Vol}_{k-1}(\mathcal E) = k$.  (A moment’s thought shows that this is “equivalent” to the provers playing independent games in each coordinate of $\mathcal G_m^{\otimes k}$.)

To get a good lower bound, it helps to relate spines to foams which tile $\mathbb R^k$ according to $\mathbb Z^k$, as follows.  Take two potential spines.

To determine which curve is actually a spine, we can repeatedly tile them side-by-side.

The first tiling contains the blue bi-infinite curve, which obviously gives a non-trivial cycle in $\mathcal T^k$, while the second yields a tiling of $\mathbb R^k$ by bodies of volume 1.  It is easy to deduce the following claim.

Claim: A surface $\mathcal E \subseteq \mathcal T^k$ is a spine if and only if it induces a tiling of the plane by bodies of volume 1 which is invariant under shifts by $\mathbb Z^k$.

By the isoperimetric inequality in $\mathbb R^k$, this immediately yields the bound

$\displaystyle \mathrm{Vol}_{k-1}(\mathcal E) \geq \mathrm{Vol}_{k-1}(S^{k-1}) \approx \sqrt{k},$

where $S^{k-1}$ is the unit $(k-1)$-dimensional sphere.

So the the surface area of an optimal spine lies somewhere between $k$ and $\sqrt{k}$.  On the one hand, cubes tile very nicely but have large surface area.  On the other hand, we have sphere-like objects which have small surface area, but don’t seem (at least intuitively) to tile very well at all.  As first evidence that this isn’t quite right, note that it is known how to cover $\mathbb R^k$ by disjoint bodies of volume at most 1 so that the surface area/volume ratio grows like $\sqrt{k}$.  See Lemma 3.16 in this paper, which is based on Chekuri, et. al.  It’s just that these covers are not invariant under $\mathbb Z^k$ shifts.

Before we reveal the answer, let’s see what consequences the corresponding discrete bounds would have for parallel repetition of the m-cycle game.  If the “cube bound” were tight, we would have $\mathsf{val}(\mathcal G_m^{\otimes k}) \approx 1 - \frac{k}{m}$, which doesn’t rule out a strong parallel repetition theorem ($\alpha^*=1$ in the previous lecture).  If the “sphere bound” were tight, we would have $\mathsf{val}(\mathcal G_m^{\otimes k}) \approx 1 - \frac{\sqrt{k}}{m}$, which shows that $\alpha^* \geq 2$.   In the latter case, the approach to proving equivalence of the UGC and MAX-CUT conjectures doesn’t even get off the ground.

As the astute reader might have guessed, recently Ran Raz proved that $\mathsf{val}(\mathcal G_m^{\otimes k}) \geq 1 - C\frac{\sqrt{k}}{m}$ for some constant $C > 0$, showing that a strong parallel repetition theorem—even for unique games—is impossible.   Subsequently, Kindler, O’Donnell, Rao, and Wigderson showed that there exists a spine $\mathcal E \subseteq \mathcal T^k$ with $\mathrm{Vol}_{k-1}(\mathcal E) \approx \sqrt{k}$.  While it is not difficult to show that the continuous result implies Raz’s discrete result, we will take a direct approach found recently by Alon and Klartag.