tcs math – some mathematics of theoretical computer science

February 25, 2013

Open question recap

Filed under: Math, Open question — James Lee @ 12:28 pm

In light of the nearly immediate failure of the last open question, here is a recap of the progress on the few somewhat more legitimate questions appearing here over the past few years. (Note that most of these questions are not original and appeared other places as well.)

February 22, 2013

Open question: Separated sets in isotropic measures

Filed under: Math, Open question — Tags: , , — James Lee @ 8:26 pm

[Note: This question is trivially impossible, though I leave the original form below. If \mu is the uniform measure on S^{k-1}, then any two sets of \Omega(1) measure are at distance O(1/\sqrt{k}) by concentration of measure. Thus it is impossible to find k sets satisfying the desired bound. I will try to think of the right question and repost. Unfortunately, I now recall having gone down this path a few times before, and I fear there may not be a relevant elementary open question after all.]

Here is an interesting and elementary (to state) open question whose resolution would give an optimal higher-order Cheeger esimate. The goal is to replace the factor {k^{O(1)}} in Theorem 1 of the previous post with a {\mathrm{polylog}(k)} factor, or even an optimal bound of {\sqrt{\log k}} .

Fix {k \in \mathbb N} and let {\mu} denote a probability measure on the {(k-1)} -dimensional sphere {S^{k-1}} . For the purpose of this question, one can assume that {\mu} is supported on a finite number of points. Suppose also that {\mu} is isotropic in the sense that, for every {\theta \in S^{k-1}} ,

\displaystyle  \int_{S^{k-1}} \langle x, \theta\rangle^2 \,d\mu(x)= \frac{1}{k}\,.

Now our goal is to find, for some {\varepsilon, \delta > 0} , a collection of (measurable) subsets {U_1, U_2, \ldots, U_k \subseteq S^{k-1}} such that {\mu(U_i) \geq \varepsilon/k} for each {i=1,2,\ldots,k} , and such that for {i \neq j} , we have

\displaystyle  \min_{x \in U_i, y \in U_j} \|x-y\|_2 \geq \delta\,.

In Lemma 5 of the previous post, we proved that this is possible for {\varepsilon \gtrsim 1} and {\delta \gtrsim k^{-5/2}} . In this paper, we improve the estimate on {\delta} somewhat, though the best-known is still {\delta \gtrsim k^{-c}} for some {c > 0} .

Question:
Is it possible to achieve {\varepsilon, \delta \gtrsim 1/\mathrm{poly}(\log k)} ? One could even hope for {\varepsilon \gtrsim 1} and {\delta \gtrsim 1/(\log k)} .

Two extreme cases are when (1) {\mu} is uniform on {S^{k-1}} , or (2) {\mu} is uniform on the standard basis {\{e_1,e_2,\ldots,e_k\}} . In the first case, one can easily achieve {\varepsilon,\delta \gtrsim 1} (in fact, one could even get {100k} sets satisfying these bounds). In the second case, taking each set to contain a single basis vector achieves {\varepsilon = 1} and {\delta = \sqrt{2}} .

If one only wants to find, say, {\lceil 3k/4\rceil} sets instead of {k} sets, then it is indeed possible to achieve {\varepsilon \gtrsim 1} and {\delta \gtrsim 1/O(\log k)} . [Update: This is actually impossible for the reasons stated above. To get the corresponding bounds, we do a random projection into O(\log k) dimensions. This only preserves the original Euclidean distances on average.] Furthermore, for {\varepsilon \gtrsim 1} , this bound on {\delta} is asymptotically the best possible.

February 20, 2013

No frills proof of higher-order Cheeger inequality

Filed under: Expository, Math — Tags: , , , — James Lee @ 11:41 pm

Following some recent applications by Mamoru Tanaka and Laurent Miclo, I was asked where there is a short, no-frills, self-contained, and (possibly) quantitatively non-optimal proof of the higher-order Cheeger inequalities from our paper with Shayan Oveis-Gharan and Luca Trevisan. I thought I would post it here. (If you’re hungering for something new, see this recently posted preprint of my coauthors relating higher eigenvalues to graph expansion.)

[Update: The application of Miclo can also be done using the higher-order Cheeger inequalities of Louis, Raghavendra, Tetali, and Vempala.]

The main simplification comes from doing the random partitioning non-optimally with axis-parallel cubes. For ease of notation, we will deal only with regular graphs, but there will be no quantitative dependence on the degree and this assumption can be removed (see the full paper).

Suppose that {G=(V,E)} is a connected, {n} -vertex, {d} -regular graph. Define the Laplacian by {\mathcal L = I - (1/d)A} , where {A} is the adjacency matrix of {G} . We will think of {\mathcal L} as acting on {\ell^2(V)} , the space of functions {f : V \rightarrow \mathbb R} equipped with the {\ell^2} norm. {\mathcal L} is positive semi-definite with spectrum

\displaystyle  0 = \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_{|V|} \leq 2\,.

We we define the Rayleigh quotient of a function {f \in \ell^2(V)} by

\displaystyle  \mathcal R(f) = \frac{\sum_{u \sim v} (f(u)-f(v))^2}{d \sum_{u \in V} f(u)^2}\,,

where the numerator is summed over edges of {G} . By the variational principle for eigenvalues, we have

\displaystyle  \lambda_k = \min_{\stackrel{W \subseteq \ell^2(V)}{\dim(W)=k}} \max_{0 \neq f \in W} \mathcal R(f)\,. \ \ \ \ \ (1)

For a subset {S \subseteq V} , we define the expansion of {S} by {\phi(S) = \mathcal R(\mathbf{1}_S)} , where {\mathbf{1}_S} is the indicator function of {S} .

Finally, for {k \in \{1,2,\ldots,n\}} , we define the {k} -way expansion constant of {G} by

\displaystyle  \rho_G(k) = \min \left\{ \max_i \phi(S_i) : S_1, S_2, \ldots, S_k \subseteq V \right\}\,,

where the minimum is over collections of {k} disjoint, non-empty subsets of {V} .

The classical discrete Cheeger inequality asserts that

\displaystyle  \frac{\lambda_2}{2} \leq \rho_G(2) \leq \sqrt{2\lambda_2}\,.

We will now prove the following generalization. See the full paper for a discussion of the surrounding issues and better quantitative bounds.

Theorem 1 For every {k \in \{1,2,\ldots,n\}} ,

\displaystyle  \frac{\lambda_k}{2} \leq \rho_G(k) \leq 30 k^3 \sqrt{\lambda_k}\,. \ \ \ \ \ (2)

First, let’s prove the (easy) LHS of (2). Suppose we have {S_1, S_2, \ldots, S_k \subseteq V} which are disjoint and non-empty and which satisfy {\phi(S_i) = \mathcal R(\mathbf{1}_{S_i}) \leq \rho} . Then certainly {W = \mathrm{span}(\mathbf{1}_{S_1}, \ldots, \mathbf{1}_{S_k})} is a {k} -dimensional subspace of {\ell^2(V)} . On the other hand, every {f \in W} satisfies {\mathcal R(f) \leq 2 \rho} because if {f = \alpha_1 \mathbf{1}_{S_1} + \cdots + \alpha_k \mathbf{1}_{S_k}} , then

\displaystyle  \sum_{u \sim v} (f(u)-f(v))^2 \leq 2 \sum_{i=1}^k \alpha_i^2 \sum_{u \sim v} |\mathbf{1}_{S_i}(u)-\mathbf{1}_{S_i}(v)|^2\,,

where we have used the fact that if {u \in S_i} and {v \in S_j} , then

\begin{array}{rcl} |\alpha_i \mathbf{1}_{S_i}(u) - \alpha_j \mathbf{1}_{S_j}(u)|^2 &\leq & 2(\alpha_i^2 + \alpha_j^2) \\ &=& 2\alpha_i^2 |\mathbf{1}_{S_i}(u) - \mathbf{1}_{S_i}(v)|^2 + 2\alpha_j^2 |\mathbf{1}_{S_j}(u)-\mathbf{1}_{S_j}(v)|^2\,.\end{array}

But now using (1), the subspace {W} witnesses the fact that {\lambda_k \leq 2\rho} .

To prove the more difficult RHS of (2), we will use the following discrete Cheeger inequality with boundary conditions.

Lemma 2 For any {f : V\rightarrow \mathbb R} , there is a subset {U \subseteq \{ v \in V : f(v) \neq 0\}} such that

\displaystyle  \phi(U) \leq \sqrt{2 \mathcal R(f)}\,.

Proof: Let {U_t = \{ v \in V : f(v)^2 \geq t \}} . Observe that for each {t > 0} , one has {U_t \subseteq \{ v \in V : f(v) \neq 0\}} . For {S \subseteq V} , let {E(S,\bar S)} denote the edges of {G} with exactly one endpoint in {S} . Then we have

\displaystyle  \begin{array}{rcl}  \int_0^{\infty} |E(U_t, \bar U_t)|\,dt &=& \sum_{\{u,v\} \in E} \left|f(u)^2 - f(v)^2\right| \\ &= & \sum_{\{u,v\} \in E} |f(u) + f(v)| |f(u)-f(v)| \\ &\leq & \sqrt{\sum_{\{u,v\} \in E} (|f(u)|+|f(v)|)^2} \sqrt{\sum_{\{u,v\} \in E} |f(u)-f(v)|^2} \\ &\leq & \sqrt{2 d \sum_{u \in V} f(u)^2}\sqrt{\sum_{\{u,v\} \in E} |f(u)-f(v)|^2}. \end{array}

On the other hand, {\int_0^{\infty} d|U_t|\,dt = d \sum_{u \in V} |f(u)|^2,} thus

\displaystyle  \int_0^{\infty} |E(U_t, \bar U_t)|\,dt \leq \sqrt{2 \mathcal R(f)} \int_0^{\infty} d|U_t|\,dt\,,

implying there exists a {t > 0} such that {\phi(U_t) = \frac{|E(U_t,\bar U_t)|}{d |U_t|} \leq \sqrt{2 \mathcal R(f)}} . \Box

In light of the preceding lemma, to prove the RHS of (2), it suffices to find {k} disjointly supported functions {\psi_1, \psi_2, \ldots, \psi_k : V \rightarrow \mathbb R} such that {\mathcal R(\psi_i) \leq (30)^2 k^6 \lambda_k} for each {i=1,2,\ldots,k} . Then Lemma 2 guarantees the existence of disjoint subsets of vertices satisfying our desired conclusion.

Toward this end, let {f_1, f_2, \ldots, f_k : V \rightarrow \mathbb R} denote {k} orthonormal functions satisfying {\mathcal L f_i = \lambda_i f_i} for each {i=1,2,\ldots,k} , i.e. consider the first {k} eigenfunctions of the Laplacian. Define the map {F : V \rightarrow \mathbb R^k} via

\displaystyle  F(v) = \left(f_1(v), f_2(v), \ldots, f_k(v)\right)\,.

We also put {\hat F(v) = F(v)/\|F(v)\|} . (Since {\lambda_1=0} , the function {f_1} takes value {1/\sqrt{n}} everywhere, hence {F} is never zero and this is well-defined.)

The next lemma shows that, in order to find disjointly supported functions, it suffices to find “separated sets.”

Lemma 3 Suppose that for some {\varepsilon > 0} and {0 < \delta \leq 1} , there exist {k} subsets {U_1, U_2, \ldots, U_k \subseteq V} satisfying the conditions:

  1. For every {i=1,2,\ldots,k} , we have {\sum_{v \in U_i} \|F(v)\|^2 \geq \varepsilon} , and
  2. For every {i \neq j} , if {u \in U_i} and {v \in U_j} , then

    \displaystyle  \|\hat F(u)-\hat F(v)\| \geq \delta

Then there are disjointly supported functions {\psi_1, \psi_2, \ldots, \psi_k : V \rightarrow \mathbb R} such that {\mathcal R(\psi_i) \leq 25k/(\varepsilon\delta^2)} .

Proof: For each {i=1,2,\ldots,k} , we define maps {\theta_i : V \rightarrow \mathbb R} by

\displaystyle  \theta_i(v) = \max\left(0, 1 - \frac{2}{\delta} \min_{u \in U_i} \|\hat F(u)-\hat F(v)\|\right)\,.

Observe that, by the triangle inequality, for {u,v \in V} , we have

\displaystyle  |\theta_i(u)-\theta_i(v)| \leq \frac{2}{\delta} \|\hat F(u)-\hat F(v)\|\,. \ \ \ \ \ (3)

Next, we define

\displaystyle  \psi_i(v) = \theta_i(v) \|F(v)\|\,.

Observe that since {\theta_i} is identically {1} on {U_i} , we have

\displaystyle  \sum_{v \in V} \psi_i(v)^2 \geq \sum_{v \in U_i} \|F(v)\|^2 \geq \varepsilon \ \ \ \ \ (4)

by condition (i).

Next, we argue that the functions {\{\theta_i\}} are disjointly supported. This immediately implies that the {\{\psi_i\}} are disjointly supported. If {u \in \mathrm{supp}(\theta_i) \cap \mathrm{supp}(\theta_j)} for some {i \neq j} , then there are points {v \in U_i} and {v' \in U_j} such that {\|\hat F(u)-\hat F(v)\| < \delta/2} and {\|\hat F(u)-\hat F(v')\| < \delta/2} . But then by the triangle inequality, {\|\hat F(v)-\hat F(v')\| < \delta} , violating condition (ii).

Finally, we bound {\mathcal R(\psi_i)} . For any {u,v \in V} and {i=1,2,\ldots,k} , we have

\displaystyle  |\psi_i(u)-\psi_i(v)| \leq \theta_i(u) \left|\vphantom{\bigoplus}\|F(u)\|-\|F(v)\|\right| + \|F(v)\| \cdot |\theta_i(u)-\theta_i(v)|\,. \ \ \ \ \ (5)

Now using (3),

\displaystyle  \|F(v)\| \cdot |\theta_i(u)-\theta_i(v)| \leq \frac{2}{\delta} \|F(v)\| \cdot \|\hat F(u)-\hat F(v)\| \leq \frac{4}{\delta} \|F(u)-F(v)\|\,, \ \ \ \ \ (6)

where we have used the fact that for any non-zero vectors {x,y \in \mathbb R^k} , we have

\displaystyle  \|x\| \left\|\frac{x}{\|x\|} - \frac{y}{\|y\|}\right\| = \left\|x - \frac{\|x\|}{\|y\|} y\right\| \leq \|x-y\| + \left\|y - \frac{\|x\|}{\|y\|} y\right\| \leq 2 \,\|x-y\|\,.

Using (6) and the fact that {\theta_i(u) \leq 1} in (5) yields

\displaystyle  |\psi_i(u)-\psi_i(v)| \leq \left(1+\frac{4}{\delta}\right) \|F(u)-F(v)\|\,. \ \ \ \ \ (7)

Finally, observe that

\displaystyle  \sum_{u \sim v} \|F(u)-F(v)\|^2 = \sum_{i=1}^k \sum_{u \sim v} |f_i(u)-f_i(v)|^2 = d (\lambda_1 + \cdots + \lambda_k) \leq dk\lambda_k\,.

Combining this with (7) and (4) shows that {\mathcal R(\psi_i) \leq \frac{k}{\varepsilon}(1+\frac{4}{\delta})^2 \leq 25k/(\varepsilon \delta^2).} \Box

We will first make the task of finding separated sets slightly easier.

Lemma 4 Suppose that for some {0 < \delta \leq 1} and {m \geq 1} , there are subsets {T_1, T_2, \ldots, T_m \subseteq V} which satisfy:

  1. {\displaystyle \sum_{i=1}^m \sum_{v \in T_i} \|F(v)\|^2 \geq k - \frac14} .
  2. For every {i \neq j} , if {u \in T_i} and {v \in T_j} , then { \|\hat F(u)-\hat F(v)\| \geq \delta }
  3. For every {i=1,2,\ldots, m} ,

    \displaystyle  \sum_{v \in T_i} \|F(v)\|^2 \leq 1 + \frac{1}{2k}\,.

Then there are {k} sets {U_1, U_2, \ldots, U_k \subseteq V} satisfying the assumption of Lemma 3 with {\varepsilon = \frac{1}{4}} .

Proof: We can form the desired sets {\{U_i\}} by taking disjoint unions of the sets {\{T_i\}} . Begin with the family {\{T_1, T_2, \ldots, T_m\}} and repeatedly replace two sets {T} and {T'} by their union if they each satisfy {\sum_{v \in T} \|F(v)\|^2 < \frac12} .

When we finish, we are left with a collection of sets each member of which satisfies {\sum_{v \in T} \|F(v)\|^2 \in [\frac12, 1+\frac{1}{2k}]} , and possibly one set for which {\sum_{v \in T} \|F(v)\|^2 < 1/2} . By (i), we must end with at least {k} sets which each satisfy {\sum_{v \in T} \|F(v)\|^2 \geq 1/4} . \Box

Our final lemma, which finishes the proof of Theorem 1, simply asserts that such sets exist. We will no longer need the fact that {F} comes from eigenfunctions.

Lemma 5 Suppose that {g_1, g_2, \ldots, g_k : V \rightarrow \mathbb R} are orthornormal in {\ell^2(V)} . Let

\displaystyle G(v) = (g_1(v), g_2(v), \ldots, g_k(v))

and {\hat G(v) = G(v)/\|G(v)\|} . Then there is an {m \geq 1} and subsets {T_1, T_2, \ldots, T_m \subseteq V} such that

  1. {\displaystyle\sum_{i=1}^m \sum_{v \in T_i} \|G(v)\|^2 \geq k - \frac14} .
  2. For every {i \neq j} , if {u \in T_i} and {v \in T_j} , then

    \displaystyle \|\hat G(u)-\hat G(v)\| \geq \frac{1}{2\sqrt{2} k^{5/2}}\,.

  3. For every {i = 1,2,\ldots,m} ,

    \displaystyle  \sum_{v \in T_i} \|G(v)\|^2 \leq 1 + \frac{1}{2k}\,.

Proof: Consider the {n \times k} matrix {A} which has columns {g_1, g_2, \ldots, g_k} . For any {x \in \mathbb R^k} , we have

\displaystyle  \sum_{v \in V} \langle x, G(v)\rangle^2 = \|Ax\|^2 = x^T A^T A x = x^T x = \|x\|^2\,, \ \ \ \ \ (8)

since the columns of {A} are orthornormal

For a subset {X \subseteq S^{k-1}} of the unit sphere in {\mathbb R^k} , we put {V(X) = \{ v \in V : \hat G(v) \in X \}} . Fix some {x \in X} and let {\mathsf{diam}(X)} denote the diameter of {X} , then by (8), we have

\displaystyle  1 = \sum_{v \in V(X)} \langle x, G(v)\rangle^2 = \sum_{v \in V(X)} \|G(v)\|^2 \left(1-\frac{\|\hat G(v)-x\|^2}{2}\right)^2 \geq \sum_{v \in V(X)} \|G(v)\|^2 \left(1-\frac{\mathrm{diam}(X)^2}{2}\right)^2\,.

We conclude that if {\mathrm{diam}(X) \leq 1/\sqrt{2k}} , then

\displaystyle  \sum_{v \in V(X)} \|G(v)\|^2 \leq 1 + \frac{1}{2k}\,. \ \ \ \ \ (9)

Now, let {\mathcal{P}} be a partition of {\mathbb R^k} into axis-parallel squares of side length {L = \frac{1}{k \sqrt{2}}} . For any such square {Q \in \mathcal P} , we let {\tilde Q \subseteq Q} denote the set of points which are at Euclidean distance at least {\frac{L}{4k^2}} from every side of {Q} . Observe that

\displaystyle  \mathrm{vol}(\tilde Q) \geq \left(1-\frac{1}{4k^2}\right)^k \mathrm{vol}(Q) \geq \left(1-\frac{1}{4k}\right) \mathrm{vol}(Q)\,. \ \ \ \ \ (10)

Consider now the collection of sets {\{V(\tilde Q) : Q \in \mathcal P\}} . Since {\mathrm{diam}(\tilde Q) \leq L \sqrt{k} = \frac{1}{\sqrt{2k}}} , (9) implies that {\sum_{v \in V(\tilde Q)} \|G(v)\|^2 \leq 1 + \frac{1}{2k}} . Furthermore, by construction, for any {u \in V(\tilde Q)} and {v \in V(\tilde Q')} with {Q \neq Q'} , we have

\displaystyle  \|\hat G(u)-\hat G(v)\| \geq 2 \frac{L}{4k^2} \geq \frac{1}{2\sqrt{2} k^{5/2}}\,.

Thus the collection of sets {\{V(\tilde Q) : Q \in \mathcal P\}} satisfy both conditions (ii) and (iii) of the lemma. We are left to verify (i).

Note that {\sum_{v \in V} \|G(v)\|^2 = \sum_{i=1}^k \sum_{v \in V} g_i(v)^2 = k} . If we choose a uniformly random axis-parallel translation {\mathcal P'} of the partition {\mathcal P} , then (10) implies

\displaystyle  \mathbb{E} \sum_{Q \in \mathcal P'} \sum_{v \in V(\tilde Q)} \|G(v)\|^2 \geq \left(1-\frac{1}{4k}\right) \sum_{v \in V(Q)} \|G(v)\|^2 \geq k - \frac14\,.

In particular, there exists some fixed partition that achieves this bound. For this partition {\mathcal P} , the sets {\{V(\tilde Q) : Q \in \mathcal P\}} satisfy all three conditions of the lemma, completing the proof.

February 18, 2013

Talagrand’s Bernoulli Conjecture, resolved.

Filed under: Math — Tags: , , — James Lee @ 5:34 pm

Apparently, Bednorz and LataÅ‚a have solved Talagrand’s $5,000 Bernoulli Conjecture. The conjecture concerns the supremum of a Bernoulli process.

Consider a finite subset {T \subseteq \ell^2} and define the value

\displaystyle  b(T) = \mathbb{E} \max_{t \in T} \sum_{i \geq 1} \varepsilon_i t_i\,,

where {\varepsilon_1, \varepsilon_2, \ldots} are i.i.d. random {\pm 1} . This looks somewhat similar to the corresponding value

\displaystyle  g(T) = \mathbb{E} \max_{t \in T} \sum_{i \geq 1} g_i t_i\,,

where {g_1, g_2, \ldots} are i.i.d. standard normal random variables. But while {g(T)} can be characterized (up to universal constant factors) by the Fernique-Talagrand majorizing measures theory, no similar control was known for {b(T)} . One stark difference between the two cases is that {g(T)} depends only on the distance geometry of {T} , i.e. {g(A(T))=g(T)} whenever {A} is an affine isometry. On the other hand, {b(T)} can depend heavily on the coordinate structure of {T} .

There are two basic ways to prove an upper bound on {b(T)} . One is via the trivial bound

\displaystyle  b(T) \leq \max_{t \in T} \|t\|_1\,. \ \ \ \ \ (1)

The other uses the fact that the tails of Gaussians are “fatter” than those of Bernoullis.

\displaystyle  b(T) \leq \sqrt{\frac{\pi}{2}} g(T)\,. \ \ \ \ \ (2)

This can be proved easily using Jensen’s inequality.

Talagrand’s Bernoulli conjecture is that {b(T)} can be characterized by these two upper bounds.

Bernoulli conjecture: There exists a constant {C > 0} such that for every {T \subseteq \ell^2} , there are two subsets {T_1, T_2 \subseteq \ell^2} such that

\displaystyle  T \subseteq T_1 + T_2 = \{ t_1 + t_2 : t_1 \in T_1, t_2 \in T_2 \}\,,

and

\displaystyle  g(T_1) + \sup_{t \in T_2} \|t\|_1 \leq C b(T)\,.

Note that this is a “characterization” because given such sets {T_1} and {T_2} , equations (1) and (2) imply

\displaystyle  b(T) \leq \sqrt{\frac{\pi}{2}} g(T_1) + \sup_{t \in T_2} \|t\|_1\,.

The set {T_1} controls the “Gaussian” part of the Bernoulli process, while the set {T_2} controls the part that is heavily dependent on the coordinate structure.

This beautiful problem finally appears to have met a solution. While I don’t know of any applications yet in TCS, it does feel like something powerful and relevant.

February 7, 2013

A bit more on expanders…

Filed under: Math — Tags: , , — James Lee @ 11:56 am

So the lecture notes I posted a year ago giving a “simpler proof” of expansion for the Margulis-Gabber-Galil graphs turns out to be very similar to arguments of M. Berger (1991) and Y. Shalom (1999) for proving that SL_2(\mathbb Z) \ltimes \mathbb Z^2  has property (T). See Section 4.2 of the book of Bekka, de la Harpe, and Valette.

Anyone interested in expander graphs should take a look at Amir Yehudayoff’s recent article in the SIGACT newsletter: Proving expansion in three steps. Amir uses a ping-pong lemma argument (which is essentially the same as the “combinatorial lemma” in the preceding post) to exemplify the “opening” of the three part strategy of Bourgain and Gamburd to proving expansion in Cayley graphs.

January 27, 2013

Expanders from the action of GL(2,Z)

Filed under: Math — Tags: , , , — James Lee @ 8:14 pm

Many months (and only one post) ago, I wrote about an analysis of the Gabber-Galil graphs using a simple combinatorial lemma and the discrete Cheeger inequality. Here is a note I posted recently carrying the study a bit further.

Given any two invertible, integral matrices {S,T \in GL_2(\mathbb Z)} , one can consider the family of graphs {G_n^{S,T} = (V_n, E_n^{S,T})} , where {E_n^{S,T}} contains edges from every {(x,y) \in V_n} to each of

\displaystyle (x \pm 1,y), (x,y\pm 1), S(x,y), S^{-1}(x,y), T(x,y), T^{-1}(x,y)\,.

The Gabber-Galil graphs correspond to the choice {S = \left(\begin{smallmatrix} 1 & 1 \\ 0 & 1 \end{smallmatrix}\right)} and {T = \left(\begin{smallmatrix} 1 & 0 \\ 1 & 1 \end{smallmatrix}\right)} .

Consider also the countably infinite graph {G^{S,T}} with vertex set {\mathbb Z^2 \setminus \{0\}} and edges

\displaystyle E^{S,T} = \left\{ \{z,S z\}, \{z, T z\} : z \in \mathbb Z^2 \setminus \{0\} \right\}\,.

Using some elementary Fourier analysis and a discrete Cheeger inequality, one can prove the following relationship (we use S' to represent the transpose of a matrix S ).

Theorem 1 For any {S,T \in GL_2(\mathbb Z)} , if {G^{S',T'}} has positive Cheeger constant, then {\{G_n^{S,T}\}} is a family of expander graphs.

We recall that an infinite graph {G=(V,E)} with uniformly bounded degrees has positive Cheeger constant if there is a number {\epsilon > 0} such that every finite subset {U \subseteq V} has at least {\epsilon |U|} edges with exactly one endpoint in {U} .

While Theorem 1 may not seem particularly powerful, it turns out that in many interesting cases, proving a non-trivial lower bound on the Cheeger constant of {G^{S,T}} is elementary. For the Gabber-Galil graphs, the argument is especially simple. The following argument is, in fact, significantly simpler than the analysis of the previous post.

In the next lemma, let G=G^{S,T} where S(x,y)=(x+y,x) and T(x,y)=(x,y+x) and let E=E^{S,T} be the corresponding edge set. We will prove that G has positive Cheeger constant.

Lemma 2 For any finite subset {A \subseteq \mathbb Z^2 \setminus \{0\}} , we have { |E(A, \bar A)| \geq |A| } .

Proof: Define {Q_1 = \{ (x,y) \in \mathbb Z^2 : x > 0, y \geq 0 \}} . This is the first quadrant, without the {y} -axis and the origin. Define {Q_2, Q_3, Q_4} similarly by rotating {Q_1} by {90} , {180} , and {270} degrees, respectively, and note that we have a partition {\mathbb Z^2 \setminus \{0\} = Q_1 \cup Q_2 \cup Q_3 \cup Q_4} .

Let {A_i = A \cap Q_i} . We will show that {|E(A_1, \bar A \cap Q_1)| \geq |A_1|} . Since our graph is invariant under rotations of the plane by {90^{\circ}} , this will imply our goal:

\displaystyle  |E(A, \bar A)| \geq \sum_{i=1}^4 |E(A_i, \bar A \cap Q_i)| \geq \sum_{i=1}^4 |A_i| = |A|\,.

It is immediate that {S(A_1), T(A_1) \subseteq Q_1} . Furthermore, we have {S(A_1) \cap T(A_1) = \emptyset} because {S} maps points in {Q_1} above (or onto) the line {y=x} and {T} maps points of {Q_1} below the line {y=x} . Furthermore, {S} and {T} are bijections, thus {|S(A_1) + T(A_1)| = |S(A_1)| + |T(A_1)| = 2|A_1|\,.} In particular, this yields {|E(A_1, \bar A \cap Q_1)| \geq |A_1|} , as desired. \Box

One can generalize the Gabber-Galil graphs in a few different ways. As a prototypical example, consider the family {\{G_n^{S,S'}\}} for any {S \in GL_2(\mathbb Z)} . An elementary analysis yields the following characterization.

Theorem 3 For any {S =\left(\begin{smallmatrix} a & b \\ c & d \end{smallmatrix}\right) \in GL_2(\mathbb Z)} , it holds that {\{G_n^{S,S'}\}} is an expander family if and only if {(a+d)(b-c) \neq 0} .

For instance, the preceding theorem implies that if {S} has order dividing {4} then {\{G_n^{S,S'}\}} is not a family of expander graphs, but if {S} has order {3, 6, 12} or infinite order (the other possibilities) and {S \neq S'} then the graphs are expanders.

Here is a different sort of generalization. Let {R = \left(\begin{smallmatrix} 0 & 1 \\ 1 & 0 \end{smallmatrix}\right)} be the reflection across the line {y=x} . The Gabber-Galil graphs can also be seen as {G_n^{S,T}} where {S = \left(\begin{smallmatrix} 1 & 1 \\ 0 & 1 \end{smallmatrix}\right)} and {T = RSR} . We can characterize expansion of these graphs as well.

Theorem 4 For any {S =\left(\begin{smallmatrix} a & b \\ c & d \end{smallmatrix}\right) \in GL_2(\mathbb Z)} , it holds that {\{G_n^{S,RSR}\}} is an expander family if and only if {(a+d)(b+c) \neq 0} .

It is an interesting open problem to find a complete characterization of the pairs {S,T \in GL_2(\mathbb Z)} such that {\{G_n^{S,T}\}} is a family of expanders.

May 2, 2012

Solved problems, open problems, and re-solved problems

Filed under: Math, Reading — James Lee @ 12:15 am

Two open problems from this blog were solved recently. First, there is a solution by Bartal, Gottlieb, and Krauthgamer of the Doubling TSP problem. Next, Raghu Meka gave a deterministic procedure to compute the expected supremum of a Gaussian process.

If you’re thirsting for more open problems, I suggest this list on analysis of Boolean functions, compiled by Ryan O’Donnell in conjunction with the 2012 Simons Symposium.

Finally, in the realm of re-proving theorems, let me mention this absolutely gorgeous proof by Lovett and Meka of Spencer’s six standard deviations suffice theorem in discrepancy theory. The proof is constructive (i.e., there exists a polynomial-time algorithm to construct the 2-coloring) as in recent groundbreaking work of Bansal, but unlike Bansal’s proof is independent of Spencer’s bound. It also happens to be elementary and beautiful; my thanks to Aravind Srinivasan for pointing it out.

April 18, 2012

Gabber-Galil analysis of Margulis’ expanders

Filed under: Math — Tags: , , — James Lee @ 8:11 pm

I’m currently teaching a course on spectral graph theory, and I decided to lecture on Margulis’ expander construction and the analysis of its spectral gap by Gabber and Galil. I had never realized how intuitive the analysis could be; the lectures notes I had seen didn’t quite reflect this. In particular, we won’t do any unsightly calculations. Everything I’m about to say is probably well-known, but I thought I would write it down anyway.

The idea is to first start with an initial “expanding object,” and then try to construct a family of graphs out of it. First, consider the infinite graph {G} with vertex set {\mathbb Z^2} . The edges are given by two maps {S} and {T} , where {S(x,y) = (x,x+y)} and {T(x,y) = (x+y,y)} . So the edges are {\{(x,y), S(x,y)\}} and {\{(x,y), T(x,y)\}} . Clearly {(0,0)} is not adjacent to anything. Except for the origin, this graph is an expander in the following sense.

Lemma 1 For any subset {A \subseteq \mathbb Z^2 \setminus \{0\}} , we have

\displaystyle  |E(A)| \geq \frac{1}{3} |A|\,,

where {E(A)} denotes the edges leaving {A} .

The following simple proof is inspired by a paper of Linial and London.

Proof: First consider a subset {A} that does not intersect the coordinate axes. Let {Q_1, Q_2, Q_3, Q_4} represent the four quadrants of {\mathbb Z^2} , and let {A_i = A \cap Q_i} . Consider that {S(A_1), T(A_1) \subseteq Q_1} and {S(A_1) \cap T(A_1) = \emptyset} . The latter fact follows because if {(x,x+y)=(x'+y',y')} then {x'=-y} . Similarly, {S^{-1}(A_2), T^{-1}(A_2) \subseteq Q_2} and {S(A_3), T(A_3) \subseteq Q_3} and {S^{-1}(A_4), T^{-1}(A_4) \subseteq Q_4} , while all the respective pairs {S^{-1}(A_2), T^{-1}(A_2)} and {S(A_3), T(A_3)} and {S^{-1}(A_4), T^{-1}(A_4)} are disjoint.

Combining this with the fact that {S} and {T} are bijections on {\mathbb Z^2} immediately shows that {|S(A) \cup S^{-1}(A) \cup T(A) \cup T^{-1}(A)|} is at least

\displaystyle  |S(A_1)| + |T(A_1)| + |S^{-1}(A_2)| + |T^{-1}(A_2)|

\displaystyle  + |S(A_3)| + |T(A_3)| + |S^{-1}(A_4)| + |T^{-1}(A_4)|

\displaystyle  = 2(|A_1| + |A_2| + |A_3| + |A_4|)

\displaystyle  = 2 |A|\,.

Dealing with the coordinate axes is not too hard. Suppose now that {A \subseteq \mathbb Z^2 \setminus \{0\}} is arbitrary. Let {A_{+} = A \cap \{ (x,0), (0,x) : x \in \mathbb Z \}} . The preceding argument shows {|E(A)| \geq |A\setminus A_+|-|A_+| = |A|-2|A_+|} . We will show that {|E(A)| \geq |A_+|} . Averaging these two inequalities with weights {1/3} and {2/3} completes the proof.

Let {A_\times = A \cap \{ (x,x), (x,-x) : x \in \mathbb Z \}} , and {A_\circ = A \setminus (A_1 \cup A_2)} . Then we have the bounds:

\displaystyle |E(A_+, \bar A)| \geq |A_+| - |A_\times|

\displaystyle |E(A_\times, \bar A)| \geq 2|A_\times| - |A_\circ|

\displaystyle |E(A_\circ, \bar A)| \geq |A_\circ| - |A_\times|\,.

The first equation follows since each element of {A_+} is connected to exactly two elements of {A_{\times}} , and each element of {A_{\times}} is connected to exactly two elements of {A_+} . For instance, {(x,0)} is connected to {(x,x)} and {(x,-x)} , while {(x,x)} is connected to {(x,0)} and {(0,x)} .

The second follows because every point of {A_\times} is connected to two unique points of {A_\circ} , e.g. {(x,x)} is connected to {(x,2x)} and {(2x,x)} . The final inequality follows from the fact that {|E(A_\circ)| \geq |A_\circ|} from our first argument (since {A_{\circ}} does not touch the axes), and because {A_\circ} has no edges to {A_+} . Summing these three inequalities yields {|E(A)| \geq |A_+|} . \Box

Of course, {\mathbb Z^2} is not a finite graph, so for a parameter {n} , we define the graph {G_n=(V_n,E_n)} with vertex set {V_n = \mathbb Z_n \oplus \mathbb Z_n} , where {\mathbb Z_n = \mathbb Z/(n \mathbb Z)} . There are four types of edges in {E_n} : A vertex {(x,y)} is connected to the vertices

\displaystyle \{(x,y+1), (x+1, y), (x,x+y), (x+y,y)\}\,.

This yields a graph of degree at most 8.

Theorem 2 There is a constant {c > 0} such that for every {n \in \mathbb N} ,

\displaystyle  \lambda_2(G_n) \geq c\,,

where {\lambda_2} is the second-smallest eigenvalue of the Laplacian on {G_n} .

Of course, the graphs {\{G_n\}} now have torsion, and thus our expansion result for {\mathbb Z^2} is not, a priori, very useful. The non-trivial idea of the proof is that the torsion doesn’t matter, making Lemma 1 applicable. This is not difficult to show using some Fourier analysis. It turns out to be better though, if we first move to the continuous torus.

Recall that,

\displaystyle  \lambda_2(G_n) = \min_{f : V_n \rightarrow \mathbb R} \left\{\frac{\sum_{uv \in E_n} (f(u)-f(v))^2}{\sum_{u \in V_n} f(u)^2} : \sum_{u \in V_n} f(u) = 0\right\}\,,

Let {\mathbb T^2 = \mathbb R^2/\mathbb Z^2} be the 2-dimensional torus equipped with the Lebesgue measure, and consider the complex Hilbert space

\displaystyle  L^2(\mathbb T^2) = \left\{ f : \mathbb T^2 \rightarrow \mathbb C : \int_{\mathbb T^2} |f|^2 < \infty \right\}\,.

equipped with the inner product {\langle f,g\rangle_{L^2} = \int_{\mathbb T^2} f \bar g} .

We might also define a related value,

\displaystyle   \lambda_2(\mathbb T^2) = \min_{f \in L^2(\mathbb T^2)} \left\{\frac{\|f-f\circ S\|_{L^2}^2 + \|f-f \circ T\|_{L^2}^2}{\|f\|_{L^2}^2} : \int_{\mathbb T^2} f = 0\right\}\,. \ \ \ \ \ (1)


Note that this is just defined as a number; the eigenvalue notation is merely suggestive, but we have not introduced an operator on the torus.

Claim 1 There is some {\varepsilon > 0} such that for any {n \in \mathbb N} , we have {\lambda_2(G_n) \geq \varepsilon \lambda_2(\mathbb T^2)} \

Proof: We simply sketch a proof, which is rather intuitive. Suppose we are given some map {f : V_n \rightarrow \mathbb R} such that {\sum_{u \in V_n} f(u)=0} . Define its continuous extension {\tilde f : \mathbb T^2 \rightarrow \mathbb R} as follows: Under the natural embedding of {\mathbb Z_n \oplus \mathbb Z_n} into {\mathbb T^2} , every point {z \in \mathbb T^2} sits inside a grid square with four corners {u_1,u_2,u_3,u_4 \in \mathbb Z_n \oplus \mathbb Z_n} . Writing a local convex combination {z = \lambda_1 u_1 + \lambda_2 u_2 + \lambda_3 u_3 + \lambda_4 u_4} , we define

\displaystyle  \tilde f(z) = \lambda_1 f(u_1) + \lambda_2 f(u_2) + \lambda_3 f(u_3) + \lambda_4 f(u_4)\,.

Now it is elementary to verify that {\int_{\mathbb T^2} \tilde f = 0} . It is also easy to verify that {\|\tilde f\|^2_{L^2} \geq c\frac{1}{n} \sum_{v \in V} f(v)^2} for some {c > 0} . (Even if {f(u_1)+f(u_2)+f(u_3)+f(u_4)=0} , we still get a contribution from {f(u_1)^2 + f(u_2)^2 + f(u_3)^2 + f(u_4)^2} to {\|\tilde f\|_{L^2}} on this square because we are taking a weighted average.)

Finally, for any {z \in \mathbb T^2} , there is a path of length {O(1)} in {G_n} connecting each of the corners of {z} ‘s square to the corners of {S(z)} ‘s square. A similar fact holds for {T(z)} . In fact, this is the only place where we need to use the fact that edges of the form {(x,y) \leftrightarrow (x,y+1)} and {(x,y) \leftrightarrow (x+1,y)} are present in {G_n} . Thus any contribution {|\tilde f(z)-\tilde f(S(z))|^2} to {\|\tilde f-\tilde f\circ S\|_{L^2}^2} can be charged against a term in {\sum_{uv \in E_n} (f(u)-f(v))^2} , and similarly for {T} . \Box

Now our goal is to show that {\lambda_2(\mathbb T^2) > 0} . We will use the Fourier transform to “remove the torsion.” The point is that {S} and {T} , being shift operators, will act rather nicely on the Fourier basis.

We recall that if {m,n \in \mathbb N} and we define {\chi_{m,n} \in L^2(\mathbb T^2)} by {\chi_{m,n}(x,y) = \exp(2\pi i(mx+ny))} , then {\{\chi_{m,n}\}} forms an orthonormal Hilbert basis for {L^2(\mathbb T^2)} . In particular, every {f \in L^2(\mathbb T^2)} can be written uniquely as

\displaystyle  f = \sum_{m,n \in \mathbb Z} \hat f_{m,n} \chi_{m,n}\,,

where {\hat f_{m,n} = \langle f, \chi_{m,n}\rangle_{L^2}} . The Fourier transform is defined as a map {\mathcal F~:~L^2(\mathbb T^2) \rightarrow \ell^2(\mathbb Z^2)} , where {\mathcal F f(m,n) = \hat f_{m,n}} . In particular, {\mathcal F} is a linear isometry.

Define {S^*(f) = f \circ S} and {T^*(f) = f \circ T} . Then for any {m,n \in \mathbb Z} , we have

\displaystyle  S^*(\chi_{m,n}) = \chi_{m+n,n} \quad\textrm{ and }\quad T^*(\chi_{m,n}) = \chi_{m,n+m}.

In particular, for any {f \in L^2(\mathbb T^2)} , we have {\widehat{S^*(f)} = \sum_{m,n} \hat f_{m-n,n} \chi_{m,n}} and {\widehat{T^*(f)} = \sum_{m,n} \hat f_{m,n-m} \chi_{m,n}} . The final thing to note is that {\hat f(0,0) = \langle f, \chi_{0,0} \rangle = \int_{\mathbb T^2} f} . So now if we simply apply the Fourier transform (a linear isometry) to the expression in (1), we get a reformulation that {\lambda_2(\mathbb T^2)} is precisely

\displaystyle  \min_{g \in \ell^2(\mathbb Z^2)} \left\{ \frac{\sum_{z \in \mathbb Z^2} |g(z)-g(S(z))|^2 + |g(z)-g(T(z))|^2}{\|g\|^2_{\ell^2}}: g(0,0)=0 \right\}\,.

Here we have simply replaced {f} by {\mathcal F f} in (1), and then written {g = \mathcal F f} .

But now recall our initial infinite graph {G} on {\mathbb Z^2} . If we denote by {L_G} the Laplacian on {G} , then we can rewrite this as,

\displaystyle  \lambda_2(\mathbb T^2) = \min_{g \in \ell^2(\mathbb Z^2)} \left\{ \frac{\langle g, L_G g\rangle_{\ell^2}}{\|g\|^2_{\ell^2}}: g(0,0)=0 \right\}\,.

In other words, it is precisely the first Dirichlet eigenvalue on {G} , subject to the boundary condition {g(0,0)=0} .

But now the discrete Cheeger inequality tells us that

\displaystyle  \lambda_2(\mathbb T^2) \geq \frac1{2 d_{\max}} h^2,

where {h} is the minimal expansion of any set not containing the origin. Thus we have indeed unwrapped the torsion and returned to our initial question. Lemma 1 shows that {h \geq 1/3} , yielding the desired lower bound on {\lambda_2(\mathbb T^2)} .

January 11, 2012

A simpler proof of the KPR theorem

1. Introduction

The Klein-Plotkin-Rao (KPR) Theorem is a powerful statement about the geometry of planar graphs and their generalizations. Here, I’ll present a new, very simple proof of the theorem that was discovered in joint work with Cyrus Rashtchian. (This will appear in a preprint soon, together with some new results.) In the next post, I’ll give some applications in geometry and algorithms.

Recall that a graph is planar if it can be drawn in the plane without any edge crossings. Wagner’s theorem gives an intrinsic characterization of planar graphs in terms of excluded minors. Recall that a graph {H} is a minor of a graph {G} if {H} can be obtained from {G} by a sequence of (i) edge and vertex deletions and (ii) contraction of edges. A graph {G} excludes {H} as a minor if {H} is not a minor of {G}. Kuratowki’s theorem states that planar graphs are precisely those which exclude both {K_5} and {K_{3,3}} as minors, where we use {K_h} and {K_{h,h}} to denote the complete graph on {h} vertices and the complete {h}-by-{h} bipartite graph, respectively. In this post, we are particularly concerned with {K_h}-minor-free graphs, i.e. those which exclude {K_h} as a minor for some {h \geq 2}.

I’ll first state and prove a simpler version of the KPR theorem. In the next post, I’ll discuss a stronger statement (in the language of random partitions) that follows directly from the proof. Then using these partitions, we will show that the observable diameter of every {K_h}-minor-free graph is “large,” and use that fact to prove an upper bound on the uniform multi-commodity flow gap in such graphs.

2. Low-diameter graph partitioning

Consider a finite graph {G=(V,E)} equipped with its shortest-path metric {d_G} (much of what we say here extends to infinite graphs). For now, all the edges of {G} will have length one, although we will generalize to arbitrary weighted graphs for some applications in the next post.

Given a subset {S \subseteq V}, we write {\mathsf{diam}(S) = \max_{x,y \in S} d_G(x,y)\,.} A weak but simpler version of the KPR Theorem can be stated as follows.

Theorem 1 Let {G=(V,E)} be a graph that excludes {K_h} as a minor. Then for every number {\Delta \geq 1}, there exists a partition {V = S_1 \cup S_2 \cup \cdots \cup S_m} such that {\mathsf{diam}(S_i) \leq \Delta} for every {i=1,2,\ldots,m} and at most an {O(h^2/\Delta)}-fraction of edges of {G} go between different sets in the partition.

The theorem was originally proved with a dependence of {O(h^3)}, but this was improved to {O(h^2)} by Fakcharoenphol and Talwar. Today I will prove the {O(h^2)} bound. The partitioning will be accomplished via an iterative operation which we will call chopping.

Consider any connected graph {H} and a number {\tau \geq 1}. We will describe an operation which we call a {\tau}-chop of {H}. Let {x_0 \in V(H)} be any node of {H}, which we will call the “root node” of the chop, and let {r_0 \in [1,\tau]} (the “initial offset”).

Figure 1

The chopping operation is as follows: We partition {V(H)=\bigcup_{j=0}^{\infty} A_j}, where {A_0 = \{ v \in V(H) : d_H(x_0, v) < r_0 \}}, and the rest of the sets are the disjoint annuli,

\displaystyle  A_j =\left\{ v \in V(H) : r_0 + (j-1)\tau \leq d_H(x_0, v) < r_0 + j\tau \right\}\,,

for {j=1,2,\ldots}. See Figure 1 for an example of these cuts (the red and blue alternating regions) on a grid graph. Of course, since {H} is finite, eventually the annuli are empty.

We define a {\tau}-chop on a possibly disconnected graph as the partition arising from doing a {\tau}-chop on each of its connected components. Finally, we define a {\tau}-chop on a sequence of disjoint sets {S_1, S_2, \ldots, S_k \subseteq V} as the result of doing a {\tau}-chop on each of the induced graphs {G[S_i]} for {i=1,2,\ldots,m}. Thus if we have an initial partition {P} of {V}, then a {\tau}-chop of {P} produces a refined partition {P'} of {V}. See Figure 2 for the result of two iterated \tau-chops applied to the grid graph. The yellow circles represent the root nodes in the second chop.

Figure 2

We can now state the main technical result needed to prove Theorem 1.

Lemma 2 If {G=(V,E)} excludes {K_h} as a minor, then for any {\tau \geq 1}, any sequence of {h-1} iterated {\tau}-chops on {V} results in a partition {V = S_1 \cup S_2 \cup \cdots \cup S_m} such that {\mathsf{diam}(S_i) \leq O(h \tau)} for each {i=1,2,\ldots,m}.

Observe that {\mathsf{diam}(\cdot)} refers to the diameter in {d_G}, the shortest-path metric on {G}. Also, note that we do not constrain the root node or the initial offset of the chops. Klein, Plotkin, and Rao prove this lemma with a dependence of {O(h^2 \tau)} on the diameter. FT use a more complicated approach.

To see how Lemma 2 implies Theorem 1, one proceeds as follows. First, let {C} be large enough so that setting {\tau = \Delta/(Ch)} in Lemma 2 yields a partition into sets of diameter at most {\Delta}. After fixing the root node for a {\tau}-chop of {G}, one can consider the initial offsets {r_0=1,2,\ldots,\tau}. An edge {\{x,y\}} can be cut (i.e. {x} and {y} end up in distinct sets of the partition) in at most one of these offsets. Thus there exists an offset that cuts only a {1/\tau = O(h/\Delta)}-fraction of edges. Since we perform {h-1} iterated {\tau}-chops, there exists a choice of initial offsets that cuts at most {(h-1)/\tau = O(h^2/\Delta)}-fraction of edges. That completes the reduction.

3. A sketch

Before moving onto the formal argument, I’ll present a simple sketch that contains the main ideas. The proof is by contradiction; if we perform a sequence of {h} {\tau}-chops and the diameter of any remaining piece fails to be {O(h \tau)}, then we will construct a {K_{h+1}} minor in {G}.

First, we give an equivalent characterization of when a graph {G=(V,E)} has a graph {H} as a minor: There exist disjoint connected subsets {S_1, S_2, \ldots, S_{|V(H)|} \subseteq V}, one corresponding to each vertex of {H}. We call these supernodes. Furthermore, there should be an edge between supernodes whenever there is an edge between the corresponding vertices in {H}.

Figure 3

Now, the proof is by induction. Note that the base case {h=0} is trivial since a {K_1} minor is a single vertex. By induction, we can assume that if a sequence of {h} chops fails, then there must be a {K_h} minor contained in some offending annulus. See Figure 3. If we could ensure that every supernode of the minor touched the upper boundary of the annulus as in Figure 3, we could easily construct a {K_{h+1}} minor and be done, by simplying choosing the {(h+1)}-st supernode to be a ball around {x_0}.

Thus we need to enforce this extra property of our {K_h} minor. The (very simple) idea is contained in Figure 4.

Figure 4

After finding a {K_h} minor that intersects the annuli, we extend the supernodes to touch the upper boundary of the annulus from the preceding chop (which is represented by the purple line in the picture). The point is that we can choose these paths to be contained above the red boundary (and thus disjoint from the {K_h} supernodes), and also each of length at most {2\tau+1} since the width of the “purple” annulus is only {\tau}. The same can be done for {x_0}. (If we didn’t care that the paths have to be above the red boundary, we could choose them of length only {\tau}.)

The only issue is that we need these new paths to be disjoint. Since the paths are always short (length at most {O(\tau)}), we can enforce this by making sure that each supernode contains a representative and these representatives are pairwise far apart; then we grow the paths from the representatives. Initially, the representatives will be {\Omega(h \tau)} apart, and then as we go up the inductive chain, they will get closer by at most {O(\tau)} at every step. By choosing the initial separation large enough, they will remain disjoint. That’s the sketch; it should be possible to reproduce the proof from the sketch alone, but we now present a more formal proof.

4. The proof

We need a couple definitions. First, given a subset of vertices {V_0 \subseteq V} and a number {\tau \geq 0}, we say that a set {S} is {\tau}-dense in {V_0} if every element of {V_0} can reach an element of {S} by a path of length {\tau} that is contained completely in {G[V_0]} (the induced graph on {V_0}). Second, we say that an {H}-minor is {R}-represented by {S} if every supernode of {H} contains a representative from {S} and these representatives are pairwise distance more than {R} apart in the metric {d_G} (the global shortest-path metric on {G}). We now state a lemma that we can prove by induction and implies Lemma 2.

Lemma 3 Let {h \geq j \geq 0} and {\tau \geq 1} be any numbers. Suppose {G=(V,E)} is a graph and there is any sequence of {j} iterated {\tau}-chops on {G} that leaves a component of diameter more than {16 h\tau}. Then for any set {S} that is {\tau}-dense in {V}, one can find a {K_{j+1}}-minor that is {6(h-j)\tau}-represented by {S}.

One can recover Lemma 2 by setting {h=j} and {S=V}.

Proof: We proceed by induction on {j}. The case {j=0} is trivial since a {K_1} minor is simply a single vertex. Thus we may assume that {h \geq j \geq 1}.

The following figure will be a useful reference.

Figure 5

In general, we argue as follows. Let {S} be any set satisfying the assumptions of the lemma. Assume there is a sequence of {j} iterated {\tau}-chops that leaves a component of diameter more than {16 h \tau}. Then there must be some annulus {A_k} of the first {\tau}-chop such that {j-1} iterated {\tau}-chops on {A_k} leaves a component of diameter more than {16 h \tau}.

Suppose the first {\tau}-chop has root node {x_0 \in V} and initial offset {r_0}. Let {S' \subseteq A_k} be the set of nodes at distance exactly {r_0 + (k-1) \tau}, i.e. the upper boundary of {A_k}. Observe that {S'} is {\tau}-dense in {A_k} by construction. Thus by induction, there is a {K_{j}}-minor in {G[A_k]} that is {6(h-j+1)\tau}-represented by {S'}. Let {V_1, V_2, \ldots, V_j} be the supernodes of this minor, and let {\{ v_i \in S' \cap V_i : i=1,2,\ldots,j \}} be the representatives which are further than {6(h-j+1)\tau} apart.

We now extend this to a {K_{j+1}}-minor which is {6(h-j)\tau}-represented by {S}. First, we may assume that {k \geq 6h+1}. Otherwise, all the points of {A_k} lie in a ball of radius at most {r_0 + k \tau \leq (6h+2)\tau}, and hence {A_k} has diameter at most {(12h+4)\tau \leq 16h \tau}. In particular, we know that {d_G(x_0, v_i) \geq 6h\tau} for every {i=1,2,\ldots, j}.

Now for each {i=1,2,\ldots,j}, we choose a point {v_i' \in S \setminus A_i} such that {v_i'} is connected to {v_i} by a path of length at most {3\tau} in {G}. This can be done by first going up a shortest-path from {v_i} to {x_0} of length {\tau+1} to reach a point {v_i''}, and then choosing any point of {S} within distance {\tau} of {v_i''} (which can always be done since {S} is {\tau}-dense in {V}). We add this path to {V_i} to get a new supernode {V_i'}. Observe that the sets {\{V_i' : i=1,2,\ldots,j\}} are all connected and pairwise disjoint since the new paths are outside the annulus {A_k} and the paths themselves are pairwise disjoint because they are each of length at most {3\tau}, but they emanate from representatives that are more than {6(h-j+1)\tau \geq 6\tau} apart. In fact, this also shows that the representatives {\{ v_i' \in S : i=1,2,\ldots,j \}} are further than {6(h-j)\tau} apart, as required.

Finally, we construct a new supernode {V_0} as follows. For each {i=1,2,\ldots, j}, let {u_i \in V_i'} be the closest node to {x_0}, and let {s_0 \in S} be the closest node in {S} to {x_0}. For each {i}, let {P_i} be a shortest-path from {x_0} to {u_i} without its endpoint {u_i}, and let {P} be a shortest-path to {s_0}, including {s_0}. We now set {V_0 = P \cup P_1 \cup P_2 \cup \cdots \cup P_j}. We claim that {\{ V_0, V_1', \ldots, V_j' \}} forms a {K_{j+1}}-minor which is {6(h-j)\tau}-represented by {S}. First, it is clear that {d_G(s_0, v_i) > 6(h-j)\tau} since {d_G(s_0, x_0) \leq \tau} (again, because {S} is {\tau}-dense in {V}). For the same reason, the path {P} is disjoint from all the sets {\{V_i'\}}.

Thus the only possible obstruction to having a valid {K_{j+1}}-minor is if some path {P_i} intersects a set {V'_{\ell}} for {i \neq \ell}. We now show that this cannot happen. We know that if {P_i} intersects {V'_{\ell}}, then it must have already traveled distance at least {(k-1)\tau - 3\tau} away from {x_0}. But {P_i} contains a node adjacent to {V_i} (by construction), which means it continues an additional distance of {3\tau} (the distance between {V'_{\ell} \setminus V_{\ell}} and {V'_{i} \setminus V_i)}. This additional distance is also moving away from {x_0}, implying that {P_i} intersects {A_k}, which is impossible. This completes the proof.

5. The dependence on {h}

The best-known lower bound requires the conclusion of Theorem 1 to cut at least an {\Omega((\log h)/\Delta)}-fraction of edges. (One can take {G=(V,E)} to be an {n}-vertex 3-regular expander graph, which obviously excludes {K_n} as a minor. Now it is easy to see that for some constant {c}, partitioning into pieces of diameter at most {c \log n} must cut at least an {\Omega(1)}-fraction of edges.) In some special cases, e.g. graphs of genus {g} (which exclude {K_{c \lceil \sqrt{g}\rceil}} as a minor for some constant {c > 0}), one can reduce the bound to {O(\log g)} (see this joint work with Sidiropoulos). This leads to the following open problem.

Open problem: Show that under the assumptions of Theorem 1, one can find a partition that cuts only an {O((\log h)/\Delta)}-fraction of edges.

A positive resolution would yield an optimal unifom multi-commodity flow/cut gap for {K_h}-minor-free graphs. (See the next post for details.)

February 23, 2011

PSD lifting and Unique Games integrality gaps

Filed under: Math, Open question — Tags: , , — James Lee @ 9:51 am

By now, it is known that integrality gaps for the standard Unique Games SDP (see the paper of Khot and Vishnoi or Section 5.2 of this post) can be used to obtain integrality gaps for many other optimization problems, and often for very strong SDPs coming from various methods of SDP tightening; see, for instance, the paper of Raghavendra and Steurer.

Problematically, the Khot-Vishnoi gap is rather inefficient: To achieve the optimal gap for Unique Games with alphabet size {L}, one needs an instance of size {\exp(\Omega(L))}. As far as I know, there is no obstacle to achieving a gap instance where the number of variables is only {\mathrm{poly}(L)}.

The Walsh-Hadamard code

The Khot-Vishnoi construction is based on the Hadamard code.
(See Section 5.2 here for a complete description.) If we use {L^2(\{-1,1\}^k)} to denote the Hilbert space of real-valued functions {f : \{-1,1\}^k \rightarrow \mathbb R}, then the Walsh-Hadamard basis of {L^2(\{-1,1\}^k))} is the set of functions of the form

\displaystyle  u_S(x) = \prod_{i \in S} x_i,

where {S \subseteq \{1,2,\ldots,k\}}.

Of course, for two such sets {S \neq T}, we have the orthogonality relations,

\displaystyle  \langle u_S, u_T \rangle = 0.

In their construction, the variables are essentially all functions of the form {f : \{-1,1\}^k \rightarrow \{-1,1\}}, of which there are {2^{2^k}}, while there are only {2^k} basis elements {\{u_S\}_{S \subseteq [k]}} which act as the alphabet for the underlying Unique Games instance. This is what leads to the exponential relationship between the number of variables and the label size.

A PSD lifting question

In an effort to improve this dependence, one could start with a much larger set of nearly orthogonal vectors, and then somehow lift them to a higher-dimensional space where they would become orthogonal. In order for the value of the SDP not to blow up, it would be necessary that this map has some kind of Lipschitz property. We are thus led to the following (possibly naïve) question.

Let {C(d,\varepsilon)} be the smallest number such that the following holds. (Here, {S^{d-1} \subseteq \mathbb R^d} denotes the {(d-1)}-dimensional unit sphere and S(L^2) denotes the unit-sphere of L^2.)

There exists a map {F : S^{d-1} \rightarrow S(L^2)} such that {\|F\|_{\mathrm{Lip}} \leq C(d,\varepsilon)} and whenever {u,v \in \mathbb R^d} satisfy {|\langle u,v\rangle| \leq \varepsilon}, we have {\langle F(u), F(v)\rangle = 0}.

(Recall that \|F\|_{\mathrm{Lip}} = \sup_{x \neq y \in S^{d-1}} \|F(x)-F(y)\|/\|x-y\|.)

One can show that

\displaystyle C(d,\varepsilon) \lesssim \frac{\sqrt{d}}{1-\varepsilon}

by randomly partitioning {S^{d-1}} so that all vectors satisfying {|\langle u,v\rangle| \leq \varepsilon} end up in different sets of the partition, and then mapping all the points in a set to a different orthogonal vector.

My question is simply: Is a better dependence on {d} possible? Can one rule out that {C(d,\varepsilon)} could be independent of {d}? Note that any solution which randomly maps points to orthogonal vectors must incur such a blowup (this is essentially rounding the SDP to an integral solution).

Older Posts »

Theme: Shocking Blue Green. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 44 other followers