Separated sets in unions of frames

In celebration of the recent resolution of the Kadison-Singer problem by Marcus, Spielman, and Srivastava, here is a question on isotropic point sets on which Kadison-Singer does not (seem to) shed any light. A positive resolution would likely have strong implications for the Sparsest Cut problem and SDP hierarchies. The question arose in discussions with Shayan Oveis Gharan, Prasad Raghavendra, and David Steurer.

Open Question: Do there exist constants \varepsilon, \delta > 0 such that for any k \in \mathbb N, the following holds? Let \mathcal B_1, \mathcal B_2, \ldots, \mathcal B_k \subseteq \mathbb R^k be a collection of k orthonormal bases and define W = \mathcal B_1 \cup \mathcal B_2 \cup \cdots \cup \mathcal B_k. Then there are subsets A,B \subseteq W with |A|,|B| \geq \varepsilon |W| and \min_{x \in A, y \in B} \|x-y\|_2 \geq \delta.

[Some additional notes: One piece of intuition for why the question should have a positive resolution is that these k orthonormal bases which together comprise at most k^2 vectors cannot possibly “fill” k-dimensional space in a way that achieves k-dimensional isoperimetry. One would seem to need \exp(O(k)) points for this.

One can state an equivalent question in terms of vertex expansion. Say that a graph G=(V,E) on k^2 vertices is a vertex expander if |N(S)| \geq 1.1 |S| for all subsets S \subseteq V with |S| \leq |V|/2. Here, N(S) denotes all the nodes that are in S or are adjacent to S. Then one can ask whether there exists a 1-1 mapping from V to \mathcal B_1 \cup \cdots \cup \mathcal B_k for some orthonormal bases \{\mathcal B_i\} such that the endpoints of every edge are mapped at most o(1) apart (as k \to \infty).]

Open question recap

In light of the nearly immediate failure of the last open question, here is a recap of the progress on the few somewhat more legitimate questions appearing here over the past few years. (Note that most of these questions are not original and appeared other places as well.)

Open question: Separated sets in isotropic measures

[Note: This question is trivially impossible, though I leave the original form below. If \mu is the uniform measure on S^{k-1}, then any two sets of \Omega(1) measure are at distance O(1/\sqrt{k}) by concentration of measure. Thus it is impossible to find k sets satisfying the desired bound. I will try to think of the right question and repost. Unfortunately, I now recall having gone down this path a few times before, and I fear there may not be a relevant elementary open question after all.]

Here is an interesting and elementary (to state) open question whose resolution would give an optimal higher-order Cheeger esimate. The goal is to replace the factor {k^{O(1)}} in Theorem 1 of the previous post with a {\mathrm{polylog}(k)} factor, or even an optimal bound of {\sqrt{\log k}} .

Fix {k \in \mathbb N} and let {\mu} denote a probability measure on the {(k-1)} -dimensional sphere {S^{k-1}} . For the purpose of this question, one can assume that {\mu} is supported on a finite number of points. Suppose also that {\mu} is isotropic in the sense that, for every {\theta \in S^{k-1}} ,

\displaystyle  \int_{S^{k-1}} \langle x, \theta\rangle^2 \,d\mu(x)= \frac{1}{k}\,.

Now our goal is to find, for some {\varepsilon, \delta > 0} , a collection of (measurable) subsets {U_1, U_2, \ldots, U_k \subseteq S^{k-1}} such that {\mu(U_i) \geq \varepsilon/k} for each {i=1,2,\ldots,k} , and such that for {i \neq j} , we have

\displaystyle  \min_{x \in U_i, y \in U_j} \|x-y\|_2 \geq \delta\,.

In Lemma 5 of the previous post, we proved that this is possible for {\varepsilon \gtrsim 1} and {\delta \gtrsim k^{-5/2}} . In this paper, we improve the estimate on {\delta} somewhat, though the best-known is still {\delta \gtrsim k^{-c}} for some {c > 0} .

Is it possible to achieve {\varepsilon, \delta \gtrsim 1/\mathrm{poly}(\log k)} ? One could even hope for {\varepsilon \gtrsim 1} and {\delta \gtrsim 1/(\log k)} .

Two extreme cases are when (1) {\mu} is uniform on {S^{k-1}} , or (2) {\mu} is uniform on the standard basis {\{e_1,e_2,\ldots,e_k\}} . In the first case, one can easily achieve {\varepsilon,\delta \gtrsim 1} (in fact, one could even get {100k} sets satisfying these bounds). In the second case, taking each set to contain a single basis vector achieves {\varepsilon = 1} and {\delta = \sqrt{2}} .

If one only wants to find, say, {\lceil 3k/4\rceil} sets instead of {k} sets, then it is indeed possible to achieve {\varepsilon \gtrsim 1} and {\delta \gtrsim 1/O(\log k)} . [Update: This is actually impossible for the reasons stated above. To get the corresponding bounds, we do a random projection into O(\log k) dimensions. This only preserves the original Euclidean distances on average.] Furthermore, for {\varepsilon \gtrsim 1} , this bound on {\delta} is asymptotically the best possible.

A simpler proof of the KPR theorem

1. Introduction

The Klein-Plotkin-Rao (KPR) Theorem is a powerful statement about the geometry of planar graphs and their generalizations. Here, I’ll present a new, very simple proof of the theorem that was discovered in joint work with Cyrus Rashtchian. (This will appear in a preprint soon, together with some new results.) In the next post, I’ll give some applications in geometry and algorithms.

Recall that a graph is planar if it can be drawn in the plane without any edge crossings. Wagner’s theorem gives an intrinsic characterization of planar graphs in terms of excluded minors. Recall that a graph {H} is a minor of a graph {G} if {H} can be obtained from {G} by a sequence of (i) edge and vertex deletions and (ii) contraction of edges. A graph {G} excludes {H} as a minor if {H} is not a minor of {G}. Kuratowki’s theorem states that planar graphs are precisely those which exclude both {K_5} and {K_{3,3}} as minors, where we use {K_h} and {K_{h,h}} to denote the complete graph on {h} vertices and the complete {h}-by-{h} bipartite graph, respectively. In this post, we are particularly concerned with {K_h}-minor-free graphs, i.e. those which exclude {K_h} as a minor for some {h \geq 2}.

I’ll first state and prove a simpler version of the KPR theorem. In the next post, I’ll discuss a stronger statement (in the language of random partitions) that follows directly from the proof. Then using these partitions, we will show that the observable diameter of every {K_h}-minor-free graph is “large,” and use that fact to prove an upper bound on the uniform multi-commodity flow gap in such graphs.

2. Low-diameter graph partitioning

Consider a finite graph {G=(V,E)} equipped with its shortest-path metric {d_G} (much of what we say here extends to infinite graphs). For now, all the edges of {G} will have length one, although we will generalize to arbitrary weighted graphs for some applications in the next post.

Given a subset {S \subseteq V}, we write {\mathsf{diam}(S) = \max_{x,y \in S} d_G(x,y)\,.} A weak but simpler version of the KPR Theorem can be stated as follows.

Theorem 1 Let {G=(V,E)} be a graph that excludes {K_h} as a minor. Then for every number {\Delta \geq 1}, there exists a partition {V = S_1 \cup S_2 \cup \cdots \cup S_m} such that {\mathsf{diam}(S_i) \leq \Delta} for every {i=1,2,\ldots,m} and at most an {O(h^2/\Delta)}-fraction of edges of {G} go between different sets in the partition.

The theorem was originally proved with a dependence of {O(h^3)}, but this was improved to {O(h^2)} by Fakcharoenphol and Talwar. Today I will prove the {O(h^2)} bound. The partitioning will be accomplished via an iterative operation which we will call chopping.

Consider any connected graph {H} and a number {\tau \geq 1}. We will describe an operation which we call a {\tau}-chop of {H}. Let {x_0 \in V(H)} be any node of {H}, which we will call the “root node” of the chop, and let {r_0 \in [1,\tau]} (the “initial offset”).

Figure 1

The chopping operation is as follows: We partition {V(H)=\bigcup_{j=0}^{\infty} A_j}, where {A_0 = \{ v \in V(H) : d_H(x_0, v) < r_0 \}}, and the rest of the sets are the disjoint annuli,

\displaystyle  A_j =\left\{ v \in V(H) : r_0 + (j-1)\tau \leq d_H(x_0, v) < r_0 + j\tau \right\}\,,

for {j=1,2,\ldots}. See Figure 1 for an example of these cuts (the red and blue alternating regions) on a grid graph. Of course, since {H} is finite, eventually the annuli are empty.

We define a {\tau}-chop on a possibly disconnected graph as the partition arising from doing a {\tau}-chop on each of its connected components. Finally, we define a {\tau}-chop on a sequence of disjoint sets {S_1, S_2, \ldots, S_k \subseteq V} as the result of doing a {\tau}-chop on each of the induced graphs {G[S_i]} for {i=1,2,\ldots,m}. Thus if we have an initial partition {P} of {V}, then a {\tau}-chop of {P} produces a refined partition {P'} of {V}. See Figure 2 for the result of two iterated \tau-chops applied to the grid graph. The yellow circles represent the root nodes in the second chop.

Figure 2

We can now state the main technical result needed to prove Theorem 1.

Lemma 2 If {G=(V,E)} excludes {K_h} as a minor, then for any {\tau \geq 1}, any sequence of {h-1} iterated {\tau}-chops on {V} results in a partition {V = S_1 \cup S_2 \cup \cdots \cup S_m} such that {\mathsf{diam}(S_i) \leq O(h \tau)} for each {i=1,2,\ldots,m}.

Observe that {\mathsf{diam}(\cdot)} refers to the diameter in {d_G}, the shortest-path metric on {G}. Also, note that we do not constrain the root node or the initial offset of the chops. Klein, Plotkin, and Rao prove this lemma with a dependence of {O(h^2 \tau)} on the diameter. FT use a more complicated approach.

To see how Lemma 2 implies Theorem 1, one proceeds as follows. First, let {C} be large enough so that setting {\tau = \Delta/(Ch)} in Lemma 2 yields a partition into sets of diameter at most {\Delta}. After fixing the root node for a {\tau}-chop of {G}, one can consider the initial offsets {r_0=1,2,\ldots,\tau}. An edge {\{x,y\}} can be cut (i.e. {x} and {y} end up in distinct sets of the partition) in at most one of these offsets. Thus there exists an offset that cuts only a {1/\tau = O(h/\Delta)}-fraction of edges. Since we perform {h-1} iterated {\tau}-chops, there exists a choice of initial offsets that cuts at most {(h-1)/\tau = O(h^2/\Delta)}-fraction of edges. That completes the reduction.

3. A sketch

Before moving onto the formal argument, I’ll present a simple sketch that contains the main ideas. The proof is by contradiction; if we perform a sequence of {h} {\tau}-chops and the diameter of any remaining piece fails to be {O(h \tau)}, then we will construct a {K_{h+1}} minor in {G}.

First, we give an equivalent characterization of when a graph {G=(V,E)} has a graph {H} as a minor: There exist disjoint connected subsets {S_1, S_2, \ldots, S_{|V(H)|} \subseteq V}, one corresponding to each vertex of {H}. We call these supernodes. Furthermore, there should be an edge between supernodes whenever there is an edge between the corresponding vertices in {H}.

Figure 3

Now, the proof is by induction. Note that the base case {h=0} is trivial since a {K_1} minor is a single vertex. By induction, we can assume that if a sequence of {h} chops fails, then there must be a {K_h} minor contained in some offending annulus. See Figure 3. If we could ensure that every supernode of the minor touched the upper boundary of the annulus as in Figure 3, we could easily construct a {K_{h+1}} minor and be done, by simplying choosing the {(h+1)}-st supernode to be a ball around {x_0}.

Thus we need to enforce this extra property of our {K_h} minor. The (very simple) idea is contained in Figure 4.

Figure 4

After finding a {K_h} minor that intersects the annuli, we extend the supernodes to touch the upper boundary of the annulus from the preceding chop (which is represented by the purple line in the picture). The point is that we can choose these paths to be contained above the red boundary (and thus disjoint from the {K_h} supernodes), and also each of length at most {2\tau+1} since the width of the “purple” annulus is only {\tau}. The same can be done for {x_0}. (If we didn’t care that the paths have to be above the red boundary, we could choose them of length only {\tau}.)

The only issue is that we need these new paths to be disjoint. Since the paths are always short (length at most {O(\tau)}), we can enforce this by making sure that each supernode contains a representative and these representatives are pairwise far apart; then we grow the paths from the representatives. Initially, the representatives will be {\Omega(h \tau)} apart, and then as we go up the inductive chain, they will get closer by at most {O(\tau)} at every step. By choosing the initial separation large enough, they will remain disjoint. That’s the sketch; it should be possible to reproduce the proof from the sketch alone, but we now present a more formal proof.

4. The proof

We need a couple definitions. First, given a subset of vertices {V_0 \subseteq V} and a number {\tau \geq 0}, we say that a set {S} is {\tau}-dense in {V_0} if every element of {V_0} can reach an element of {S} by a path of length {\tau} that is contained completely in {G[V_0]} (the induced graph on {V_0}). Second, we say that an {H}-minor is {R}-represented by {S} if every supernode of {H} contains a representative from {S} and these representatives are pairwise distance more than {R} apart in the metric {d_G} (the global shortest-path metric on {G}). We now state a lemma that we can prove by induction and implies Lemma 2.

Lemma 3 Let {h \geq j \geq 0} and {\tau \geq 1} be any numbers. Suppose {G=(V,E)} is a graph and there is any sequence of {j} iterated {\tau}-chops on {G} that leaves a component of diameter more than {16 h\tau}. Then for any set {S} that is {\tau}-dense in {V}, one can find a {K_{j+1}}-minor that is {6(h-j)\tau}-represented by {S}.

One can recover Lemma 2 by setting {h=j} and {S=V}.

Proof: We proceed by induction on {j}. The case {j=0} is trivial since a {K_1} minor is simply a single vertex. Thus we may assume that {h \geq j \geq 1}.

The following figure will be a useful reference.

Figure 5

In general, we argue as follows. Let {S} be any set satisfying the assumptions of the lemma. Assume there is a sequence of {j} iterated {\tau}-chops that leaves a component of diameter more than {16 h \tau}. Then there must be some annulus {A_k} of the first {\tau}-chop such that {j-1} iterated {\tau}-chops on {A_k} leaves a component of diameter more than {16 h \tau}.

Suppose the first {\tau}-chop has root node {x_0 \in V} and initial offset {r_0}. Let {S' \subseteq A_k} be the set of nodes at distance exactly {r_0 + (k-1) \tau}, i.e. the upper boundary of {A_k}. Observe that {S'} is {\tau}-dense in {A_k} by construction. Thus by induction, there is a {K_{j}}-minor in {G[A_k]} that is {6(h-j+1)\tau}-represented by {S'}. Let {V_1, V_2, \ldots, V_j} be the supernodes of this minor, and let {\{ v_i \in S' \cap V_i : i=1,2,\ldots,j \}} be the representatives which are further than {6(h-j+1)\tau} apart.

We now extend this to a {K_{j+1}}-minor which is {6(h-j)\tau}-represented by {S}. First, we may assume that {k \geq 6h+1}. Otherwise, all the points of {A_k} lie in a ball of radius at most {r_0 + k \tau \leq (6h+2)\tau}, and hence {A_k} has diameter at most {(12h+4)\tau \leq 16h \tau}. In particular, we know that {d_G(x_0, v_i) \geq 6h\tau} for every {i=1,2,\ldots, j}.

Now for each {i=1,2,\ldots,j}, we choose a point {v_i' \in S \setminus A_i} such that {v_i'} is connected to {v_i} by a path of length at most {3\tau} in {G}. This can be done by first going up a shortest-path from {v_i} to {x_0} of length {\tau+1} to reach a point {v_i''}, and then choosing any point of {S} within distance {\tau} of {v_i''} (which can always be done since {S} is {\tau}-dense in {V}). We add this path to {V_i} to get a new supernode {V_i'}. Observe that the sets {\{V_i' : i=1,2,\ldots,j\}} are all connected and pairwise disjoint since the new paths are outside the annulus {A_k} and the paths themselves are pairwise disjoint because they are each of length at most {3\tau}, but they emanate from representatives that are more than {6(h-j+1)\tau \geq 6\tau} apart. In fact, this also shows that the representatives {\{ v_i' \in S : i=1,2,\ldots,j \}} are further than {6(h-j)\tau} apart, as required.

Finally, we construct a new supernode {V_0} as follows. For each {i=1,2,\ldots, j}, let {u_i \in V_i'} be the closest node to {x_0}, and let {s_0 \in S} be the closest node in {S} to {x_0}. For each {i}, let {P_i} be a shortest-path from {x_0} to {u_i} without its endpoint {u_i}, and let {P} be a shortest-path to {s_0}, including {s_0}. We now set {V_0 = P \cup P_1 \cup P_2 \cup \cdots \cup P_j}. We claim that {\{ V_0, V_1', \ldots, V_j' \}} forms a {K_{j+1}}-minor which is {6(h-j)\tau}-represented by {S}. First, it is clear that {d_G(s_0, v_i) > 6(h-j)\tau} since {d_G(s_0, x_0) \leq \tau} (again, because {S} is {\tau}-dense in {V}). For the same reason, the path {P} is disjoint from all the sets {\{V_i'\}}.

Thus the only possible obstruction to having a valid {K_{j+1}}-minor is if some path {P_i} intersects a set {V'_{\ell}} for {i \neq \ell}. We now show that this cannot happen. We know that if {P_i} intersects {V'_{\ell}}, then it must have already traveled distance at least {(k-1)\tau - 3\tau} away from {x_0}. But {P_i} contains a node adjacent to {V_i} (by construction), which means it continues an additional distance of {3\tau} (the distance between {V'_{\ell} \setminus V_{\ell}} and {V'_{i} \setminus V_i)}. This additional distance is also moving away from {x_0}, implying that {P_i} intersects {A_k}, which is impossible. This completes the proof.

5. The dependence on {h}

The best-known lower bound requires the conclusion of Theorem 1 to cut at least an {\Omega((\log h)/\Delta)}-fraction of edges. (One can take {G=(V,E)} to be an {n}-vertex 3-regular expander graph, which obviously excludes {K_n} as a minor. Now it is easy to see that for some constant {c}, partitioning into pieces of diameter at most {c \log n} must cut at least an {\Omega(1)}-fraction of edges.) In some special cases, e.g. graphs of genus {g} (which exclude {K_{c \lceil \sqrt{g}\rceil}} as a minor for some constant {c > 0}), one can reduce the bound to {O(\log g)} (see this joint work with Sidiropoulos). This leads to the following open problem.

Open problem: Show that under the assumptions of Theorem 1, one can find a partition that cuts only an {O((\log h)/\Delta)}-fraction of edges.

A positive resolution would yield an optimal unifom multi-commodity flow/cut gap for {K_h}-minor-free graphs. (See the next post for details.)

PSD lifting and Unique Games integrality gaps

By now, it is known that integrality gaps for the standard Unique Games SDP (see the paper of Khot and Vishnoi or Section 5.2 of this post) can be used to obtain integrality gaps for many other optimization problems, and often for very strong SDPs coming from various methods of SDP tightening; see, for instance, the paper of Raghavendra and Steurer.

Problematically, the Khot-Vishnoi gap is rather inefficient: To achieve the optimal gap for Unique Games with alphabet size {L}, one needs an instance of size {\exp(\Omega(L))}. As far as I know, there is no obstacle to achieving a gap instance where the number of variables is only {\mathrm{poly}(L)}.

The Walsh-Hadamard code

The Khot-Vishnoi construction is based on the Hadamard code.
(See Section 5.2 here for a complete description.) If we use {L^2(\{-1,1\}^k)} to denote the Hilbert space of real-valued functions {f : \{-1,1\}^k \rightarrow \mathbb R}, then the Walsh-Hadamard basis of {L^2(\{-1,1\}^k))} is the set of functions of the form

\displaystyle  u_S(x) = \prod_{i \in S} x_i,

where {S \subseteq \{1,2,\ldots,k\}}.

Of course, for two such sets {S \neq T}, we have the orthogonality relations,

\displaystyle  \langle u_S, u_T \rangle = 0.

In their construction, the variables are essentially all functions of the form {f : \{-1,1\}^k \rightarrow \{-1,1\}}, of which there are {2^{2^k}}, while there are only {2^k} basis elements {\{u_S\}_{S \subseteq [k]}} which act as the alphabet for the underlying Unique Games instance. This is what leads to the exponential relationship between the number of variables and the label size.

A PSD lifting question

In an effort to improve this dependence, one could start with a much larger set of nearly orthogonal vectors, and then somehow lift them to a higher-dimensional space where they would become orthogonal. In order for the value of the SDP not to blow up, it would be necessary that this map has some kind of Lipschitz property. We are thus led to the following (possibly naïve) question.

Let {C(d,\varepsilon)} be the smallest number such that the following holds. (Here, {S^{d-1} \subseteq \mathbb R^d} denotes the {(d-1)}-dimensional unit sphere and S(L^2) denotes the unit-sphere of L^2.)

There exists a map {F : S^{d-1} \rightarrow S(L^2)} such that {\|F\|_{\mathrm{Lip}} \leq C(d,\varepsilon)} and whenever {u,v \in \mathbb R^d} satisfy {|\langle u,v\rangle| \leq \varepsilon}, we have {\langle F(u), F(v)\rangle = 0}.

(Recall that \|F\|_{\mathrm{Lip}} = \sup_{x \neq y \in S^{d-1}} \|F(x)-F(y)\|/\|x-y\|.)

One can show that

\displaystyle C(d,\varepsilon) \lesssim \frac{\sqrt{d}}{1-\varepsilon}

by randomly partitioning {S^{d-1}} so that all vectors satisfying {|\langle u,v\rangle| \leq \varepsilon} end up in different sets of the partition, and then mapping all the points in a set to a different orthogonal vector.

My question is simply: Is a better dependence on {d} possible? Can one rule out that {C(d,\varepsilon)} could be independent of {d}? Note that any solution which randomly maps points to orthogonal vectors must incur such a blowup (this is essentially rounding the SDP to an integral solution).

Open question: Cover times and the Gaussian free field

Here are a few fascinating open questions coming from my work with Jian Ding and Yuval Peres on cover times of graphs and the Gaussian free field. (Also, here are my slides for the corresponding talk.)


1. Cover times and the Gaussian free field


Consider a finite, connected graph {G=(V,E)} and the simple random walk on {G} (which, at every step, moves from a vertex to a uniformly random neighbor). If we let {\tau_{\mathrm{cov}}} denote the first (random) time at which every vertex of {G} has been visited, and we use {\mathop{\mathbb E}_v} to denote expectation over the random walk started at {v \in V} , then the cover time of {G} is defined by

\displaystyle t_{\mathrm{cov}}(G) = \max_{v \in V} \mathop{\mathbb E}_v \tau_{\mathrm{cov}}.

On the other hand, consider the following Gaussian process {\{g_v\}_{v \in V}} , called the (discrete) Gaussian free field (GFF) on {G} . Such a process is specified uniquely by its covariance structure. Fix some {v_0 \in V} , and put {g_{v_0}=0} . Then the rest of the structure is specified by the relation

\displaystyle \mathop{\mathbb E}(g_u-g_v)^2 = R_{\mathrm{eff}}(u,v)

for all {u,v \in V} , where {R_{\mathrm{eff}}(u,v)} the effective resistance between {u} and {v} when {G} is thought of as an electrical network. Equivalently, the density of {\{g_v\}_{v \in V}} is proportional to

\displaystyle \exp\left(-\frac12 \sum_{u \sim v} (g_u-g_v)^2\right),

where the sum is over edges of G.
One of our main theorems can be stated as follows.

Theorem 1 (Ding-L-Peres) For every graph {G=(V,E)} , one has

\displaystyle t_{\mathrm{cov}}(G) \asymp |E| \left(\mathop{\mathbb E} \max_{v \in V} g_v\right)^2,

where {\{g_v\}_{v \in V}} denotes the Gaussian free field on {G} .


GFF on the 2D lattice; courtesy of Scott Sheffield.


Here {A \asymp B} is the assertion that {A} and {B} are within universal constant factors. In particular, we use this theorem to give a deterministic {O(1)} -approximation to the cover time, answering a question of Aldous and Fill, and to resolve the Winkler-Zuckerman blanket time conjectures.
This follows some partial progress on these questions by Kahn, Kim, Lovasz, and Vu (1999).


2. Derandomizing the cover time


The cover time is one of the few basic graph parameters which can easily be computed by a randomized polynomial-time algorithm, but for which we don’t know of a deterministic counterpart. More precisely, for every {\varepsilon > 0} , we can compute a {(1+\varepsilon)} -approximation to the cover time by simulating the random walk enough times and taking the median estimate, but even given the above results, the best we can do in deterministic polynomial-time is an {O(1)} -approximation.

We now describe one conjectural path to a better derandomization. Let {t_{\mathrm{hit}}(G) = \max_{u,v \in V} H(u,v)} denote the maximal hitting time in {G} , where {H(u,v)} is the expected number of steps needed to hit {v} from a random walk started at {u} . We prove the following more precise estimate.

Theorem 2 There is a constant {C > 0} such that for every graph {G=(V,E)} ,

\displaystyle t_{\mathrm{cov}}(G) \leq \left(1+C\sqrt{\frac{t_{\mathrm{hit}}(G)}{t_{\mathrm{cov}}(G)}}\right) |E| \left(\mathop{\mathbb E} \max_{v \in V} g_v\right)^2.



This prompts the following conjecture, which describes a potentially deeper connection between cover times and the GFF.

Conjecture 1 For a sequence of graphs {\{G_n=(V_n,E_n)\}} with {t_{\mathrm{hit}}(G_n) = o(t_{\mathrm{cov}}(G_n))} ,

\displaystyle t_{\mathrm{cov}}(G_n) \sim |E_n| \left(\mathop{\mathbb E} \max_{v\in V_n} g^{(n)}_v\right)^2,

where {\{g^{(n)}_v\}} denotes the GFF on {G_n} .

Here, we use {a_n \sim b_n} to denote {\lim a_n/b_n = 1} . The conjecture holds in some interesting cases, including the complete graph, the discrete {2D} torus, and complete {d} -ary trees. (See the full paper for details; except for the complete graph, these exact estimates are entire papers in themselves.)

Since the proof of Theorem 1 makes heavy use of the Fernique-Talagrand majorizing measures theory for estimating {\mathop{\mathbb E} \max_{v \in V} g_v} , and this theory is bound to lose a large multiplicative constant, it seems that new techniques will be needed in order to resolve the conjecture. In particular, using the isomorphism theory discussed below, it seems that understanding the structure of the near-maxima, i.e. those points {g_v} which achieve {g_v \geq (1-\varepsilon) \left(\mathop{\mathbb E} \max_{v \in V} g_v\right)} , will be an essential part of any study.

The second part of such a derandomization strategy is the ability to compute a deterministic {(1+\varepsilon)} -approximation to {\mathop{\mathbb E} \max_{v \in V} g_v} .

Question 1 For every {\varepsilon > 0} , is there a deterministic polynomial-time algorithm that, given a finite set of points {X \subseteq \mathbb R^n} , computes a {(1+\varepsilon)} -approximation to the value

\displaystyle \mathop{\mathbb E} \max_{x \in X} \langle g,x\rangle,

where {g} is a standard {n} -dimensional Gaussian? Is this possible if we know that {X} has the covariance structure of a Gaussian free field?


3. Understanding the Dynkin Isomorphism Theory


Besides majorizing measures, another major tool used in our work is the theory of isomorphisms between Markov processes and Gaussian processes. We now switch to considering the continuous-time random walk on a graph {G} . This makes the same transitions as the simple discrete-time walk, but now spends an exponential (with mean one) amount of time at every vertex. We define the local time at {v} at time {t} by

\displaystyle L_t^v = \frac{\textrm{amount of time spent at } v}{\mathrm{deg}(v)}

when we have run the random walk for time {t} .

Work of Ray and Knight in the 1960’s characterized the local times of Brownian motion, and then in 1980, Dynkin described a general connection between the local times of Markov processes and associated Gaussian processes. The version we use is due to Eisenbaum, Kaspi, Marcus, Rosen, and Shi (2000).

Theorem 3 Let {G=(V,E)} and fix some {v_0 \in V} , which is the origin of the random walk. Let {\ell > 0} be given, and define the (random) time {T=T(\ell)=\inf \{ t: L_t^{v_0} \geq \ell \}} . If {\{g_v\}_{v \in V}} is the GFF with {g_{v_0}=0} , then

\displaystyle \left\{ L_T^x + \frac12 g_x^2 : x \in V \right\} \stackrel{\mathrm{law}}{=} \left\{ \frac12 (g_x - \sqrt{2\ell})^2 : x \in V\right\}.



Note that on the left hand side, the local times {\{L_T^x\}} and the GFF {\{g_x\}} are independent. This remarkable theorem (and many others like it) are proved in the book of Marcus and Rosen. In the introduction, the authors describe the “wonderful, mysterious isomorphism theorems.” They continue,

Another confession we must make is that we do not really understand the actual relationship between local times… and their associated Gaussian processes. If one asks us, as is often the case during lectures, to give an intuitive description… we must answer that we cannot. We leave this extremely interesting question to you.

So I will now pass the question along. The proof of the isomorphism theorems proceeds by taking Laplace transforms and then doing some involved combinatorics. It’s analogous to the situation in enumerative combinatorics where we have a generating function proof of some equality, but not a bijective proof where you can really get your hands on what’s happening.

What is the simplest isomorphism-esque statement which has no intuitive proof? The following lemma is used in the proof of the original Ray-Knight theorem on Brownian motion (see Lemma 6.32 in the Morters-Peres book). It can be proved in a few lines using Laplace transforms, yet one suspects there should be an explicit coupling.

Lemma 4 For any {\ell > 0} , the following holds. Suppose that {X} is a standard normal, {Z_1, Z_2, \ldots} are i.i.d. standard exponentials, and {N} is Poisson with parameter {\ell^2/2} . If all these random variables are independent, then

\displaystyle (X+\ell)^2 \stackrel{\mathrm{law}}{=} X^2 + 2 \sum_{j=1}^N Z_j.



Amazing! The last open question is to explain this equality of distributions in a satisfactory manner, as a first step to understanding what’s really going on in the isomorphism theory.


The Gödel Prize, TSP, and volume growth

Recently, Sanjeev Arora and Joe Mitchell won the Gödel prize for their work on Euclidean TSP. They show that given n points in \mathbb R^2, and a parameter \varepsilon > 0, it is possible to compute, in polynomial time, a traveling salesman tour of the input whose length is at most a factor 1+\varepsilon more than the length of the optimum tour. (This is called a polynomial-time approximation scheme, or PTAS.)

Later, Arora extended this to work in \mathbb R^k for every fixed k \geq 2. What properties of \mathbb R^k are really needed to get such an algorithm?

Certainly a key property is that the volume of a ball of radius r in \mathbb R^k only grows like O(r^k). This ensures that one can choose an \epsilon-net of size at most O(1/\epsilon)^k in a ball of radius {O(1)}, which is essential for using dynamic programming. In my opinion, this leads to the most fascinating problem left open in this area:

Is bounded volume growth the only property needed to get a PTAS?

This would imply that the use of Euclidean geometry in Arora’s algorithm is non-essential. We can state the question formally as follows. Let (X,d) be a metric space, and let \lambda(X,d) be the smallest number \lambda such that for every r > 0,every ball of radius r in X can be covered by \lambda balls of radius r/2. Is there a PTAS for TSP in X? (In other words, the running time should be bounded by a polynomial in n =|X| whose degree depends only on \lambda(X,d).)

The problem is open, though there are some partial results by Talwar.

The need for non-linear mappings

In the last post, I recalled the problem of dimension reduction for finite subsets of L_1.  I thought I should mention the main obstacle to reducing the dimension below O(n) for n-point subsets:  It can’t be done with linear mappings.

All the general results mentioned in that post use a linear mapping.  In fact, they are all of the form:

  1. Change of density, i.e. preprocess the points/subspace so that no point has too much weight on any one coordinate.
  2. Choose a subset of the coordinates, possibly multiplying the chosen coordinates by non-negative weights.  (Note that the Newman-Rabinovich result, based on Batson, Spielman, and Srivastava, is deterministic, while in the other bounds, the sampling is random.)

(The dimension reduction here is non-linear, but only applies to special subsets of L_1, like the Brinkman-Charikar point set.)

The next theorem shows that linear dimension reduction mappings cannot do better than O(n) dimensions.

Theorem: For every 1 \leq p \leq \infty, there are arbitrarily large n-point subsets of L_p on which any linear embedding into L_2 incurs distortion at least \left(\frac{n-1}{2}\right)^{|1/p-1/2|}.

Since the identity map from \ell_1^n to \ell_2^n has distortion \sqrt{n}, this theorem immediately implies that there are n-point subsets on which any linear embedding requires \Omega(n) dimension for an {O(1)}-distortion embedding.  The p=1 case of the preceding theorem was proved by Charikar and Sahai.  A simpler proof, which extends to all p \geq 1 is given in Lemma 3.1 of a paper by myself, Mendel, and Naor.

Open problem: Dimension reduction in L_1

Since I’ve been interacting a lot with the theory group at MSR Redmond (see the UW-MSR Theory Center), I’ve been asked occasionally to propose problems in the geometry of finite metric spaces that might be amenable to probabilistic tools. Here’s a fundamental problem that’s wide open. Let {c_D(n)} be the smallest number such that every {n}-point subset of {L_1} embeds into {\ell_1^{c_D(n)}} with distortion at most {D}. Here’s what’s known.

  1. Talagrand (following work of Bourgain-Lindenstrauss-Milman and Schechtman) proved that for every {\varepsilon > 0}, every {n}-dimensional subspace of {L_1} admits a {(1+\varepsilon)}-distortion embedding into {\ell_1^{d}} with {d = O((n \log n)/\varepsilon^2)}. In particular, this gives

    \displaystyle c_{1+\varepsilon}(n) = O((n \log n)/\varepsilon^2).

  2. Brinkman and Charikar showed that {c_D(n) \geq \Omega(n^{1/D^2})} for {D \geq 1}. A significantly simpler proof was later given by Assaf Naor and myself. (With Brinkman and Karagiozova, we have also shown that this bound is tight for the Brinkman-Charikar examples and their generalizations.)
  3. Recently, Newman and Rabinovich showed that one can take {c_{1+\varepsilon}(n) = O(n/\varepsilon^2)} for any {\varepsilon > 0}. Their paper relies heavily on the beautiful spectral sparsification method of Batson, Spielman, and Srivastava. In fact, it is shown that one can use only {O(n/\varepsilon^2)} weighted cuts (see the paper for details). This also hints at a limitation of their technique, since it is easy to see that the metric on {\{1,2,\ldots,n\} \subseteq \mathbb R} requires {\Omega(n)} cuts for a constant distortion embedding (and obviously only one dimension).

The open problem is to get better bounds. For instance, we only know that

\displaystyle  \Omega(n^{1/100}) \leq c_{10}(n) \leq O(n).

There is evidence that n^{\Theta(1/D^2)} might be the right order of magnitude.  In the large distortion regime, when D = \Omega(\sqrt{\log n} \log \log n), results of Arora, myself, and Naor show that c_D(n) = O(\log n).

Open question: PSD flows

This post is about a beautiful twist on flows that arises when studying (the dual) of the Sparsest Cut SDP.  These objects, which I’m going to call “PSD flows,” are rather poorly understood, and there are some very accessible open problems surrounding them.  Let’s begin with the definition of a normal flow:

Let G=(V,E) be a finite, undirected graph, and for every pair u,v \in V, let \mathcal P_{uv} be the set of all paths between u and v in G.  Let \mathcal P = \bigcup_{u,v \in V} \mathcal P_{uv}.  A flow in G is simply a mapping F : \mathcal P \to \mathbb R_{\geq 0}.  We define, for every edge (u,v) \in E, the congestion on (u,v) as

\displaystyle C_F(u,v) = \sum_{p \in \mathcal P: (u,v) \in p} F(p)

which is the total amount of flow going through (u,v).  Finally, for every u,v \in V, we define

F\displaystyle \lbrack u,v\rbrack = \sum_{p \in \mathcal P_{uv}} F(p)

as the total amount of flow sent from u to v.

Now, in the standard (all-pairs) maximum concurrent flow problem, the goal is to find a flow F which simultaneously sends D units of flow from every vertex to every other, while not putting more than one unit of flow through any edge, i.e.

\displaystyle \mathsf{mcf}(G) = \textrm{maximize } \left\{ \vphantom{\bigoplus} D : \forall u,v, F[u,v] \geq D \textrm{ and } \forall (u,v) \in E, C_F(u,v) \leq 1.\right\}

In order to define a PSD flow, it helps to write this in a slightly different way.  If we define the symmetric matrix

A_{u,v} = F[u,v] - D + {\bf 1}_{\{(u,v) \in E\}} - C_F(u,v)

then we have

Claim 1: \mathsf{mcf}(G) = \max \{ D : A_{u,v} \geq 0 \}.

To see that this is true, we can take a matrix with A_{u,v} \geq 0 for all u,v \in V and fix it one entry at a time so that F[u,v] \geq D and C_F(u,v) \leq 1, without decreasing the total demand satisfied by the flow.

For instance, if (u,v) \in E and C_F(u,v) > 1+\varepsilon, then it must be that F[u,v] > D+\varepsilon, so we can reroute \varepsilon units of flow going through the edge (u,v) to go along one of the extraneous flow paths which gives the excess F[u,v] > D + \varepsilon.  Similar arguments hold for the other cases (Exercise!).

PSD flows

So those are normal flows.  To define a PSD flow, we define for any symmetric matrix A, the Laplacian of A, which has diagonal entries L(A)_{i,i} = \sum_{j \neq i} A_{i,j} and off-diagonal entries L(A)_{i,j} = - A_{i,j}.  It is easy to check that

\displaystyle \langle x, L(A)\, x \rangle = \sum_{i,j} A_{i,j} (x_i-x_j)^2.

Hence if A_{u,v} \geq 0 for all u,v \in V, then certainly L(A) \succeq 0 (i.e. L(A) is positive semi-definite).  The PSD flow problem is precisely

\displaystyle \max \{ D : L(A) \succeq 0 \}

where A is defined as above.  Of course, now we are allowing A to have negative entries, which makes this optimization trickier to understand.  We allow the flow to undersatisfy some demand, and to overcongest some edges, but now the “error” matrix has to induce a PSD Laplacian.

Scaling down the capacities

Now, consider some \delta \in [0,1], and write

\displaystyle A_{u,v}^{(\delta)} = F[u,v] - D + \delta \cdot {\bf 1}_{(u,v) \in E} - C_F(u,v).

Requiring A_{u,v}^{(\delta)} \geq 0 for every u,v \in V simply induces a standard flow problem where each edge now has capacity \delta.  In the case of normal flows, because we can decouple the demand/congestion constraints as in Claim 1, we can easily relate \max \{ D : A_{u,v}^{(\delta)} \geq 0\,\forall u,v \in V\} to \max \{ D : A_{u,v} \geq 0\,\forall u,v \in V\} (the first is exactly \delta times the second, because we can just scale a normal flow down by \delta and now it satisfies the reduced edge capacities).

Question: Can we relate \max \{ D : L(A^{(\delta)}) \succeq 0 \} and \max \{ D : L(A) \succeq 0 \}?  More specifically, do they differ by some multiplicative constant depending only on \delta?

This is a basic question whose answer is actually of fundamental importance in understanding the Sparsest Cut SDP.  I asked this question in its primal form almost 4 years ago (see question 3.2 here).

Note that the answer is affirmative if we can decouple the demand/congestion constraints in the case of PSD flows.  In other words, let X_{u,v} = F[u,v] - D and let Y_{u,v} = {\bf 1}_{(u,v \in E)} - C_F(u,v).

Question: Can we relate \max \{ D : L(A) \succeq 0 \} to \max \{ D : L(X) \succeq 0 \textrm{ and } L(Y) \succeq 0 \}?

In the next post, I’ll discuss consequences of this question for constructing integrality gaps for the Sparsest Cut SDP.