tcs math - some mathematics of theoretical computer science

May 8, 2008

Kernels of random sign matrices

Filed under: Math — jrluw @ 9:27 pm

In the last post, we began proving that a random N/2 \times N sign matrix A almost surely satisfies \Delta(\mathrm{ker}(A)) = O(1). In other words, almost surely for every x \in \mathrm{ker}(A), we have

\displaystyle \frac{\sqrt{N}}{O(1)} \|x\|_2 \leq \|x\|_1 \leq \sqrt{N} \|x\|_2.

The proof follows roughly the lines of the proof that a uniformly random N/2 \times N matrix B with \{0,1\} entries is almost surely the check matrix of a good linear code over \mathbb F_2. In other words, with high probability, there does not exist a non-zero vector x \in \mathbb F_2^N with hamming weight o(N), and which satisfies Bx=0 (this equation is considered over \mathbb F_2). The proof of this is simple: Let B_1, B_2, \ldots, B_{N/2} be the rows of B. If x \neq 0, then \Pr[\langle B_i, x \rangle = 0] \leq \frac12. Therefore, for any non-zero x, we have

\Pr[Bx = 0] \leq 2^{-N/2}. (**)

On the other hand, for \delta > 0 small enough, there are fewer than 2^{N/2} non-zero vectors of hamming weight at most \delta N, so a union bound finishes the argument.

The proof that \Delta(\mathrm{ker}(A)) = O(1) proceeds along similar lines, except that we can no longer take a naive union bound since there are now infinitely many “bad” vectors. Of course the solution is to take a sufficiently fine discretization of the set of bad vectors. This proceeds in three steps:

  1. If x is far from \mathrm{ker}(A), then any x' with \|x-x'\|_2 small is also far from \mathrm{ker}(A).
  2. Any fixed vector x \in \mathbb R^N with \|x\|_2 = 1 is far from \mathrm{ker}(A) with very high probability. This is the analog of (**) in the coding setting.
  3. There exists a small set of vectors that well-approximates the set of bad vectors.

Combining (1), (2), and (3) we will conclude that every bad vector is far from the kernel of A (and, in particular, not contained in the kernel).

To verify (1), we need to show that almost surely the operator norm \|A\| = \max \left\{ \|Ax\|_2 : \|x\|_2 = 1\right\} is small, because if we define

\mathrm{dist}(x, \mathrm{ker}(A)) = \min_{y \in \mathrm{ker}(A)} \|x-y\|_2,

then

\mathrm{dist}(x', \mathrm{ker}(A)) \geq \mathrm{dist}(x,\mathrm{ker}(A)) - \|A\| \cdot \|x-x'\|_2

From the last post, we have the following two statements.

Lemma 1: If v \in \mathbb R^k and X \in \{-1,1\}^k is chosen uniformly at random, then

\displaystyle \Pr\left[|\langle v, X \rangle| \geq t\right] \leq 2 \exp\left(\frac{-t^2}{2 \|v\|_2^2}\right)

The next theorem (see the proof from the last post) verifies (1) above.

Theorem 1: If A is a random N/2 \times N sign matrix, then

\Pr[\|A\| \geq 10 \sqrt{N}] \leq 2 e^{-6N}.

Now we turn to proving (2), which is the analog of (**). We need to upper bound the small ball probability, i.e. the probability that Av falls into a small ball (around 0) for any fixed v \in \mathbb R^N. To start, we would like to say that for some fixed v\in \mathbb R^N with \|v\|_2 = 1, and a uniformly random sign vector X \in \{-1,1\}^N, we have for example,

\Pr[ |\sum_{i=1}^N  X_i v_i| > 0.01] \geq 0.01

These kinds of estimates fall under the name of Littlewood-Offord type problems; see the work of Tao and Vu and of Rudelson and Vershynin for a detailed study of such random sums, and the beautiful connections with additive combinatorics. For the above estimate, we will need only the simple Paley-Zygmund inequality:

Lemma 2: If Z is a non-negative random variable and 0 < t < 1, then

\displaystyle \Pr[Z \geq t (\mathbb EZ)] \geq (1-t)^2 \frac{(\mathbb EZ)^2}{\mathbb E(Z^2)}

Proof: By Cauchy-Schwarz,

\mathbb EZ = \mathbb E[Z\mathbf{1}_{\{Z < t (\mathbb EZ)\}}] + \mathbb E[Z\mathbf{1}_{\{Z \geq t(\mathbb EZ)\}}] \leq t \mathbb E[Z] + \sqrt{\mathbb E[Z^2]} \sqrt{\mathbb E[\mathbf{1}_{\{Z \geq t (\mathbb EZ)\}}]}

Now observe that \mathbb E[\mathbf{1}_{\{Z \geq t (\mathbb EZ)\}}] = \Pr[Z \geq t(\mathbb EZ)] and rearrange.

We can use this to prove our estimate on random sums.

Lemma 3: For any v \in \mathbb R^N, if X \in \{-1,1\}^N is chosen uniformly at random, then for every 0 < t < 1,

\displaystyle \Pr\left[|\langle X,v\rangle|^2 \geq t \|v\|_2^2 \right] \geq \frac{(1-t)^2}{12}.

Proof: Apply Lemma 2 with Z = |\langle X,v\rangle|^2. Note that by independence, we have \mathbb E[Z] = \|v\|_2^2. Furthermore, for any non-negative random variable Y and 0 < p < \infty, we have \mathbb E[Y^p] = p \int_{0}^{\infty} \lambda^{p-1} \Pr(Y > \lambda)\,d\lambda, thus

\displaystyle \mathbb E[Z^2] = \mathbb E[|\langle X,v\rangle|^4] = 4 \int_{0}^{\infty} \lambda^3 \Pr(|\langle X,v\rangle| > \lambda)\,d\lambda \leq 4\int_0^{\infty} \lambda^3 e^{-\lambda^2/(2\|v\|_2^2)}\,d\lambda,

where the final inequality follows from Lemma 1. It is easy to see that the final bound is at most 12 \cdot \|v\|_2^4 = 12 (\mathbb EZ)^2.

(more…)

May 4, 2008

The pseudorandom subspace problem

Filed under: Math — jrluw @ 7:31 pm

In this post, I will talk about the existence and construction of subspaces of \ell_1^N which are “almost Euclidean.” In the next few, I’ll discuss the relationship between such subspaces, compressed sensing, and error-correcting codes over the reals.

A good starting point is Dvoretzky’s theorem:

For every N \in \mathbb N and \varepsilon > 0, the following holds. Let |\cdot| be the Euclidean norm on \mathbb R^N, and let \|\cdot\| be an arbitrary norm. Then there exists a subspace X \subseteq \mathbb R^N with \mathrm{dim}(X) \geq c(\varepsilon) \log N, and a number A > 0 so that for every x\in X,

\displaystyle A|x| \leq \|x\| \leq (1+\varepsilon) A |x|.

Here, c(\varepsilon) > 0 is a constant that depends only on \varepsilon.

The theorem as stated is due to Vitali Milman, who gave the (optimal) dependence of \log N on the dimension of X, and proved that the above fact actually holds with high probability when X is chosen uniformly at random from all subspaces of this dimension. In other words, for d \approx \log N, a random d-dimensional subspace of an arbitrary N-dimensional normed space is almost Euclidean. Milman’s proof was one of the starting points for the “local theory” of Banach spaces, which views Banach spaces through the lens of sequences of finite-dimensional subspaces whose dimension goes to infinity. It’s a very TCS-like philosophy; see the book of Milman and Schechtman.

One can associate to any norm its unit ball B = \left\{ x \in \mathbb R^N : \|x\| \leq 1 \right\}, which will be a centrally symmetric convex body. Thus we can restate Milman’s proof of Dvoretzky’s theorem geometrically: If B is an arbitrary convex body, and H is a random c(\varepsilon) \log N-dimensional hyperplane through the origin, then with high probability, B \cap H will almost be a Euclidean ball. This is quite surprising if one starts, for instance, with a polytope like the unit cube or the cross polytope. The intersection B \cap H is again a polytope, but which is approximated closely by a smooth ball. Here’s a pictorial representation lifted from a talk I once gave:

(Note: Strictly speaking, the intersection will only be an ellipsoid, and one might have to pass to a subsection to actually get a Euclidean sphere.)

Euclidean subspaces of \ell_1^N and Error-Correcting Codes.

It is not too difficult to see that the dependence \mathrm{dim}(X) \approx \log N is tight when we consider \mathbb R^N equipped with the \ell_\infty norm (see e.g. Claim 3.10 in these notes), but rather remarkably, results of Kasin and Figiel, Lindenstrauss, and Milman show that if we consider the \ell_1 norm, we can find Euclidean subspaces of proportional dimension, i.e. \mathrm{dim}(X) \approx N.

For any x \in \mathbb R^N, Cauchy-Schwarz gives us

\|x\|_2 \leq \|x\|_1 \leq \sqrt{N} \|x\|_2.

To this end, for a subspace X \subseteq \mathbb R^N, let’s define

\displaystyle \Delta(X) = \max_{0 \neq x \in \mathbb R^N} \frac{\sqrt{N} \|x\|_2}{\|x\|_1},

which measures how well-spread the \ell_2-mass is among the coordinates of vectors x \in X. For instance, if \Delta(X) = O(1), then every non-zero x \in \mathbb X has at least \Omega(N) non-zero coordinates (because \|x\|_1 \leq \sqrt{|\mathrm{supp}(x)|} \|x\|_2). This has an obvious analogy with the setting of linear error-correcting codes, where one would like the same property from the kernel of the check matrix. Over the next few posts, I’ll show how such subspaces actually give rise to error-correcting codes over \mathbb R that always have efficient decoding algorithms.

The Kasin and Figiel-Lindenstrauss-Milman results actually describe two different regimes. Kasin shows that for every \eta > 0, there exists a subspace X \subseteq \mathbb R^N with \dim(X) \geq (1-\eta) N, and \Delta(X) = O_{\eta}(1). The FLM result shows that for every \varepsilon > 0, one can get \Delta(X) \leq 1 + \varepsilon and \dim(X) \geq \Omega_{\varepsilon}(N). In coding terminology, the former approach maximizes the rate of the code, while the latter maximizes the minimum distance. In this post, we will be more interested in Kasin’s “nearly full dimensional” setting, since this has the most relevance to sensing and coding. Work of Indyk (see also this) shows that the “nearly isometric” setting is useful for applications in high-dimensional nearest-neighbor search.

The search for explicit constructions

Both approaches show that the bounds hold for a random subspace of the proper dimension (where “random” can mean different things;we’ll see one example below). In light of this, many authors have asked whether there are explicit constructions of such subspaces exist, see e.g. Johnson and Schechtman (Section 2.2), Milman (Problem 8), and Szarek (Section 4). As I’ll discuss in future posts, this would also yield explicit constructions of codes over the reals, and of compressed sensing matrices.

Depending on the circles you run in, “explicit” can mean different things. Let’s fix a TCS-style definition: An explicit construction means a deterministic algorithm which, give N as input, outputs a description of a subspace X (e.g. in the form of a set of basis vectors), and runs in time \mathrm{poly}(N). Fortunately, the constructions I’ll discuss later will satisfy just about everyone’s sense of explicitness.

In light of the difficulty of obtaining explicit constructions, people have started looking at weaker results and partial derandomizations. One starting point is Kasin’s method of proof: He shows, for instance, that if one chooses a uniformly random N/2 \times N sign matrix A (i.e. one whose entries are chosen independently and uniformly from \{-1,1\}), then with high probability, one has \Delta(\mathrm{ker}(A)) = O(1). Of course this bears a strong resemblance to random parity check codes. Clearly we also get \dim(\mathrm{ker}(A)) \geq N/2.

In work of Guruswami, myself, and Razborov, we show that there exist explicit N/2 \times N sign matrices A for which \Delta(\mathrm{ker}(A)) \leq (\log N)^{O(\log \log \log N)} (recall that the trivial bound is \Delta(\cdot) \leq \sqrt{N}). Our approach is inspired by the construction and analysis of expander codes. Preceding our work, Artstein-Avidan and Milman pursued a different direction, by asking whether one can reduce the dependence on the number of random bits from the trivial O(N^2) bound (and still achieve \Delta(\mathrm{ker}(A)) = O(1)). Using random walks on expander graphs, they showed that one only needs O(N \log N) random bits to construct such a subspace. Lovett and Sodin later reduced this dependency to O(N). (Indyk’s approach based on Nisan’s pseudorandom generator can also be used to get O(N \log^2 N).) In upcoming work with Guruswami and Wigderson, we show that the dependence can be reduced to O(N^{\delta}) for any \delta > 0.

Kernels of random sign matrices

It makes sense to end this discussion with an analysis of the random case. We will prove that if A is a random N/2 \times N sign matrix (assume for simplicity that N is even), then with high probability \Delta(\mathrm{ker}(A)) = O(1). It might help first to recall why a uniformly random N/2 \times N matrix B with \{0,1\} entries is almost surely the check matrix of a good linear code.

Let \mathbb F_2 = \{0,1\} be the finite field of order 2. We would like to show that, with high probability, there does not exist a non-zero vector x \in \mathbb F_2^N with hamming weight o(N), and which satisfies Bx=0 (this equation is considered over \mathbb F_2). The proof is simple: Let B_1, B_2, \ldots, B_{N/2} be the rows of B. If x \neq 0, then \Pr[\langle B_i, x \rangle = 0] \leq \frac12. Therefore, for any non-zero x, we have

\Pr[Bx = 0] \leq 2^{-N/2}. (**)

On the other hand, for \delta > 0 small enough, there are fewer than 2^{N/2} non-zero vectors of hamming weight at most \delta N, so a union bound finishes the argument.

The proof that \Delta(\mathrm{ker}(A)) = O(1) proceeds along similar lines, except that we can no longer take a naive union bound since there are now infinitely many “bad” vectors. Of course the solution is to take a sufficiently fine discretization of the set of bad vectors. This proceeds in three steps:

  1. If x is far from \mathrm{ker}(A), then any x' with \|x-x'\|_2 small is also far from \mathrm{ker}(A).
  2. Any fixed vector x \in \mathbb R^N with \|x\|_2 = 1 is far from \mathrm{ker}(A) with very high probability. This is the analog of (**) in the coding setting.
  3. There exists a small set of vectors that well-approximates the set of bad vectors.

Combining (1), (2), and (3) we will conclude that every bad vector is far from the kernel of A (and, in particular, not contained in the kernel).

To verify (1), we need to show that almost surely the operator norm \|A\| = \max \left\{ \|Ax\|_2 : \|x\|_2 = 1\right\} is small, because if we define

\mathrm{dist}(x, \mathrm{ker}(A)) = \min_{y \in \mathrm{ker}(A)} \|x-y\|_2,

then

\mathrm{dist}(x', \mathrm{ker}(A)) \geq \mathrm{dist}(x,\mathrm{ker}(A)) - \|A\| \cdot \|x-x'\|_2

In order to proceed, we first need a tail bound on random Bernoulli sums, which follows immediately from Hoeffding’s inequality.

Lemma 1: If v \in \mathbb R^k and X \in \{-1,1\}^k is chosen uniformly at random, then

\displaystyle \Pr\left[|\langle v, X \rangle| \geq t\right] \leq 2 \exp\left(\frac{-t^2}{2 \|v\|_2^2}\right)

In particular, for y \in \mathbb R^{N/2} and x \in \mathbb R^N, we have \langle y, Ax \rangle = \sum_{i=1}^{N/2} \sum_{j=1}^N A_{ij} y_i x_j. We conclude that if we take \|x\|_2 = 1 and \|y\|_2 = 1, then

\Pr\left[ |\langle y, Ax \rangle| \geq t \sqrt{2N}\right] \leq 2 \exp\left(-t^2 N\right), (4)

observing that \sum_{i,j} y_i^2 x_j^2 = 1, and applying Lemma 1. Now we can prove that \|A\| is usually not too big.

Theorem 1: If A is a random N/2 \times N sign matrix, then

\Pr[\|A\| \geq 10 \sqrt{N}] \leq 2 e^{-6N}.

Proof:

For convenience, let m = N/2, and let \Gamma_m and \Gamma_N be (1/3)-nets on the unit spheres of \mathbb R^m and \mathbb R^N, respectively. In other words, for every x \in \mathbb R^m with \|x\|_2 = 1, there should exist a point x' \in \Gamma_m for which \|x-x'\|_2 \leq 1/3, and similarly for \Gamma_N. It is well-known that one can choose such nets with |\Gamma_m| \leq 7^m and |\Gamma_N| \leq 7^N (see, e.g. the book of Milman and Schechtman mentioned earlier).

So applying (4) and a union bound, we have

\displaystyle \Pr\left[\exists y \in \Gamma_m, x \in \Gamma_N\,\, |\langle y, Ax \rangle| \geq 3\sqrt{2N}\right] \leq 2 |\Gamma_m| |\Gamma_N| \exp(-9N) \leq 2 \exp(-6N).

So let’s assume that no such x \in \Gamma_N and y \in \Gamma_m exist. Let u \in \mathbb R^N and v \in \mathbb R^m be arbitrary vectors with \|u\|_2=1,\|v\|_2=1, and write u = \sum_{i \geq 0} \alpha_i x_i and v = \sum_{i \geq 0} \beta_i y_i, where \{x_i\} \subseteq \Gamma_N and \{y_i\} \subseteq \Gamma_m, where |\alpha_i|, |\beta_i| \leq 3^{-i} (by choosing successive approximations). Then we have,

|\langle v, A u\rangle| \leq \sum_{i,j \geq 0} 3^{-i-j} |\langle y_i, A x_i\rangle| \leq (3/2)^2 3\sqrt{2N} \leq 10 \sqrt{N}.

We conclude by nothing that \|A\| = \max \left\{ |\langle u, Av \rangle| : \|u\|_2 = 1, \|v\|_2=1 \right\}.

This concludes the verification of (1). In the next post, we’ll see how (2) and (3) are proved.

April 13, 2008

Planar multi-flows, L_1 embeddings, and differentiation

Filed under: Math — Tags: , , , — jrluw @ 9:18 am

In this post, I’ll discuss the relationship between multi-flows and sparse cuts in graphs, bi-lipschitz embeddings into L_1, and the weak differentiation of L_1-valued mappings. It revolves around one of my favorite open questions in this area, the planar multi-flow conjecture.

Table of contents:

  1. The max-flow/min-cut theorem
  2. Multi-commodity flows and sparse cuts
  3. The planar multi-flow conjecture
  4. Bi-Lipschitz embeddings into L_1
  5. The planar embedding conjecture
  6. Parallel geodesics and the slums of geometry
  7. Local rigidity and coarse differentiation
  8. Beyond planar graphs

The max-flow/min-cut theorem

Let G=(V,E) be a finite, undirected graph, with a mapping \mathsf{cap} : E \to \mathbb R_+ assigning a capacity to every edge. If \mathcal P_{s,t} is the set of all paths from s to t, then an s\!-\!t flow is a mapping F : \mathcal P_{s,t} \to \mathbb R_+ which doesn’t overload the edges beyond their capacities: For every edge e \in E,

\displaystyle \sum_{\gamma \in \mathcal P_{s,t} : e \in \gamma} F(\gamma) \leq \mathsf{cap}(e).

The value of the flow F is the total amount of flow sent: \mathsf{val}(F) = \sum_{\gamma \in \mathcal P_{s,t}} F(\gamma). A cut in G is a partition V = S \cup \bar S which we will usually write as (S, \bar S). Naturally, one defines the capacity across (S, \bar S) by \mathsf{cap}(S,\bar S) = \sum_{xy \in E} \mathsf{cap}(x,y) |\mathbf{1}_S(x)-\mathbf{1}_S(y)|, where \mathbf{1}_S is the characteristic function of S. Since one can only send as much flow across a cut as there is capacity to support it, it is easy to see that for any valid flow F and any cut (S, \bar S) with s \in S and t \in \bar S, we have \mathsf{val}(F) \leq \mathsf{cap}(S, \bar S). The classical max-flow/min-cut theorem says that in fact these upper bounds are achieved:

\max_{F} \mathsf{val}(F)=\min_{(S, \bar S)} \mathsf{cap}(S, \bar S)

where the maximum is over all s\!-\! t flows, and the minimum is over all cuts (S, \bar S) with s \in S and t \in \bar S.

Multi-commodity flows and sparse cuts

We will presently be interested in multi-commodity flows (multi-flows for short), where we are given a demand function \mathsf{dem} : V \times V \to \mathbb R_+ which requests that we send \mathsf{dem}(x,y) units of flow from x to y for every x,y \in V. In this case, we’ll write the value of a valid flow (i.e. one which doesn’t try to send more total flow than an edge can carry) as

\displaystyle \mathsf{val}(F) = \min_{x,y \in V} \frac{\sum_{\gamma \in \mathcal P_{x,y}} F(\gamma)}{\mathsf{dem}(x,y)}

where the minimum is over pairs with \mathsf{dem}(x,y) \neq 0. So if the value of the flow is \frac12, it means that every pair gets at least half the flow it requested (well, demanded).

Again, cuts give a natural obstruction to flows. If we define \Phi(S, \bar S) = \displaystyle \frac{\mathsf{cap}(S, \bar S)}{\mathsf{dem}(S, \bar S)}, where \mathsf{dem}(S, \bar S) = \sum_{x,y \in V} \mathsf{dem}(x,y) |\mathbf{1}_S(x)-\mathbf{1}_S(y)| is the total demand requested across (S, \bar S), then \mathsf{val}(F) \leq \Phi(S, \bar S) for any valid flow F. Unfortunately, it is no longer true in general that \max_F \mathsf{val}(F) = \min_{(S, \bar S)} \Phi(S). (It is true as long as the demand is supported on a set of size at most 4, while it stops being true in general when the demand is supported on a set of size 5 or larger.)

In fact, the gap between the two can be arbitrarily large, as witnessed by expander graphs (this rather fruitful connection will be discussed in a future post). For now, we’ll concentrate on a setting where the gap is conjectured to be at most O(1).

The planar multi-flow conjecture

It has been conjectured that there exists a universal constant C \geq 1 such that if G=(V,E) is a planar graph (i.e. can be drawn in the plane without edge crossings), there is a C-approximate multi-commodity max-flow/min-cut theorem: For any choice of \mathsf{dem} and \mathsf{cap}, one has a

\displaystyle \frac{1}{C} \min_{(S, \bar S)} \Phi(S, \bar S) \leq \max_F \mathsf{val}(F) \leq \min_{(S, \bar S)} \Phi(S, \bar S) (1)

The conjecture first appeared in print here, but was tossed around since the publications of Linial, London, and Rabinovich and Aumann and Rabani, which recast these multi-flow/cut gaps as questions about bi-lipschitz mappings into L_1 (discussed next). Perhaps the most compelling reason to to believe the conjecture is the beautiful result of Klein, Plotkin, and Rao which shows that C = O(1) for any planar instance where \mathsf{dem}(x,y) = 1\,\,\forall x,y \in V (this is called a uniform multi-flow instance).

It is relatively easy to see that we cannot take C=1, as the following example of Okamura and Seymour shows.

Okamura-Seymour graph

In the example, the non-zero demands are \mathsf{dem}(s_i,t_i) = 1 for i \in \{1,2,3,4\}. It is easy to check that every cut has \Phi(S,\bar S) \geq 1, but the value of the maximum flow is only 3/4, implying that C \geq 4/3. If we use the same pattern on the complete bipartite graph K_{2,n} (instead of K_{2,3}), taking n \to \infty shows that we must have C \geq 3/2 in (1). (Later a differentiation argument applied to a different family of graphs will show that we need C \geq 2 in (1).)

For any graph G, define \mathsf{gap}(G) as the smallest value C such that (1) holds for every choice of capacities and demands on G. We will now see how \mathsf{gap}(G) is determined by the geometric properties of path metrics defined on G.

Bi-Lipschitz embeddings into L_1

Consider a metric space (X,d), and the space of absolutely integrable functions L_1 = L_1([0,1]), equipped with the \|\cdot\|_1 norm. A mapping f : X \to L_1 is said to be C-bi-lipschitz if there exist constants A,B > 0 such that for all x,y \in X,

\frac{1}{B} d(x,y) \leq \|f(x)-f(y)\|_1 \leq A\,d(x,y)

and C = A \cdot B (so if C=1, then f is an isometry up to scaling). The infimal value of C for which f is C-bi-lipschitz is called the distortion of f. Define c_1(X) as least distortion with which (X,d) maps into L_1.

We will now relate \mathsf{gap}(G) to the L_1-distortion of the geometries that G supports. Any set of non-negative weights w : E \to \mathbb R_+ on the edges of G gives rise to a distance \mathrm{dist}_w on V defined by taking shortest-paths. Define c_1(G) to be the maximum of c_1(V, \mathrm{dist}_w) as w ranges over all such weightings. (Strictly speaking, \mathrm{dist}_w is only a pseudometric.)

Here is the beautiful connection referred to previously (see this for a complete proof).

Theorem (Linial-London-Rabinovich, Aumann-Rabani): For any graph G, \mathsf{gap}(G) = c_1(G).

Metric obstructions. To give an idea of why the theorem is true, we mention two facts. It is rather obvious how a cut obstructs a flow. A more general type of obstruction is given by a metric \mathrm{dist} on V. For any valid flow F,

\displaystyle \mathsf{val}(F) \leq \frac{\sum_{xy \in E} \mathsf{cap}(x,y)\,\mathrm{dist}(x,y)}{\sum_{x,y \in V} \mathsf{dem}(x,y)\,\mathrm{dist}(x,y)} (2)

To see why this holds, think of giving every edge (x,y) \in E a “length” of \mathrm{dist}(x,y), and having a cross-sectional area of \mathsf{cap}(x,y). Then the numerator in (2) is precisely the “volume” of the network, while the denominator is the total “volume” required to satisfy all the demands: By the triangle inequality, sending one unit of flow from x to y requires volume at least \mathrm{dist}(x,y). Observe that the bound (2) is a generalization of the cut upper bound when we define the pseudometric \mathrm{dist}(x,y) = |\mathbf{1}_S(x)-\mathbf{1}_S(y)| for some S \subseteq V (this is called a cut pseudometric).

It turns out (by linear programming duality) that in fact metrics are the correct dual objects to flows, and maximizing the left hand side of (2) over valid flows, and minimizing the right-hand side over metrics yields equality (it is also easy to see that the minimal metric is a shortest-path metric). One might ask why one continues to study cuts if they are the “wrong” dual objects. I hope to address this extensively in future posts, but the basic idea is to flip the correspondence around: minimizing \Phi(S,\bar S) is NP-hard, while maximizing \mathsf{val}(F) can be done by linear programming. Thus we are trying to see how close we can get to the very complex object \Phi, by something which is efficiently computable.

The cut decomposition of L_1. Now that we see why metrics come into the picture, let’s see how L_1 presents itself. Define a cut measure on a finite set X as a mapping \mu : 2^X \to \mathbb R_+ which satisfies \mu(S) = \mu(\bar S) for every S \subseteq X.

Fact: Given F : X \to L_1, there exists a cut measure \mu on X such that \|F(x)-F(y)\|_1 = \sum_S \mu(S) |\mathbf{1}_S(x)-\mathbf{1}_S(y)| for every x,y \in X. Conversely, for every cut measure \mu, there exists a mapping F : X \to L_1 for which the same equality holds.

Thus every L_1-distance on a finite set is precisely a weighted sum of cut pseudometrics (and, of course, cuts are where this story began). Proving the fact is straightforward; first check that it holds for F : X \to \mathbb R, and then integrate. As an example, consider these two isometric embeddings, presented via their cut measures (the graphs are unit-weighted):

4-cycle 5-cycle embedding

In the 4-cycle, each cut has unit weight. The corresponding embedding into (\mathbb R^2, \|\cdot\|_1) maps the four vertices to \{(0,0),(0,1),(1,0),(1,1)\}. In the second example, the dashed cuts have weight 1/2, and the dotted cut has weight 1. (Exercise: Verify that every tree embeds into L_1 isometrically.)

Two comments are in order; first, when X is finite, one can equivalently take the target space to be \mathbb R^{|X| \choose 2} equipped with the \|\cdot\|_1 norm (exercise: prove this using Caratheodory’s theorem). Thus one might ask why we would introduce a function space L_1 in the first place. One reason is that the dimension is irrelevant; a deeper reason is that when X is infinite (in which case an appropriately stated version of the fact holds), L_1 is the proper setting (and not, e.g. \ell_1). This arises, for instance, in arguments involving ultralimits in the passage between the finite and infinite settings.

The planar embedding conjecture

Using the connection with L_1-embeddings, we can now state the planar multi-flow conjecture in its dual formulation.

Planar embedding conjecture: There exists a constant C \geq 1 such that every shortest-path metric on a planar graph G=(V,E) admits a C-bi-lipschitz embedding into L_1.

By a relatively simple approximation argument in one direction, and a compactness argument in the other, one has the following equivalent conjecture.

Riemannian version: There exists a constant C \geq 1 such that every Riemannian metric on the 2-sphere admits a C-bi-lipschitz embedding into L_1.

Recently, Indyk and Sidiropoulos proved that if it’s true for the 2-sphere, then it’s true for every compact surface of genus g \geq 1, where the constant C depends on g (as it must).

We can now state some known results in the dual setting of embeddings. Let G=(V,E) be a planar graph, and let \mathrm{dist} be a shortest-path metric on V. The previously mentioned theorem of Klein, Plotkin, and Rao can be stated as follows: There exists a non-expansive mapping f : V \to \mathbb R such that

O(1) \sum_{x,y \in V} |f(x)-f(y)| \geq \sum_{x,y \in V} \mathrm{dist}(x,y).

In Gromov’s language, the observable diameter of planar graph metrics is almost as large as possible (i.e. no forced concentration of Lispchitz mappings).

A classical theorem of Okamura and Seymour is stated in the embedding setting as: Let F \subseteq V be a face in some drawing of G in the plane. Then there exists a non-expansive mapping f : V \to L_1 such that f|_F is an isometry. In the flow setting, this says that if the demand function is supported on the pairs belonging to a single face, then (1) holds with C=1.

Finally, we mention the bound of Rao which shows that c_1(V,d) = O(\sqrt{\log |V|}) for planar metrics. A slightly more general version says that whenever the demand is supported on at most k pairs, the flow/cut gap can be at most O(\sqrt{\log k}).

Read about the relationship with coarse differentiation after the jump.

(more…)

First post

Filed under: First post — Tags: — jrluw @ 8:05 am

At the moment, this blog is merely an experiment in mathematical exposition. The focus is on mathematics that arises in theoretical computer science. The idea is to tell mathematicians about what goes on in TCS, as well as to introduce relevant mathematical techniques to theoretical computer scientists at large.  Feedback is welcome.

The first three posts of this trial will concern:

  1. Planar multi-flows, L_1 embeddings, and differentiation
  2. The explicit subspace problem, compressed sensing, and error-correction over the reals
  3. Geometry of the Laplacian on graphs and spectral data analysis

Blog at WordPress.com.