Entropy optimality: Quantum lifts of polytopes

In these previous posts, we explored whether the cut polytope can be expressed as the linear projection of a polytope with a small number of facets (i.e., whether it has a small linear programming extended formulation).

For many cut problems, semi-definite programs (SDPs) are able to achieve better approximation ratios than LPs. The most famous example is the Goemans-Williamson {0.878} -approximation for MAX-CUT. The techniques of the previous posts (see the full paper for details) are able to show that no polynomial-size LP can achieve better than factor {1/2} .

1.1. Spectrahedral lifts

The feasible regions of LPs are polyhedra. Up to linear isomorphism, every polyhedron {P} can be represented as {P = \mathbb R_+^n \cap V} where {\mathbb R_+^n} is the positive orthant and {V \subseteq \mathbb R^n} is an affine subspace.

In this context, it makes sense to study any cones that can be optimized over efficiently. A prominent example is the positive semi-definite cone. Let us define {\mathcal S_+^n \subseteq \mathbb R^{n^2}} as the set of {n \times n} real, symmetric matrices with non-negative eigenvalues. A spectrahedron is the intersection {\mathcal S_+^n \cap V} with an affine subspace {V} . The value {n} is referred to as the dimension of the spectrahedron.

In analogy with the {\gamma} parameter we defined for polyhedral lifts, let us define {\bar \gamma_{\mathsf{sdp}}(P)} for a polytope {P} to be the minimal dimension of a spectrahedron that linearly projects to {P} . It is an exercise to show that {\bar \gamma_{\mathsf{sdp}}(P) \leq \bar \gamma(P)} for every polytope {P} . In other words, spectahedral lifts are at least as powerful as polyhedral lifts in this model.

In fact, they are strictly more powerful. Certainly there are many examples of this in the setting of approximation (like the Goemans-Williamson SDP mentioned earlier), but there are also recent gaps between {\bar \gamma} and {\bar \gamma_{\mathrm{sdp}}} for polytopes; see the work of Fawzi, Saunderson, and Parrilo.

Nevertheless, we are recently capable of proving strong lower bounds on the dimension of such lifts. Let us consider the cut polytope {\mathrm{CUT}_n} as in previous posts.

Theorem 1 (L-Raghavendra-Steurer 2015) There is a constant {c > 0} such that for every {n \geq 1} , one has {\bar \gamma_{\mathsf{sdp}}(\mathrm{CUT}_n) \geq e^{c n^{2/13}}} .

Our goal in this post and the next is to explain the proof of this theorem and how quantum entropy maximization plays a key role.

1.2. PSD rank and factorizations

Just as in the setting of polyhedra, there is a notion of “factorization through a cone” that characterizes the parameter {\bar \gamma_{\mathsf{sdp}}(P)} . Let {M \in \mathbb R^{m \times n}_+} be a non-negative matrix. One defines the psd rank of {M} as the quantity

\displaystyle  \mathrm{rank}_{\mathsf{psd}}(M) = \min \left\{ r : M_{ij} = \mathrm{Tr}(A_i B_j) \textrm{ for some } A_1, \ldots, A_m, B_1, \ldots, B_n \in \mathcal S_+^r\right\}\,.

The following theorem was independently proved by Fiorini, Massar, Pokutta, Tiwary, and de Wolf and Gouveia, Parrilo, and Thomas. The proof is a direct analog of Yannakakis’ proof for non-negative rank.

Theorem 2 For every polytope {P} , it holds that {\bar \gamma_{\mathsf{sdp}}(P) = \mathrm{rank}_{\mathsf{psd}}(M)} for any slack matrix {M} of {P} .

Recall the class {\mathrm{QML}_n^+} of non-negative quadratic multi-linear functions that are positive on {\{0,1\}^n} and the matrix {\mathcal M_n : \mathrm{QML}_n^+ \times \{0,1\}^n \rightarrow \mathbb R_+} given by

\displaystyle  \mathcal M_n(f,x) = f(x)\,.

We saw previously that {\mathcal M_n} is a submatrix of some slack matrix of {\mathrm{CUT}_n} . Thus our goal is to prove a lower bound on {\mathrm{rank}_{\mathsf{psd}}(\mathcal M_n)} .

1.3. Sum-of-squares certificates

Just as in the setting of non-negative matrix factorization, we can think of a low psd rank factorization of {\mathcal M_n} as a small set of “axioms” that can prove the non-negativity of every function in {\mathrm{QML}_n^+} . But now our proof system is considerably more powerful.

For a subspace of functions {\mathcal U \subseteq L^2(\{0,1\}^n)} , let us define the cone

\displaystyle  \mathsf{sos}(\mathcal U) = \mathrm{cone}\left(q^2 : q \in \mathcal U\right)\,.

This is the cone of squares of functions in {\mathcal U} . We will think of {\mathcal U} as a set of axioms of size {\mathrm{dim}(\mathcal U)} that is able to assert non-negativity of every {f \in \mathrm{sos}(\mathcal U)} by writing

\displaystyle  f = \sum_{i=1}^k q_i^2

for some {q_1, \ldots, q_k \in \mathsf{sos}(\mathcal U)} .

Fix a subspace {\mathcal U} and let {r = \dim(\mathcal U)} . Fix also a basis {q_1, \ldots, q_r : \{0,1\}^n \rightarrow \mathbb R} for {\mathcal U} .

Define {B : \{0,1\}^n \rightarrow \mathcal S_+^r} by setting {B(x)_{ij} = q_i(x) q_j(x)} . Note that {B(x)} is PSD for every {x} because {B(x) = \vec q(x) \vec q(x)^T} where {\vec q(x)=(q_1(x), \ldots, q_r(x))} .

We can write every {p \in \mathcal U} as {p = \sum_{i=1}^r \lambda_i q_i} . Defining {A(p^2) \in \mathcal S_+^r} by {A(p^2)_{ij} = \lambda_i \lambda_j} , we see that

\displaystyle  \mathrm{Tr}(A(p^2) Q(x)) = \sum_{i,j} \lambda_i \lambda_j q_i(x) q_j(x) = p(x)^2\,.

Now every {f \in \mathsf{sos}(\mathcal U)} can be written as {\sum_{i=1}^k c_i p_i^2} for some {k \geq 0} and {\{c_i \geq 0\}} . Therefore if we define {A(f) = \sum_{i=1}^k c_i \Lambda(p_i^2)} (which is a positive sum of PSD matrices), we arrive at the representation

\displaystyle  f(x) = \mathrm{Tr}(A(f) B(x))\,.

In conclusion, if {\mathrm{QML}_+^n \subseteq \mathsf{sos}(\mathcal U)} , then {\mathrm{rank}_{\mathsf{psd}}(\mathcal M_n) \leq \dim(\mathsf{sos}(\mathcal U))} .

By a “purification” argument, one can also conclude {\dim(\mathsf{sos}(\mathcal U)) \leq \mathrm{rank}_{\mathsf{psd}}(\mathcal M_n)^2} .

1.4. The canonical axioms

And just as {d} -juntas were the canonical axioms for our NMF proof system, there is a similar canonical family in the SDP setting: Let {\mathcal Q_d} be the subspace of all degree-{d} multi-linear polynomials on {\mathbb R^n} . We have

\displaystyle  \dim(\mathcal Q_d) \leq \sum_{k=0}^d {n \choose k} \leq 1+n^d\,. \ \ \ \ \ (1)

For a function {f : \{0,1\}^n \rightarrow \mathbb R_+} , one defines

\displaystyle  \deg_{\mathsf{sos}}(f) = \min \{d : f \in \mathsf{sos}(\mathcal Q_{d}) \}\,.

(One could debate whether the definition of sum-of-squares degree should have {d/2} or {d} . The most convincing arguments suggest that we should use membership in the cone of squares over {\mathcal Q_{\lfloor d/2\rfloor}} so that the sos-degree will be at least the real-degree of the function.)

On the other hand, our choice has the following nice property.

Lemma 3 For every {f : \{0,1\}^n \rightarrow \mathbb R_+} , we have {\deg_{\mathrm{sos}}(f) \leq \deg_J(f)} .

Proof: If {q} is a non-negative {d} -junta, then {\sqrt{q}} is also a non-negative {d} -junta. It is elementary to see that every {d} -junta is polynomial of degree at most {d} , thus {q} is the square of a polynomial of degree at most {d} . \Box

1.5. The canonical tests

As with junta-degree, there is a simple characterization of sos-degree in terms of separating functionals. Say that a functional {\varphi : \{0,1\}^n \rightarrow \mathbb R} is degree-{d} pseudo-positive if

\displaystyle  \langle \varphi, q^2 \rangle = \mathop{\mathbb E}_{x \in \{0,1\}^n} \varphi(x) q(x)^2 \geq 0

whenever {q : \{0,1\}^n \rightarrow \mathbb R} satisfies {\deg(q) \leq d} (and by {\deg} here, we mean degree as a multi-linear polynomial on {\{0,1\}^n} ).

Again, since {\mathsf{sos}(\mathcal Q_d)} is a closed convex set, there is precisely one way to show non-membership there. The following characterization is elementary.

Lemma 4 For every {f : \{0,1\}^n \rightarrow \mathbb R_+} , it holds that {\deg_{\mathsf{sos}}(f) > d} if and only if there is a degree-{d} pseudo-positive functional {\varphi : \{0,1\}^n \rightarrow \mathbb R} such that {\langle \varphi,f\rangle < 0} .

1.6. The connection to psd rank

Following the analogy with non-negative rank, we have two objectives left: (1) to exhibit a function {f \in \mathrm{QML}_n^+} with {\deg_{\mathsf{sos}}(f)} large, and (ii) to give a connection between the sum-of-squares degree of {f} and the psd rank of an associated matrix.

Notice that the function {f(x)=(1-\sum_{i=1}^n x_i)^2} we used for junta-degree has {\deg_{\mathsf{sos}}(f)=1} , making it a poor candidate. Fortunately, Grigoriev has shown that the knapsack polynomial has large sos-degree.

Theorem 5 For every odd {m \geq 1} , the function

\displaystyle  f(x) = \left(\frac{m}{2} - \sum_{i=1}^n x_i\right)^2 - \frac14

has {\deg_{\mathsf{sos}}(f) \geq \lceil m/2\rceil} .

Observe that this {f} is non-negative over {\{0,1\}^m} (because {m} is odd), but it is manifestly not non-negative on {\mathbb R^m} .

Finally, we recall the submatrices of {\mathcal M_n} defined as follows. Fix some integer {m \geq 1} and a function {g : \{0,1\}^m \rightarrow \mathbb R_+} . Then {M_n^g : {[n] \choose m} \times \{0,1\}^n \rightarrow \mathbb R_+} is given by

\displaystyle  M_n^g(S,x) = g(x|_S)\,.

In the next post, we discuss the proof of the following theorem.

Theorem 6 (L-Raghavendra-Steurer 2015) For every {m \geq 1} and {g : \{0,1\}^m \rightarrow \mathbb R_+} , there exists a constant {C(g)} such that the following holds. For every {n \geq 2m} ,

\displaystyle  1+n^{d} \geq \mathrm{rank}_{\mathsf{psd}}(M_n^g) \geq C(g) \left(\frac{n}{\log n}\right)^{(d-1)/2}\,,

where d=\deg_{\mathsf{sos}}(g) .

Note that the upper bound is from (1) and the non-trivial content is contained in the lower bound. As before, in conjunction with Theorem 5, this shows that \mathrm{rank}_{\mathsf{psd}}(\mathcal M_n)  cannot be bounded by any polynomial in n and thus the same holds for \bar \gamma_{\mathsf{sdp}}(\mathrm{CUT}_n) .

PSD lifting and Unique Games integrality gaps

By now, it is known that integrality gaps for the standard Unique Games SDP (see the paper of Khot and Vishnoi or Section 5.2 of this post) can be used to obtain integrality gaps for many other optimization problems, and often for very strong SDPs coming from various methods of SDP tightening; see, for instance, the paper of Raghavendra and Steurer.

Problematically, the Khot-Vishnoi gap is rather inefficient: To achieve the optimal gap for Unique Games with alphabet size {L}, one needs an instance of size {\exp(\Omega(L))}. As far as I know, there is no obstacle to achieving a gap instance where the number of variables is only {\mathrm{poly}(L)}.

The Walsh-Hadamard code

The Khot-Vishnoi construction is based on the Hadamard code.
(See Section 5.2 here for a complete description.) If we use {L^2(\{-1,1\}^k)} to denote the Hilbert space of real-valued functions {f : \{-1,1\}^k \rightarrow \mathbb R}, then the Walsh-Hadamard basis of {L^2(\{-1,1\}^k))} is the set of functions of the form

\displaystyle  u_S(x) = \prod_{i \in S} x_i,

where {S \subseteq \{1,2,\ldots,k\}}.

Of course, for two such sets {S \neq T}, we have the orthogonality relations,

\displaystyle  \langle u_S, u_T \rangle = 0.

In their construction, the variables are essentially all functions of the form {f : \{-1,1\}^k \rightarrow \{-1,1\}}, of which there are {2^{2^k}}, while there are only {2^k} basis elements {\{u_S\}_{S \subseteq [k]}} which act as the alphabet for the underlying Unique Games instance. This is what leads to the exponential relationship between the number of variables and the label size.

A PSD lifting question

In an effort to improve this dependence, one could start with a much larger set of nearly orthogonal vectors, and then somehow lift them to a higher-dimensional space where they would become orthogonal. In order for the value of the SDP not to blow up, it would be necessary that this map has some kind of Lipschitz property. We are thus led to the following (possibly naïve) question.

Let {C(d,\varepsilon)} be the smallest number such that the following holds. (Here, {S^{d-1} \subseteq \mathbb R^d} denotes the {(d-1)}-dimensional unit sphere and S(L^2) denotes the unit-sphere of L^2.)

There exists a map {F : S^{d-1} \rightarrow S(L^2)} such that {\|F\|_{\mathrm{Lip}} \leq C(d,\varepsilon)} and whenever {u,v \in \mathbb R^d} satisfy {|\langle u,v\rangle| \leq \varepsilon}, we have {\langle F(u), F(v)\rangle = 0}.

(Recall that \|F\|_{\mathrm{Lip}} = \sup_{x \neq y \in S^{d-1}} \|F(x)-F(y)\|/\|x-y\|.)

One can show that

\displaystyle C(d,\varepsilon) \lesssim \frac{\sqrt{d}}{1-\varepsilon}

by randomly partitioning {S^{d-1}} so that all vectors satisfying {|\langle u,v\rangle| \leq \varepsilon} end up in different sets of the partition, and then mapping all the points in a set to a different orthogonal vector.

My question is simply: Is a better dependence on {d} possible? Can one rule out that {C(d,\varepsilon)} could be independent of {d}? Note that any solution which randomly maps points to orthogonal vectors must incur such a blowup (this is essentially rounding the SDP to an integral solution).

Bloomington summer school recap

A couple months ago, at Indiana University, David Fisher, Nets Katz, and I  organized a summer school on Analysis and geometry in the theory of computation.  This school is one in a series organized by David and funded by NSF grant DMS-0643546 (see, e.g. last year’s school). What follows is a brief synopsis of what the school covered.  All the lectures were given by the participants, and there are links to their lecture notes below.  This is essentially an extended version of an introductory document I wrote for the participants, who were a mix of mathematicians and theoretical computer scientists.

Approximation Algorithms

In the following discussion, we will use the word efficient to describe an algorithm that runs in time polynomial in the size of its input. For a graph {G=(V,E)}, we use {\textsf{MC}(G)} to denote the “MAX-CUT value,” i.e. the quantity

\displaystyle \max_{S \subseteq V} \frac{|E(S, \bar S)|}{|E|},

where {E(S, \bar S)} denotes the set of edges between {S} and its complement. It is well-known that computing {\textsf{MC}(G)} is {\mathsf{NP}}-complete, and thus assuming {\mathsf{P} \neq \mathsf{NP}}, there is no efficient algorithm that, given {G}, outputs {\textsf{MC}(G)}.

Given this state of affairs, it is natural to ask how well we can approximate the value {\mathsf{MC}(G)} with an efficient algorithm. For an algorithm {\mathcal A}, we use {\mathcal A(G)} to denote its output when run on the graph {G}. If {\mathcal A} satisfies {\mathcal A(G) \leq \mathsf{MC}(G)} for all {G}, we define its approximation ratio as

\displaystyle \alpha(\mathcal A) = \sup \left\{ \alpha : \mathcal A(G) \geq \alpha \cdot \mathsf{MC}(G) \textrm{ for all graphs}\right\}

Clearly {\mathcal A(G) \in [0,1]}. Now we are interested in the best approximation ratio achievable by an efficient algorithm {\mathcal A}, i.e. the quantity

\displaystyle \textrm{approx}(\mathsf{MC}) = \sup \left\{ \alpha(\mathcal A) : \mathcal A \textrm{ is efficient} \right\}

It should be clear that similar questions arise for all sorts of other values which are NP-hard to compute (e.g. the chromatic number of a graph, or the length of its shortest tour, or the length of the longest simple path, etc.) An algorithm of Goemans and Williamson (based on a form of convex optimization known as semi-definite programming) shows that

\displaystyle \mathrm{approx}(\mathsf{MC}) \geq \alpha_{\mathrm{GW}} = \frac{2}{\pi} \min_{0 < \theta < \pi} \frac{\theta}{1-\cos\theta} = 0.878\ldots

On the other hand, Håstad proved that, as a consequence of the PCP Theorem, it is NP-complete to obtain an approximation ratio better than {16/17}, i.e. if {\mathsf{P} \neq \mathsf{NP}}, then

\displaystyle \mathrm{approx}(\mathsf{MC}) \leq \frac{16}{17} = 0.941\ldots

How does one prove such a theorem? Well, the {\mathsf{NP}}-hardness of MAX-CUT is based on constructing graphs where every optimal solution has a particular structure (which eventually encodes the solution to another NP-hard problem like SATISFIABILITY). Similarly, the NP-hardness of of obtaining even “near-optimal” solutions is proved, in part, by constructing graphs where every solution whose value is close to optimal has some very specific structure (e.g. is close—in some stronger sense—to an optimal solution).

In this way, one of the main steps in proving the inapproximability of {\mathsf{NP}}-hard problems involves constructing objects which have such a “rigidity” property. This summer school is about how one can use the rigidity of analytic and geometric objects to obtain combinatorial objects with the same property. In fact, assuming something called the “Unique Games Conjecture” (which we will see later), the approximability of many constraint satisfaction problems can be tied directly to the existence of certain geometric configurations.

The Lectures

The first series of lectures will concern the Sparsest Cut problem in graphs and its relationship to bi-lipschitz {L_1} embeddings of finite metric spaces. In particular, we will look at rigidity properties of  “nice” subsets of the Heisenberg group, and how these can be used to prove limitations on a semi-definite programming approach to Sparsest Cut. In the second series, we will see how—assuming the Unique Games Conjecture (UGC)—proving lower bounds on certain simple semi-definite programs actually proves lower bounds against all efficient algorithms. This will entail, among other things, an analytic view of {\{0,1\}}-valued functions, primarily through harmonic analysis.

Sparsest Cut and {L_1} embeddings

The Sparsest Cut problem is classically described as follows. We have a graph {G=(V,E)} and two functions {C : V \times V \rightarrow \mathbb R_+} and {D : V \times V \rightarrow \mathbb R_+}, with {\mathrm{supp}(C) \subseteq E}. The goal is to compute

\displaystyle   \Phi^*(G;C,D) = \min_{S \subseteq V} \frac{C(S, \bar S)}{D(S, \bar S)},

where we use {C(A,B) = \sum_{a \in A, b\in B} C(a,b)} and {D(A,B) = \sum_{a \in A, b \in B} D(a,b)}. The problem has a number of important applications in computer science.

Computing {\Phi^*(G;C,D)} is NP-hard, but again we can ask for approximation algorithms. The best-known approach is based on computing the value of the Goemans-Linial semi-definite program, \mathsf{sdp}(G;C,D), which is

\displaystyle \min \left\{ \frac{\sum_{u,v} C(u,v) \|x_u-x_v\|_2^2}{\sum_{u,v} D(u,v) \|x_u-x_v\|_2^2}: \{x_u\}_{u \in V} \subseteq \mathbb R^V\textrm{ and }\|x_u-x_v\|^2 \leq \|x_u-x_w\|^2 + \|x_w-x_v\|^2 \textrm{ for all }  u,v,w \in V \right\}.

This value can be computed by a semi-definite program (SDP), as we will see. It is an easy exercise to check that {\mathsf{sdp}(G;C,D) \leq \Phi^*(G;C,D)}, and we can ask for the smallest {\alpha = \alpha(n)} such that for all {n}-node graphs {G} and all functions {C,D}, we have

\displaystyle \Phi^*(G;C,D) \leq \alpha(n) \cdot \mathsf{sdp}(G;C,D).

(E.g. it is now known that {(\log n)^{2^{-1000}} \leq \alpha(n) \leq O(\sqrt{\log n} \log \log n)}, with the upper bound proved here, and the lower bound proved here.)

By some duality arguments, one can characterize {\alpha(n)} in a different way. For a metric space {(X,d)}, write {c_1(X,d)} for the infimal constant {B} such that there exists a mapping {f : X \rightarrow L_1} satisfying, for all {x,y \in X},

\displaystyle   \|f(x)-f(y)\|_1 \leq d(x,y) \leq B \|f(x)-f(y)\|_1.

It turns out that

\displaystyle \alpha(n) = \sup \left\{ c_1(X,d) : |X|=n \textrm{ and } (X, \sqrt{d})\textrm{ embeds isometrically in } L_2\right\} (1)

This shows that determining the power of the preceding SDP is intimately connected to understanding bi-lipschitz embeddings into {L_1}. This is what we will study in the first 6 lectures.

  1. (Arnaud de Mesmay) In the first lecture, we will be introduced to the basic geometry of the 3-dimensional Heisenberg group {\mathbb H^3}, and how differentiation plays a roll in proving lower bounds on bi-lipschitz distortion. In particular, we will see Pansu’s approach for finite-dimensional targets and a generalization to spaces with the RNP, and also why a straightforward generalization would fail for {L_1}.
  2. (Mohammad Moharrami) Next, we will see how a differentiation approach to {L_1} embeddings might work in a toy setting that uses only finite graphs. The study of “monotone subsets” (which is elementary here) also arises in the work of Cheeger and Kleiner in lectures 4 and 5.  (See also this post.)
  3. (Sean Li) Here, we will see that there is an equivalent metric {d} on the Heisenberg group for which {(\mathbb H^3, \sqrt{d})} embeds isometrically into {L_2}. This is one half of proving lower bounds on {\alpha(n)} using (1).
  4. (Jeehyeon Seo and John Mackay) In Lectures 4-5, we’ll look at the approach of Cheeger and Kleiner for proving that {\mathbb H^3} does not bi-lipschitz embed into {L_1}.  (Note that these authors previously offered a different approach to non-embeddability, though the one presented in these lectures is somewhat simpler.)
  5. (Florent Baudier) Finally, in Lecture 6, we see some embedding theorems for finite metric spaces that allow us to prove upper bounds on {\alpha(n)}.

The UGC, semi-definite programs, and constraint satisfaction

In the second series of lectures, we’ll see how rigidity of geometric objects can possibly say something, not just about a single algorithm (like a semi-definite program), but about all efficient algorithms for solving a particular problem.

  1. (An-Sheng Jhang) First, we’ll review basic Fourier analysis on the discrete cube, and how this leads to some global rigidity theorems for cuts. These tools will be essential later.  (See also these lecture notes from Ryan O’Donnell.)
  2. (Igor Gorodezky) Next, we’ll see a semi-definite program (SDP) for the MAX-CUT problem, and a tight analysis of its approximation ratio (which turns out to be the {0.878\ldots} value we saw earlier).
  3. (Sam Daitch) In the third lecture, we’ll see the definition of the Unique Games Conjecture, and how it can be used (in an ad-hoc manner, for now) to transform our SDP analysis into a proof that the SDP-based algorithm is optimal (among all efficient algorithms) under some complexity-theoretic assumptions.
  4. (Deanna Needell) A key technical component of the preceding lecture is something called the Majority is Stablest Theorem that relates sufficiently nice functions on the discrete cube to functions on Gaussian space.
  5. (Sushant Sachdeva) In the final lecture, we’ll see Raghavendra’s work which shows that, for a certain broad class of NP-hard constraint satisfaction problems, assuming the UGC, the best-possible algorithm is the “canonical” semi-definite program. In other words, the approximation ratio for these problems is completely determined by the existence (or lack thereof) of certain vector configurations in {\mathbb R^n}.  (See also this post.)

Open question: PSD flows

This post is about a beautiful twist on flows that arises when studying (the dual) of the Sparsest Cut SDP.  These objects, which I’m going to call “PSD flows,” are rather poorly understood, and there are some very accessible open problems surrounding them.  Let’s begin with the definition of a normal flow:

Let G=(V,E) be a finite, undirected graph, and for every pair u,v \in V, let \mathcal P_{uv} be the set of all paths between u and v in G.  Let \mathcal P = \bigcup_{u,v \in V} \mathcal P_{uv}.  A flow in G is simply a mapping F : \mathcal P \to \mathbb R_{\geq 0}.  We define, for every edge (u,v) \in E, the congestion on (u,v) as

\displaystyle C_F(u,v) = \sum_{p \in \mathcal P: (u,v) \in p} F(p)

which is the total amount of flow going through (u,v).  Finally, for every u,v \in V, we define

F\displaystyle \lbrack u,v\rbrack = \sum_{p \in \mathcal P_{uv}} F(p)

as the total amount of flow sent from u to v.

Now, in the standard (all-pairs) maximum concurrent flow problem, the goal is to find a flow F which simultaneously sends D units of flow from every vertex to every other, while not putting more than one unit of flow through any edge, i.e.

\displaystyle \mathsf{mcf}(G) = \textrm{maximize } \left\{ \vphantom{\bigoplus} D : \forall u,v, F[u,v] \geq D \textrm{ and } \forall (u,v) \in E, C_F(u,v) \leq 1.\right\}

In order to define a PSD flow, it helps to write this in a slightly different way.  If we define the symmetric matrix

A_{u,v} = F[u,v] - D + {\bf 1}_{\{(u,v) \in E\}} - C_F(u,v)

then we have

Claim 1: \mathsf{mcf}(G) = \max \{ D : A_{u,v} \geq 0 \}.

To see that this is true, we can take a matrix with A_{u,v} \geq 0 for all u,v \in V and fix it one entry at a time so that F[u,v] \geq D and C_F(u,v) \leq 1, without decreasing the total demand satisfied by the flow.

For instance, if (u,v) \in E and C_F(u,v) > 1+\varepsilon, then it must be that F[u,v] > D+\varepsilon, so we can reroute \varepsilon units of flow going through the edge (u,v) to go along one of the extraneous flow paths which gives the excess F[u,v] > D + \varepsilon.  Similar arguments hold for the other cases (Exercise!).

PSD flows

So those are normal flows.  To define a PSD flow, we define for any symmetric matrix A, the Laplacian of A, which has diagonal entries L(A)_{i,i} = \sum_{j \neq i} A_{i,j} and off-diagonal entries L(A)_{i,j} = - A_{i,j}.  It is easy to check that

\displaystyle \langle x, L(A)\, x \rangle = \sum_{i,j} A_{i,j} (x_i-x_j)^2.

Hence if A_{u,v} \geq 0 for all u,v \in V, then certainly L(A) \succeq 0 (i.e. L(A) is positive semi-definite).  The PSD flow problem is precisely

\displaystyle \max \{ D : L(A) \succeq 0 \}

where A is defined as above.  Of course, now we are allowing A to have negative entries, which makes this optimization trickier to understand.  We allow the flow to undersatisfy some demand, and to overcongest some edges, but now the “error” matrix has to induce a PSD Laplacian.

Scaling down the capacities

Now, consider some \delta \in [0,1], and write

\displaystyle A_{u,v}^{(\delta)} = F[u,v] - D + \delta \cdot {\bf 1}_{(u,v) \in E} - C_F(u,v).

Requiring A_{u,v}^{(\delta)} \geq 0 for every u,v \in V simply induces a standard flow problem where each edge now has capacity \delta.  In the case of normal flows, because we can decouple the demand/congestion constraints as in Claim 1, we can easily relate \max \{ D : A_{u,v}^{(\delta)} \geq 0\,\forall u,v \in V\} to \max \{ D : A_{u,v} \geq 0\,\forall u,v \in V\} (the first is exactly \delta times the second, because we can just scale a normal flow down by \delta and now it satisfies the reduced edge capacities).

Question: Can we relate \max \{ D : L(A^{(\delta)}) \succeq 0 \} and \max \{ D : L(A) \succeq 0 \}?  More specifically, do they differ by some multiplicative constant depending only on \delta?

This is a basic question whose answer is actually of fundamental importance in understanding the Sparsest Cut SDP.  I asked this question in its primal form almost 4 years ago (see question 3.2 here).

Note that the answer is affirmative if we can decouple the demand/congestion constraints in the case of PSD flows.  In other words, let X_{u,v} = F[u,v] - D and let Y_{u,v} = {\bf 1}_{(u,v \in E)} - C_F(u,v).

Question: Can we relate \max \{ D : L(A) \succeq 0 \} to \max \{ D : L(X) \succeq 0 \textrm{ and } L(Y) \succeq 0 \}?

In the next post, I’ll discuss consequences of this question for constructing integrality gaps for the Sparsest Cut SDP.