Primer: Lifts of polytopes and non-negative matrix factorization

In preparation for the next post on entropy optimality, we need a little background on lifts of polytopes and non-negative matrix factorization.

1.1. Polytopes and inequalities

A {d} -dimensional convex polytope {P \subseteq \mathbb R^d} is the convex hull of a finite set of points in {\mathbb R^d} . Equivalently, it is a compact set defined by a family of linear inequalities

\displaystyle  P = \{ x \in \mathbb R^d : Ax \leq b \}

for some matrix {A \in \mathbb R^{m \times d}} .

Let us give a measure of complexity for {P} : Define {\gamma(P)} to be the smallest number {m} such that for some {C \in \mathbb R^{s \times d}, y \in \mathbb R^s, A \in \mathbb R^{m \times d}, b \in \mathbb R^m} , we have

\displaystyle  P = \{ x \in \mathbb R^d : Cx=y \textrm{ and } Ax \leq b\}\,.

In other words, this is the minimum number of inequalities needed to describe {P} . If {P} is full-dimensional, then this is precisely the number of facets of {P} (a facet is a maximal proper face of {P} ).

Thinking of {\gamma(P)} as a measure of complexity makes sense from the point of view of optimization: Interior point methods can efficiently optimize linear functions over {P} (to arbitrary accuracy) in time that is polynomial in {\gamma(P)} .

1.2. Lifts of polytopes

Many simple polytopes require a large number of inequalities to describe. For instance, the cross-polytope

\displaystyle  C_d = \{ x \in \mathbb R^d : \|x\|_1 \leq 1 \} = \{ x \in \mathbb R^d : \pm x_1 \pm x_2 \cdots \pm x_d \leq 1 \}

has {\gamma(C_d)=2^d} . On the other hand, {C_d} is the projection of the polytope

\displaystyle  Q_d = \left\{ (x,y) \in \mathbb R^{2d} : \sum_{i=1}^n y_i = 1, \,\, - y_i \leq x_i \leq y_i\,\forall i\right\}

onto the {x} coordinates, and manifestly, {\gamma(Q_d) \leq 2d} . Thus {C_d} is the (linear) shadow of a much simpler polytope in a higher dimension.

[image credit: Fiorini, Rothvoss, and Tiwary]

A polytope {Q} is called a lift of the polytope {P} if {P} is the image of {Q} under a linear projection. Again, from an optimization stand point, lifts are important: If we can optimize linear functionals over {Q} , then we can optimize linear functionals over {P} . Define now {\bar \gamma(P)} to be the minimal value of {\gamma(Q)} over all lifts {Q} of {P} . (The value {\bar \gamma(P)} is sometimes called the (linear) extension complexity of {P} .)

1.3. The permutahedron

Here is a somewhat more interesting family of examples where lifts reduce complexity. The permutahedron {\Pi_n \subseteq \mathbb R^n} is the convex hull of the vectors {(i_1, i_2, \ldots, i_n)} where {\{i_1, \ldots, i_n\} = \{1,\ldots,n \}} . It is known that {\gamma(\Pi_n)=2^n-2} .

Let {B_n \subseteq \mathbb R^{n^2}} denote the convex hull of the {n \times n} permutation matrices. The Birkhoff-von Neumann theorem tells us that {B_n} is precisely the set of doubly stochastic matrices, thus {\gamma(B_n)=n^2} (corresponding to the non-negativity constraints on each entry).

Observe that {\Pi_n} is the linear image of {B_n} under the map {A \mapsto A (1, 2, \ldots, n)^T} , i.e. we multiply a matrix {A \in B_n} on the right by the column vector {(1, 2, \ldots, n)} . Thus {B_n} is a lift of {\Pi_n} , and we conclude that {\bar \gamma(\Pi_n) \leq n^2 \ll \gamma(\Pi_n)} .

1.4. The cut polytope

If {P \neq NP} , there are certain combinatorial polytopes we should not be able to optimize over efficiently. A central example is the cut polytope: {\mathrm{CUT}_n \subseteq \mathbb R^{n^2}} is the convex hull of all matrices of the form {(A_S)_{ij} = |\mathbf{1}_S(i)-\mathbf{1}_S(j)|} for some subset {S \subseteq \{1,\ldots,n\}} . Here, {\mathbf{1}_S} denotes the characteristic function of {S} .

Note that the MAX-CUT problem on a graph {G=(V,E)} can be encoded in the following way: Let {W_{ij} = 1} if {\{i,j\} \in E} and {W_{ij}=0} otherwise. Then the value of the maximum cut in {G} is precisely the maximum of {\langle W, A\rangle} for {A \in \mathrm{CUT}_n} . Accordingly, we should expect that {\bar \gamma(\mathrm{CUT}_n)} cannot be bounded by any polynomial in {n} (lest we violate a basic tenet of complexity theory).

1.5. Non-negative matrix factorization

The key to understanding {\bar \gamma(\mathrm{CUT}_n)} comes from Yannakakis’ factorization theorem.

Consider a polytope {P \subseteq \mathbb R^d} and let us write in two ways: As a convex hull of vertices

\displaystyle  P = \mathrm{conv}\left(x_1, x_2, \ldots, x_n\right)\,,

and as an intersection of half-spaces: For some {A \in \mathbb R^{m \times d}} and {b \in \mathbb R^m} ,

\displaystyle  P = \left\{ x \in \mathbb R^d : Ax \leq b \right\}\,.

Given this pair of representations, we can define the corresponding slack matrix of {P} by

\displaystyle  S_{ij} = b_i - \langle A_i, x_j \rangle \qquad i \in \{1,2,\ldots,m\}, j \in \{1,2,\ldots,n\}\,.

Here, {A_1, \ldots, A_m} denote the rows of {A} .

We need one more definition. In what follows, we will use {\mathbb R_+ = [0,\infty)} . If we have a non-negative matrix {M \in \mathbb R_+^{m \times n}} , then a rank-{r} non-negative factorization of {M} is a factorization {M = AB} where {A \in \mathbb R_+^{m \times r}} and {B \in \mathbb R_+^{r \times n}} . We then define the non-negative rank of {M} , written {\mathrm{rank}_+(M)} , to be the smallest {r} such that {M} admits a rank-{r} non-negative factorization.

Theorem 1 (Yannakakis) For every polytope {P} , it holds that {\bar \gamma(P) = \mathrm{rank}_+(S)} for any slack matrix {S} of {P} .

The key fact underlying this theorem is Farkas’ Lemma. It asserts that if {P = \{ x \in \mathbb R^d : Ax \leq b \}} , then every valid linear inequality over {P} can be written as a non-negative combination of the defining inequalities {\langle A_i, x \rangle \leq b_i} .

There is an interesting connection here to proof systems. The theorem says that we can interpret {\bar \gamma(P)} as the minimum number of axioms so that every valid linear inequality for {P} can be proved using a conic (i.e., non-negative) combination of the axioms.

1.6. Slack matrices and the correlation polytope

Thus to prove a lower bound on {\bar \gamma(\mathrm{CUT}_n)} , it suffices to find a valid set of linear inequalities for {\mathrm{CUT}_n} and prove a lower bound on the non-negative rank of the corresponding slack matrix.

Toward this end, consider the correlation polytope {\mathrm{CORR}_n \subseteq \mathbb R^{n^2}} given by

\displaystyle  \mathrm{CORR}_n = \mathrm{conv}\left(\left\{x x^T : x \in \{0,1\}^n \right\}\right)\,.

It is an exercise to see that {\mathrm{CUT}_{n+1}} and {\mathrm{CORR}_n} are linearly isomorphic.

Now we identify a particularly interesting family of valid linear inequalities for {\mathrm{CORR}_n} . (In fact, it turns out that this will also be an exhaustive list.) A quadratic multi-linear function on {\mathbb R^n} is a function {f : \mathbb R^n \rightarrow \mathbb R} of the form

\displaystyle  f(x) = a_0 + \sum_i a_{ii} x_i + \sum_{i < j} a_{ij} x_i x_j\,,

for some real numbers {a_0} and {\{a_{ij}\}} .

Suppose {f} is a quadratic multi-linear function that is also non-negative on {\{0,1\}^n} . Then “{f(x) \geq 0 \,\,\forall x \in \{0,1\}^n} ” can be encoded as a valid linear inequality on {\mathrm{CORR}_n} . To see this, write {f(x)=\langle A,xx^T\rangle + a_0} where {A=(a_{ij})} . (Note that {\langle \cdot,\cdot\rangle} is intended to be the standard inner product on {\mathbb R^{n^2}} .)

Now let {\textrm{QML}^+_n} be the set of all quadratic multi-linear functions that are non-negative on {\{0,1\}^n} , and consider the matrix (represented here as a function) {\mathcal M_n : \textrm{QML}^+_n \times \{0,1\}^n \rightarrow \mathbb R_+} given by

\displaystyle  \mathcal M_n(f,x) = f(x)\,.

Then from the above discussion, {\mathcal M_n} is a valid sub-matrix of some slack matrix of {\mathrm{CORR}_n} . To summarize, we have the following theorem.

Theorem 2 For all {n \geq 1} , it holds that {\bar \gamma(\mathrm{CUT}_{n+1}) \geq \mathrm{rank}_+(\mathcal M_n)} .

It is actually the case that {\bar \gamma(\mathrm{CUT}_{n+1}) = \mathrm{rank}_+(\mathcal M_n)} . The next post will focus on providing a lower bound on {\mathrm{rank}_+(\mathcal M_n)} .

Leave a comment