In preparation for the next post on entropy optimality, we need a little background on lifts of polytopes and non-negative matrix factorization.
1.1. Polytopes and inequalities
A -dimensional convex polytope is the convex hull of a finite set of points in . Equivalently, it is a compact set defined by a family of linear inequalities
for some matrix .
Let us give a measure of complexity for : Define to be the smallest number such that for some , we have
In other words, this is the minimum number of inequalities needed to describe . If is full-dimensional, then this is precisely the number of facets of (a facet is a maximal proper face of ).
Thinking of as a measure of complexity makes sense from the point of view of optimization: Interior point methods can efficiently optimize linear functions over (to arbitrary accuracy) in time that is polynomial in .
1.2. Lifts of polytopes
Many simple polytopes require a large number of inequalities to describe. For instance, the cross-polytope
has . On the other hand, is the projection of the polytope
onto the coordinates, and manifestly, . Thus is the (linear) shadow of a much simpler polytope in a higher dimension.
[image credit: Fiorini, Rothvoss, and Tiwary]
A polytope is called a lift of the polytope if is the image of under a linear projection. Again, from an optimization stand point, lifts are important: If we can optimize linear functionals over , then we can optimize linear functionals over . Define now to be the minimal value of over all lifts of . (The value is sometimes called the (linear) extension complexity of .)
1.3. The permutahedron
Here is a somewhat more interesting family of examples where lifts reduce complexity. The permutahedron is the convex hull of the vectors where . It is known that .
Let denote the convex hull of the permutation matrices. The Birkhoff-von Neumann theorem tells us that is precisely the set of doubly stochastic matrices, thus (corresponding to the non-negativity constraints on each entry).
Observe that is the linear image of under the map , i.e. we multiply a matrix on the right by the column vector . Thus is a lift of , and we conclude that .
1.4. The cut polytope
If , there are certain combinatorial polytopes we should not be able to optimize over efficiently. A central example is the cut polytope: is the convex hull of all matrices of the form for some subset . Here, denotes the characteristic function of .
Note that the MAX-CUT problem on a graph can be encoded in the following way: Let if and otherwise. Then the value of the maximum cut in is precisely the maximum of for . Accordingly, we should expect that cannot be bounded by any polynomial in (lest we violate a basic tenet of complexity theory).
1.5. Non-negative matrix factorization
The key to understanding comes from Yannakakis’ factorization theorem.
Consider a polytope and let us write in two ways: As a convex hull of vertices
and as an intersection of half-spaces: For some and ,
Given this pair of representations, we can define the corresponding slack matrix of by
Here, denote the rows of .
We need one more definition. In what follows, we will use . If we have a non-negative matrix , then a rank- non-negative factorization of is a factorization where and . We then define the non-negative rank of , written , to be the smallest such that admits a rank- non-negative factorization.
Theorem 1 (Yannakakis) For every polytope , it holds that for any slack matrix of .
The key fact underlying this theorem is Farkas’ Lemma. It asserts that if , then every valid linear inequality over can be written as a non-negative combination of the defining inequalities .
There is an interesting connection here to proof systems. The theorem says that we can interpret as the minimum number of axioms so that every valid linear inequality for can be proved using a conic (i.e., non-negative) combination of the axioms.
1.6. Slack matrices and the correlation polytope
Thus to prove a lower bound on , it suffices to find a valid set of linear inequalities for and prove a lower bound on the non-negative rank of the corresponding slack matrix.
Toward this end, consider the correlation polytope given by
It is an exercise to see that and are linearly isomorphic.
Now we identify a particularly interesting family of valid linear inequalities for . (In fact, it turns out that this will also be an exhaustive list.) A quadratic multi-linear function on is a function of the form
for some real numbers and .
Suppose is a quadratic multi-linear function that is also non-negative on . Then “” can be encoded as a valid linear inequality on . To see this, write where . (Note that is intended to be the standard inner product on .)
Now let be the set of all quadratic multi-linear functions that are non-negative on , and consider the matrix (represented here as a function) given by
Then from the above discussion, is a valid sub-matrix of some slack matrix of . To summarize, we have the following theorem.
Theorem 2 For all , it holds that .
It is actually the case that . The next post will focus on providing a lower bound on .