# Entropy optimality: Non-negative rank lower bounds

Using the notation from the last post, our goal is now to prove the following theorem.

Theorem 1 For every ${m \geq 1}$ and ${g : \{0,1\}^m \rightarrow \mathbb R_+}$, there is a constant ${C=C(g)}$ such that for all ${n \geq 2m}$,

$\displaystyle \mathrm{rank}_+(M_n^g) \geq C \left(\frac{n}{\log n}\right)^{\mathrm{deg}_J(g)-1}\,.$

1.1. Convex relaxations of non-negative rank

Before getting to the proof, let us discuss the situation in somewhat more generality. Consider finite sets ${X}$ and ${Y}$ and a matrix ${M : X \times Y \rightarrow \mathbb R_+}$ with ${r=\mathrm{rank}_+(M)}$.

In order to use entropy-maximization, we would like to define a convex set of low non-negative rank factorizations (so that maximizing entropy over this set will give us a “simple” factorization). But the convex hull of ${\{ N \in \mathbb R_+^{X \times Y} : \mathrm{rank}_+(N) = 1 \}}$ is precisely the set of all non-negative matrices.

Instead, let us proceed analytically. For simplicity, let us equip both ${X}$ and ${Y}$ with the uniform measure. Let ${\mathcal Q = \{ b : Y \rightarrow \mathbb R_+ \mid \|b\|_1 = 1\}}$ denote the set of probability densities on ${Y}$. Now define

$\displaystyle \alpha_+(N) = \min \Big\{ \alpha : \exists A \in \mathbb R_+^{X \times k}, B \in \mathbb R_+^{k \times Y} \textrm{ with } N=AB, \textrm{ and}$

$\displaystyle \qquad\qquad\qquad\qquad\qquad\{B_1, \ldots, B_k\} \subseteq \mathcal Q, \textrm{ and }$

$\displaystyle \hphantom{\{B_1, \ldots, B_k\} \subseteq \mathcal Q,} \max_{i \in [k]} \|B_i\|_{\infty} \leq \alpha, \sum_{i=1}^k \|A^{(i)}\|_{\infty} \leq \alpha\Big\}.$

Here ${\{A^{(i)}\}}$ are the columns of ${A}$ and ${\{B_i\}}$ are the rows of ${B}$. Note that now ${k}$ is unconstrained.

Observe that ${\alpha_+}$ is a convex function. To see this, given a pair ${N=AB}$ and ${N'=A'B'}$, write

$\displaystyle \frac{N+N'}{2} = \left(\begin{array}{cc} \frac12 A & \frac12 A' \end{array}\right) \left(\begin{array}{l} \vphantom{\bigoplus} B \\ \vphantom{\bigoplus} B' \end{array}\right)\,,$

witnessing the fact that ${\alpha_+(\frac12(N+N')) \leq \frac12 \alpha_+(N) + \frac12 \alpha_+(N')}$.

1.2. A truncation argument

So the set ${\{ N : \alpha_+(N) \leq c \}}$ is convex, but it’s not yet clear how this relates to ${\mathrm{rank}_+}$. We will see now that low non-negative rank matrices are close to matrices with ${\alpha_+}$ small. In standard communication complexity/discrepancy arguments, this corresponds to discarding “small rectangles.”

In the following lemma, we use the norms ${\|M\|_1 = \mathbb E_{x,y} |M_{x,y}|}$ and ${\|M\|_{\infty} = \max_{x,y} |M_{x,y}|}$.

Lemma 2 For every non-negative ${M \in \mathbb R_+^{X \times Y}}$ with ${\mathrm{rank}_+(M) \leq r}$ and every ${\delta \in (0,1)}$, there is a matrix ${\tilde M \in \mathbb R_+^{X \times Y}}$ such that

$\displaystyle \|M-\tilde M\|_1 \leq \delta$

and

$\displaystyle \alpha_+(\tilde M) \leq \frac{r}{\delta} \|M\|_{\infty}\,.$

Proof: Suppose that ${M=AB}$ with ${A \in \mathbb R_+^{X \times r}, B \in \mathbb R_+^{r \times Y}}$, and let us interpret this factorization in the form

$\displaystyle M(x,y) = \sum_{i=1}^r A_i(x) B_i(y) \ \ \ \ \ (1)$

(where ${\{A_i\}}$ are the columns of ${A}$ and ${\{B_i\}}$ are the rows of ${B}$). By rescaling the columns of ${A}$ and the rows of ${B}$, respectively, we may assume that ${\mathop{\mathbb E}_{\mu}[B_i]=1}$ for every ${i \in [r]}$.

Let ${\Lambda = \{ i : \|B_i\|_{\infty} > \tau \}}$ denote the “bad set” of indices (we will choose ${\tau}$ momentarily). Observe that if ${i \in \Lambda}$, then

$\displaystyle \|A_i\|_{\infty} \leq \frac{\|M\|_{\infty}}{\tau}\,,$

from the representation (1) and the fact that all summands are positive.

Define the matrix ${\tilde M(x,y) = \sum_{i \notin \Lambda} A_i(x) B_i(y)}$. It follows that

$\displaystyle \|M-\tilde M\|_1 = \mathop{\mathbb E}_{x,y} \left[|M(x,y)-\tilde M(x,y)|\right] = \sum_{i \in \Lambda} \mathop{\mathbb E}_{x,y} [A_i(x) B_i(y)]\,.$

Each of the latter terms is at most ${\|A_i\|_{\infty} \|B_i\|_1 \leq \frac{\|M\|_{\infty}}{\tau}}$ and ${|\Lambda| \leq r}$, thus

$\displaystyle \|M-\tilde M\|_1 \leq r \frac{\|M\|_{\infty}}{\tau}\,.$

Next, observe that

$\displaystyle \mathop{\mathbb E}_y [M(x,y)] = \sum_{i=1}^r A_i(x) \|B_i\|_1 = \sum_{i=1}^r A_i(x)\,,$

implying that ${\|A_i\|_{\infty} \leq \|M\|_{\infty}}$ and thus ${\sum_{i=1}^r \|A_i\|_{\infty} \leq r \|M\|_{\infty}}$.

Setting ${\tau = r \|M\|_{\infty}/\delta}$ yields the statement of the lemma. $\Box$

Generally, the ratio ${\frac{\|M\|_{\infty}}{\|M\|_1}}$ will be small compared to ${r}$ (e.g., polynomial in ${n}$ vs. super-polynomial in ${n}$). Thus from now on, we will actually prove a lower bound on ${\alpha_+(M)}$. One has to verify that the proof is robust enough to allow for the level of error inherent in Lemma 2.

1.3. The test functionals

Now we have a convex body of low “analytic non-negative rank” matrices. Returning to Theorem 1 and the matrix ${M_n^g}$, we will now assume that ${\alpha_+(M_n^g) \leq \alpha}$. Next we identify the proper family of test functionals that highlight the difficulty of factoring the matrix ${M_n^g}$. We will consider the uniform measures on ${{[n] \choose m}}$ and ${\{0,1\}^n}$. We use ${\mathop{\mathbb E}_S}$ and ${\mathop{\mathbb E}_x}$ to denote averaging with respect to these measures.

Let ${d=\deg_J(g)-1}$. From the last post, we know there exists a ${d}$-locally positive functional ${\varphi : \{0,1\}^n \rightarrow \mathbb R}$ such that ${\beta := \mathop{\mathbb E}_x \varphi(x) g(x) < 0}$, and ${\mathop{\mathbb E}_x \varphi(x) q(x) \geq 0}$ for every ${d}$-junta ${q}$.

For ${S \subseteq [n]}$ with ${|S|=m}$, let us denote ${\varphi_S(x) = \varphi(x|_S)}$. These functionals prove lower bounds on the junta-degree of ${g}$ restricted to various subsets of the coordinates. If we expect that junta-factorizations are the “best” of a given rank, then we have some confidence in choosing the family ${\{\varphi_S\}}$ as our test functions.

1.4. Entropy maximization

Use ${\alpha_+(M_n^g) \leq \alpha}$ to write

$\displaystyle M_n^g(S,x) = \sum_{i=1}^k A_i(S) B_i(x)\,,$

where ${A_i, B_i \geq 0}$ and we have ${\|B_i\|_1=1}$ and ${\|B_i\|_{\infty} \leq \alpha}$ for all ${i \in [k]}$, and finally ${\sum_{i=1}^k \|A_i\|_{\infty} \leq \alpha}$.

First, as we observed last time, if each ${B_i}$ were a ${d}$-junta, we would have a contradiction:

$\displaystyle \mathop{\mathbb E}_{S,x} \left[\varphi_S(x) M_n^g(S,x)\right] = \mathop{\mathbb E}_{y \in \{0,1\}^m} \varphi(y) g(y) = \beta < 0\,, \ \ \ \ \ (2)$

and yet

$\displaystyle \mathop{\mathbb E}_{S,x} \left[\varphi_S(x) \sum_{i=1}^k A_i(S) B_i(x)]\right] = \sum_{i=1}^k \mathop{\mathbb E}_S A_i(S) \mathop{\mathbb E}_x \left[\varphi_S(x) B_i(x)\right] \geq 0 \ \ \ \ \ (3)$

because ${\mathop{\mathbb E}_x \left[\varphi_S(x) B_i(x)\right] = \mathop{\mathbb E}_{y \in \{0,1\}^S} \varphi(y) \mathop{\mathbb E}_{x} \left[B_i(x) \,\big|\, x|_S =y\right] \geq 0}$ since ${\varphi}$ is ${d}$-locally positive and the function ${y \mapsto \mathop{\mathbb E}_{x} \left[B_i(x) \,\big|\, x|_S =y\right]}$ is a ${d}$-junta.

So now the key step: Use entropy maximization to approximate ${B_i}$ by a junta! In future posts, we will need to consider the entire package ${\{B_1, \ldots, B_k\}}$ of functions simultaneously. But for the present lower bound, it suffices to consider each ${B_i}$ separately.

Consider the following optimization over variables ${\{\tilde B_i(x) : x \in \{0,1\}^n\}}$:

$\displaystyle \mathrm{minimize }\qquad \mathrm{Ent}(\tilde B_i) = \mathbb E [\tilde B_i \log \tilde B_i]$

$\displaystyle \qquad\textrm{ subject to } \qquad \qquad\mathbb E[\tilde B_i]=1, \quad \tilde B_i(x) \geq 0 \quad \forall x \in \{0,1\}^n$

$\displaystyle \hphantom{\bigoplus} \qquad \qquad \qquad \qquad \mathop{\mathbb E}_x \left[\varphi_S(x) \tilde B_i(x)\right] \leq \mathop{\mathbb E}_x \left[\varphi_S(x) B_i(x)\right] + \varepsilon \qquad \forall |S|=m\,. \ \ \ \ \ (4)$

The next claim follows immediately from Theorem 1 in this post (solving the max-entropy optimization by sub-gradient descent).

Claim 1 There exists a function ${\tilde B_i : \{0,1\}^n \rightarrow \mathbb R_+^n}$ satisfying all the preceding constraints and of the form

$\displaystyle \tilde B_i = \frac{\exp\left(\sum_{j=1}^{M} \lambda_j \varphi_{S_j}\right)}{\mathop{\mathbb E} \exp\left(\sum_{i=1}^{M} \lambda_j \varphi_{S_j}\right)}$

such that

$\displaystyle M \leq C(g) \frac{\log \alpha}{\varepsilon^2}\,,$

where ${C(g)}$ is some constant depending only on ${g}$.

Note that ${\varphi}$ depends only on ${g}$, and thus ${\|\varphi_S\|_{\infty}}$ only depends on ${g}$ as well. Now each ${\varphi_{S_j}}$ only depends on ${m}$ variables (those in ${S_j}$ and ${|S_j|=m}$), meaning that our approximator ${\tilde B_i}$ is an ${h}$-junta for

$\displaystyle h \leq m \cdot C(g) \frac{\log \alpha}{\varepsilon^2}\,. \ \ \ \ \ (5)$

Oops. That doesn’t seem very good. The calculation in (3) needs that ${\tilde B_i}$ is a ${d}$-junta, and certainly ${d < m}$ (since ${g}$ is a function on ${\{0,1\}^m}$). Nevertheless, note that the approximator is a non-trivial junta. For instance, if ${\alpha \leq 2^{\sqrt{n}}}$, then it is an $O(\sqrt{n})$-junta, recalling that $m$ is a constant (depending on ${g}$).

1.5. Random restriction saves the day

Let’s try to apply the logic of (3) to the ${\tilde B_i}$ approximators anyway. Fix some ${i \in [k]}$ and let ${J_i}$ be the set of coordinates on which ${\tilde B_i}$ depends. Then:

$\displaystyle \mathop{\mathbb E}_{S,x} \left[\varphi_S(x) A_i(S) \tilde B_i(x)\right] = \mathop{\mathbb E}_S\,A_i(S) \mathop{\mathbb E}_{y \in \{0,1\}^S} \varphi(y) \mathop{\mathbb E}_{x \in \{0,1\}^n} \left[B_i(x) \,\big|\, x|_S=y\right]$

Note that the map ${y \mapsto \mathop{\mathbb E}_{x \in \{0,1\}^n} \left[B_i(x) \,\big|\, x|_S=y\right]}$ is a junta on ${J_i \cap S}$. Thus if ${|J_i \cap S| \leq d}$, then the contribution from this term is non-negative since ${\varphi}$ is ${d}$-locally positive. But ${|S|=m}$ is fixed and ${n}$ is growing, thus ${|J_i \cap S| > d}$ is quite rare! Formally,

$\displaystyle \mathop{\mathbb E}_{S,x} \left[\varphi_S(x) A_i(S) \tilde B_i(x)\right] \geq - \|A_i\|_{\infty}\, \mathbb P_S[|J_i \cap S| > d] \geq - \|A_i\|_{\infty} \frac{h^d (2m)^d}{n^d} \,.$

In the last estimate, we have used a simple union bound and ${n \geq 2m}$.

Putting everything together and summing over ${i \in [k]}$, we conclude that

$\displaystyle \sum_{i=1}^k \mathop{\mathbb E}_{S,x} \left[\varphi_S(x) A_i(S) \tilde B_i(x)\right] \geq -\alpha \frac{h^d (2m)^d}{n^d}\,.$

Note that by choosing ${n}$ only moderately large, we will make this error term very small.

Moreover, since each ${\tilde B_i}$ is a feasible point of the optimization (4), we have

$\displaystyle \sum_{i=1}^k \mathop{\mathbb E}_{S,x} \left[\varphi_S(x) A_i(S) B_i(x)\right] = \sum_{i=1}^k \mathop{\mathbb E}_S A_i(S) \mathop{\mathbb E}_x \left[\varphi_S(x) B_i(x)\right]$

$\displaystyle \hphantom{\sum_{i=1}^k \mathop{\mathbb E}_{S,x} \left[\varphi_S(x) A_i(S) B_i(x)\right]} \geq \sum_{i=1}^k \mathop{\mathbb E}_S A_i(S) \left(-\varepsilon + \mathop{\mathbb E}_x \left[\varphi_S(x) \tilde B_i(x)\right]\right)$

$\displaystyle \hphantom{\sum_{i=1}^k \mathop{\mathbb E}_{S,x} \left[\varphi_S(x) A_i(S) B_i(x)\right]} \geq -\varepsilon \sum_{i=1}^k \|A_i\|_1 - \alpha \frac{h^d (2m)^d}{n^d}\,.$

Almost there: Now observe that

$\displaystyle \|g\|_1 = \mathop{\mathbb E}_{S,x} [M_n^g(S,x)] = \sum_{i=1}^k \|A_i\|_1 \|B_i\|_1 = \sum_{i=1}^k \|A_i\|_1\,.$

Plugging this into the preceding line yields

$\displaystyle \sum_{i=1}^k \mathop{\mathbb E}_{S,x} \left[\varphi_S(x) A_i(S) B_i(x)\right] \geq -\varepsilon \|g\|_1 - \alpha \frac{h^d (2m)^d}{n^d}\,.$

Recalling now (2), we have derived a contradiction to ${\alpha_+(M) \leq \alpha}$ if we can choose the right-hand side to be bigger than ${\beta}$ (which is a negative constant depending only on ${g}$). Setting ${\varepsilon = -\beta/(2 \|g\|_1)}$, we consult (5) to see that

$\displaystyle h \leq C'(g) m \log \alpha$

for some other constant ${C'(g)}$ depending only on ${g}$. We thus arrive at a contradiction if ${\alpha = o((n/\log n)^d)}$, recalling that ${m, d}$ depend only on ${g}$. This completes the argument.

# Primer: Lifts of polytopes and non-negative matrix factorization

In preparation for the next post on entropy optimality, we need a little background on lifts of polytopes and non-negative matrix factorization.

1.1. Polytopes and inequalities

A ${d}$-dimensional convex polytope ${P \subseteq \mathbb R^d}$ is the convex hull of a finite set of points in ${\mathbb R^d}$. Equivalently, it is a compact set defined by a family of linear inequalities

$\displaystyle P = \{ x \in \mathbb R^d : Ax \leq b \}$

for some matrix ${A \in \mathbb R^{m \times d}}$.

Let us give a measure of complexity for ${P}$: Define ${\gamma(P)}$ to be the smallest number ${m}$ such that for some ${C \in \mathbb R^{s \times d}, y \in \mathbb R^s, A \in \mathbb R^{m \times d}, b \in \mathbb R^m}$, we have

$\displaystyle P = \{ x \in \mathbb R^d : Cx=y \textrm{ and } Ax \leq b\}\,.$

In other words, this is the minimum number of inequalities needed to describe ${P}$. If ${P}$ is full-dimensional, then this is precisely the number of facets of ${P}$ (a facet is a maximal proper face of ${P}$).

Thinking of ${\gamma(P)}$ as a measure of complexity makes sense from the point of view of optimization: Interior point methods can efficiently optimize linear functions over ${P}$ (to arbitrary accuracy) in time that is polynomial in ${\gamma(P)}$.

1.2. Lifts of polytopes

Many simple polytopes require a large number of inequalities to describe. For instance, the cross-polytope

$\displaystyle C_d = \{ x \in \mathbb R^d : \|x\|_1 \leq 1 \} = \{ x \in \mathbb R^d : \pm x_1 \pm x_2 \cdots \pm x_d \leq 1 \}$

has ${\gamma(C_d)=2^d}$. On the other hand, ${C_d}$ is the projection of the polytope

$\displaystyle Q_d = \left\{ (x,y) \in \mathbb R^{2d} : \sum_{i=1}^n y_i = 1, \,\, - y_i \leq x_i \leq y_i\,\forall i\right\}$

onto the ${x}$ coordinates, and manifestly, ${\gamma(Q_d) \leq 2d}$. Thus ${C_d}$ is the (linear) shadow of a much simpler polytope in a higher dimension.

[image credit: Fiorini, Rothvoss, and Tiwary]

A polytope ${Q}$ is called a lift of the polytope ${P}$ if ${P}$ is the image of ${Q}$ under a linear projection. Again, from an optimization stand point, lifts are important: If we can optimize linear functionals over ${Q}$, then we can optimize linear functionals over ${P}$. Define now ${\bar \gamma(P)}$ to be the minimal value of ${\gamma(Q)}$ over all lifts ${Q}$ of ${P}$. (The value ${\bar \gamma(P)}$ is sometimes called the (linear) extension complexity of ${P}$.)

1.3. The permutahedron

Here is a somewhat more interesting family of examples where lifts reduce complexity. The permutahedron ${\Pi_n \subseteq \mathbb R^n}$ is the convex hull of the vectors ${(i_1, i_2, \ldots, i_n)}$ where ${\{i_1, \ldots, i_n\} = \{1,\ldots,n \}}$. It is known that ${\gamma(\Pi_n)=2^n-2}$.

Let ${B_n \subseteq \mathbb R^{n^2}}$ denote the convex hull of the ${n \times n}$ permutation matrices. The Birkhoff-von Neumann theorem tells us that ${B_n}$ is precisely the set of doubly stochastic matrices, thus ${\gamma(B_n)=n^2}$ (corresponding to the non-negativity constraints on each entry).

Observe that ${\Pi_n}$ is the linear image of ${B_n}$ under the map ${A \mapsto A (1, 2, \ldots, n)^T}$, i.e. we multiply a matrix ${A \in B_n}$ on the right by the column vector ${(1, 2, \ldots, n)}$. Thus ${B_n}$ is a lift of ${\Pi_n}$, and we conclude that ${\bar \gamma(\Pi_n) \leq n^2 \ll \gamma(\Pi_n)}$.

1.4. The cut polytope

If ${P \neq NP}$, there are certain combinatorial polytopes we should not be able to optimize over efficiently. A central example is the cut polytope: ${\mathrm{CUT}_n \subseteq \mathbb R^{n^2}}$ is the convex hull of all matrices of the form ${(A_S)_{ij} = |\mathbf{1}_S(i)-\mathbf{1}_S(j)|}$ for some subset ${S \subseteq \{1,\ldots,n\}}$. Here, ${\mathbf{1}_S}$ denotes the characteristic function of ${S}$.

Note that the MAX-CUT problem on a graph ${G=(V,E)}$ can be encoded in the following way: Let ${W_{ij} = 1}$ if ${\{i,j\} \in E}$ and ${W_{ij}=0}$ otherwise. Then the value of the maximum cut in ${G}$ is precisely the maximum of ${\langle W, A\rangle}$ for ${A \in \mathrm{CUT}_n}$. Accordingly, we should expect that ${\bar \gamma(\mathrm{CUT}_n)}$ cannot be bounded by any polynomial in ${n}$ (lest we violate a basic tenet of complexity theory).

1.5. Non-negative matrix factorization

The key to understanding ${\bar \gamma(\mathrm{CUT}_n)}$ comes from Yannakakis’ factorization theorem.

Consider a polytope ${P \subseteq \mathbb R^d}$ and let us write in two ways: As a convex hull of vertices

$\displaystyle P = \mathrm{conv}\left(x_1, x_2, \ldots, x_n\right)\,,$

and as an intersection of half-spaces: For some ${A \in \mathbb R^{m \times d}}$ and ${b \in \mathbb R^m}$,

$\displaystyle P = \left\{ x \in \mathbb R^d : Ax \leq b \right\}\,.$

Given this pair of representations, we can define the corresponding slack matrix of ${P}$ by

$\displaystyle S_{ij} = b_i - \langle A_i, x_j \rangle \qquad i \in \{1,2,\ldots,m\}, j \in \{1,2,\ldots,n\}\,.$

Here, ${A_1, \ldots, A_m}$ denote the rows of ${A}$.

We need one more definition. In what follows, we will use ${\mathbb R_+ = [0,\infty)}$. If we have a non-negative matrix ${M \in \mathbb R_+^{m \times n}}$, then a rank-${r}$ non-negative factorization of ${M}$ is a factorization ${M = AB}$ where ${A \in \mathbb R_+^{m \times r}}$ and ${B \in \mathbb R_+^{r \times n}}$. We then define the non-negative rank of ${M}$, written ${\mathrm{rank}_+(M)}$, to be the smallest ${r}$ such that ${M}$ admits a rank-${r}$ non-negative factorization.

Theorem 1 (Yannakakis) For every polytope ${P}$, it holds that ${\bar \gamma(P) = \mathrm{rank}_+(S)}$ for any slack matrix ${S}$ of ${P}$.

The key fact underlying this theorem is Farkas’ Lemma. It asserts that if ${P = \{ x \in \mathbb R^d : Ax \leq b \}}$, then every valid linear inequality over ${P}$ can be written as a non-negative combination of the defining inequalities ${\langle A_i, x \rangle \leq b_i}$.

There is an interesting connection here to proof systems. The theorem says that we can interpret ${\bar \gamma(P)}$ as the minimum number of axioms so that every valid linear inequality for ${P}$ can be proved using a conic (i.e., non-negative) combination of the axioms.

1.6. Slack matrices and the correlation polytope

Thus to prove a lower bound on ${\bar \gamma(\mathrm{CUT}_n)}$, it suffices to find a valid set of linear inequalities for ${\mathrm{CUT}_n}$ and prove a lower bound on the non-negative rank of the corresponding slack matrix.

Toward this end, consider the correlation polytope ${\mathrm{CORR}_n \subseteq \mathbb R^{n^2}}$ given by

$\displaystyle \mathrm{CORR}_n = \mathrm{conv}\left(\left\{x x^T : x \in \{0,1\}^n \right\}\right)\,.$

It is an exercise to see that ${\mathrm{CUT}_{n+1}}$ and ${\mathrm{CORR}_n}$ are linearly isomorphic.

Now we identify a particularly interesting family of valid linear inequalities for ${\mathrm{CORR}_n}$. (In fact, it turns out that this will also be an exhaustive list.) A quadratic multi-linear function on ${\mathbb R^n}$ is a function ${f : \mathbb R^n \rightarrow \mathbb R}$ of the form

$\displaystyle f(x) = a_0 + \sum_i a_{ii} x_i + \sum_{i < j} a_{ij} x_i x_j\,,$

for some real numbers ${a_0}$ and ${\{a_{ij}\}}$.

Suppose ${f}$ is a quadratic multi-linear function that is also non-negative on ${\{0,1\}^n}$. Then “${f(x) \geq 0 \,\,\forall x \in \{0,1\}^n}$” can be encoded as a valid linear inequality on ${\mathrm{CORR}_n}$. To see this, write ${f(x)=\langle A,xx^T\rangle + a_0}$ where ${A=(a_{ij})}$. (Note that ${\langle \cdot,\cdot\rangle}$ is intended to be the standard inner product on ${\mathbb R^{n^2}}$.)

Now let ${\textrm{QML}^+_n}$ be the set of all quadratic multi-linear functions that are non-negative on ${\{0,1\}^n}$, and consider the matrix (represented here as a function) ${\mathcal M_n : \textrm{QML}^+_n \times \{0,1\}^n \rightarrow \mathbb R_+}$ given by

$\displaystyle \mathcal M_n(f,x) = f(x)\,.$

Then from the above discussion, ${\mathcal M_n}$ is a valid sub-matrix of some slack matrix of ${\mathrm{CORR}_n}$. To summarize, we have the following theorem.

Theorem 2 For all ${n \geq 1}$, it holds that ${\bar \gamma(\mathrm{CUT}_{n+1}) \geq \mathrm{rank}_+(\mathcal M_n)}$.

It is actually the case that ${\bar \gamma(\mathrm{CUT}_{n+1}) = \mathrm{rank}_+(\mathcal M_n)}$. The next post will focus on providing a lower bound on ${\mathrm{rank}_+(\mathcal M_n)}$.