Entropy optimality: Analytic psd rank and John’s theorem

Recall that our goal is to sketch a proof of the following theorem, where the notation is from the last post. I will assume a knowledge of the three posts on polyhedral lifts and non-negative rank, as our argument will proceed by analogy.

Theorem 1 For every {m \geq 1} and {g : \{0,1\}^m \rightarrow \mathbb R_+} , there exists a constant {C(g)} such that the following holds. For every {n \geq 2m} ,

\displaystyle  1+n^{d/2} \geq \mathrm{rank}_{\mathsf{psd}}(M_n^g) \geq C \left(\frac{n}{\log n}\right)^{(d-1)/2}\,. \ \ \ \ \ (1)

where {d = \deg_{\mathsf{sos}}(g).}

In this post, we will see how John’s theorem can be used to transform a psd factorization into one of a nicer analytic form. Using this, we will be able to construct a convex body that contains an approximation to every non-negative matrix of small psd rank.

1.1. Finite-dimensional operator norms

Let {H} denote a finite-dimensional Euclidean space over {\mathbb R} equipped with inner product {\langle \cdot,\cdot\rangle} and norm {|\cdot|} . For a linear operator {A : H \rightarrow H} , we define the operator, trace, and Frobenius norms by

\displaystyle  \|A\| = \max_{x \neq 0} \frac{|Ax|}{x},\quad \|A\|_* = \mathrm{Tr}(\sqrt{A^T A}),\quad \|A\|_F = \sqrt{\mathrm{Tr}(A^T A)}\,.

Let {\mathcal M(H)} denote the set of self-adjoint linear operators on {H} . Note that for {A \in \mathcal M(H)} , the preceding three norms are precisely the {\ell_{\infty}} , {\ell_1} , and {\ell_2} norms of the eigenvalues of {A} . For {A,B \in \mathcal M(H)} , we use {A \succeq 0} to denote that {A} is positive semi-definite and {A \succeq B} for {A-B \succeq 0} . We use {\mathcal D(H) \subseteq \mathcal M(H)} for the set of density operators: Those {A \in \mathcal M(H)} with {A \succeq 0} and {\mathrm{Tr}(A)=1} .

One should recall that {\mathrm{Tr}(A^T B)} is an inner product on the space of linear operators, and we have the operator analogs of the Hölder inequalities: {\mathrm{Tr}(A^T B) \leq \|A\| \cdot \|B\|_*} and {\mathrm{Tr}(A^T B) \leq \|A\|_F \|B\|_F} .

1.2. Rescaling the psd factorization

As in the case of non-negative rank, consider finite sets {X} and {Y} and a matrix {M : X \times Y \rightarrow \mathbb R_+} . For the purposes of proving a lower bound on the psd rank of some matrix, we would like to have a nice analytic description.

To that end, suppose we have a rank-{r} psd factorization

\displaystyle  M(x,y) = \mathrm{Tr}(A(x) B(y))

where {A : X \rightarrow \mathcal S_+^r} and {B : Y \rightarrow \mathcal S_+^r} . The following result of Briët, Dadush and Pokutta (2013) gives us a way to “scale” the factorization so that it becomes nicer analytically. (The improved bound stated here is from an article of Fawzi, Gouveia, Parrilo, Robinson, and Thomas, and we follow their proof.)

Lemma 2 Every {M} with {\mathrm{rank}_{\mathsf{psd}}(M) \leq r} admits a factorization {M(x,y)=\mathrm{Tr}(P(x) Q(y))} where {P : X \rightarrow \mathcal S_+^r} and {Q : Y \rightarrow \mathcal S_+^r} and, moreover,

\displaystyle  \max \{ \|P(x)\| \cdot \|Q(y)\| : x \in X, y \in Y \} \leq r \|M\|_{\infty}\,,

where {\|M\|_{\infty} = \max_{x \in X, y \in Y} M(x,y)} .

Proof: Start with a rank-{r} psd factorization {M(x,y) = \mathrm{Tr}(A(x) B(y))} . Observe that there is a degree of freedom here, because for any invertible operator {J} , we get another psd factorization {M(x,y) = \mathrm{Tr}\left(\left(J A(x) J^T\right) \cdot \left((J^{-1})^T B(y) J^{-1}\right)\right)} .

Let {U = \{ u \in \mathbb R^r : \exists x \in X\,\, A(x) \succeq uu^T \}} and {V = \{ v \in \mathbb R^r : \exists y \in X\,\, B(y) \succeq vv^T \}} . Set {\Delta = \|M\|_{\infty}} . We may assume that {U} and {V} both span {\mathbb R^r} (else we can obtain a lower-rank psd factorization). Both sets are bounded by finiteness of {X} and {Y} .

Let {C=\mathrm{conv}(U)} and note that {C} is centrally symmetric and contains the origin. Now John’s theorem tells us there exists a linear operator {J : \mathbb R^r \rightarrow \mathbb R^r} such that

\displaystyle  B_{\ell_2} \subseteq J C \subseteq \sqrt{r} B_{\ell_2}\,, \ \ \ \ \ (2)

where {B_{\ell_2}} denotes the unit ball in the Euclidean norm. Let us now set {P(x) = J A(x) J^T} and {Q(y) = (J^{-1})^T B(y) J^{-1}} .

Eigenvalues of {P(x)} : Let {w} be an eigenvector of {P(x)} normalized so the corresponding eigenvalue is {\|w\|_2^2} . Then {P(x) \succeq w w^T} , implying that {J^{-1} w \in U} (here we use that {A \succeq 0 \implies S A S^T \succeq 0} for any {S} ). Since {w = J(J^{-1} w)} , (2) implies that {\|w\|_2 \leq \sqrt{r}} . We conclude that every eigenvalue of {P(x)} is at most {r} .

Eigenvalues of {Q(y)} : Let {w} be an eigenvector of {Q(y)} normalized so that the corresponding eigenvalue is {\|w\|_2^2} . Then as before, we have {Q(y) \succeq ww^T} and this implies {J^T w \in V} . Now, on the one hand we have

\displaystyle  \max_{z \in JC}\, \langle z,w\rangle \geq \|w\|_2 \ \ \ \ \ (3)

since {JC \supseteq B_{\ell_2}} .

On the other hand:

\displaystyle  \max_{z \in JC}\, \langle z,w\rangle^2 = \max_{z \in C}\, \langle Jz, w\rangle^2 = \max_{z \in C}\, \langle z, J^T w\rangle^2\,. \ \ \ \ \ (4)

Finally, observe that for any {u \in U} and {v \in V} , we have

\displaystyle  \langle u,v\rangle^2 =\langle uu^T, vv^T\rangle \leq \max_{x \in X, y \in Y} \langle A(x), B(y)\rangle \leq \Delta\,.

By convexity, this implies that {\max_{z \in C}\, \langle z,v\rangle^2 \leq \Delta} for all {v \in V} , bounding the right-hand side of (4) by {\Delta} . Combining this with (3) yields {\|w\|_2^2 \leq \Delta} . We conclude that all the eigenvalues of {Q(y)} are at most {\Delta} . \Box

1.3. Convex proxy for psd rank

Again, in analogy with the non-negative rank setting, we can define an “analytic psd rank” parameter for matrices {N : X \times Y \rightarrow \mathbb R_+} :

\displaystyle  \alpha_{\mathsf{psd}}(N) = \min \Big\{ \alpha \mid \exists A : X \rightarrow \mathcal S_+^k, B : Y \rightarrow \mathcal S_+^k\,,

\displaystyle  \hphantom{xx} \mathop{\mathbb E}_{x \in X}[A(x)]=I,

\displaystyle  \hphantom{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx} \|B(y)\| \leq \frac{\alpha}{k}\, \mathop{\mathbb E}_{y \in Y}[\mathrm{Tr}(B(y))] \quad \forall y \in Y

\displaystyle  \hphantom{\qquad\qquad} \|A(x)\| \leq \alpha \quad \forall x \in X\Big\}\,.

Note that we have implicit equipped {X} and {Y} with the uniform measure. The main point here is that {k} can be arbitrary. One can verify that {\alpha_{\mathsf{psd}}} is convex.

And there is a corresponding approximation lemma. We use {\|N\|_{\infty}=\max_{x,y} |N(x,y)|} and {\|N\|_1 = \mathop{\mathbb E}_{x,y} |N(x,y)|} .

Lemma 3 For every non-negative matrix {M : X \times Y \rightarrow \mathbb R_+} and every {\eta \in (0,1]} , there is a matrix {N} such that {\|M-N\|_{\infty} \leq \eta \|M\|_{\infty}} and

\displaystyle  \alpha_{\mathsf{psd}}(N) \leq O(\mathrm{rank}_{\mathsf{psd}}(M)) \frac{1}{\eta} \frac{\|M\|_{\infty}}{\|M\|_1}\,.

Using Lemma 2 in a straightforward way, it is not particularly difficult to construct the approximator {N} . The condition {\mathop{\mathbb E}_x [A(x)] = I} poses a slight difficulty that requires adding a small multiple of the identity to the LHS of the factorization (to avoid a poor condition number), but this has a correspondingly small effect on the approximation quality. Putting “Alice” into “isotropic position” is not essential, but it makes the next part of the approach (quantum entropy optimization) somewhat simpler because one is always measuring relative entropy to the maximally mixed state.