Entropy optimality: Bloom’s Chang’s Lemma

[Added in post: One might consult this post for a simpler/more general (and slightly more correct) version of the results presented here.]

In our last post regarding Chang’s Lemma, let us visit a version due to Thomas Bloom. We will offer a new proof using entropy maximization. In particular, we will again use only boundedness of the Fourier characters.

There are two new (and elementary) techniques here: (1) using a trade-off between entropy maximization and accuracy and (2) truncating the Taylor expansion of {e^x} .

We use the notation from our previous post: {G=\mathbb F_p^n} for some prime {p} and {\mu} is the uniform measure on {G} . For {\eta > 0}  and {f \in L^2(G)} , we define {\Delta_{\eta}(f) = \{ \alpha \in G : |\hat f(\alpha)| \geq \eta \|f\|_1 \}} . We also use {Q_{\mu} \subseteq L^2(G)} to denote the set of all densities with respect to {\mu} .

Theorem 1 (Bloom) There is a constant {c > 0} such that for every {\eta > 0} and every density {f \in Q_{\mu}} , there is a subset {\Delta \subseteq \Delta_{\eta}(f) \subseteq G} such that {|\Delta| \geq c \eta |\Delta_{\eta}(f)|} and {\Delta} is contained in some {\mathbb F_p} subspace of dimension at most

\displaystyle  \frac{1+\mathrm{Ent}_{\mu}(f)}{c\eta}\,.

Note that we only bound the dimension of a subset of the large spectrum, but the bound on the dimension improves by a factor of {1/\eta} . Bloom uses this as the key step in his proof of what (at the time of writing) constitutes the best asymptotic bounds in Roth’s theorem on three-term arithmetic progressions:

Theorem 2 If a subset {A \subseteq \{1,2,\ldots,N\}} contains no non-trivial three-term arithmetic progressions, then

\displaystyle  |A| \leq O(1) \frac{\left(\log \log N\right)^4}{\log N} N\,.

This represents a modest improvement over the breakthrough of Sanders achieving {\frac{(\log \log N)^{O(1)}}{\log N} N} , but the proof is somewhat different.

1.1. A stronger version

In fact, we will prove a stronger theorem.

Theorem 3 For every {\eta > 0} and every density {f \in Q_{\mu}} , there is a random subset {\Delta \subseteq G} such that almost surely

\displaystyle  \dim_{\mathbb F_p}(\mathrm{span}\, \Delta) \leq 12 \frac{\mathrm{Ent}_{\mu}(f)}{\eta} + O(\log (1/\eta))\,,

and for every {\alpha \in \Delta_{\eta}(f)} , it holds that

\displaystyle  \mathbb P[\alpha \in \Delta] \geq \frac{\eta}{4}\,.

This clearly yields Theorem 1 by averaging.

1.2. The same polytope

To prove Theorem 3, we use the same polytope we saw before. Recall the class of test functionals {\mathcal F = \{ \pm \mathrm{Re}\,\chi_\alpha,  \pm \mathrm{Im}\,\chi_\alpha : \alpha \in G\} \subseteq L^2(G) }

We defined {P(f,\eta) \subseteq L^2(G)} by

\displaystyle  P(f,\eta) = \left\{ g \in L^2(G) : \langle g, \varphi \rangle \geq \langle f,\varphi\rangle - \eta\quad\forall \varphi \in \mathcal F\right\}\,.

Let us consider a slightly different convex optimization:

\displaystyle  \textrm{minimize } \mathrm{Ent}_{\mu}(g)+K \mathbf{\varepsilon} \qquad \textrm{ subject to } g \in P(f,\mathbf{\varepsilon}) \cap Q_{\mu}\,. \ \ \ \ \ (1)

Here, {K} is a constant that we will set soon. On the other hand, {\varepsilon \geq 0} is now intended as an additional variable over which to optimize. We allow the optimization to trade off the entropy term and the accuracy {\varepsilon} . The constant {K > 0} represents how much we value one vs. the other.

Notice that, since {f \in P(f,0) \cap Q_{\mu}} , this convex program satisfies Slater’s condition (there is a feasible point in the relative interior), meaning that strong duality holds (see Section 5.2.3).

1.3. The optimal solution

As in our first post on this topic, we can set the gradient of the Lagrangian equal to zero to obtain the form of the optimal solution: For some dual variables {\{\lambda^*_{\varphi} \geq 0: \varphi \in \mathcal F\} \subseteq \mathbb R} ,

\displaystyle  g^*(x) = \frac{\exp\left(\sum_{\varphi \in \mathcal F} \lambda^*_{\varphi} \varphi(x)\right)}{\mathop{\mathbb E}_{\mu} \left[\exp\left(\sum_{\varphi \in \mathcal F} \lambda^*_{\varphi} \varphi\right)\right]} \ \ \ \ \ (2)

Furthermore, corresponding to our new variable {\varepsilon} , there is a new constraint on the dual variables:

\displaystyle  \sum_{\varphi \in \mathcal F} \lambda^*_{\varphi} \leq K\,.

Observe now that if we put {K = 2 \frac{\mathrm{Ent}_{\mu}(f)}{\eta}} then we can bound {\varepsilon^*} (the error in the optimal solution): Since {f} is a feasible solution with {\varepsilon=0} , we have

\displaystyle  \mathrm{Ent}_{\mu}(g^*) + K \varepsilon^* \leq \mathrm{Ent}_{\mu}(f)\,,

which implies that {\varepsilon^* \leq \frac{\mathrm{Ent}_{\mu}(f)}{K} = \frac{\eta}{2}} since {\mathrm{Ent}_{\mu}(g^*) \geq 0} .

To summarize: By setting {K} appropriately, we obtain {g^* \in P(f,\eta/2) \cap Q_{\mu}} of the form (2) and such that

\displaystyle  \sum_{\varphi \in \mathcal F} \lambda_{\varphi}^* \leq 2\frac{\mathrm{Ent}_{\mu}(f)}{\eta}\,. \ \ \ \ \ (3)

Note that one can arrive at the same conclusion using the algorithm from our previous post: The version unconcerned with sparsity finds a feasible point after time {T \leq \frac{\mathrm{Ent}_{\mu}(f)}{\varepsilon}} . Setting {\varepsilon = \eta/2} yields the same result without using duality.

1.4. A Taylor expansion

Let us slightly rewrite {g^*} by multiplying the numerator and denominator by {\exp\left({\sum_{\varphi \in \mathcal F} \lambda^*_{\varphi}}\right)} . This yields:

\displaystyle  g^* = \frac{\exp\left(\sum_{\varphi \in \mathcal F} \lambda^*_{\varphi} (1+\varphi)\right)}{\mathop{\mathbb E}_{\mu} \exp\left(\sum_{\varphi \in \mathcal F} \lambda^*_{\varphi} (1+\varphi)\right)}

The point of this transformation is that now the exponent is a sum of positive terms (using {\|\varphi\|_{\infty} \leq 1} ), and furthermore by (3), the exponent is always bounded by

\displaystyle  B \stackrel{\mathrm{def}}{=} 4 \frac{\mathrm{Ent}_{\mu}(f)}{\eta}\,. \ \ \ \ \ (4)

Let us now Taylor expand {e^x = \sum_{j \geq 0} \frac{x^j}{j!}} . Applying this to the numerator, we arrive at an expression

\displaystyle  g^* = \sum_{\vec \alpha} y_{\vec \alpha} T_{\vec \alpha}

where {y_{\vec \alpha} \geq 0} , {\sum_{\vec \alpha} y_{\vec \alpha} = 1} , and each {T_{\vec \alpha} \in Q_{\mu}} is a density. Here, {\vec \alpha} ranges over all finite sequences of elements from {\mathcal F} and

\displaystyle  T_{\vec \alpha} = \frac{\prod_{i=1}^{|\vec \alpha|} (1+\varphi_{\vec \alpha_i})}{\mathop{\mathbb E}_{\mu} \prod_{i=1}^{|\vec \alpha|} (1+\varphi_{\vec \alpha_i})}=\prod_{i=1}^{|\vec \alpha|} (1+\varphi_{\vec \alpha_i})\,,

where we use {|\vec \alpha|} to denote the length of the sequence {\vec \alpha} .

1.5. The random subset

We now define a random function {\mathbf{T} \in L^2(G)} by taking {\mathbf{T}=T_{\vec \alpha}} with probability {y_{\vec \alpha}} .

Consider some {\gamma \in \Delta_{\eta}(f)} . Since {g^* \in P(f,\eta/2)} , we know that {\gamma \in \Delta_{\eta/2}(g^*)} . Thus

\displaystyle  \frac{\eta}{2} < |\langle g^*, \chi_{\gamma}\rangle| \leq \sum_{\vec \alpha} y_{\vec \alpha} |\langle T_{\vec \alpha}, \chi_{\gamma}\rangle| = \mathop{\mathbb E}\,|\langle \mathbf{T},\chi_{\gamma}\rangle|\,.

But we also have {|\langle T_{\vec \alpha}, \chi_{\gamma}\rangle| \leq \|T_{\vec \alpha}\|_1 \cdot \|\chi_{\gamma}\|_{\infty} \leq 1} . This implies that {\mathbb P[|\langle \mathbf{T}, \chi_{\gamma}\rangle| > 0] > \eta/2} .

Equivalently, for any {\gamma \in \Delta_{\eta}(f)} , it holds that {\mathbb P[\gamma \in \Delta_0(\mathbf{T})] > \eta/2} . We would be done with the proof of Theorem 3 if we also knew that {\mathbf{T}} were supported on functions {T_{\vec \alpha}} for which {|\vec \alpha| \leq O(B)} because {\dim_{\mathbb F_p}(\mathrm{span}(\Delta_0(T_{\vec \alpha}))) \leq |\vec \alpha|} . This is not necessarily true, but we can simply truncate the Taylor expansion to ensure it.

1.6. Truncation

Let {p_k(x) = \sum_{j \leq k} \frac{x^j}{j!}} denote the Taylor expansion of {e^x} to degree {k} . Since the exponent in {g^*} is always bounded by {B} (recall (4)), we have

\displaystyle  \sum_{|\vec \alpha| > k} y_{\vec \alpha} \leq \sup_{x \in [0,B]} \frac{|e^x - p_k(x)|}{e^x} \leq \frac{B^{k+1}}{(k+1)!}\,.

By standard estimates, we can choose {k \leq 3 B + O(\log(1/\eta))} to make the latter quantity at most {\eta/4} .

Since {\sum_{|\vec \alpha| > k} y_{\vec \alpha} \leq \eta/4} , a union bound combined with our previous argument immediately implies that for {\gamma \in \Delta_{\eta}(f)} , we have

\displaystyle  \mathbb P\left[|\langle \mathbf{T}, \chi_{\gamma}\rangle| > 0 \textrm{ and } \dim_{\mathbb F_p}(\mathrm{span}(\Delta_0(\mathbf{T}))) \leq k \right] \geq \frac{\eta}{4}\,.

This completes the proof of Theorem 3.

1.7. Prologue: A structure theorem

Generalizing the preceding argument a bit, one can prove the following.

Let {G} be a finite abelian group and use {\hat G} to denote the dual group. Let {\mu} denote the uniform measure on {G} . For every {\gamma \in \hat G} , let {\chi_{\gamma}} denote the corresponding character. Let us define a degree-{k} Reisz product to be a function of the form

\displaystyle  R(x) = \prod_{i=1}^k (1+\varepsilon_{i} \Lambda_i \chi_{\gamma_i}(x))

for some {\gamma_1, \ldots, \gamma_k \in \hat G} and {\varepsilon_1, \ldots, \varepsilon_k \in \{-1,1\}} and {\Lambda_1, \ldots, \Lambda_k \in \{\mathrm{Re},\mathrm{Im}\}} .

Theorem 4 For every {\eta > 0} , the following holds. For every {f : G \rightarrow \mathbb R_+} with {\mathbb E_{\mu} [f]=1} , there exists a {g : G \rightarrow \mathbb R_+} with {\mathbb E_{\mu} [g] = 1} such that {\|\hat f - \hat g\|_{\infty} \leq \eta} and {g} is a convex combination of degree-{k} Reisz products where

\displaystyle  k \leq O(1) \frac{\mathrm{Ent}_{\mu}(f)}{\eta} + O(\log (1/\eta))\,.

1.8. A prologue’s prologue

To indicate the lack of algebraic structure required for the preceding statement, we can set things up in somewhat greater generality.

For simplicity, let {X} be a finite set equipped with a probability measure {\mu} . Recall that {L^2(X)} is the Hilbert space of real-valued functions on {X} equipped with the inner product {\langle f,g\rangle = \mathop{\mathbb E}_{\mu}[fg]} . Let {\mathcal F \subseteq L^2(X)} be a set of functionals with the property that {\|\varphi\|_{\infty} \leq 1} for {\varphi \in \mathcal F} .

Define a degree-{k} {\mathcal F} -Riesz product as a function of the form

\displaystyle  R(x) = \prod_{i=1}^k (1+\varphi_i(x))

for some functions {\varphi_1, \ldots, \varphi_k \in \mathcal F} . Define also the (semi-) norm {\|f\|_{\mathcal F} = \sup_{\varphi \in \mathcal F} |\langle f,\varphi\rangle_{L^2(X)}|} .

Theorem 5 For every {\eta > 0} , the following holds. For every {f : X \rightarrow \mathbb R_+} with {\mathbb E_{\mu} [f]=1} , there exists a {g : X \rightarrow \mathbb R_+} with {\mathbb E_{\mu} [g] = 1} such that {\|f - g\|_{\mathcal F} \leq \eta} and {g} is a convex combination of degree-{k} {\mathcal F} -Riesz products where

\displaystyle  k \leq O(1) \frac{\mathrm{Ent}_{\mu}(f)}{\eta} + O(\log (1/\eta))\,.