[Added in post: One might consult this post for a simpler/more general (and slightly more correct) version of the results presented here.]

In our last post regarding Chang’s Lemma, let us visit a version due to Thomas Bloom. We will offer a new proof using entropy maximization. In particular, we will again use only boundedness of the Fourier characters.

There are two new (and elementary) techniques here: (1) using a trade-off between entropy maximization and accuracy and (2) truncating the Taylor expansion of .

We use the notation from our previous post: for some prime and is the uniform measure on . For and , we define . We also use to denote the set of all densities with respect to .

Theorem 1 (Bloom)There is a constant such that for every and every density , there is a subset such that and is contained in some subspace of dimension at most

Note that we only bound the dimension of a subset of the large spectrum, but the bound on the dimension improves by a factor of . Bloom uses this as the key step in his proof of what (at the time of writing) constitutes the best asymptotic bounds in Roth’s theorem on three-term arithmetic progressions:

Theorem 2If a subset contains no non-trivial three-term arithmetic progressions, then

This represents a modest improvement over the breakthrough of Sanders achieving , but the proof is somewhat different.

** 1.1. A stronger version **

In fact, we will prove a stronger theorem.

Theorem 3For every and every density , there is a random subset such that almost surelyand for every , it holds that

This clearly yields Theorem 1 by averaging.

** 1.2. The same polytope **

To prove Theorem 3, we use the same polytope we saw before. Recall the class of test functionals

We defined by

Let us consider a slightly different convex optimization:

Here, is a constant that we will set soon. On the other hand, is now intended as an *additional variable* over which to optimize. We allow the optimization to trade off the entropy term and the accuracy . The constant represents how much we value one vs. the other.

Notice that, since , this convex program satisfies Slater’s condition (there is a feasible point in the relative interior), meaning that strong duality holds (see Section 5.2.3).

** 1.3. The optimal solution **

As in our first post on this topic, we can set the gradient of the Lagrangian equal to zero to obtain the form of the optimal solution: For some dual variables ,

Furthermore, corresponding to our new variable , there is a new constraint on the dual variables:

Observe now that if we put then we can bound (the error in the optimal solution): Since is a feasible solution with , we have

which implies that since .

To summarize: By setting appropriately, we obtain of the form (2) and such that

Note that one can arrive at the same conclusion using the algorithm from our previous post: The version unconcerned with sparsity finds a feasible point after time . Setting yields the same result without using duality.

** 1.4. A Taylor expansion **

Let us slightly rewrite by multiplying the numerator and denominator by . This yields:

The point of this transformation is that now the exponent is a sum of positive terms (using ), and furthermore by (3), the exponent is always bounded by

Let us now Taylor expand . Applying this to the numerator, we arrive at an expression

where , , and each is a density. Here, ranges over all finite sequences of elements from and

where we use to denote the length of the sequence .

** 1.5. The random subset **

We now define a random function by taking with probability .

Consider some . Since , we know that . Thus

But we also have . This implies that .

Equivalently, for any , it holds that . We would be done with the proof of Theorem 3 if we also knew that were supported on functions for which because . This is not necessarily true, but we can simply truncate the Taylor expansion to ensure it.

** 1.6. Truncation **

Let denote the Taylor expansion of to degree . Since the exponent in is always bounded by (recall (4)), we have

By standard estimates, we can choose to make the latter quantity at most .

Since , a union bound combined with our previous argument immediately implies that for , we have

This completes the proof of Theorem 3.

** 1.7. Prologue: A structure theorem **

Generalizing the preceding argument a bit, one can prove the following.

Let be a finite abelian group and use to denote the dual group. Let denote the uniform measure on . For every , let denote the corresponding character. Let us define a *degree- Reisz product* to be a function of the form

for some and and .

Theorem 4For every , the following holds. For every with , there exists a with such that and is a convex combination of degree- Reisz products where

** 1.8. A prologue’s prologue **

To indicate the lack of algebraic structure required for the preceding statement, we can set things up in somewhat greater generality.

For simplicity, let be a finite set equipped with a probability measure . Recall that is the Hilbert space of real-valued functions on equipped with the inner product . Let be a set of functionals with the property that for .

Define a *degree- -Riesz product* as a function of the form

for some functions . Define also the (semi-) norm .

Theorem 5For every , the following holds. For every with , there exists a with such that and is a convex combination of degree- -Riesz products where