In order to prove that the chaining argument is tight, we will need some additional properties of Gaussian processes. For the chaining upper bound, we used a series of union bounds specified by a tree structure. As a first step in producing a good lower bound, we will look at a way in which the union bound is tight.
Theorem 1 (Sudakov inequality) For some constant , the following holds. Let be a Gaussian process such that for every distinct , we have . Then,
The claim is an elementary calculation for a sequence of i.i.d. random variables (i.e. ). We will reduce the general case to this one using Slepian’s comparison lemma.
Lemma 2 (Slepian’s Lemma) Let and be two Gaussian processes such that for all ,
There is a fairly elementary proof of Slepian’s Lemma (see, e.g. the Ledoux-Talagrand book), if one is satisfied with the weaker conclusion , which suffices for our purposes.
To see that Lemma 2 yields Theorem 1, take a family with for all and consider the associated variables where is a family of i.i.d. random variables. It is straightforward to verify that (1) holds, hence by the lemma, , and the result follows from the i.i.d. case.
The Sudakov inequality gives us “one level” of a lower bound; the following strengthening will allow us to use it recursively. If we have a Gaussian process and , we will use the notation
For and , we also use the notation
Here is the main theorem of this post; its statement is all we will require for our proof of the majorizing measures theorem:
Theorem 3 For some constants and , the following holds. Suppose is a Gaussian process, and let be such that for . Then,
The proof of the preceding theorem relies on the a strong concentration property for Gaussian processes. First, we recall the classical isoperimetric inequality for Gaussian space (see, for instance, (2.9) here).
We remind the reader that for a function ,
Theorem 4 Let , and let , where is the standard -dimensional Gaussian measure. Then,
Using this, we can prove the following remarkable fact.
Theorem 5 Let be a Gaussian process, then
A notable aspect of this statement is that only the maximum variance affects the concentration, not the number of random variables. We now prove Theorem 5 using Theorem 4.
Proof: We will prove it in the case , but of course our bound is independent of . The idea is that given a Gaussian process , we can write
for , where are standard i.i.d. normals, and the matrix is a matrix of real coefficients. In this case, if is a standard -dimensional Gaussian, then the vector is distributed as .
If we put , then Theorem 4 yields (3) as long as . It is easy to see that
But is just the maximum norm of any row of , and the norm of row is
Using this theorem, we are ready to prove Theorem 3. I will only give a sketch here, but filling in the details is not too difficult.
Assume that the conditions of Theorem 3 hold. Pick an arbitrary , and recall that we can write
since our gaussians are centered.
Now, by Theorem 1,
Suppose that achieves this. By definition,
so we could hope that for some , we simultaneously have , yielding
The problem, of course, is that the events we are discussing are not independent.
This is where Theorem 5 comes in. For any , all the variances of the variables are bounded by . This implies that we can choose a constant such that
So, in fact, we can expect that none of the random variables will deviate from its expected value by more than . Which means we can (morally) replace (4) by
But now by choosing , the error term is absorbed.