In order to prove that the chaining argument is tight, we will need some additional properties of Gaussian processes. For the chaining upper bound, we used a series of union bounds specified by a tree structure. As a first step in producing a good lower bound, we will look at a way in which the union bound is tight.
The claim is an elementary calculation for a sequence of i.i.d. random variables (i.e. ). We will reduce the general case to this one using Slepian’s comparison lemma.
There is a fairly elementary proof of Slepian’s Lemma (see, e.g. the Ledoux-Talagrand book), if one is satisfied with the weaker conclusion , which suffices for our purposes.
To see that Lemma 2 yields Theorem 1, take a family with for all and consider the associated variables where is a family of i.i.d. random variables. It is straightforward to verify that (1) holds, hence by the lemma, , and the result follows from the i.i.d. case.
The Sudakov inequality gives us “one level” of a lower bound; the following strengthening will allow us to use it recursively. If we have a Gaussian process and , we will use the notation
For and , we also use the notation
Here is the main theorem of this post; its statement is all we will require for our proof of the majorizing measures theorem:
The proof of the preceding theorem relies on the a strong concentration property for Gaussian processes. First, we recall the classical isoperimetric inequality for Gaussian space (see, for instance, (2.9) here).
We remind the reader that for a function ,
Using this, we can prove the following remarkable fact.
Proof: We will prove it in the case , but of course our bound is independent of . The idea is that given a Gaussian process , we can write
for , where are standard i.i.d. normals, and the matrix is a matrix of real coefficients. In this case, if is a standard -dimensional Gaussian, then the vector is distributed as .
But is just the maximum norm of any row of , and the norm of row is
Using this theorem, we are ready to prove Theorem 3. I will only give a sketch here, but filling in the details is not too difficult.
Assume that the conditions of Theorem 3 hold. Pick an arbitrary , and recall that we can write
since our gaussians are centered.
Now, by Theorem 1,
Suppose that achieves this. By definition,
This is where Theorem 5 comes in. For any , all the variances of the variables are bounded by . This implies that we can choose a constant such that
So, in fact, we can expect that none of the random variables will deviate from its expected value by more than . Which means we can (morally) replace (4) by
But now by choosing , the error term is absorbed.