In order to prove that the chaining argument is tight, we will need some additional properties of Gaussian processes. For the chaining upper bound, we used a series of union bounds specified by a tree structure. As a first step in producing a good lower bound, we will look at a way in which the union bound is tight.
The claim is an elementary calculation for a sequence of i.i.d. random variables (i.e. ). We will reduce the general case to this one using Slepian’s comparison lemma.
There is a fairly elementary proof of Slepian’s Lemma (see, e.g. the Ledoux-Talagrand book), if one is satisfied with the weaker conclusion , which suffices for our purposes.
To see that Lemma 2 yields Theorem 1, take a family with for all and consider the associated variables where is a family of i.i.d. random variables. It is straightforward to verify that (1) holds, hence by the lemma, , and the result follows from the i.i.d. case.
The Sudakov inequality gives us “one level” of a lower bound; the following strengthening will allow us to use it recursively. If we have a Gaussian process and , we will use the notation
For and , we also use the notation
Here is the main theorem of this post; its statement is all we will require for our proof of the majorizing measures theorem:
The proof of the preceding theorem relies on the a strong concentration property for Gaussian processes. First, we recall the classical isoperimetric inequality for Gaussian space (see, for instance, (2.9) here).
We remind the reader that for a function ,
Using this, we can prove the following remarkable fact.
Proof: We will prove it in the case , but of course our bound is independent of . The idea is that given a Gaussian process , we can write
for , where are standard i.i.d. normals, and the matrix is a matrix of real coefficients. In this case, if is a standard -dimensional Gaussian, then the vector is distributed as .
But is just the maximum norm of any row of , and the norm of row is
Using this theorem, we are ready to prove Theorem 3. I will only give a sketch here, but filling in the details is not too difficult.
Assume that the conditions of Theorem 3 hold. Pick an arbitrary , and recall that we can write
since our gaussians are centered.
Now, by Theorem 1,
Suppose that achieves this. By definition,
The problem, of course, is that the events we are discussing are not independent.
This is where Theorem 5 comes in. For any , all the variances of the variables are bounded by . This implies that we can choose a constant such that
So, in fact, we can expect that none of the random variables will deviate from its expected value by more than . Which means we can (morally) replace (4) by
But now by choosing , the error term is absorbed.