1. Construction of Föllmer’s drift
In a previous post, we saw how an entropy-optimal drift process could be used to prove the Brascamp-Lieb inequalities. Our main tool was a result of Föllmer that we now recall and justify. Afterward, we will use it to prove the Gaussian log-Sobolev inequality.
Consider
with
, where
is the standard Gaussian measure on
. Let
denote an
-dimensional Brownian motion with
. We consider all processes of the form

where
is a progressively measurable drift and such that
has law
.
Theorem 1 (Föllmer) It holds that
![\displaystyle D(f d\gamma_n \,\|\, d\gamma_n) = \min D(W_{[0,1]} \,\|\, B_{[0,1]}) = \min \frac12 \int_0^1 \mathop{\mathbb E}\,\|v_t\|^2\,dt\,,](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle++D%28f+d%5Cgamma_n+%5C%2C%5C%7C%5C%2C+d%5Cgamma_n%29+%3D+%5Cmin+D%28W_%7B%5B0%2C1%5D%7D+%5C%2C%5C%7C%5C%2C+B_%7B%5B0%2C1%5D%7D%29+%3D+%5Cmin+%5Cfrac12+%5Cint_0%5E1+%5Cmathop%7B%5Cmathbb+E%7D%5C%2C%5C%7Cv_t%5C%7C%5E2%5C%2Cdt%5C%2C%2C++&bg=FFFFFF&fg=000000&s=0&c=20201002)
where the minima are over all processes of the form (1).
Proof: In the preceding post (Lemma 2), we have already seen that for any drift of the form (1), it holds that
![\displaystyle D(f d\gamma_n \,\|\,d\gamma_n) \leq \frac12 \int_0^1 \mathop{\mathbb E}\,\|v_t\|^2\,dt = D(W_{[0,1]} \,\|\, B_{[0,1]})\,,](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle++D%28f+d%5Cgamma_n+%5C%2C%5C%7C%5C%2Cd%5Cgamma_n%29+%5Cleq+%5Cfrac12+%5Cint_0%5E1+%5Cmathop%7B%5Cmathbb+E%7D%5C%2C%5C%7Cv_t%5C%7C%5E2%5C%2Cdt+%3D+D%28W_%7B%5B0%2C1%5D%7D+%5C%2C%5C%7C%5C%2C+B_%7B%5B0%2C1%5D%7D%29%5C%2C%2C++&bg=FFFFFF&fg=000000&s=0&c=20201002)
thus we need only exhibit a drift
achieving equality.
We define

where
is the Brownian semigroup defined by
![\displaystyle P_t f(x) = \mathop{\mathbb E}[f(x + B_t)]\,.](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle++P_t+f%28x%29+%3D+%5Cmathop%7B%5Cmathbb+E%7D%5Bf%28x+%2B+B_t%29%5D%5C%2C.++&bg=FFFFFF&fg=000000&s=0&c=20201002)
As we saw in the previous post (Lemma 2), the chain rule yields
![\displaystyle D(W_{[0,1]} \,\|\, B_{[0,1]}) = \frac12 \int_0^1 \mathop{\mathbb E}\,\|v_t\|^2\,dt\,. \ \ \ \ \ (2)](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle++D%28W_%7B%5B0%2C1%5D%7D+%5C%2C%5C%7C%5C%2C+B_%7B%5B0%2C1%5D%7D%29+%3D+%5Cfrac12+%5Cint_0%5E1+%5Cmathop%7B%5Cmathbb+E%7D%5C%2C%5C%7Cv_t%5C%7C%5E2%5C%2Cdt%5C%2C.+%5C+%5C+%5C+%5C+%5C+%282%29+&bg=FFFFFF&fg=000000&s=0&c=20201002)
We are left to show that
has law
and
.
We will prove the first fact using Girsanov’s theorem to argue about the change of measure between
and
. As in the previous post, we will argue somewhat informally using the heuristic that the law of
is a Gaussian random variable in
with covariance
. Itô’s formula states that this heuristic is justified (see our use of the formula below).
The following lemma says that, given any sample path
of our process up to time
, the probability that Brownian motion (without drift) would have “done the same thing” is
.
Remark 1 I chose to present various steps in the next proof at varying levels of formality. The arguments have the same structure as corresponding formal proofs, but I thought (perhaps naïvely) that this would be instructive.
Lemma 2 Let
denote the law of
. If we define

then under the measure
given by

the process
has the same law as
.
Proof: We argue by analogy with the discrete proof. First, let us define the infinitesimal “transition kernel” of Brownian motion using our heuristic that
has covariance
:

We can also compute the (time-inhomogeneous) transition kernel
of
:

Here we are using that
and
is deterministic conditioned on the past, thus the law of
is a normal with mean
and covariance
.
To avoid confusion of derivatives, let’s use
for the density of
and
for the density of Brownian motion (recall that these are densities on paths). Now let us relate the density
to the density
. We use here the notations
to denote a (non-random) sample path of
:
![\displaystyle \begin{array}{lll} \alpha_{t+dt}(\hat W_{[0,t+dt]}) &= \alpha_t(\hat W_{[0,t]}) q_t(\hat W_t, \hat W_{t+dt}) \\ &= \alpha_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) e^{-\frac12 \|\hat v_t\|^2\,dt-\langle \hat v_t,\hat W_t-\hat W_{t+dt}\rangle} \\ &= \alpha_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) e^{-\frac12 \|\hat v_t\|^2\,dt+\langle \hat v_t,d \hat W_t\rangle} \\ &= \alpha_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) e^{\frac12 \|\hat v_t\|^2\,dt+\langle \hat v_t, d \hat B_t\rangle}\,, \end{array}](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%5Cbegin%7Barray%7D%7Blll%7D++%5Calpha_%7Bt%2Bdt%7D%28%5Chat+W_%7B%5B0%2Ct%2Bdt%5D%7D%29+%26%3D+%5Calpha_t%28%5Chat+W_%7B%5B0%2Ct%5D%7D%29+q_t%28%5Chat+W_t%2C+%5Chat+W_%7Bt%2Bdt%7D%29+%5C%5C+%26%3D+%5Calpha_t%28%5Chat+W_%7B%5B0%2Ct%5D%7D%29+p%28%5Chat+W_t%2C+%5Chat+W_%7Bt%2Bdt%7D%29+e%5E%7B-%5Cfrac12+%5C%7C%5Chat+v_t%5C%7C%5E2%5C%2Cdt-%5Clangle+%5Chat+v_t%2C%5Chat+W_t-%5Chat+W_%7Bt%2Bdt%7D%5Crangle%7D+%5C%5C+%26%3D+%5Calpha_t%28%5Chat+W_%7B%5B0%2Ct%5D%7D%29+p%28%5Chat+W_t%2C+%5Chat+W_%7Bt%2Bdt%7D%29+e%5E%7B-%5Cfrac12+%5C%7C%5Chat+v_t%5C%7C%5E2%5C%2Cdt%2B%5Clangle+%5Chat+v_t%2Cd+%5Chat+W_t%5Crangle%7D+%5C%5C+%26%3D+%5Calpha_t%28%5Chat+W_%7B%5B0%2Ct%5D%7D%29+p%28%5Chat+W_t%2C+%5Chat+W_%7Bt%2Bdt%7D%29+e%5E%7B%5Cfrac12+%5C%7C%5Chat+v_t%5C%7C%5E2%5C%2Cdt%2B%5Clangle+%5Chat+v_t%2C+d+%5Chat+B_t%5Crangle%7D%5C%2C%2C+%5Cend%7Barray%7D++&bg=FFFFFF&fg=000000&s=0&c=20201002)
where the last line uses
.
Now by “heuristic” induction, we can assume
, yielding
![\displaystyle \begin{array}{lll} \alpha_{t+dt}(\hat W_{[0,t+dt]}) &= \frac{1}{M_t} \beta_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) e^{\frac12 \|\hat v_t\|^2\,dt+\langle \hat v_t, d \hat B_t\rangle} \\ &= \frac{1}{M_{t+dt}} \beta_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) \\ &= \frac{1}{M_{t+dt}} \beta_{t+dt}(\hat W_{[0,t+dt]})\,. \end{array}](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%5Cbegin%7Barray%7D%7Blll%7D++%5Calpha_%7Bt%2Bdt%7D%28%5Chat+W_%7B%5B0%2Ct%2Bdt%5D%7D%29+%26%3D+%5Cfrac%7B1%7D%7BM_t%7D+%5Cbeta_t%28%5Chat+W_%7B%5B0%2Ct%5D%7D%29+p%28%5Chat+W_t%2C+%5Chat+W_%7Bt%2Bdt%7D%29+e%5E%7B%5Cfrac12+%5C%7C%5Chat+v_t%5C%7C%5E2%5C%2Cdt%2B%5Clangle+%5Chat+v_t%2C+d+%5Chat+B_t%5Crangle%7D+%5C%5C+%26%3D+%5Cfrac%7B1%7D%7BM_%7Bt%2Bdt%7D%7D+%5Cbeta_t%28%5Chat+W_%7B%5B0%2Ct%5D%7D%29+p%28%5Chat+W_t%2C+%5Chat+W_%7Bt%2Bdt%7D%29+%5C%5C+%26%3D+%5Cfrac%7B1%7D%7BM_%7Bt%2Bdt%7D%7D+%5Cbeta_%7Bt%2Bdt%7D%28%5Chat+W_%7B%5B0%2Ct%2Bdt%5D%7D%29%5C%2C.+%5Cend%7Barray%7D++&bg=FFFFFF&fg=000000&s=0&c=20201002)
In the last line, we used the fact that
is the infinitesimal transition kernel for Brownian motion. 
Now we will show that

From Lemma 2, it will follow that
has the law
where
is the law of
. In particular,
has the law
which was our first goal.
Given our preceding less formal arguments, let us use a proper stochastic calculus argument to establish (3). To do that we need a way to calculate

Notice that this involves both time and space derivatives.
Itô’s lemma. Suppose we have a continuously differentiable function
that we write as
where
is a space variable and
is a time variable. We can expand
via its Taylor series:

Normally we could eliminate the terms
, etc. since they are lower order as
. But recall that for Brownian motion we have the heuristic
. Thus we cannot eliminate the second-order space derivative if we plan to plug in
(or
, a process driven by Brownian motion). Itô’s lemma says that this consideration alone gives us the correct result:

This generalizes in a straightforward way to the higher dimensional setting
.
With Itô’s lemma in hand, let us continue to calculate the derivative

For the time derivative (the first term), we have employed the heat equation

where
is the Laplacian on
.
Note that the heat equation was already contained in our “infinitesimal density”
in the proof of Lemma 2, or in the representation
, and Itô’s lemma was also contained in our heuristic that
has covariance
.
Using Itô’s formula again yields

giving our desired conclusion (3).
Our final task is to establish optimality:
. We apply the formula (3):
![\displaystyle D(W_1\,\|\,B_1) = \mathop{\mathbb E}[\log f(W_1)] = \mathop{\mathbb E}\left[\frac12 \int_0^1 \|v_t\|^2\,dt\right],](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle++D%28W_1%5C%2C%5C%7C%5C%2CB_1%29+%3D+%5Cmathop%7B%5Cmathbb+E%7D%5B%5Clog+f%28W_1%29%5D+%3D+%5Cmathop%7B%5Cmathbb+E%7D%5Cleft%5B%5Cfrac12+%5Cint_0%5E1+%5C%7Cv_t%5C%7C%5E2%5C%2Cdt%5Cright%5D%2C++&bg=FFFFFF&fg=000000&s=0&c=20201002)
where we used
. Combined with (2), this completes the proof of the theorem. 
2. The Gaussian log-Sobolev inequality
Consider again a measurable
with
. Let us define
. Then the classical log-Sobolev inequality in Gaussian space asserts that

First, we discuss the correct way to interpret this. Define the Ornstein-Uhlenbeck semi-group
by its action
![\displaystyle U_t f(x) = \mathop{\mathbb E}[f(e^{-t} x + \sqrt{1-e^{-2t}} B_1)]\,.](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle++U_t+f%28x%29+%3D+%5Cmathop%7B%5Cmathbb+E%7D%5Bf%28e%5E%7B-t%7D+x+%2B+%5Csqrt%7B1-e%5E%7B-2t%7D%7D+B_1%29%5D%5C%2C.++&bg=FFFFFF&fg=000000&s=0&c=20201002)
This is the natural stationary diffusion process on Gaussian space. For every measurable
, we have

or equivalently

The log-Sobolev inequality yields quantitative convergence in the relative entropy distance as follows: Define the Fisher information

One can check that

thus the Fisher information describes the instantaneous decay of the relative entropy of
under diffusion.
So we can rewrite the log-Sobolev inequality as:

This expresses the intuitive fact that when the relative entropy is large, its rate of decay toward equilibrium is faster.
Martingale property of the optimal drift. Now for the proof of (5). Let
be the entropy-optimal process with
. We need one more fact about
: The optimal drift is a martingale, i.e.
for
.
Let’s give two arguments to support this.
Argument one: Brownian bridges. First, note that by the chain rule for relative entropy, we have:
![\displaystyle D(W_{[0,1]} \,\|\, B_{[0,1]}) = D(W_1 \,\|\, B_1) + \int D(W_{[0,1]} \,\|\, B_{[0,1]} \mid W_1=B_1=x) f(x) d\gamma_n(x)\,.](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle++D%28W_%7B%5B0%2C1%5D%7D+%5C%2C%5C%7C%5C%2C+B_%7B%5B0%2C1%5D%7D%29+%3D+D%28W_1+%5C%2C%5C%7C%5C%2C+B_1%29+%2B+%5Cint+D%28W_%7B%5B0%2C1%5D%7D+%5C%2C%5C%7C%5C%2C+B_%7B%5B0%2C1%5D%7D+%5Cmid+W_1%3DB_1%3Dx%29+f%28x%29+d%5Cgamma_n%28x%29%5C%2C.++&bg=FFFFFF&fg=000000&s=0&c=20201002)
But from optimality, we know that the latter expectation is zero. Therefore
-almost surely, we have
![\displaystyle D(W_{[0,1]} \,\| B_{[0,1]} \mid W_1=B_1=x) = 0\,.](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle++D%28W_%7B%5B0%2C1%5D%7D+%5C%2C%5C%7C+B_%7B%5B0%2C1%5D%7D+%5Cmid+W_1%3DB_1%3Dx%29+%3D+0%5C%2C.++&bg=FFFFFF&fg=000000&s=0&c=20201002)
This implies that if we condition on the endpoint
, then
is a Brownian bridge (i.e., a Brownian motion conditioned to start at
and end at
).
This implies that
, as one can check that a Brownian bridge
with endpoint
is described by the drift process
, and
![\displaystyle \mathop{\mathbb E}\left[\frac{x-\hat B_t}{1-t} \,\Big|\, B_{[0,s]}\right] = \frac{x-\hat B_s}{1-s}\,.](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%5Cmathop%7B%5Cmathbb+E%7D%5Cleft%5B%5Cfrac%7Bx-%5Chat+B_t%7D%7B1-t%7D+%5C%2C%5CBig%7C%5C%2C+B_%7B%5B0%2Cs%5D%7D%5Cright%5D+%3D+%5Cfrac%7Bx-%5Chat+B_s%7D%7B1-s%7D%5C%2C.++&bg=FFFFFF&fg=000000&s=0&c=20201002)
That seemed complicated. There is a simpler way to see this: Given
and any bridge
from
to
, every “permutation” of the infinitesimal steps in
has the same law (by commutativity, they all land at
). Thus the marginal law of
at every point
should be the same. In particular,
![\displaystyle \mathop{\mathbb E}[v_t\,dt \mid v_s] = \mathop{\mathbb E}[dB_t + v_t\,dt \mid v_s] = \mathop{\mathbb E}[dB_s + v_s \,ds \mid v_s] = v_s\,ds\,.](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%5Cmathop%7B%5Cmathbb+E%7D%5Bv_t%5C%2Cdt+%5Cmid+v_s%5D+%3D+%5Cmathop%7B%5Cmathbb+E%7D%5BdB_t+%2B+v_t%5C%2Cdt+%5Cmid+v_s%5D+%3D+%5Cmathop%7B%5Cmathbb+E%7D%5BdB_s+%2B+v_s+%5C%2Cds+%5Cmid+v_s%5D+%3D+v_s%5C%2Cds%5C%2C.++&bg=FFFFFF&fg=000000&s=0&c=20201002)
Argument two: Change of measure. There is a more succinct (though perhaps more opaque) way to see that
is a martingale. Note that the process
is a Doob martingale. But we have
and we also know that
is precisely the change of measure that makes
into Brownian motion.
Proof of the log-Sobolev inequality. In any case, now we are ready for the proof of (5). It also comes straight from Lehec’s paper. Since
is a martingale, we have
. So by Theorem 1:

The latter quantity is
. In the last equality, we used the fact that
is precisely the change of measure that turns
into Brownian motion.