Notes for tomorrow’s lecture on Approximation by ultrametrics are now up. (Yes, notes from the preceding two lectures still coming.)
(The combination of power and simplicity in this lecture is pretty incredible.)
Notes for tomorrow’s lecture on Approximation by ultrametrics are now up. (Yes, notes from the preceding two lectures still coming.)
(The combination of power and simplicity in this lecture is pretty incredible.)
Notes for the third lecture Metrical task systems on a weighted star are up.
Notes for the second lecture of our course Competitive analysis via convex optimization are up: Navigating a convex body online.
After not generating any new posts for a long time, I am happy to have finally switched platforms. The new site is hosted on gitpages at tcsmath.github.io. While the domain name tcsmath.org should point there, I am trying to work around well-known https certificate issues that cause Chrome to have a nervous breakdown.
The motivation for switching platforms is a course I’ve just started co-teaching with Seb Bubeck on Competitive analysis via convex optimization. The notes for the first lecture are live: Regret minimization and competitive analysis.
1. Construction of Föllmer’s drift
In a previous post, we saw how an entropy-optimal drift process could be used to prove the Brascamp-Lieb inequalities. Our main tool was a result of Föllmer that we now recall and justify. Afterward, we will use it to prove the Gaussian log-Sobolev inequality.
Consider with
, where
is the standard Gaussian measure on
. Let
denote an
-dimensional Brownian motion with
. We consider all processes of the form
where is a progressively measurable drift and such that
has law
.
Theorem 1 (Föllmer) It holds that
where the minima are over all processes of the form (1).
Proof: In the preceding post (Lemma 2), we have already seen that for any drift of the form (1), it holds that
thus we need only exhibit a drift achieving equality.
We define
where is the Brownian semigroup defined by
As we saw in the previous post (Lemma 2), the chain rule yields
We are left to show that has law
and
.
We will prove the first fact using Girsanov’s theorem to argue about the change of measure between and
. As in the previous post, we will argue somewhat informally using the heuristic that the law of
is a Gaussian random variable in
with covariance
. Itô’s formula states that this heuristic is justified (see our use of the formula below).
The following lemma says that, given any sample path of our process up to time
, the probability that Brownian motion (without drift) would have “done the same thing” is
.
Remark 1 I chose to present various steps in the next proof at varying levels of formality. The arguments have the same structure as corresponding formal proofs, but I thought (perhaps naïvely) that this would be instructive.
Lemma 2 Let
denote the law of
. If we define
then under the measure
given by
the process
has the same law as
.
Proof: We argue by analogy with the discrete proof. First, let us define the infinitesimal “transition kernel” of Brownian motion using our heuristic that has covariance
:
We can also compute the (time-inhomogeneous) transition kernel of
:
Here we are using that and
is deterministic conditioned on the past, thus the law of
is a normal with mean
and covariance
.
To avoid confusion of derivatives, let’s use for the density of
and
for the density of Brownian motion (recall that these are densities on paths). Now let us relate the density
to the density
. We use here the notations
to denote a (non-random) sample path of
:
where the last line uses .
Now by “heuristic” induction, we can assume , yielding
In the last line, we used the fact that is the infinitesimal transition kernel for Brownian motion.
From Lemma 2, it will follow that has the law
where
is the law of
. In particular,
has the law
which was our first goal.
Given our preceding less formal arguments, let us use a proper stochastic calculus argument to establish (3). To do that we need a way to calculate
Notice that this involves both time and space derivatives.
Itô’s lemma. Suppose we have a continuously differentiable function that we write as
where
is a space variable and
is a time variable. We can expand
via its Taylor series:
Normally we could eliminate the terms , etc. since they are lower order as
. But recall that for Brownian motion we have the heuristic
. Thus we cannot eliminate the second-order space derivative if we plan to plug in
(or
, a process driven by Brownian motion). Itô’s lemma says that this consideration alone gives us the correct result:
This generalizes in a straightforward way to the higher dimensional setting .
With Itô’s lemma in hand, let us continue to calculate the derivative
For the time derivative (the first term), we have employed the heat equation
where is the Laplacian on
.
Note that the heat equation was already contained in our “infinitesimal density” in the proof of Lemma 2, or in the representation
, and Itô’s lemma was also contained in our heuristic that
has covariance
.
Using Itô’s formula again yields
giving our desired conclusion (3).
Our final task is to establish optimality: . We apply the formula (3):
where we used . Combined with (2), this completes the proof of the theorem.
2. The Gaussian log-Sobolev inequality
Consider again a measurable with
. Let us define
. Then the classical log-Sobolev inequality in Gaussian space asserts that
First, we discuss the correct way to interpret this. Define the Ornstein-Uhlenbeck semi-group by its action
This is the natural stationary diffusion process on Gaussian space. For every measurable , we have
or equivalently
The log-Sobolev inequality yields quantitative convergence in the relative entropy distance as follows: Define the Fisher information
One can check that
thus the Fisher information describes the instantaneous decay of the relative entropy of under diffusion.
So we can rewrite the log-Sobolev inequality as:
This expresses the intuitive fact that when the relative entropy is large, its rate of decay toward equilibrium is faster.
Martingale property of the optimal drift. Now for the proof of (5). Let be the entropy-optimal process with
. We need one more fact about
: The optimal drift is a martingale, i.e.
for
.
Let’s give two arguments to support this.
Argument one: Brownian bridges. First, note that by the chain rule for relative entropy, we have:
But from optimality, we know that the latter expectation is zero. Therefore -almost surely, we have
This implies that if we condition on the endpoint , then
is a Brownian bridge (i.e., a Brownian motion conditioned to start at
and end at
).
This implies that , as one can check that a Brownian bridge
with endpoint
is described by the drift process
, and
That seemed complicated. There is a simpler way to see this: Given and any bridge
from
to
, every “permutation” of the infinitesimal steps in
has the same law (by commutativity, they all land at
). Thus the marginal law of
at every point
should be the same. In particular,
Argument two: Change of measure. There is a more succinct (though perhaps more opaque) way to see that is a martingale. Note that the process
is a Doob martingale. But we have
and we also know that
is precisely the change of measure that makes
into Brownian motion.
Proof of the log-Sobolev inequality. In any case, now we are ready for the proof of (5). It also comes straight from Lehec’s paper. Since is a martingale, we have
. So by Theorem 1:
The latter quantity is . In the last equality, we used the fact that
is precisely the change of measure that turns
into Brownian motion.
After using it in the last post on non-positively curved surfaces, I thought it might be nice to give a simple proof of the Okamura-Seymour theorem in the dual setting. This argument arose out of conversations in 2007 with Amit Chakrabarti and former UW undergrad Justin Vincent. I was later informed by Yuri Rabinovich that he and Ilan Newman discovered a similar proof.
Note that establishing a node-capacitated version of the Okamura-Seymour theorem was an open question of Chekuri and Kawarabayashi. Resolving it positively is somewhat more difficult.
Theorem 1 Given a weighted planar graph
and a face
of
, there is a non-expansive embedding
such that
is an isometry.
By rational approximation and subdivision of edges, we may assume that is unweighted. The following proof is constructive, and provides an explicit sequence of cuts on
whose characteristic functions form the coordinates of the embedding
. Each such cut is obtained by applying the following lemma. Note that for a subset
of vertices, we use the notation
, for the graph obtained from
by contracting the edges across
Lemma 2 (Face-preserving cut lemma) Let
be a
-connected bipartite planar graph and
be the boundary of a face of
. There exists a cut
such that
Proof: Fix a plane embedding of that makes
the boundary of the outer face. Since
is
-connected,
is a cycle, and since
is bipartite and has no parallel edges,
.
Consider an arbitrary pair of distinct vertices on
. There is a unique path from
to
that runs along
counterclockwise; call this path
. Consider a path
and a vertex
. We say that
lies below
if, in the plane embedding of
,
lies in the closed subset of the plane bounded by
and
.
(Note that the direction of is significant in the definition of “lying below,” i.e., belowness with respect to a path in
is not the same as belowness with respect to the reverse of the same path in
.)
We say that lies strictly below
if
lies below
and
. We use this notion of “lying below” to define a partial order on the paths in
: for
we say that
is lower than
if every vertex in
lies below
.
We now fix the pair and a path
so that the following properties hold:
Note that a suitable pair exists because
. Finally, we define the cut
as follows:
does not lie strictly below
.
For the rest of this section, we fix the pair , the path
and the cut
as defined in the above proof.
Lemma 1 Let
and
be distinct vertices on
and
be such that
. Then
.
Proof: If the lemma holds trivially, so assume
. Also assume without loss of generality that
precedes
in the path
. The conditions on
imply that all vertices in
lie strictly below
. Therefore, the path
is lower than
and distinct from
By property (3), we have , which implies
. Since
is bipartite, the cycle formed by
and
must have even length; therefore,
.
Proof: If there is nothing to prove. If not, we can write
for some where, for all
,
and
. Let
and
denote the endpoints of
with
preceding
in the path
. Further, define
and
.
By Lemma 1, we have for
. Since
is a shortest path, we have
for
. Therefore
The latter quantity is precisely which completes the proof since
.
Proof of Claim 1: Let be arbitrary and distinct. It is clear that
, so it suffices to prove the opposite inequality. We begin by observing that
Let be a path in
that achieves the minimum in the above expression. First, suppose
. Then we must have
. Now,
, which implies
and we are done.
Next, suppose . Then, there exists at least one vertex in
that lies on
. Let
be the first such vertex and
the last (according to the ordering in
) and assume that
precedes
in the path
. Let
. Note that
may be trivial, because we may have
. Now,
, whence
This gives
where the first line follows from Eq. (1) and the definition of and the third line is obtained by applying Lemma 2 to the path
. If at least one of
lies in
, then
and we are done.
Therefore, suppose . Let
. For a path
and vertices
on
, let us use
as shorthand for
. By property (1), we have
and since
is bipartite, this means
. By property (2), we have
and
. Using these facts, we now derive
Using this in (**) above and noting that , we get
. This completes the proof.
Proof of Theorem 1: Assume that is
-connected. We may also assume that
is bipartite. To see why, note that subdividing every edge of
by introducing one new vertex per edge leaves the metric
essentially unchanged except for a scaling factor of
.
We shall now prove the stronger statement that for every face of
there exists a sequence of cuts
of
such that for all
on
, we have
and that for
,
. We prove this by induction on
.
The result is trivial in the degenerate case when is a single edge. For any larger
and any cut
, the graph
is either a single edge or is
-connected. Furthermore, contracting a cut preserves the parities of the lengths of all closed walks; therefore
is also bipartite.
Apply the face-preserving cut lemma (Lemma 1) to obtain a cut . By the above observations, we can apply the induction hypothesis to
to obtain cuts
of
corresponding to the image of
in
. Each cut
induces a cut
of
. Clearly
for any
. Finally, for any
, we have
where the first equality follows from the property of and the second follows from the induction hypothesis. This proves the theorem.
Isometric embedding of non-positively curved surfaces into
As I have mentioned before, one of my favorite questions is whether the shortest-path metric on a planar graph embeds into with
distortion. This is equivalent to such graphs having an
-approximate multi-flow/min-cut theorem. We know that the distortion has to be at least 2. By a simple discretization and compactness argument, this is equivalent to the question of whether every simply-connected surface admits a bi-Lipschitz embedding into
.
In a paper of Tasos Sidiropoulos, it is proved that every simply-connected surface of non-positive curvature admits a bi-Lipschitz embedding into . A followup work of Chalopin, Chepoi, and Naves shows that actually such a surface admits an isometric emedding into
. In this post, we present a simple proof of this result that was observed in conversations with Tasos a few years ago—it follows rather quickly from the most classical theorem in this setting, the Okamura-Seymour theorem.
Suppose that is a geodesic metric space (i.e. the distance between any pair of points
is realized by a geodesic whose length is
). One says that
has non-positive curvature (in the sense of Busemann) if for any pair of geodesics
and
, the map
given by
is convex.
Theorem 1 Suppose that
is homeomorphic to
and is endowed with a geodesic metric
such that
has non-positive curvature. Then
embeds isometrically in
.
We will access the non-positive curvature property through the following fact. We refer to the book Metric spaces of non-positive curvature.
Lemma 2 Every geodesic
in
can be extended to a bi-infinite geodesic line.
Proof of Theorem 1: By a standard compactness argument, it suffices to construct an isometric embedding for a finite subset . Let
denote the convex hull of
. (A set
is convex if for every
we have
for every geodesic
connecting
to
.)
It is an exercise to show that the boundary of is composed of a finite number of geodesics
between points
. For every pair
, let
denote a geodesic line containing
and
which exists by Lemma 2. Consider the collection of sets
, and let
denote the set of intersection points between geodesics in
. Since
is a collection of geodesics, and all geodesics intersect at most once (an easy consequence of non-positive curvature), the set
is finite.
Consider finally the set . The geodesics in
naturally endow
with the structure of a planar graph
, where two vertices
are adjacent if they lie on a subset of some geodesic in
and the portion
between
and
does not contain any other points of
. Note that
is the outer face of
in the natural drawing, where
is the boundary of
(the union of the geodesics in
).
We can put a path metric on this graph by defining the length of an edge as
. Let
denote the induced shortest-path metric on the resulting graph. By construction, we have the following two properties.
Both properties follow from our construction using the lines .
Now let us state the geometric (dual) version of the Okamura-Seymour theorem.
Theorem 3 (Okamura-Seymour dual version) For every planar graph
and face
, there is a
-Lispchitz mapping
such that
is an isometry.
Let us apply this theorem to our graph and face
. Consider
and
. By property (1) above, we have
. Since
, from Theorem 3, we have
. But property (2) above says that
and
lie on a
–
shortest-path
in
. Since
is
-Lipschitz, we conclude that it maps the whole path
isometrically, thus
, showing that
is an isometry, and completing the proof.