In the last post, we began proving that a random sign matrix
almost surely satisfies
. In other words, almost surely for every
, we have
The proof follows roughly the lines of the proof that a uniformly random matrix
with
entries is almost surely the check matrix of a good linear code over
. In other words, with high probability, there does not exist a non-zero vector
with hamming weight
, and which satisfies
(this equation is considered over
). The proof of this is simple: Let
be the rows of
. If
, then
Therefore, for any non-zero
, we have
(**)
On the other hand, for small enough, there are fewer than
non-zero vectors of hamming weight at most
, so a union bound finishes the argument.
The proof that proceeds along similar lines, except that we can no longer take a naive union bound since there are now infinitely many “bad” vectors. Of course the solution is to take a sufficiently fine discretization of the set of bad vectors. This proceeds in three steps:
- If
is large, and
is close to
, then
is also large.
- Any fixed vector
with
has
large with high probability. This is the analog of (**) in the coding setting.
- There exists a small set of vectors that well-approximates the set of bad vectors.
Combining (1), (2), and (3) we will conclude that every bad vector has
large (and, in particular,
is not contained in the kernel).
To verify (1), we need to show that almost surely the operator norm is small.
From the last post, we have the following two statements.
Lemma 1: If and
is chosen uniformly at random, then
The next theorem (see the proof from the last post) verifies (1) above.
Theorem 1: If is a random
sign matrix, then
.
Now we turn to proving (2), which is the analog of (**). We need to upper bound the small ball probability, i.e. the probability that falls into a small ball (around 0) for any fixed
. To start, we would like to say that for some fixed
with
, and a uniformly random sign vector
, we have for example,
These kinds of estimates fall under the name of Littlewood-Offord type problems; see the work of Tao and Vu and of Rudelson and Vershynin for a detailed study of such random sums, and the beautiful connections with additive combinatorics. For the above estimate, we will need only the simple Paley-Zygmund inequality:
Lemma 2: If is a non-negative random variable and
, then
Proof: By Cauchy-Schwarz,
Now observe that and rearrange.
We can use this to prove our estimate on random sums.
Lemma 3: For any , if
is chosen uniformly at random, then for every
,
Proof: Apply Lemma 2 with . Note that by independence, we have
. Furthermore, for any non-negative random variable
and
, we have
, thus
where the final inequality follows from Lemma 1. It is easy to see that the final bound is at most .
Observing that , where
are the rows of
yields:
Corollary 1: For any ,
Proof: Apply Lemma 3 with , say, and then use a Chernoff bound to show that the conclusion of the Lemma holds for a constant fraction of the trials
with overwhelming probability.
Using Theorem 1 and Corollary 1, we can verify (2):
Theorem 2: For any , we have
Proof: By Theorem 1 and Corollary 1, we may assume that and
. Let
be the closest point to
. Then
, so
.
But since , we must have
.
Let and
be the unit balls of the
and
norms, respectively, in
. Property (3) above follows from the well-known fact that
can be covered by at most
copies of
(see, e.g. Pisier’s book). We use this to finish the proof.
Theorem 3: There exists a constant such that almost surely the following holds. Every
satisfies
In other words, .
Proof: Choose small enough so that
, and let
be such that
.
By a union bound, we can assume that , and for every
,
Now suppose there exists an with
. We may assume that
and
. But then
, hence there exists a
such that
. It follows that
Choosing , we have
. But then (4) implies
, contradicting
.
In the next post, we’ll see how such subspaces are related to error-correcting codes over the reals.
just a minor comment — in Lemma 2, all you need is that EZ >= 0, not that Z is non-negative necessarily; you can apply the Chebyshev-Cantellin inequality then (which will also lead to a slightly smaller denominator, but maybe Paley-Zygmund also leads to it).
Hi James,
Just came across this collection – it’s really nice. I hope you do keep up posting these. This is the kind of blog that’s actually worth reading.
I’m not sure I follow
dist(x’, ker(A)) >= dist(x,ker(A)) – |A||x-x’|
but it doesn’t matter much.
It seems that the equation you need (and use) is that
|x-x’| >= |A(x-x’)|/|A|
BTW I saw in your (great) powerpoint slides that you mentioned something that Milman conjectures is impossible. I hope you expand on that, and whether he conjectures an information theoretic lower bound, or that it’s hard to come up with an explicit construciton.
Boaz
Boaz, thanks; you are right that the line
dist(x’, ker(A)) >= dist(x,ker(A)) – |A||x-x’|
doesn’t make any sense, and also that the proof doesn’t use it. I have changed the description of steps (1)-(3) to reflect what actually goes on in the proof.
About Milman’s “conjecture” — it was made informally at a talk he gave last year; I believe he intended to guess that there is no completely explicit construction, but it’s not clear what that means. E.g. he would not be happy with an arbitrary deterministic polynomial time algorithm. Note that no one knows how to compute
efficiently. In fact, no one knows whether
? is in NP, or even how to give witnesses for
. Or even how to generate random
‘s that have a witness with high probability.
James, you forgot a sqrt(N) in the statement of theorem 3.