In the last post, we began proving that a random sign matrix almost surely satisfies . In other words, almost surely for every , we have
The proof follows roughly the lines of the proof that a uniformly random matrix with entries is almost surely the check matrix of a good linear code over . In other words, with high probability, there does not exist a non-zero vector with hamming weight , and which satisfies (this equation is considered over ). The proof of this is simple: Let be the rows of . If , then Therefore, for any non-zero , we have
(**)
On the other hand, for small enough, there are fewer than non-zero vectors of hamming weight at most , so a union bound finishes the argument.
The proof that proceeds along similar lines, except that we can no longer take a naive union bound since there are now infinitely many “bad” vectors. Of course the solution is to take a sufficiently fine discretization of the set of bad vectors. This proceeds in three steps:
- If is large, and is close to , then is also large.
- Any fixed vector with has large with high probability. This is the analog of (**) in the coding setting.
- There exists a small set of vectors that well-approximates the set of bad vectors.
Combining (1), (2), and (3) we will conclude that every bad vector has large (and, in particular, is not contained in the kernel).
To verify (1), we need to show that almost surely the operator norm is small.
From the last post, we have the following two statements.
Lemma 1: If and is chosen uniformly at random, then
The next theorem (see the proof from the last post) verifies (1) above.
Theorem 1: If is a random sign matrix, then
.
Now we turn to proving (2), which is the analog of (**). We need to upper bound the small ball probability, i.e. the probability that falls into a small ball (around 0) for any fixed . To start, we would like to say that for some fixed with , and a uniformly random sign vector , we have for example,
These kinds of estimates fall under the name of Littlewood-Offord type problems; see the work of Tao and Vu and of Rudelson and Vershynin for a detailed study of such random sums, and the beautiful connections with additive combinatorics. For the above estimate, we will need only the simple Paley-Zygmund inequality:
Lemma 2: If is a non-negative random variable and , then
Proof: By Cauchy-Schwarz,
Now observe that and rearrange.
We can use this to prove our estimate on random sums.
Lemma 3: For any , if is chosen uniformly at random, then for every ,
Proof: Apply Lemma 2 with . Note that by independence, we have . Furthermore, for any non-negative random variable and , we have , thus
where the final inequality follows from Lemma 1. It is easy to see that the final bound is at most .
Observing that , where are the rows of yields:
Corollary 1: For any ,
Proof: Apply Lemma 3 with , say, and then use a Chernoff bound to show that the conclusion of the Lemma holds for a constant fraction of the trials with overwhelming probability.
Using Theorem 1 and Corollary 1, we can verify (2):
Theorem 2: For any , we have
Proof: By Theorem 1 and Corollary 1, we may assume that and . Let be the closest point to . Then , so
.
But since , we must have .
Let and be the unit balls of the and norms, respectively, in . Property (3) above follows from the well-known fact that can be covered by at most copies of (see, e.g. Pisier’s book). We use this to finish the proof.
Theorem 3: There exists a constant such that almost surely the following holds. Every satisfies
In other words, .
Proof: Choose small enough so that , and let be such that
.
By a union bound, we can assume that , and for every ,
Now suppose there exists an with . We may assume that and . But then , hence there exists a such that . It follows that
Choosing , we have . But then (4) implies , contradicting .
In the next post, we’ll see how such subspaces are related to error-correcting codes over the reals.
just a minor comment — in Lemma 2, all you need is that EZ >= 0, not that Z is non-negative necessarily; you can apply the Chebyshev-Cantellin inequality then (which will also lead to a slightly smaller denominator, but maybe Paley-Zygmund also leads to it).
Hi James,
Just came across this collection – it’s really nice. I hope you do keep up posting these. This is the kind of blog that’s actually worth reading.
I’m not sure I follow
dist(x’, ker(A)) >= dist(x,ker(A)) – |A||x-x’|
but it doesn’t matter much.
It seems that the equation you need (and use) is that
|x-x’| >= |A(x-x’)|/|A|
BTW I saw in your (great) powerpoint slides that you mentioned something that Milman conjectures is impossible. I hope you expand on that, and whether he conjectures an information theoretic lower bound, or that it’s hard to come up with an explicit construciton.
Boaz
Boaz, thanks; you are right that the line
dist(x’, ker(A)) >= dist(x,ker(A)) – |A||x-x’|
doesn’t make any sense, and also that the proof doesn’t use it. I have changed the description of steps (1)-(3) to reflect what actually goes on in the proof.
About Milman’s “conjecture” — it was made informally at a talk he gave last year; I believe he intended to guess that there is no completely explicit construction, but it’s not clear what that means. E.g. he would not be happy with an arbitrary deterministic polynomial time algorithm. Note that no one knows how to compute efficiently. In fact, no one knows whether ? is in NP, or even how to give witnesses for . Or even how to generate random ‘s that have a witness with high probability.
James, you forgot a sqrt(N) in the statement of theorem 3.