Additive Combinatorics By Terence Tao and Van Vu读书报告 (chapter 1)

四月 4, 2008 at 5:46 上午 | In Maths(数学) | Leave a Comment

Hoeffding‘s inequality: (exercise 1.3.5) Let X_1, \cdots, X_n be jointly independent random variables, taking finitely many values, with a_i \leq X_i \leq b_i for all i and some real numbers a_i< b_i. Let X:=X_1+\cdots+X_n. Using the exponential moment method, show that

\mathbf P (|X-\mathbf E(X)| \geq \lambda (\sum_{i=1}^n|b_i-a_i|^2)^{1/2} )\leq 2 e^{-2\lambda^2}

Proof.By subtracting some constant from each X_i we can assume that E (X_1)=0 for each i. Observe that \mathbf P(|X| \geq \varepsilon)= \mathbf P(|X| \geq \varepsilon)+ \mathbf P (X \leq -\varepsilon). So it suffice to show that \mathbf P(X \geq \varepsilon) \leq e^{-2\varepsilon ^2/ (\sum_{i=1}^n(b_i-a_i)^2)}. Since \mathbf P(X\geq \varepsilon)=\mathbf P(e^{tX} \geq e^{t\varepsilon})\leq e^{-t\varepsilon}\mathbf E(e^{tX}) and the fact that \mathbf E (e^{tX})= \mathbf E(e^{tX_1})\cdots \mathbf E(e^{tX_n}) from the joint independence of X_is. We then try to compute \mathbf E(tX_i). \mathbf E(e^{tX_i})= P_1e^{tX_{i,1}}+\cdots P_{i_k}e^{tX_{i,i_k}}, where X_{i,1},\cdots,X_{i,i_k} are the finite values that X_i could take.Let f(t):=\log (P_1e^{tX_{i,1}}+\cdots P_{i_k}e^{tX_{i,i_k}}), then one easily sees that f(0)=f'(0)=0. We have

f''(t)=\frac{(P_1X_{i,1}^2e^{tX_{i,1}}+\cdots P_{i_k}X_{i,i_k}^2e^{tX_{i,i_k}}) (P_1e^{tX_{i,1}}+\cdots P_{i_k}e^{tX_{i,i_k}})-(P_1X_{i,1}e^{tX_{i,1}}+\cdots P_{i_k}X_{i,i_k}e^{tX_{i,i_k}})^2}{(P_1e^{tX_{i,1}}+\cdots P_{i_k}e^{tX_{i,i_k}})} .

and thus

f''(t) \leq \frac{(b_i-a_i)^2}{4}

The last inequality is from the fact that P_1X_{i,1}^2e^{tX_{i,1}}+\cdots P_{i_k}X_{i,i_k}^2e^{tX_{i,i_k}}\leq (b-a)(P_1X_{i,1}e^{tX_{i,1}}+\cdots P_{i_k}X_{i,i_k}e^{tX_{i,i_k}}) and some basic computations. Thus we get e^{f(t)} = e^{f(0)+f'(0)t+\frac{f''(\xi)}{2}t^2} =e^{\frac{f''(\xi)}{2}t^2}\leq e^{\frac{(b_i-a_i)^2}{8}t^2}, so \mathbf E (e^{tX}) \leq e^{\frac{t^2\sum_{i=1}^n(b_i-a_i)^2}{8}} for all t>0. By Choosing t=\frac{4\varepsilon}{\sum_{i=1}^n(b_i-a_i)^2}, we finally have \mathbf P(X\geq \varepsilon) \leq e^{-t\varepsilon+\frac{t^2\sum_{i=1}^n(b_i-a_i)^2}{8}}=e^{-2\varepsilon ^2/ (\sum_{i=1}^n(b_i-a_i)^2)}, completing the proof. In fact “taking only finite many values” could be omitted, the proof is somehow a little more complicated. \square

我对不等式的重视程度现在有了变化。原来觉得这东西无聊,其实是没有意识到其重要性。华罗庚曾经说过:“轻视往往到最后就变成恐惧,比如有的人轻视计算, 最后就变成了害怕计算”。显然,不等式计算集中而繁复,但是同时,不等式往往可以带来十分重要的而深刻的结果。本书第一章中,不等式就是估计,是对概率的 界定,是整个概率方法的中心,Erd\ddot{o}s将这种方法引入离散数学,得到的很多结果都没办法在直观上体会的到,很多经典的结论至今也没有其他的“非概率”方法。本书显然是为算术组合介绍了足够的这方面的知识,而较全面的书还是Alon 和 Spencer 的“The probabilistic method” .

Azuma’s inequality: (exercise 1.3.6) Let X_1,\cdots,X_n be random variables taking finitely many values with |X_i|\leq 1 for all i. We do not assume that the X_i are jointly independent, however we do require that the X_i form a martingale difference sequence, by which we mean that \mathbf E(X_i|X_1=x_1,\cdots,X_{i_1}=x_{i_1})=0 for all 1\leq i\leq n and all x_1,\cdots,x_{i-1}. Using the exponential moment method, establish the large deviation inequality \mathbf P (|X_1+\cdots+X_n|\geq \lambda \sqrt{n})\leq 2e^{-\lambda^2/2}

Proof. It suffice to prove that \mathbf P ( X_1+\cdots+X_n \geq \lambda \sqrt{n})\leq e^{-\lambda^2/2}. Firstly, we can write e^{tX_i}\leq \frac{X_i-(-1)}{2}e^t+ \frac{1-X_i}{2}e^{-t}, so \mathbf E (e^{tX_i})=\frac{e^t+e^{-t}}{2}\leq e^{\frac{t^2}{2}}, then \mathbf E(e^{tX_i}|X_{i-1},\cdots,X_1)\leq e^{\frac{t^2}{2}}. Therefore,

\mathbf P(X_1+\cdots+X_n\geq \lambda \sqrt{n}) = \mathbf P(e^{tX_1+\cdots+X_n}\geq e^{t \lambda \sqrt{n}} )

\leq \mathbf E(e^{t(X_n+\cdots+X_1)})e^{-t\lambda \sqrt{n}}=\mathbf E (e^{t(X_1+\cdots+X_{n-1})}\mathbf E(e^{t X_n}|X_{n-1},\cdots,X_1))e^{ t\lambda\sqrt{n}}

\leq \mathbf E(e^{t(X_{n-1}+\cdots+X_1)})e^{\frac{t^2}{2}}e^{ t\lambda\sqrt{n}}\leq e^{t^2n/2-t\lambda \sqrt{n}}

Take t=\frac{\lambda}{\sqrt{n}} and get our conclusion. \square

Borel-Cantelli 引理十分之重要,大概算作本章最重要的一个定理了。在用概率方法证明一个性质对足够大的n的都成立的时候使用。转述如下:

Lemma 1.2 Let E_1,E_2,\cdots be a sequence of events(possibly infinite or dependent), such that \sum_n \mathbf P(E_n) < \infty. Then with probability 1 at most finitely many of the events E_1,E_2,\cdots hold.

下面的定理是本章中一个典型的的例子,体现了概率方法,尤其大偏差估计在算术组合上的应用。注意Borel-Cantelli 引理的两次使用,注意一些估计的技巧,这些技巧实际上贯穿了整章。

Lemma(related with Theorem 1.39): For any \varepsilon >0, there exists a set A \in \Bbb Z ^+ with |A\cap [0,n]|=\Omega_h (n^{1/h-\varepsilon}) for all large n, which makes r_{h,A} (n) is uniformly bounded in n by some g=g_{h,\varepsilon}.

Proof. We construct A randomly, letting the events n\in A be independent with probability \mathbf P(n\in A)=n^{1/h-a-\varepsilon}. Let X_i be the indicator random variable of the event that i is contained in A, thus X=\sum_{k=1}^n X_k counts the number of k’s which are contained in A. We compute

\mathbf E (X) =\sum_{k=1}^n \mathbf P (k\in A)=\sum_{k=1}^n k^{1/h-1-\varepsilon}\leq C \int_1^n x^{1/h-1-\varepsilon} \leq C n^{1/h-\varepsilon}

We apply Chenoffls inequality, showing that \mathbf P (|A\cap [1,n]|\geq 100 C n^{1/h - \varepsilon }) \geq n^{-2} (actually, we can do much better than this, but it suffice.) We then apply Borel-Cantelli lemma, thus know that  |A\cap [1,n]|\geq 100 C n^{1/h - \varepsilon } =\Omega_h (n^{1/h- \varepsilon }) for all but finitely many n with probability 1, by the convergence of the series \sum_n n^{-2} . Thus it suffice to show that r_{h,A} (n) \leq g for some g with probability one. Let t_n be the indicator variables t_n=\mathbf  I(n\in A). For each m,we observe the random variable

Y_m=\sum_{n_1\leq \cdots \leq n_h:n_1+\cdots+n_h=m}t_{n_1}\cdots t_{n_h}

is a regular polynomial (square-free) of degree h in t_1\cdots t_m since t_i^a =t_i for a=2, 3, \cdots . Then it suffice to show that Y_m\leq g for all but finitely many m. By the Borel-Cantelli lemma (for the second time), it is enough to establish the upper tail estimate

\mathbf P (Y_m>g) \leq m^{-2}

for all large m. From linearity of expectation and independence we have

\mathbf E(Y_m)=\sum_{n_1\leq \cdots\leq n_h:n_1+\cdots+n_h=m}n_1^{1/h-1-\varepsilon}\cdots n_h^{1/h-1-\varepsilon}

\leq O_h(m^{1/h-\varepsilon}\sum_{n_1\leq \cdots\leq n_{h-1}\leq m} n_1^{1/h-1-\varepsilon}\cdots n_{h-1}^{1/h-1-\varepsilon})

\leq m^{1/h-1-\varepsilon} O_h((\sum_{n\leq m}n^{q/h-1-\varepsilon})^{h-1})

\leq O_h(m^{-h\varepsilon}).

最后,我们叙述并证明Erd\ddot{o}s 和 Rado 的“向日葵”定理,这是一个纯粹的组合结论,很简单,而且有趣的很。

Lemma 1.41 (Sunflower lemma) If \mathscr A is a collection of sets, each of size at most k, and |\mathscr A| > (l-1)^k k!, then \mathscr A contains l sets forming a sunflower. A collection of sets forms a sunflower if the pairwise intersections A_i \cap A_j for i\neq j are all the same (the A_i are called the padels of the flower).

Proof. Suppose for contradiction that \mathscr A does not contain l sets forming a sunflower. Then we can at most take l-1 sets A_1, \cdots, A_{l-1} from \mathscr A which are all disjoint, containing at most k(l-1) elements. Any other sets in \mathscr A must contain some of there k(l-1) elements, by the construction of A_is. We apply the pigeonhole principal to get there are at least |\mathscr A|/k(l-1)|>(k-1)!(l-1)^{k-1} sets containing some common element x_1. Thus among there must exist more than (k-2)!(l-1)^{k-2} sets which contain a second common x_2, and so forth. Finally we get more than one sets have all the k elements the same, which is an obvious contradiction. \square

No Comments Yet »

RSS方式表示的feed TrackBack URI

留下评论

XHTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.