Chebyshev inequality

This article discusses a very important inequality in statistics, the Chebyshev inequality.

Chebyshev's inequality is usually a conclusion mentioned when studying the expected value and variance of random variables. It looks complicated, but it actually has a very intuitive explanation and effect, and it is also the law of large numbers to be discussed later. The fundamental lemma is really very, very important.

Markov inequality

Before discussing Chebyshev's inequality, we first need to understand its basic version, Markov's inequality, which is relatively simpler and has a more intuitive interpretation.

The form of the Markov inequality is, considering the random variable $ X \geq 0 $ , its expected value is $ E(X) $, then for any $ a>0 $ the following inequality holds:

$$ P(X \geq a) \leq {E(x) \over a} $$

That is to say, for any positive number $ a $, the probability of $ X\geq a $ is constrained to a certain upper limit;

why is that? At first glance it feels amazing, but the proof is very simple. We only consider discrete random variables here (the same goes for continuous random variables):

$$ \begin{align} E(X) &= \sum{X_iP(X_i)}\\ &\geq \sum_{X \geq a}{X_iP(X_i)}\\ &\geq \sum_{X \ geq a}{aP(X_i)} = a\sum_{X \geq a}P(X_i) = aP(X\geq a) \end{align} $$

therefore:

$$ \begin{align} & a \cdot P(X\geq a)\leq E(X)\\ & P(X \geq a) \leq {E(x) \over a} \end{align} $$

Don't be frightened by the above series of equations. If you understand it, you can try to understand it from a more intuitive perspective. What is the Markov inequality saying?

For any value $ a > 0 $, $ X $ has a smaller part of the distribution than $ a $, and also has a part greater than or equal to $ a $:

Then the expected value of the whole $ E(X) $ is equal to the sum of the left and right parts $ X $ weighted according to their respective proportions.

$$ E(X) = \sum_{X < a}X_iP(X_i) + \sum_{X \geq a}X_iP(X_i) $$

Now we only consider the right half, which is the part of $ X \geq a $, the proportion of this part (that is, the probability $ P(X \geq a) $) is multiplied by the lower part of this part $ X $ The limit $ a $ must not exceed the weighted sum of this half of the whole $ X $, then of course it cannot be greater than the expected value of the whole $ E(X) $, which is expressed in mathematics Above is:

$$ a\cdot P(X\geq a)\leq E(X) $$

If you take a closer look, you will find that this is a simple and clear conclusion. And this is actually not a strong inequality, its conditions are relatively loose.

Chebyshev inequality

Next we consider Chebyshev's inequality, which is actually an applied version of Markov's inequality, discussing that the distribution of a random variable is constrained by its variance.

For a random variable $ X $ (discrete or continuous), its expected value is $ \mu $, and its variance is $ \sigma^2 $, then the Chebyshev inequality gives the following conclusion, for any $ a > 0 $:

$$ P(|X - \mu| \geq a) \leq {{\sigma^2} \over {a^2}} $$

or written as:

$$ a^2 \cdot P(|X - \mu| \geq a) \leq {\sigma^2} $$

This is very similar in form to the Markov inequality, both describing the probability that a random variable (here $ |X - u| $) is greater than $ a $, subject to a certain value upper bound constraint.

If you have a good understanding of Markov's inequality, then the conclusion of Chebyshev's inequality should also be obvious. Because in essence, the variance is also an expected value, which calculates the expected value of the square of the distance of $ X $ from the center point $ \mu $:

$$ \sigma^2 =\sum(X_i - \mu)^2 P(X_i) $$

Taking $ a $ as the dividing line, the above formula can also be written as:

$$ \sigma^2 ={\sum_{|X_i-\mu| < a}|X_i - \mu|^2 P(X_i)} + {\sum_{|X_i-\mu| \geq a}|X_i - \mu|^2 P(X_i)}\\ $$

That is, in the distribution of $ X $, the weighted sum of the part whose distance $ \mu $ is less than $ a $, plus the weighted sum of the part that is greater than or equal to $ a $; from the graph Look, it is divided into white part and shadow part;

Then obviously, the proportion of the part whose distance $ \mu $ is greater than $ a $ (shaded part), subject to the constraints of Markov's inequality, will have an upper limit;

$$ a^2 \cdot P(|X-\mu| \geq a) \leq \sigma^2 $$

The key here is to understand that $ |X - \mu| $ is used to replace the form of $ X $ in the Markov inequality; you may wish to fold the above figure in half along the dotted line $ \mu $, just You will find that it has the same form as the Markov inequality.

application

Whether it is Markov's inequality or Chebyshev's inequality, it is actually an inverse inequality, that is to say, when the expected value $ E $ is known, the overall probability distribution is greater than $ a $ that The proportion of the part is constrained by the upper limit of $ E $; conversely, by contradictory method, if there is no such constraint, it is impossible to calculate the current overall expected value $ E $.

Then for Chebyshev's inequality, it actually gives a conclusion that the variance is used to constrain the original probability distribution, that is, in the original distribution $ X $, the distance mean $ \mu $ is greater than $ a $ The probability of that part of , must be less than a certain value. That is to say, it constrains the proportion of $ X $ that deviates too far from $ \mu $.

So what application does it have? One of its most important applications is to prove a very important theory in probability theory, namely the law of large numbers, another conclusion that seems to be taken for granted, but you don't know how to prove it rigorously. This will be put in the next article Discussed in detail.

Chebyshev inequality

Markov inequality

Chebyshev inequality

application

navi

引用和评论

大数定律

频率派与贝叶斯统计在营销组合建模中的应用比较：隐私优先时代的方法选择

《统计学习基础：数据挖掘、推理与预测（第二版）》