1

Continuing with the Chebyshev inequality , this article discusses a very important theory in statistics, the law of large numbers, which is the basic theory of probability theory.

The intuitive expression of the law of large numbers is very in line with our intuition. For example, if an ordinary coin is tossed enough times, the number of heads and tails will be infinitely close to 50%; or a coin that has been cheated, the theoretical probability of flipping heads is 0.7, then when we toss enough times, the number of heads and tails will be infinitely close to 70% and 30%.

This process of approximating the theoretical value of probability from an infinite number of repeated experiments is what the law of large numbers describes: that is, when the number of trials \(N\) is large enough, the actual frequency (frequency) will be infinitely close to the theoretical probability (probability) ).

As a normal thinking person, this seems to be a matter of course, but this is mathematics, such a seemingly obvious conclusion is not an axiom, we need strict theoretical proof.

Sinchin's Theorem of Large Numbers

The Law of Large Numbers is an umbrella term for several theorems, the fundamental version of which we are discussing here, and the lemma for all the other subsequent theorems, namely Schinchin's Theorem of Large Numbers.

Consider a random variable\(X\) that fits a certain probability distribution, its expected value is\(E(X) = \mu\), and the variance is\(\sigma^2\); usually we don't know exactly\ The true values of (\mu\) and \(\sigma^2\) can only be estimated by sampling. Each time a value of \(X\) is sampled, a series of sampled values are obtained:

$$ X_1, X_2, X_3 ... X_n $$

They are independent of each other and all conform to the original \(X\) distribution.

Xin Qin's Theorem of Large Numbers states that when \(n\) is large enough, the average of these \(n\) sampled data\(\overline X\) will be infinitely close to the expected value\(\mu\ ).

However, this is an intuitive expression. How do we define the matter of "infinitely close to the expected value" in rigorous mathematical language? A definition similar to the concept of limit in calculus will be used here.

For any \(\epsilon>0\), there are:

$$ \lim\limits_{n\rightarrow+\infty}P(|\overline X - \mu| < \epsilon) = 1 $$

That is to say, no matter how small \(\epsilon\) is, as long as \(n\) tends to \(+\infty\), \(\overline X\) will infinitely gather at a fixed value in the probability distribution\( The distance around \mu\) does not exceed the vicinity of \(\epsilon\), which is called \(\overline X\) probabilistically converging to \(\mu\).

With a strict mathematical definition, let's think about how to prove this seemingly obvious conclusion.

prove

Since the expected value of \(X\) is \(E(X) = \mu\) and the variance is \(\sigma^2\), now let's consider the expected value and variance of \(\overline X\). In fact we have the following conclusions:

$$ E(\overline X) = E(X) = \mu $$

That is, the expected value of \(\overline X\) is equal to the expected value of the original \(X\).

And since \(X_1, X_2 ... X_n\) are all independent and identically distributed, according to the relevant theory of variance, we have:

$$ D(X_1 + X_2 + ... + X_n) = D(X_1) + D(X_1) + ... + D(X_n) = n\sigma^2 $$

So the variance of \(\overline X\) can be calculated:

$$ \begin{align} D(\overline X) & = D\,[{1\over n}(X_1 + X_2 + ... + X_n)]\\ & = {1 \over n^2}[ D(X_1) + D(X_2) + ... + D(X_n)]\\ & = {1 \over n^2} \cdot n\sigma^2 = {\sigma^2 \over n} \end {align} $$

So we get the following conclusions:

$$ E(\overline X) = \mu, \,\,\,\,D(\overline X) ={\sigma^2 \over n} $$

Pay attention to these two formulas, don't take them for granted, they have strict preconditions that \(X_1, X_2 ... X_n\) are independent and identically distributed with \(X\); and it is necessary to prove that they are in fact It also takes a lot of trouble, it is not as obvious as it seems, you can find it in the textbook. But this is not our focus, we just need to know this conclusion.


With the above basic conclusions, we get a very important conclusion, that is, when we take out enough sampled data\(X_i\), their mean is the same as the original distribution\(X\) The expected value\(\mu \), but the variance is reduced from \(\sigma^2\) to \(\sigma^2 \over n\);

Intuitively, when we average the sampled data, its overall expected value is unchanged, but the variance of the data is reduced, and the overall data distribution is more concentrated. And the larger the number of samples \( N \), the smaller the variance and the more concentrated the data:

Since the variance represents the degree of aggregation of the data, the smaller the variance, the more the distribution of the data is close to the expected value. Intuitively, we can infer that as \( N \) becomes larger and larger, \( \ The overline X \) will get closer and closer to the expected value \( \mu \).

Although we have an intuitive understanding, we still need a strict mathematical formula to express this fact inference. At this time, we can move out the Chebyshev inequality. For any \(\epsilon>0\), there are:

$$ P(|\overline X - \mu| \geq \epsilon) <= {D(\overline X) \over {\epsilon^2}} = {\sigma^2 \ \over {n \cdot \epsilon ^2}} $$

Then when \(n\) tends to infinity:

$$ \lim\limits_{n\rightarrow+\infty}P(|\overline X - \mu| \geq \epsilon) <= \lim\limits_{n\rightarrow+\infty}{\sigma^2 \ \over { n \cdot \epsilon^2}} = 0 $$

Carefully understand this expression, what exactly does it express?

The Chebyshev inequality constrains the proportion of the data that is too far away from \(\mu\), which is constrained by the variance; when \(n\) is large enough, the variance gets closer and closer to 0, Therefore, the upper limit of this constraint is also infinitely close to 0, which means that the probability of the part where the \(\overline X\) distance \(\mu\) exceeds \(\epsilon\) is infinitely close to 0; that is, regardless of \( \epsilon\) how small, as long as the number of samples\(n\) becomes larger and larger, all \(\overline X\) will be more and more constrained around \(\mu\) not exceeding \(\epsilon \) \([\mu - \epsilon, \mu + \epsilon]\), then in fact we get \(\overline X\) infinitely close to \(\mu\).

We can also see this from the above figure. When \(n\) becomes larger and larger, the variance of \(\overline X\) becomes smaller and smaller, and the overall distribution becomes more and more concentrated to the expected value\ (\mu\); when \(n\) tends to infinity, the variance is close to 0, and the overall distribution tends to be a vertical line concentrated in \(\mu\), which indicates that the \( \overline X\) is infinitely close to \(\mu\).


navi
612 声望191 粉丝

naive