Friends who do statistics-related systems must have learned concepts such as normal distribution, variance, and standard deviation. In PHP, there are corresponding extension functions that are specifically developed for these statistics-related functions. The stats extension function library we are going to learn today is this kind of operation function. Of course, I haven't done any similar system myself, and I don't know much about these concepts, so the content I'm learning today is only based on my personal understanding and some of the content I have touched a little before. However, it is said that Python is relatively more powerful in this respect. After all, it is a universal glue language, and it is also a language that is slowly accepted by the public after its success in the field of statistics. Interested students can study it by themselves.
The installation of the stats extension is also very simple, just use the normal extension installation method directly. And it does not require additional support from other components in the system, which is very convenient.
Random number between 0-1
First, let's look at a function that has little to do with statistics.
var_dump(stats_rand_ranf()); // float(0.32371053099632)
The ordinary rand() and mt_rand() functions both return integers from 0 to getrandmax(). And this stats_rand_ranf() returns a decimal between 0 and 1. In addition to this function, there are some other functions at the beginning of stats_rand_, which are used to return random values such as normal distribution. If you have an understanding of statistics, you can refer to the documentation yourself.
Variance, standard deviation
The two concepts of variance and standard deviation should be relatively simpler and more common. For example, my real major is psychology. In psychological statistics, there are calculations of variance and standard deviation, which are also required for examinations. However, the content of this section is also very simple. After using the function, we will also use our own calculation code to show the calculation formula of variance and standard deviation.
// 1,3,9,12
// 平均数:(1+3+9+12)/4 = 6.25
// 方差
var_dump(stats_variance([1,3,9,12])); // float(19.6875)
// 方差公式:(1-6.25)^2+(3-6.25)^2+(9-6.25)^2+(12-6.25)^2)/4
var_dump((pow(1-6.25, 2)+pow(3-6.25, 2)+pow(9-6.25,2)+pow(12-6.25,2))/4); // float(19.6875)
The average is very useful for many statistical calculations and is one of the basic data of many algorithms. So we prepare an average first, mainly for our manual calculation later. In fact, variance and standard deviation are also the basic data for many other calculations.
The stats_variance() function is used to calculate the variance of a set of data. What it receives is an array parameter, and the calculated content is the value of the data in the data. The formula for variance is actually to subtract the average of each data and square it, then add it up and divide by the number of data.
You can see that the result of the calculation is the same as the result of calling the stats_variance() function directly.
// 标准差
var_dump(stats_standard_deviation([1,3,9,12])); // float(4.4370598373247)
var_dump(stats_standard_deviation([1,3,9,12], true)); // float(5.1234753829798)
// 标准差:开方((1-6.25)^2+(3-6.25)^2+(9-6.25)^2+(12-6.25)^2)/4)
// 样本标准差:开方((1-6.25)^2+(3-6.25)^2+(9-6.25)^2+(12-6.25)^2)/(4-1))
var_dump(sqrt((pow(1-6.25, 2)+pow(3-6.25, 2)+pow(9-6.25,2)+pow(12-6.25,2))/4)); // float(4.4370598373247)
var_dump(sqrt((pow(1-6.25, 2)+pow(3-6.25, 2)+pow(9-6.25,2)+pow(12-6.25,2))/3)); // float(5.1234753829798)
The calculation of the standard deviation is actually the square root of the variance result and then dividing by the number of data. It has two forms, one is to divide by the quantity directly, and the other is to divide by the quantity minus one. They are called standard deviation and sample standard deviation. It can be seen that directly using stats_standard_deviation() and specifying its second parameter can easily switch the calculation results of these two standard deviations. And it is much more convenient than the calculation by hand.
Mean deviation, harmonic mean, factorial
The average deviation generally refers to the arithmetic mean of the absolute value of the deviation between each value in the sequence and its arithmetic mean. My goodness, this concept is well read, how are you guys who are studying statistics? Of course, a function in the stats extension is done.
// 平均偏差
var_dump(stats_absolute_deviation([1,3, 9, 12])); // 4.25
// ((6.25-1)+(6.25-3)+(9-6.25)+(12-6.25))/4
//(5.25+3.25+2.75+5.75)/4 = 4.25
The stats_absolute_deviation() function is used to calculate the average deviation. In fact, the above concept is the formula I wrote in the comments. The absolute value of each data minus the average is then divided by the number of data. Just see if the formula is much clearer than the above concept. Similarly, let's look at downgrades and averages.
// 调和平均数
var_dump(stats_harmonic_mean([1, 3, 9, 12])); // float(2.6181818181818)
// 4/(1/1+1/3+1/9+1/12) = 2.6181818181818
stats_harmonic_mean() is used to calculate the harmonic mean of a set of data. Can it be seen from the calculation formula annotated below? The harmonic mean is the result of adding the reciprocal of each data and then dividing the number of data by the reciprocal sum.
Finally, let's make it easier, a function that can directly calculate the result of factorial.
var_dump(stats_stat_factorial(6)); // float(720)
// 1*2*3*4*5*6 = 720
I believe this function does not need to be explained.
Kurtosis, skewness, cumulative normal distribution function, probability density
I haven't actually touched these concepts. But just tested that the function code can be used. There are many related functions. For example, here are some functions related to the normal distribution, as well as calculation functions related to F distribution, t distribution, Cauchy distribution, Chi-square distribution, and so on. I admit that I have only heard one or two names, and there are many others that I haven't even heard of.
// 峰度
var_dump(stats_kurtosis([1, 3, 9, 12])); // float(-1.6960846560847)
// 偏度
var_dump(stats_skew([1, 3, 9, 12])); // float(0.091222998923078)
// 返回正态分布的累积分布函数、其逆函数或其参数之一
var_dump(stats_cdf_normal(14,5,10, 1));
// 返回第一个参数的概率密度
var_dump(stats_dens_normal(14, 5, 10));
For other functions related to various distribution calculations, you can refer to the relevant documents if you need them. I will not force to get in the car here. I guess I have to go to the car after getting in the car.
Summarize
I really didn't know that such extensions already existed in our PHP before brushing the official documents. I was still thinking that it would be troublesome to use PHP to build a similar statistical system, so everyone would choose other languages. In fact, these extensions have already existed. Whether it's good or not, it's true that there are not too many examples of using PHP to do this kind of statistical system. If you need it, you still need to study it yourself. And this kind of calculation is actually a mixture of various formulas. I believe that there are also many useful frameworks in Composer that we can use without needing to install extensions separately in the system.
Test code:
Reference documents:
https://www.php.net/manual/zh/book.stats.php
Searchable on their respective media platforms [Hardcore Project Manager]
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。