Abstract: least squares method is a mathematical tool widely used in many disciplines of data processing such as error estimation, uncertainty, system identification and prediction, and forecasting. Least squares is very simple and is also widely used in the industry.

This article is shared from the HUAWEI cloud community " Least Squares Introduction ", author: Yan.

Least squares method is a mathematical tool widely used in data processing fields such as error estimation, uncertainty, system identification and prediction, and forecasting. Least squares is very simple and is also widely used in the industry.

However, many people may not understand the least squares method and its story. I will share with you today.

In 1801, Italian astronomer Giuseppe Piazzi discovered the first asteroid Ceres. After 40 days of tracking, Piazzi lost the position of Ceres because Ceres moved to the back of the sun. Then scientists all over the world used Piazzi's observational data to start searching for Ceres, but searching for Ceres based on the results of most people's calculations has no results.

At the age of 24, Gauss also calculated the orbit of Ceres. The Austrian astronomer Heinrich Oberst rediscovered Ceres based on the orbit calculated by Gauss.

The method of least squares used by Gauss was published in his book "On the Movement of Celestial Bodies" in 1809, and French scientist Legendre independently discovered the "least squares method" in 1806, but it was unknown because it was unknown to the world.

In order to make it easier for everyone to understand the least squares method, I will tell you a story.

Assuming that height is a variable X and weight is a variable Y, we all know that there is a more direct relationship between height and weight. Life experience tells us: Generally, people who are taller will have a heavier weight. But this is only our intuitive feeling, just a very rough qualitative analysis.

In the world of mathematics, we need to perform rigorous quantitative calculations most of the time: can a person be able to calculate his or her standard weight through a formula based on his or her height?

We can sample the height and weight data of a group of people, (x1​,y1​),(x2​,y2​),...,(xn​,yn​), where x is height and y is weight.

Common sense of life tells us: height and weight are an approximate linear relationship, and the simplest mathematical language to describe it is y = \beta_0+\beta_1xy=β0​+β1​x.

Therefore, the next task becomes: how to find the β0 and β1?

In order to calculate the value of β0​, β1​​, we adopt the following rules: β0​, β1​ should minimize the sum of squares of the difference between the calculated function curve and the observed value. Described by a mathematical formula:
image.png

Among them, y_{ie}yie​ represents the value estimated based on y=\beta_0 + \beta_1xy=β0​+β1​x, and y_iyi​ is the actual value obtained by observation.

In this way, the regression model of the sample is easy to get:
image.png

Now we need to determine β0​ and β1​ to minimize the cost function. It is easy for everyone to think that the minimum value can be found by taking the derivative of this function:
image.png

After collating these two equations, using Cramer's rule, it is easy to solve:
image.png

According to this formula, the corresponding parameters can be solved by bringing in all the samples.

If we generalize to a more general situation, if there are more model variables x1, x2, ..., xm (note: x_1x1​ refers to a sample, x1 refers to a model-related variable in the sample), linear functions can be used Expressed as follows:

y(x1,⋯,xm;β0​,⋯,βm​)=β0​+β1​x1+⋯+βm​xm

For n samples, it can be represented by the following linear equations:
image.png

If the sample matrix x_i^hxih​ is denoted as matrix A, the parameter matrix is ​​denoted as vector \betaβ, and the true value is denoted as vector Y, the above linear equations can be expressed as:
image.png

That is, A \beta = YAβ=Y

For least squares, the final matrix expression can be expressed as:

min∣∣Aβ−Y∣∣2​

The final optimal solution is:

β=(ATA)−1ATY

2021 HUAWEI CLOUD AI Practical Camp-The AI ​​Practical Camp that all HUAWEI CLOUD employees are learning, Come sign up and learn for free~

Click to follow, and get to know the fresh technology of


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量