4

When doing data analysis, we often hear the concept of year-on-year and month-on-month. When various enterprises and organizations release statistical data, they usually like to use year-on-year and month-on-month comparisons with previous historical data to illustrate the changes in data. For example, the Bureau of Statistics announced that the CPI in January 2022 will increase by 0.9% year-on-year and 0.6% month-on-month.

In practice, in database-based data analysis scenarios, month-on-month and year-on-year are typical complex computing scenarios, especially before the emergence of analytical functions in commercial databases such as Oracle. Taking MySQL as an example, the Lag and Lead functions were only introduced in version 8.0. These two functions, combined with the windowing function, effectively improve the implementation efficiency of complex operations such as year-on-year and chain-comparison. In the 5.x series, MySQL needed to rely on multiple nested subqueries and self-associations to achieve such computations.

Let's take a simple example to look at it separately. MySql 5.x and 8.0 are calculated on a year-on-year basis and a chain-by-month basis.

For example data of see the table:

CREATE TABLE sales  (
  `产品ID` varchar(20),
      `销售数量` int(20) ,
  `销售时间` timestamp(6) NULL DEFAULT NULL
)
INSERT INTO sales VALUES ('C1001', 15, '2020-06-01 10:10:12');
INSERT INTO sales VALUES ('C1002',26, '2020-05-02 0:10:12');
INSERT INTO sales VALUES ('C1003', 21, '2020-04-03 0:10:12');
INSERT INTO sales VALUES ('C1003', 23, '2020-04-04 0:10:12');
INSERT INTO sales VALUES ('C1003', 0, '2020-03-05 0:10:12');
INSERT INTO sales VALUES ('C1001', 16, '2020-02-06 3:0:12');
INSERT INTO sales VALUES ('C1002', 32, '2020-01-07 0:10:12');
INSERT INTO sales VALUES ('C1001', 16, '2019-12-08 0:12:24');
INSERT INTO sales VALUES ('C1001', 32, '2019-06-09  0:12:24');
INSERT INTO sales VALUES ('C1002', 17, '2019-05-09 0:12:24');

1. MySQL 5.x: achieve year-on-year and proportion calculations through sub-queries and associations

Take the total sales value of different years by year and month, and calculate the month-on-month (total sales compared with the previous period) and year-on-year (total sales compared with the same period last year) as an example.

Example table structure and data

Calculate month-on-month and year-on-year through SQL:

select  year(c.销售时间) yy,month(c.销售时间) mm,     
concat(ifnull(abs(round((sum(c.销售数量)-ss1)/ss1*100,2)),0),'%') 同比,
concat(ifnull(abs(round((sum(c.销售数量)-ss2)/ss2*100,2)),0),'%')  环比
from sales c
left join (select month(a.销售时间) mm1,
                    year(a.销售时间) yy1,
                    sum(a.销售数量) ss1
          from sales a
          GROUP BY mm1,yy1) a
          on month(c.销售时间) = a.mm1 
          and a.yy1 = year(c.销售时间)-1     
 left join  (select month(a.销售时间) mm2,
                    year(a.销售时间) yy2,
                    sum(a.销售数量) ss2
             from sales a
           
              GROUP BY mm2,yy2) b
on (b.yy2 = year(c.销售时间) and b.mm2+1 = month(c.销售时间) OR (yy2=year(c.销售时间)-1 
AND b.mm2 = 12 AND month(c.销售时间) = 1))
 group by yy, mm
 order by yy,mm asc

Calculation results:

2. MySQL 8.0: achieve year-on-year and proportion calculations through analytical functions**

MySql8.0 supports the Lead and Lag analysis functions. Although it can greatly improve the efficiency of the same and chain calculations, it still needs to write SQL statements for processing.

2.1 Calculation year-on-year

select t2.年份,t2.月份,concat(round((t2.数量-t1.数量)/t1.数量,2)*100,'%') as 同比 from (
SELECT year(销售时间) as 年份,month(销售时间) as 月份,sum(销售数量) as 数量 from sales 
group by year(销售时间),month(销售时间) order by year(销售时间) desc, month(销售时间) desc
) t1
,(
SELECT year(销售时间) as 年份,month(销售时间) as 月份,sum(销售数量) as 数量 from sales 
group by year(销售时间),month(销售时间) order by year(销售时间) desc, month(销售时间) desc
) t2 where t1.年份=t2.年份-1 and t1.月份=t2.月份

2.2 Calculate the chain ratio

SELECT
    mm,
    CONCAT(
        ROUND(
            IFNULL(
                (xl - first_xl) / first_xl * 100,
                2
            ),
            0
        ),
        '%'
    ) AS 环比
FROM
    (
        SELECT
            mm,
            xl,
            lead (xl, 1) over (ORDER BY mm DESC) AS first_xl
        FROM
            (
                SELECT
                    DATE_FORMAT(销售时间, '%Y-%m') AS mm,
                    sum(销售数量) AS xl
                FROM
                    sales
                GROUP BY
                    DATE_FORMAT(销售时间, '%Y-%m')
            ) t
    ) a

After SqlServer2008R2 and Oracle10g, Lag and Lead analysis functions are provided. The specific calculation logic and usage are similar to those of MySQL 8.0 above.

3. Calculation engine using BI tools

For such complex computing scenarios, business intelligence BI data analysis tools provide more efficient solutions. Take Wyn Enterprise embedded business intelligence software as an example. Its built-in Wax analysis expression and fast calculation engine provide the ability to directly realize complex calculations such as year-on-year and chain-comparison, without the need to write complex and lengthy SQL.

3.1 Use the built-in fast calculation function of year-on-year and month-on-month**

Calculations such as year-on-year and chain ratios are generally standard functions of BI tools, and we can implement them directly through settings.

3.2 Using data analysis expressions

If the built-in fast calculations are not sufficient, more complex calculations can also be achieved by analyzing expressions. Analysis expression is a more flexible and powerful data calculation method. With rich functions, users can freely combine them like Excel formulas to achieve more powerful analysis capabilities. Analysis expressions perform business calculations based on data models, and use some defined functions to use correct syntax to complete a complex business logic calculation. In this way, users can use data more flexibly and make maximum use of data.

Dear bosses, by comparing the differences between SQL and BI data analysis tools in handling complex calculations such as year-on-year and month-on-month, we can find that professional tools are more convenient in data calculation and processing capabilities. In the future, if you have similar analysis and calculation requirements, it is more appropriate to choose BI analysis tool for processing.


葡萄城技术团队
2.7k 声望28.5k 粉丝

葡萄城创建于1980年,是专业的软件开发技术和低代码平台提供商。以“赋能开发者”为使命,葡萄城致力于通过各类软件开发工具和服务,创新开发模式,提升开发效率,推动软件产业发展,为“数字中国”建设提速。