Demystify the "low-level logic" of your data processing, explain the calculation of the formula engine in detail (1)

background

In the information age, the most obvious thing we can feel is the explosion of dense data, and people are accumulating more and more data. These complex data appear together, and many traditional data recording, query, and summary tools cannot meet people's needs. More effective processing of these large amounts of data allows computers to understand the data effects required by humans, thereby forming a more automated and intelligent data processing method.

In order to process these massive amounts of data, various big data engines, search engines, computing engines, 3D engines, etc. have emerged to better solve problems that cannot be handled by humans caused by complex data. The calculation formula engine, which is the basis for comparison, is the core part responsible for processing data in the calculation program. Next, we will introduce the basic principles of the calculation engine, calculation chain and asynchronous function composition, and start from the basic concept of the calculation formula engine, using our table electronic component as an example to demonstrate how these contents can be implemented in JavaScript.

The calculation principle of the formula engine

The calculation engine is responsible for solving the statistics of data sources, data operations, and data management, and returns appropriate calculation results as required. For different purposes of data processing, the content that needs to be returned is different, and there are also many different categories.

In order to enable the computer to better recognize the processing operations we need, we need to go through the process of compilation and translate the language we write into a language that can be recognized by the machine.

The process of the entire compilation phase is divided according to the process according to the following figure:

The two more critical links are the process of lexical analysis and grammatical analysis. In these two parts, our input will be gradually divided into content that can be recognized by the program.

After entering the content, the compiler first performs lexical analysis on the content. In this step, the task of the compiler is to identify whether the words in the source program are wrong. The part of the compiler that implements this function is generally called a lexical analyzer. Usually the output of lexical analysis is a single word symbol.

Taking JS as an example, there are three main parts in this process: analyzing function parameters, analyzing variable declarations, and analyzing function declarations. The purpose of the grammatical analysis stage is to identify whether the grammatical structure (ie, sentence or sentence) of the source program is wrong, and grammatical errors can usually be found at this stage. In this stage, the compiler actually processes the word symbols derived from lexical analysis.

In the calculation formula engine, the way we process data is extremely similar to the process of processing language in the compilation principle. Starting from practical applications, we can implement a calculation formula engine similar to Excel calculation formula. The idea we can use is to start from lexical analysis. Split the complete long string of formula statements into small pieces of content, and then perform grammatical analysis, and finally perform operations on the generated grammatical structure tree. Next, let's take a look at how the details are implemented.

Implementation details of the formula engine

Let’s start with the calculation of the formula to explain to everyone. The calculation of the formula is the result of the expression after a formula string is calculated. For example: the formula "=1+10*11"
The result is 111 after calculation. Electronic computers are not human beings. Such a simple expression wants to be calculated completely correctly and eventually becomes the data content we need. It is not simply that we can get the answer after a glance. To realize the function of such Excel table calculation, it is necessary to go through the processes of lexical analysis, grammatical analysis, and grammatical structure tree calculation.

1. Lexical analysis

formulas commonly used in 161416a6d90c90 and

First, we perform lexical analysis. In this process, we split the formula characters into string arrays. In the Excel table formula calculation, the formula string of the expression only includes: operators, symbols, strings, numbers, arrays, and references , Name these categories.

Name: sum

Operator: (): /% +

Reference: A1 A11 B1

Number: 100

2. Syntax analysis

After the lexical analysis is completed, we conduct further grammatical analysis on the results of the lexical analysis. Usually, the grammatical analysis in calculation can be processed by expression tree or stack (ie reverse Polish style).

Here we first introduce the expression tree method.

Syntax analysis-expression tree

The process of using expression trees for analysis starts with a binary tree. First, we compose an expression tree according to the priority of the result of lexical analysis. The leaf nodes of the expression tree are operands, and the internal nodes are operators.

In this case, the colon has the highest priority, followed by the parentheses, and finally the division sign. When this tree is formed, it is very close to our final calculation result.

We will use the recursive call method to perform operations on this tree, starting from the root node, to sum, and recursively downwards. When A1:A11 is reached, the first result is obtained, and then the calculation result is returned layer by layer.

This fully demonstrates how to implement a formula calculation.

Syntax analysis-inverse Polish algorithm

The reverse Polish algorithm forms a stack (ie reverse Polish expression) in the grammatical analysis stage. The core of this expression is to convert the infix expression that we usually use to a postfix expression. The parentheses only indicate the order of operations during the calculation, but they are not the content of the elements that actually participate in the calculation. Therefore, the content of the parentheses can be omitted in the process of converting infix to suffix.

Then the computer writes the code to complete the calculation.

Here is a tree transformed into the corresponding reverse Polish look.

Binary tree recursion vs inverse Polish algorithm

Compared with the recursive calculation of a tree, the inverse Polish formula is more in line with the habit of mathematical calculation. But when dealing with this kind of formula calculation in the project, which one is more capable of handling more complicated situations?

Let us look at a multi-level nested publicity content:

The usage scenario of this announcement is the sum of multiple columns of the SUMIFS function, which is equivalent to the following:

=SUMIFS($C:$C,$B:$B,$A1)+SUMIFS($D:$D,$B:$B,$A1)+….

Obviously, the above formula is simpler. Using the binary tree recursion method, you only need to determine the content of the parent node and child node of the SUMIFS node, and you can get this multi-column summation in just one line of code.

However, if the reverse Polish algorithm is used, the code starts to calculate as soon as it encounters SUM, and it is difficult to determine that the content of SUM to be run at this time is actually in the innermost brackets. It can be solved, but it is not the easiest.

compare results

Compared with the stack method, the tree solution is easier to expand and enhance, and it can more easily deal with complex formulas. This is a unique advantage when dealing with a large number of formulas and complex calculations.

Summarize

After introducing the whole process of how to analyze and calculate the formula, we will continue to introduce the related content of the calculation chain and asynchronous function in the formula calculation engine. When dealing with complex formulas, how to solve the directed graph, what is the calcOnDemand solution, and the fancy usage of asynchronous functions in the front and back calculations.

I think it's good, like it and let's go\~The follow-up will bring you more interesting content\~

Demystify the "low-level logic" of your data processing, explain the calculation of the formula engine in detail (1)

background

The calculation principle of the formula engine

Implementation details of the formula engine

1. Lexical analysis

2. Syntax analysis

Syntax analysis-expression tree

Syntax analysis-inverse Polish algorithm

Binary tree recursion vs inverse Polish algorithm

compare results

Summarize

Extended reading

葡萄城技术团队

引用和评论

Java高效合并Excel报表实战：GcExcel让数据处理更简单

Vue.js-Vue实例

2025年最新反编译微信小程序的教程及工具

Flex 布局学习总结（对齐方式）

Koa+Typescript起手式(空环境) 不用每次玩node都要搭环境了！

JavaScript&ES6----数组去重的多种方法

Base64编码的“暗坑”：解密失败？可能是这些原因！