Demystify the "low-level logic" of your data processing, and explain the calculation of the formula engine in detail (2)

In the previous article, we introduced the calculation principle of the calculation formula engine. In this issue, we will continue to take everyone to understand the realization principle of the formula engine in the Excel table.

background

After solving the logic of basic operations in the previous section, in some actual business scenarios, formula calculations are not independent operations performed by a single formula. We often need to decompose a large operation into small operations that are dependent on the front and back; at the same time, the calculations between these cells will have a lot of interdependence. The order of calculation is also a key issue to consider. We need to divide a series of sequential Similar operations are managed and executed in sequence.

In order to realize the management of this kind of calculation relationship, a calculation chain has emerged to manage the dependence and sequence of formulas, and to deal with the intricate dependence in the electronic form. It involves the processing of the graph, the calculation of the dirty value and so on. Next, we will start from the graph calculation, introduce the calculation of different graphs, on-demand calculation and dirty value processing, and have a deeper understanding of the calculation chain related issues in Excel table calculation.

Calculation chain

Let us start with two table calculation problems.

The first simple case:

In this series of calculation formulas, when C1 is assigned a value of 1, the result of A1 is 3. But if you modify the value of C1 at this time, C1=10, the content of B1 has not been modified yet, and A1 is still 3, and then B1=10+1=11, you will find that the content of A1 is calculated incorrectly.

The second, more complicated situation:

We use the nodes of the graph to represent the calculated content of the cell, and the arrows represent the dependencies. The node with zero in-degree does not depend on the content of other nodes at all, so the order of calculation should start from the node content that does not depend on other nodes. Cell A depends on F and E, D depends on C and B, and C and B depend on F and E respectively. The only content that does not depend on other nodes is E and F.

In this way, a stable and correct calculation sequence is obtained, which is called a calculation chain. In this example, the calculation chain is: F, C, E, A, D, B.

Calculation of Directed Acyclic Graph and Directed Directed Graph

In a graph, if you start from a node and finally return to this node, we call it a directed acyclic graph, otherwise it is called a directed acyclic graph.

No matter if you start from any node in the left figure, you cannot return to that node, so the left figure is a directed acyclic graph.

Point A in the right picture can go back to A node after departure, so the right picture is a directed cyclic graph.

After we disassemble the relationship between computing nodes and computing nodes into directed graphs, we can use the standard method of computing directed graphs to get a reliable calculation chain.

Calculation of directed acyclic graph

For each node there is the concept of in-degree and initial degree, in-degree: how many arrows point to the current node, for example, for node A, the in-degree is 1; out-degree: how many arrows in the current arrow point out, for example, for node B, the in-degree is 2 .

When calculating the directed acyclic graph, the in-degree in the graph is used as a priority ordering, and the nodes with an in-degree of 0 are calculated each time, and then removed.

Take the above picture as an example, the complete calculation process is as follows:

1. Initialization, statistical entry: A:1 B:2 C:1 D:2 E:0 F:0

2. Calculate the E node and update the in-degree: A:1 B:2 C:0 D:2 F:0

3. Calculate the F node and update the in-degree: A:1 B:2 C:0 D:1

4. Calculate the C node and update the in-degree: A:1 B:1 D:0

5. Calculate the D node and update the in-degree: A:1 B:0

6. Calculate node B and update the in-degree: A:0

7. Calculate the A node, and the calculation of the directed cyclic graph is finished

Calculation of directed ring graph

The calculation problem of the directed acyclic graph is solved. If the arrow between BC in the above figure is reversed at this time, the situation will become completely different. At this time, a ring is formed between the BCDs.

At this time, the calculation is still performed:

1. Initialization, statistical input: A:1 B:1 C:2 D:2 E:0 F:0

2. Calculate the E node and update the in-degree: A:1 B:1 C:1 D:2 F:0

3. Calculate the F node and update the in-degree: A:1 B:1 C:1 D:1

4. There is no node with an in-degree of 0, start iterative calculation

(Iterative calculation: Substitute the calculation result of the previous step into the calculation of this step. After multiple steps of this calculation, a closer result can be obtained.)

On-demand calculation

When solving this kind of interdependent and complex cell calculations, in addition to graph calculations, you can also use on-demand calculations. The function calcOnDemand is used here. The core working principle of this function is: push the stack and calculate the required node content, the node can be any content.

Use the content of the figure to illustrate the calculation process: here we choose A as the node we need

Push the stack and calculate A, need to calculate the result of B
Push the stack and calculate B, need to calculate the result of C
Push the stack and calculate C, need to calculate the result of E
Push the stack and calculate E, pop out of the stack after E is calculated
Calculate C, pop out of the stack after C is calculated
Calculate B, also need to calculate the result of D
Push the stack and calculate D, get the result of C, and need the result of F
Push the stack and calculate F, pop out of the stack after F is calculated
Calculate D, pop out of the stack after D is calculated
Calculate B, pop out of the stack after B is calculated
Calculate A, pop out of the stack after A is calculated
The stack is empty, the calculation is complete

This example takes the most complex A as the demand node. If D node is needed, it becomes D node into the stack, C node into the stack, E node into the stack, out of the stack after calculation, and out of the stack after calculation, and F node into the stack After the calculation, the stack is popped, and the D node is pushed into the stack, so that the correct value of D is obtained. This calculation method only calculates the required content.

Graph computing VS on-demand computing

Here is a distinction between graph calculation and on-demand calculation. These two algorithms have different efficiency in different situations.

The figure on the left is an accumulation from top to bottom. The calculation using on-demand calculation is very smooth. It can be calculated in order from top to bottom. There are no more than two elements to be calculated in the stack, and the calculation of the graph is calculated on demand. Faster.

The figure on the right is a reverse calculation. The content of the cells in the previous row depends on the content of the cells in the next row. The on-demand calculation requires 1000 steps. At this time, the on-demand calculation will be much slower than the graph calculation.

In general, graph calculation is more stable than on-demand calculation, and on-demand calculation has different performance in different situations. In actual use, we can adopt different calculation strategies according to specific usage scenarios.

Dirty value calculation

In this figure, if the value of a node is modified at this time, at this time, it is necessary to mark all nodes that need to be recalculated according to the propagation path.

Example: Modify the content of E

Mark ECBDA in turn according to the spread
Delete unmarked nodes
Start calculating the remaining nodes

After the marking is completed, we no longer need to pay attention to the unmarked nodes and the calculation is complete.

The whole process is shown in the figure below:

Expansion: Several questions about computing content

The entire content of the computing chain has been introduced. In order to help everyone better understand, here are a few questions to think about:

Can the graph calculation in the example be changed to on-demand calculation?
Can I calculate the value of E first, then mark and recalculate CBDA?
What are the advantages of recalculating nodes based on the transmission route marking?

Answer part:

The graph can be calculated on demand, because both on-demand calculation and graph calculation can correctly calculate the content of the graph.
In this figure, you cannot calculate E first, and then re-label CBDA, because once we calculate E first, then C still depends on the content of E, and C cannot be calculated before E. At this time, the value of ABCD node will become different. Reliable, this calculation reference chain will be completely destroyed.
In the processing of dirty data, only the nodes on the propagation path are processed. In actual application scenarios, the processing of hundreds of cell data can greatly reduce the content of calculations.

Summarize

All the cell calculation contents are all introduced for everyone. Let's review the contents of this chapter together: The calculation chain is to connect the calculation of each cell in series, divided into ordinary calculation and iterative calculation. Here we have introduced two different calculation methods-graph calculation and on-demand calculation. In the face of different situations, you can choose to adopt different calculation strategies. The calculation chain is not as obvious as the cell calculation in the whole calculation process, but it is more complicated than the cell calculation.

After understanding how the calculation formula performs lexical and grammatical analysis to quickly calculate the formula, and how the calculation chain handles the processing of large amounts of data in multiple cells, I will continue to introduce you to the fancy usage of asynchronous functions in front and back calculations.

See here and give me a thumbs up~ The follow-up version of Grape will also bring you more serious or interesting content.

Please indicate the source for reprinting: Grape City official website , Grape City provides developers with professional development tools, solutions and services, and empowers developers.

Demystify the "low-level logic" of your data processing, and explain the calculation of the formula engine in detail (2)

background

Calculation chain

Calculation of Directed Acyclic Graph and Directed Directed Graph

Calculation of directed acyclic graph

Calculation of directed ring graph

On-demand calculation

Graph computing VS on-demand computing

Dirty value calculation

Expansion: Several questions about computing content

Summarize

葡萄城技术团队

引用和评论

AI对低代码技术的影响

你可能不知道的图片加载相关知识

手写一个动态海洋和天空效果的vue hooks

使用CSS给标题添加书名号并超出省略

原生electron起步-从零到一完成构建和打包

Koa+Typescript起手式(空环境) 不用每次玩node都要搭环境了！

LRU算法，你别跑，我就要吃透你