Can interval algorithm problems be solved in seconds with a line tree?

background

Give a two arrays, one of which is A [1,2,3,4], and the other is B [5,6,7,8]. Let you find the big array after the two arrays are merged:

Max
Minimum
sum

Is this question very simple? We can solve it directly and easily in the time of $O(m+n)$, where m and n are the sizes of arrays A and B, respectively.

What if I can modify A and B are some values, and I asked many times maximum, minimum, and sum it?

The naive idea is to modify the array in place, and then recalculate the time of $O(m+n)$. Obviously, this did not use the previously calculated results, and the efficiency is not high. Is there a more efficient way?

have! The line segment tree can be solved.

What is a segment tree?

The line segment tree is essentially a tree. More precisely, it is a binary tree, and it is a balanced binary tree. Regarding why it is a balanced binary tree, we will talk about it later, here we first have such an understanding.

Although it is a binary tree, we usually use an array to simulate the tree structure for the line segment tree, instead of the traditional definition of TreeNode.

On the one hand, it is easy to implement, and on the other hand, because the line segment tree is actually a complete binary tree, it is very efficient to use an array to simulate directly. When heap topic here's why I've written before in the realization of the binary heap talked about, we can "snap Gaga force" in reply to my public number heap get.

What problem to solve

Just like its name, the line segment tree is related to line segments (intervals). Each tree node of the line segment tree actually stores a interval (segment) information . satisfies certain properties of for these intervals, the line segment tree can be used to improve performance.

That:

What kind of nature is it?
How to improve the performance?

What kind of nature is it?

We mentioned earlier such as maximum, minimum and sum to meet this certain properties . That is, I can deduce a certain index of the union of subsets based on several (here two) subsets.

In the above example, we can regard array A and array B as two sets. Then: the maximum value of set A and the maximum value of set B are known, we can directly obtain the maximum value of the union of set A and set B through max(Amax, Bmax). Among them, Amax and Bmax are the maximum values of set A and set B, respectively. The minimum and total are the same, so I won’t repeat them. Therefore, if the statistical information satisfies this property, we can use the line segment tree. But whether to use it or not depends on whether the line segment tree can improve performance.

How to improve the performance?

Regarding the improvement of performance, let me talk about it later when we finish talking about the implementation.

accomplish

Take the summation at the beginning of the article as an example.

We can regard interval A and interval B as the left and right nodes of a tree respectively, and store the interval sum of A and the interval sum of B into the left and right child nodes respectively.

Next, divide the interval of A into left and right parts, and similarly, B is also divided into left and right parts. Continue this process until you can no longer divide.

To sum up, the interval is continuously divided into two, and the interval information is stored in the left and right nodes respectively. If it is a summation, then the interval information is the sum of the intervals. The line segment tree at this time looks like this:

The blue font indicates the interval sum.

Note that there are n total leaf nodes of this tree (n is the length of the original array), and each one corresponds to a certain value of the original array.

It is also easy to reflect in the code. Use traverse directly to solve the problem. This is because we need to know the statistics of the left and right nodes in order to calculate the statistics of the current node.

If you are not familiar with post-order traversal, you can see my previous tree topic, and the official account can be obtained by deducting and adding the reply tree.

Like the representation of the binary heap, we can use an array to represent the tree, using 2 * i + 1 and 2 * 1 + 2 to represent the index of the left and right nodes, where i is the index of the current node corresponding to the tree.

tree is an array used to build a tree of line segments, similar to a binary heap. It's just that tree[i] currently stores interval information.

As I described above, there is obvious recursion when building trees, so we can build trees recursively. Specifically, you can define a build(tree_index, l, r) method to build a tree. Among them, l and r are the left and right end points of the corresponding interval, so that l and r can uniquely determine an interval. tree_index is actually used to mark which position of the tree array the current interval information should be updated to.

We store the interval information on the tree, then finally we can use tree[tree_index] = .... to update the interval information.

Core code:

def build(self, tree_index:int, l:int, r:int):
    '''
    递归创建线段树
    tree_index : 线段树节点在数组中位置
    l, r : 该节点表示的区间的左，右边界
    '''
    if l == r:
        self.tree[tree_index] = self.data[l]
        return
    mid = (l+r) // 2 # 区间中点，对应左孩子区间结束，右孩子区间开头
    left, right = 2 * tree_index + 1, 2 * tree_index + 2 # tree_index 的左右子树索引
    self.build(left, l, mid)
    self.build(right, mid+1, r)
    # 典型的后序遍历
    # 区间和使用加法即可，如果不是区间和要改下面这行代码
    self.tree[tree_index] = self.tree[left] + self.tree[right]

The array self.tree[i] of the above code is actually used to store the interval sum similar to the blue font in the above figure. its position on the tree, and its interval is stored with .

complexity analysis

Time complexity: From the recurrence relation T(n) = 2*T(n/2) + 1, the time complexity is O(n)$

Don't know how to get the $O(n)$? You can read the first chapter of my "The Road to Algorithm Clearance". https://leetcode-solution.cn/book-intro

Space complexity: the size of tree is the same order as n, so the space complexity is O(n)$

The tree was finally built, but I knew it was not efficient at all now. We have to do is to efficiently handle intervals in the case of frequently updated query .

So based on this line segment tree method, how to update and query interval information?

Interval query

First answer the simple question interval query what is the principle.

If you query a range of information. Here it is also OK to use post-order traversal. For example, I want to find the interval sum of the interval [l,r].

Then if the current left node:

Completely fall within [l,r]. For example, [2,3] completely falls within [1,4]. We directly take out the interval sum of the left node in the tree for backup, we might as well use lsum.
Part falls within [l,r]. For example, the [1,3] part falls on [2,4]. At this time, we continue to recurse until it completely falls within the interval (the above situation). At this time, we directly take out the interval sum for the left node in the tree for use.
The answer is to add up all the previously taken out values

The processing of the right node is the same, so I won't repeat it.

complexity analysis

Time complexity: The query does not need to process two leaf nodes at every moment. In fact, the number of processing is roughly the same as the height of the tree. The tree is balanced, so the complexity is $O(logn)$

Or by the recurrence relation T(n) = T(n/2) + 1, so the time complexity is $O(logn)$

Don't know how to get the $O(logn)$? You can read the first chapter of my "The Road to Algorithm Clearance". https://leetcode-solution.cn/book-intro

You can understand this complexity in conjunction with the code behind.

Interval modification

So what if I change A[1] to 1?

If the tree is not modified, then obviously the query interval must be wrong as long as it contains A[1], for example, the sum of the query interval [1,3] will get the wrong answer. Therefore, we have to modify the tree at the same time when A[1] is modified.

The question is which tree values we want to modify, and how much should we modify?

First answer the first question, which tree values should be modified?

We know that the leaf nodes of the line segment tree are all values in the original array, that is, the n leaf nodes of the line segment tree correspond to the original array. Therefore, we first need find the leaf node corresponding to the position we modified and modify its value.

Is this over?

It's not over. In fact, all the parent nodes and grandparent nodes (if any) of the leaf node we modified need to be changed. That is to say, we need continuously bubble from this leaf node to the root node, and modify the interval information along the way

This process is similar to the browser's event model

Next, answer the last question. What is the specific modification?

For the summation, we need to first change the leaf node to the modified value, and add delta to the sum of the points on the path from the leaf node to the root node, where delta is the difference before and after the change.

How to update the maximum and minimum values? Think about it for yourself.

The problems of which nodes to modify and how many are modified are solved, so the code implementation is easy.

complexity analysis

Time complexity: The modification does not need to process two leaf nodes at each moment. In fact, the number of processing is roughly the same as the height of the tree. The tree is balanced, so the complexity is $O(logn)$

Or by the recurrence relation T(n) = T(n/2) + 1, so the time complexity is $O(logn)$

Don't know how to get the $O(logn)$? You can read the first chapter of my "The Road to Algorithm Clearance". https://leetcode-solution.cn/book-intro

You can understand this complexity in conjunction with the code behind.

Code

The line segment tree code has been placed on the brushing plug-in, and you can get it by replying to the plug-in under the official account "Like Plus".


class SegmentTree:
    def __init__(self, data:List[int]):
        '''
        data: 传入的数组
        '''
        self.data = data
        self.n = len(data)
        #  申请 4 倍 data 长度的空间来存线段树节点
        self.tree = [None] * (4 * self.n) # 索引 i 的左孩子索引为 2i+1，右孩子为 2i+2
        if self.n:
            self.build(0, 0, self.n-1)
    # 本质就是一个自底向上的更新过程
    # 因此可以使用后序遍历，即在函数返回的时候更新父节点。
    def update(self, tree_index, l, r, index):
        '''
        tree_index: 某个根节点索引
        l, r : 此根节点代表区间的左右边界
        index : 更新的值的索引
        '''
        if l == r==index :
            self.tree[tree_index] = self.data[index]
            return
        mid = (l+r)//2
        left, right = 2 * tree_index + 1, 2 * tree_index + 2
        if index > mid:
            # 要更新的区间在右子树
            self.update(right, mid+1, r, index)
        else:
            # 要更新的区间在左子树 index<=mid
            self.update(left, l, mid, index)
        # 查询区间一部分在左子树一部分在右子树
        # 区间和使用加法即可，如果不是区间和要改下面这行代码
        self.tree[tree_index] = self.tree[left] + self.tree[right]

    def updateSum(self,index:int,value:int):
        self.data[index] = value
        self.update(0, 0, self.n-1, index)
    def query(self, tree_index:int, l:int, r:int, ql:int, qr:int) -> int:
        '''
        递归查询区间 [ql,..,qr] 的值
        tree_index : 某个根节点的索引
        l, r : 该节点表示的区间的左右边界
        ql, qr: 待查询区间的左右边界
        '''
        if l == ql and r == qr:
            return self.tree[tree_index]

        # 区间中点，对应左孩子区间结束，右孩子区间开头
        mid = (l+r) // 2
        left, right = tree_index * 2 + 1, tree_index * 2 + 2
        if qr <= mid:
            # 查询区间全在左子树
            return self.query(left, l, mid, ql, qr)
        elif ql > mid:
            # 查询区间全在右子树
            return self.query(right, mid+1, r, ql, qr)

        # 查询区间一部分在左子树一部分在右子树
        # 区间和使用加法即可，如果不是区间和要改下面这行代码
        return self.query(left, l, mid, ql, mid) + self.query(right, mid+1, r, mid+1, qr)

    def querySum(self, ql:int, qr:int) -> int:
        '''
        返回区间 [ql,..,qr] 的和
        '''
        return self.query(0, 0, self.n-1, ql, qr)

    def build(self, tree_index:int, l:int, r:int):
        '''
        递归创建线段树
        tree_index : 线段树节点在数组中位置
        l, r : 该节点表示的区间的左，右边界
        '''
        if l == r:
            self.tree[tree_index] = self.data[l]
            return
        mid = (l+r) // 2 # 区间中点，对应左孩子区间结束，右孩子区间开头
        left, right = 2 * tree_index + 1, 2 * tree_index + 2 # tree_index 的左右子树索引
        self.build(left, l, mid)
        self.build(right, mid+1, r)
        # 区间和使用加法即可，如果不是区间和要改下面这行代码
        self.tree[tree_index] = self.tree[left] + self.tree[right]

Answer the previous question

Why is a balanced binary tree?

The preceding time complexity is actually based on the premise that the tree is a balanced binary tree. So must the line segment tree be a balanced binary tree? Yes. This is because the line segment tree is a complete binary tree, and the complete binary tree is balanced.

Of course, there is another premise, that is, the total number of nodes in the tree is of the same order as the length of the original array, which is the order of n. It is easy to calculate why it is of the same order, and it can be obtained only according to the recursive formula.

Why line segment tree can improve performance?

As long as you understand the implementation part of , then it is not difficult to understand this problem. Because the time complexity of modification and query is both $logn$, the complexity of querying without using the brute force of the line segment tree is as high as $O(n)$. The corresponding cost is the space of $O(n)$ created, so the line segment tree is also a typical space-to-time algorithm.

Finally click on the question. Can interval algorithm problems be solved in seconds with the line segment tree? This has already been answered in the article, and it depends on whether two points are met:

Whether it is possible to deduce a certain index of the union of subsets based on several (here two) subsets.
Whether it can improve performance (compared to the naive violent solution). Generally, frequently modifies , you can consider using the line segment tree optimize the modified query time consumption .

Can interval algorithm problems be solved in seconds with a line tree?