Hello everyone, my name is lucifer. Today I will share with you the algorithm in Git.

This is the second article in this series - "Last Common Ancestor in Git", the first at
here

<!-- more -->

git merge-base

git merge-base AB can find the most recent common ancestor of commits A and B. And since branches and tags are just aliases for commits in Git, A and B can also be branches or tags.

 
         o---o---o---B
        /

---o---1---o---o---o---A

As shown in the Git commit above, then git merge-base AB will directly return the hash value of commit 1.

For more details about the usage of merge-base , please refer to the official documentation

How to find common ancestors?

We know that every time git commits, it actually creates a new commit object, which records some meta information. for example:

  • Submitter
  • submission time
  • hash
  • Citation from last commit
  • . . .

The reference to the last commit causes the git commit to be a linked list structure. And git supports creating branches and developing based on branches, so git commits are essentially a directed acyclic graph structure.

As shown above, we created a new branch dev based on commit 2. After developing on dev, we can merge it into the main branch
master.

When we perform a merge operation, git will first use the merge-base algorithm to calculate the nearest common ancestor.

If the nearest common ancestor is the merged commit, fast-forward can be performed. As shown below, we will dev
Merging into master can fast-forward as if the dev branch was never created.

Finally, a more complex example. As shown below, we execute git merge HEAD on commit 3 and commit 6. What will happen?

The answer is that commit 2 will be found. So how do you find 2?

If you keep looking for the parent node from commit 6, find 1 and put it in the hash table. Next, starting from commit 3, we also keep looking for the parent node. If the parent node exists in the hash table, then we have found the common ancestor. Since it is the first common ancestor found, it is the most recent common ancestor and returns directly. That's it.

There is just one topic in Likou, let's take a look.

Likou Zhenti

Topic address (236. The nearest common ancestor of a binary tree)

https://leetcode-cn.com/problems/lowest-common-ancestor-of-a-binary-tree/

Topic description

 给定一个二叉树, 找到该树中两个指定节点的最近公共祖先。

百度百科中最近公共祖先的定义为:“对于有根树 T 的两个节点 p、q,最近公共祖先表示为一个节点 x,满足 x 是 p、q 的祖先且 x 的深度尽可能大(一个节点也可以是它自己的祖先)。”

 

示例 1:

输入:root = [3,5,1,6,2,0,8,null,null,7,4], p = 5, q = 1
输出:3
解释:节点 5 和节点 1 的最近公共祖先是节点 3 。


示例 2:

输入:root = [3,5,1,6,2,0,8,null,null,7,4], p = 5, q = 4
输出:5
解释:节点 5 和节点 4 的最近公共祖先是节点 5 。因为根据定义最近公共祖先节点可以为节点本身。


示例 3:

输入:root = [1,2], p = 1, q = 2
输出:1


 

提示:

树中节点数目在范围 [2, 105] 内。
-109 <= Node.val <= 109
所有 Node.val 互不相同 。
p != q
p 和 q 均存在于给定的二叉树中。

Pre-knowledge

  • binary tree
  • tree traversal
  • hash table

ideas

This problem is to give you a binary tree and let you start from the root of the binary tree.

This is different from Git. In Git, we need to start from two commit nodes to find the parent node.

Does that mean that the above method cannot be applied?

Neither. We can maintain the parent-child relationship when traversing the binary tree, and then the problem turns into the previous one.

code

  • Language Support: Java

Java Code:

 
class Solution {
    HashMap<Integer, TreeNode> map = new HashMap<>(); // 关系为:key 的父亲是 value
    HashSet<TreeNode> set = new HashSet<>();

    public void dfs(TreeNode root) {
        if (root.left != null) {
            map.put(root.left.val, root);
            dfs(root.left);
        }
        if (root.right != null) {
            map.put(root.right.val, root);
            dfs(root.right);
        }
    }


    public TreeNode lowestCommonAncestor(TreeNode root, TreeNode p, TreeNode q) {
        dfs(root);
        // 从 p 出发,找到 p 的所有祖先节点,将其放入HashSet
        while (p != null) {
            set.add(p);
            p = map.get(p.val);
        }
        // 从 q 出发找到第一个能在HashSet中找到的节点,即为最近公共祖先
        while (q != null) {
            if (set.contains(q)) {
                return q;
            }
            q = map.get(q.val);
        }
        return null;
    }
}

Complexity Analysis

Let n be the length of the linked list.

  • Time complexity: $O(n)$
  • Space complexity: $O(n)$

optimization

In fact, the algorithm is not very efficient. If our warehouse submits a lot, that is, N is very large, it will be slow.

Is there any possibility of optimization?

of course can. And there are many angles of optimization.

No, this classmate thought of preprocessing, the link is here .
That is, the node information is maintained for the first time and stored in a file. Then when merge-base is executed later, there is no need to traverse the processed nodes. **Theoretically, we traverse the node only once, no matter how many times we merge-base.**

Is it really that ideal?

Sadly not. For example, if I perform rebase, reset and other operations to change the node, what should I do? There are a lot of details here, so I won't expand them here. If you are interested, you can join my discussion group.

Summarize

git merge-base is essentially an algorithm for finding the nearest common ancestor.

The simplest method of this algorithm is to use the hash table preprocessing from one node, and then start traversing from another node,
The first node found to appear in the hash table is the nearest common ancestor.

This algorithm also has room for optimization, and after optimization, various boundary conditions need to be considered, that is, the cache invalidation problem.

That's all for this article. What do you think about this, please leave me a message, I will check the answers one by one when I have time. For more algorithm routines, you can visit my LeetCode solution repository: https://github.com/azl397985856/leetcode . It has 40K star now. You can also pay attention to my public account "Likoujiajia" to take you down the hard bone of the algorithm.

Pay attention to the public account, try to restore the problem-solving ideas in clear and straightforward language, and have a lot of diagrams to teach you to identify routines and solve problems efficiently.


lucifer
5.3k 声望4.6k 粉丝