background
After coming to the company for more than a year, I have been participating in BU’s own front-end publishing platform outside of business; in the past, our build bottom layer (CI/CD) mostly relied on the capabilities of the group. If you build and deploy, an error will be reported. At this time, you need to open the build log of the group system, and you will see an error message similar to the following:
There are new commits on the master branch, please merge the branches before continuing to deploy
This year, due to the new business system to be docked, we need to have our own CI/CD bottom layer, and to integrate this ability, I also took a lot of detours at the beginning, so I will record it.
Why backward detection is important
At present, most simple front-end builds and deployments are managed by branches; for example, when we complete a requirement: we will pull an iterative branch from the master trunk, and then we will use this branch to build and deploy development-test environment, Before the online release, we will merge the changes back into the trunk.
The above process is a very general process, but there may also be other methods, such as particularly cutting-edge: the deployment branch is pulled from the master trunk every time, then the iterative branch is merged, and then deployed. This situation is not within the scope of today's discussion, because this strategy will not have the trouble of backwardness.
Back to the front, why do we need to do backward detection? Because many times, an application will be maintained by multiple people and there will be multiple iterations (A, B). Here we assume that the two branches of 10.10 are pulled from the same node of the main trunk. A and B develop and deploy normally by themselves, and then the iteration of A is online on 10.22. When the release is completed, the code is merged into the master; however, the iteration of B is online on 10.24, but I don’t know that A has been released. If the deployment system does not have a master branch at this time Backward detection, B will use branch B to go online smoothly. What consequences will this cause?
- After B goes online, the branch merges back to the trunk to report an error (high probability). The subsequent iterations do not have this function, resulting in failures later;
- The function on iteration A is gone, online accident (heavier, 3.25). Whose is this pot? A's? B's? Or platform? In my opinion, this pot is the platform
You may ask, why don’t you first merge the master before going online, and then build and deploy it online? On our platform side, there are two considerations:
- There is a grayscale stage before going online. If you close the master first, if you find a bug in the grayscale, or if other iterations have to go up first, it will be troublesome to exit the grayscale. At this time, the main branch will also be contaminated;
- Re-pull branch from master, merge code to build deployment, if it is manual operation, it will be very troublesome for developers; if the platform does this operation by itself, there will be a price, and if the ability is integrated, it can be directly Go ahead and build a little bit, take the cutting-edge plan mentioned earlier
## Our approach
First of all, we must understand what situation, we call it a backward master, the previous picture:
Since feat/1.0.0 was merged to the master after the release of feat/1.0.0, feat/1.0.1 lags behind the master branch. This lag has nothing to do with the number of commits submitted by 1.0.1, but is only related to whether the commit information is synchronized with the master branch;
In fact, I checked the information about branch comparison on the Internet and found that it is basically shell script processing. There is a post on stackoverflow which is basically the same as my statement: link address
Is there a way to do a diff between my branch and master that excludes changes in master that have not been merged into my branch yet?
The high praise answer inside mentioned: git diff branch...master
, and gave the official link explanation;
It mentioned a concept called merge-base
, which is equivalent to the common starting point of the two branches, for example, the above two branches, this point is X;
git diff master...feat/1.0.1 is equivalent to git diff $(git merge-base master feat/1.0.1) feat/1.0.1 is equivalent to git diff commitX commitF
So when running: git diff feat/1.0.1...master
, you can see:
As can be seen from the results in the above figure, the differences listed are only A/B submitted twice;
So the two difference comparisons are equivalent to: git diff X B
;
That's it, wo got it!!!
But our platform is based on gitlab API, it is impossible to run this kind of command line directly. Fortunately, the Internet is omnipotent. This API is: gitlab.Repositories.compare
// Repositories.compare(projectId: string | number, from: string, to: string, options?: Sudo)
const info: any = await gitlab.Repositories.compare(projectId, commitId, 'master');
Pay attention to the position of from and to, from is feat/1.0.1, to is master, this is very important;
The info result is as follows:
{
"commit": {},
"commits": [],
"diffs": [],
"compare_timeout": false,
"compare_same_ref": false,
"web_url": "https://gitlab.example.com/thedude/gitlab-foss/-/compare/ae73cb07c9eeaf35924a10f713b364d32b2dd34f...0b4bc9a49b562e85de7cc9e834518ea6828729b9"
}
So if the branch is not behind the master, that is, the master branch has no new commits, then commit is a null value, and commit and diffs are an empty array; in addition, pay attention to the value of compare_timeout. If the workload of branch comparison is too large, it may cause Timeout, compare_timeout is true, then the detection is also invalid at this time;
Another thing to note is that the fourth parameter of this api is an option. When option.straight is true, the diff result at this time is not what we expected, so pay attention to it when calling.
Wild Road Sharing
In fact, things are not as smooth as described above. Initially, we did not find the compare API due to time constraints. Instead, we adopted a recursive method, which is to constantly backtrack the branch node, trying to find the node with the same commitId as the current master. The search range is 8 levels down. If you have not found more than 8 levels, it is judged to be behind the master branch, otherwise it is safe. I personally think that this algorithm is not lower than the middle problem of leetCode;
In some simple iterative branch management, the above algorithm can still work, but for overly complex branches, either timeout or more than 8 levels; timeout is because if a node does not match, you need to adjust the API to get To the next bunch of child nodes, the API call process is very time-consuming:
// 部分代码实现
if (level > MaxLevel || this.globalFinish) {
return false;
}
const reocords = []
for (let i = 0; i < parentIds.length; i++) {
const currentId = parentIds[i];
if (this.lookedIds.has(currentId)) {
continue;
}
this.lookedIds.add(currentId);
if (currentId === masterId) {
has = true;
break;
}
const commitInfo = await gitlab.Commits.show(projectId, currentId);
reocords.push({
gitlab,
projectId,
parentIds: commitInfo.parent_ids,
masterId,
level: level + 1,
recursion: true
});
}
The above algorithm has been running for a day or two when the platform was first launched, and has not encountered any problems; but when we knew the compare algorithm, we changed it decisively and tested it online overnight, because the official API is more reliable.
This sharing ends here, if you see this, I hope it will be useful to you.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。