I. Overview
Today I'm going to tell you in detail how to start participating in open source projects and help you complete your first PR on GitHub .
Of course, in addition to the normal PR merging process, I am also going to introduce in detail how to solve relatively complex problems such as conflicts, need to append commits, need to merge commits, etc. after a PR is submitted.
Overall, this article is planned to be divided into 4 parts:
- Talk about why you should be involved in open source projects and why I introduce how to PR
- Talk about how to start participating in open source projects, that is, how to find suitable open source projects and how to find contribution points
- Introduce how to get started with the PR process, that is, the whole process from fork to push
- Introduce how to solve various common problems encountered after submitting PR
Ok, let's get started!
2. Why participate in open source projects
In this article, I don't intend to talk about "why to participate in open source", and introduce the benefits of participating in open source projects in detail. I want to talk about "why to participate in open source projects" from the perspective of "improving coding ability".
I have a habit during interviews. If a candidate says in his resume that he is familiar with a certain language, I will habitually ask him a question:
Have you ever read the source code of an open source project? Or even further, have you ever participated in an open source community, or put forward a PR for an open source project ?
If the answer is yes, for example, the candidate says that he has read the source code of some Kubernetes modules, and I further confirm that he has really read and understood or that he has really submitted a bugfix/feature type PR, then I will no longer Ask the programming language level, because I believe that being able to understand part of the module source code of a mature open source project or be able to submit a bugfix/feature type PR has already explained everything.
When I was learning Golang myself, it was roughly divided into two stages:
- Learn basic grammar and start writing projects until you can proficiently complete the development of various business functions;
- After reading the source code of some open source projects, I feel that I have benefited a lot, and the coding level has reached a new level.
Almost when I was looking at the source code of the Kubernetes project, I deeply realized the huge gap between the general enterprise internal project and the open source project that brings together the wisdom of the best programmers in the world. The importance of improving the coding level of programmers (of course, you can say that there is also very good code inside Google that is not open source, there is no doubt, but I don't think we need to discuss special cases today).
If you read the source code of an open source project carefully, you will always find some small flaws. At this time, you can submit a PR (Pull Request) to let your code be incorporated into the open source project and run in "every corner of the world", how interesting it is! And the successful incorporation of the first PR is often like opening a Pandora's box, you will enter another world, begin to contact the open source community, and feel the charm of open source!
3. Why I want to introduce how to PR
Our company has open sourced 2 projects, namely:
DevStream projects and DevLake projects will have new contributors submitting PRs every three or five times, but most contributors often encounter one or more problems when submitting the first PR, such as conflicts, too many commits records or confusion, and no commits. Signature, irregular commit message, various ci process check errors, etc.
When we see a new contributor submit a PR, we are naturally very happy and enthusiastic to welcome him and tell him how to fix various problems, but as the number of contributors increases, our open source community needs to answer a question almost every day: " How to Properly Submit a PR ". Maybe at this point you start to wonder if we haven't provided the appropriate documentation? In fact, we have detailed documents, but people are always lazy. Most new contributors do not have enough willingness to read the documents carefully and then submit PRs. Even many new contributors have just started contacting open source projects. I am relatively unfamiliar with the project structure and document organization structure, and I don't even think of the existence of these documents. In short, for various reasons, most new contributors will choose to " promise PR first ".
So today I want to try to explain "how to submit a PR correctly" thoroughly, and try to elaborate on the whole process of PR on GitHub, as well as the various difficulties and solutions that may be encountered here. On the one hand, I hope to be helpful to newcomers participating in open source projects for the first time, and on the other hand, I hope to further reduce the participation threshold of the DevStream community and DevLake community .
4. I want to participate in open source projects, how do I get started?
No matter why you decide to start participating in an open source project, whether it is out of learning, interest, sense of achievement, etc., or to incorporate a certain feature you need into an open source project, in short, today you are determined to give an open source project to an open source project. The project has submitted a PR, well, let's get started!
4.1. Find a suitable open source project
If you have already decided to participate in an open source community, please skip this section.
If you just want to start participating in open source and don't know which community to participate in, I have a few tips:
- Don't start with a particularly mature project . For example, if you want to participate in the Kubernetes community now, on the one hand, because there are too many contributors, it is difficult to grab an entry-level issue to start the first PR; on the other hand, because there are too many contributors, your voice will be drowned out, and the community will maintain You don’t care about one more or one less of you (of course no one will admit it, but you have to believe it), if you mention a PR and have encountered various problems and can’t solve it independently, then it is very likely that your PR will directly Closed over time, no one cares if you have a good participation experience;
- Don't start with extremely small projects . I don't need to explain it, do I? Very early open source projects may face many problems, such as irregular code, irregular collaboration process, frequent refactoring and not issue-driven, leaving external participants at a loss...
- Choose the incubation projects of well-known open source software foundations. On the one hand, such projects are not particularly mature, so they are friendly to new contributors; , The Linux Foundation, CNCF, etc.
For example, you can find open source projects you are interested in from these places:
- CNCF Sandbox Project
- CNCF Incubation Programs (list includes graduate programs)
- Apache project (Incubating in the name of the project during the incubation period)
Of course, you can also choose directly from the CNCF sandbox project DevStream or the Apache incubation project Apache DevLake to knock on the door of the open source world.
4.2. Find Contribution Points
There are many ways to participate in open source projects, the most typical way is to submit a PR related to feature development or bug fix, but in fact, complete documentation, perfect test cases, bug feedback, etc. are also very valuable contributions. However, this article starts from the contribution point that needs to be mentioned. Taking the DevStream project as an example (the same is true for other projects), there will be an Issues entry on the homepage of the project GitHub code base, where the currently known bugs and proposals of the project will be recorded (you can It is understood as new requirements), planned supplementary documents, urgently needed UTs, etc., as shown below:
In Issues, we can generally find an issue marked with the "good first issue" label. Clicking this label can further filter out all good first issues directly. This is a relatively simple entry-level issue reserved for new contributors by the community:
Yes, from here, browse through these good first issues to see if there are any issues you are interested in that have not been assigned, and then leave a message below, wait for the project administrator to assign tasks before you can start coding, like so:
As shown in the figure, if an issue has not been claimed, you can leave a message at this time, wait for the administrator to assign this task to you, and then you can start developing.
5. I want to submit PR, how to get started?
Generally, the root directory of the open source project code base will have a CONTRIBUTING.md or other document with a similar name to describe how to start contributing, like this:
In DevStream's Contributing document, we put a Development Workflow , which is actually an introduction to PR workflow, but today, I want to talk about PR workflow in more detail.
5.1. The first step: Fork project warehouse
Projects on GitHub have a Fork button. We need to fork the open source project to our own account first. Take DevStream as an example:
Click the Fork button, then go back to your account, you can find the fork project:
This project is under your own account, which means that you have the permission to modify it arbitrarily. What we have to do later is to mention the code changes to the code base that we fork, and then merge the commits into the upstream project through Pull Request.
5.2. Step 2: Clone the project repository to the local
For any open source project, the process is pretty much the same. I wrote some commands directly, and you can copy and paste them to execute them directly. Of course, some variables in the command still need to be modified according to your own actual needs. For example, for the DevStream project, we can configure several environment variables like this:
- environment variable
export WORKING_PATH="~/gocode"
export USER="daniel-hutao"
export PROJECT="devstream"
export ORG="devstream-io"
Similarly for DevLake, the command here becomes like this:
export WORKING_PATH="~/gocode"
export USER="daniel-hutao"
export PROJECT="incubator-devlake"
export ORG="apache"
Remember to change USER to your GitHub username. Of course, WORKING_PATH can also be flexibly configured. Write the corresponding path wherever you want to put the code.
Then there are a few lines of general commands to complete operations such as clone:
- clone etc.
mkdir -p ${WORKING_PATH}
cd ${WORKING_PATH}
# You can also use the url: git@github.com:${USER}/${PROJECT}.git
# if your ssh configuration is proper
git clone https://github.com/${USER}/${PROJECT}.git
cd ${PROJECT}
git remote add upstream https://github.com/${ORG}/${PROJECT}.git
# Never push to upstream locally
git remote set-url --push upstream no_push
If you have configured the ssh method to clone the code, of course, the url used by the git clone command can be changed to git@github.com:${USER}/${PROJECT}.git
.
After completing this step, the remote information we see locally should look like this:
-
git remote -v
origin git@github.com:daniel-hutao/devstream.git (fetch)
origin git@github.com:daniel-hutao/devstream.git (push)
upstream https://github.com/devstream-io/devstream (fetch)
upstream no_push (push)
Remember, your local code changes are always only submitted to origin, and then submit Pull Request to upstream through origin.
5.3. Step 3: Update the local branch code
If you just forked and cloned, your local code is definitely new. But "just" only exists once, and every time you are ready to start writing code, you need to confirm that the code in the local branch is new, because developing based on the old code will get you into infinite conflicts.
- Update the local main branch code:
git fetch upstream
git checkout main
git rebase upstream/main
Of course, I do not recommend that you write code directly in the main branch, although your first PR submission from main is completely fine, but what if you need to submit 2 PRs at the same time? In short, it is encouraged to add a more readable branch such as feat-xxx or fix-xxx to complete the development work.
- create branch
git checkout -b feat-xxx
In this way, we get a feature branch feat-xxx that is the same as the upstream main branch code, and then we can start writing code happily!
5.4, the fourth step: write code
Nothing to say, just write, write!
5.5. Step 5: Commit and Push
- General process:
git add <file>
git commit -s -m "some description here"
git push origin feat-xxx
Of course, everyone here needs to understand the meaning of these commands and parameters, and adjust them flexibly. For example, you can also use git add --all
to complete the add step, and you can also add the -f
parameter when pushing to force the remote branch to be overwritten (if it already exists, but the commits record is not what you want) . But please remember git commit
-s
parameters must be added!
If you are used to using IDE to commit, of course there is no problem, like this:
Here we should pay attention to the specification of the commit message. The requirements of each open source project may be different. For example, the specification of DevStream is similar to this format:
<type>[optional scope]: <description>
[optional body]
[optional footer(s)]
To give a few examples:
-
feat: some description here
-
docs: some description here
-
fix: some description here
-
fix(core): some description here
-
chore: some description here
- ...
The two steps of commit and push can be done in one step in the IDE, or they can be separated. I am used to separate operations to give myself more leeway. Also, I'm more used to command line operations:
-
git push origin feat-1
Counting objects: 80, done.
Delta compression using up to 10 threads.
Compressing objects: 100% (74/74), done.
Writing objects: 100% (80/80), 13.78 KiB | 4.59 MiB/s, done.
Total 80 (delta 55), reused 0 (delta 0)
remote: Resolving deltas: 100% (55/55), completed with 31 local objects.
remote:
remote: Create a pull request for 'feat-1' on GitHub by visiting:
remote: https://github.com/daniel-hutao/devstream/pull/new/feat-1
remote:
To github.com:daniel-hutao/devstream.git
* [new branch] feat-1 -> feat-1
At this point, the local commits are pushed to the remote.
5.6. Step 6: Open a PR
After completing the push operation, we open GitHub and see a yellow prompt box telling us that we can open a Pull Request:
If you don't see this box, you can also switch directly to the feat-1 branch, then click the "Contribute" button below to open a PR, or click Pull requests next to Issues to enter the corresponding page.
- The Pull Request format defaults to this:
Here we need to fill in an appropriate title (the default is the same as the commit message), and then fill in the PR description according to the template. The PR template is actually different in every open source project. We need to read the above content carefully to avoid making low-level mistakes.
For example, DevStream's template is currently divided into 4 parts:
- Pre-Checklist : 3 pre-check items are listed here, reminding PR submitters to read the Contributing documentation first, and then the code should have complete comments or documentation, and add test cases as much as possible;
- Description : Here is the description of the PR, that is, the content of your PR. You can describe what problems this PR solves here;
- Related Issues : Remember? Before we start writing code, we actually need to claim the issue. What we need to fill in here is the id of the corresponding issue. If the issue link you received is https://github.com/devstream-io/devstream/issues/796 , and This issue is completed after the modification of your PR, and it can be closed. At this time, you can write " close #796 " under Related Issues;
- New Behavior : In most cases, the code needs to be tested after modification. At this time, we can paste the screenshot of the test result here, so that reviewers can know that your code has passed the test and the function is as expected, which can reduce the workload of review. , quickly merge in.
This template is not complicated, we can just fill it in directly.
- for example:
Then click "Create pull request" in the lower right corner to complete the creation of a PR. However, I can't click this button here. The modifications I use to demonstrate are meaningless and cannot be merged into the upstream code base. But I still want to show you the effect of PR created, let's take pr655 as an example:
This is a PR I mentioned last month, which is basically the same as the template format. In addition to the content of the template, you may have noticed that there is an additional Test section here. Yes, the template is not dead. The template is only to reduce communication costs. You can adjust it appropriately, as long as the result is "going in a clearer direction" . I have added a local detailed test result record through the Test section here, telling reviewers that I have fully tested locally, please feel free to join.
After submitting the PR, we can find our PR in the PR list. At this time, we also need to pay attention to whether all the ci checks can pass. If it fails, it needs to be repaired in time. Taking DevStream as an example, the ci check items are roughly as follows:
5.7. Step 7: PR integration
If your PR is perfect and uncontroversial, then it won't be too long before the project administrator will directly merge it into your PR, and the life cycle of your PR will come to an end.
But, yes, there is a "but" here, but often the first PR will not be so smooth. Next, we will introduce some problems and corresponding solutions in detail.
6. I submitted a PR, and then encountered problems A,B,C,D,E,F,G,...😭
In most cases, after submitting a PR, it will not be merged immediately. Reviewers may propose various revisions, or there are some normative problems in our PR itself, or the ci check will report an error directly. How to solve it? Keep reading.
6.1. Reviewers have suggested some revisions, how do I update the PR?
Many times, after we submit a PR, we need to continue to add commits. For example, after submitting, we find that there is still some problem with the code, and we want to change it, or the reviewers have suggested some revisions, and we need to update the code.
Generally, we abide by a convention: before the review starts, update the code as much as possible without introducing new commits records, that is, merge as soon as possible to ensure that the commits records are clear and meaningful; commit, you can not merge forward, which can make the second review work more targeted.
However, different communities have different requirements. Some open source projects may require that only one commit is included in a PR. You can flexibly judge according to the actual scene.
Speaking of how to update the PR, we only need to continue to modify the code locally, and then execute these commands through the same steps as the first commit:
git add <file>
git commit -s -m "some description here"
git push origin feat-xxx
At this time, don't look at the feat-xxx branch of origin. In fact, GitHub will help you append all the new commits to an unincorporated PR. That's right, you just keep pushing, and the PR will be updated automatically.
As for how to merge commits, we will introduce it in the next section.
6.2. There are too many Commits or the records are confusing. How to merge Commits?
In many cases, we need to merge commits. For example, you changed 100 lines of code in the first commit, and then found that 1 line was changed. At this time, another commit was submitted, then the second commit was too "boring" , we need to merge.
6.2.1. Merging Commits by Git command line
For example, I have 2 commits with the same name here, and the second commit actually only changed one punctuation:
At this time, we can use the rebase command to complete the merge of the two commits:
git rebase -i HEAD~2
Executing this command will enter an editing page, the default is vim editing mode, the content is roughly as follows:
pick 3114c0f docs: just for test
pick 9b7d63b docs: just for test
# Rebase d640931..9b7d63b onto d640931 (2 commands)
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# d, drop = remove commit
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
We need to change the second pick to s, then save and exit (vim's wq command):
pick 3114c0f docs: just for test
s 9b7d63b docs: just for test
Then you will enter the second editing page:
# This is a combination of 2 commits.
# This is the 1st commit message:
docs: just for test
Signed-off-by: Daniel Hu <tao.hu@merico.dev>
# This is the commit message #2:
docs: just for test
Signed-off-by: Daniel Hu <tao.hu@merico.dev>
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# ...
This is used to edit the merged commit message. We directly delete the redundant part and keep only the following lines:
docs: just for test
Signed-off-by: Daniel Hu <tao.hu@merico.dev>
Then it is also the save and exit operation of vim, and you can see the log at this time:
[detached HEAD 80f5e57] docs: just for test
Date: Wed Jul 6 10:28:37 2022 +0800
1 file changed, 2 insertions(+)
Successfully rebased and updated refs/heads/feat-1.
At this time, you can use the git log
command to check whether the commits record is as expected:
Well, we confirm locally that the commits have been merged, and then we can continue to push to the remote and update the PR:
git push -f origin feat-xxx
There needs to be a -f
parameter to force the update. Merging commits is also a conflict in nature, and it needs to flush out the remote old commits records.
6.2.2 Merging Commits in IDE
The graphical way of course can also achieve the merging of Commits.
- Screenshots go
- Click on Git in the lower right corner
- Select the commits you want to merge
- Right-click, then click Squash Commits, remember to say a word in your mouth: Let's go!
Then you can see this page:
This is the page for graphically modifying the commit message. Okay, change it to what you like, and then click the OK button in the lower right corner, and the matter is over.
Look, 2 commits, they "fused" into a new commit with a "makeover".
6.3. PR conflict, how to solve it?
Conflicts can be resolved online or locally, let's look at each.
6.3.1. Online conflict resolution
We want to avoid conflicts as much as possible, and develop the habit of updating the local code every time before writing code. However, conflicts cannot be completely avoided. Sometimes your PR is blocked for a few days. Maybe someone else changed the same line of code and was merged first. At this time, your PR will conflict, like this (similarly, at this moment I can't really go to the upstream project to construct the conflict, so the following conflict for demonstration is in my own repo):
Every time I see this page it makes my heart skip a beat. We click the "Resolve conflicts" button to see the content of the specific conflict:
You can see the specific conflict row, the next thing to do is to resolve the conflict. We need to delete all the <<<<<<<
, >>>>>>>
and =======
tags and keep only the final desired content, as follows:
Then click "Mark as Resolved" in the upper right corner:
Finally click "Commit merge":
This completes the conflict resolution, and you can see that a new commit is generated:
At this point, the conflict is resolved.
6.3.2. Local conflict resolution
More often, we need to resolve conflicts locally, especially when there are too many and too complex.
Again, we construct a conflict, this time trying to resolve the conflict locally.
- Take a look at the content of the conflict online first:
- Then we execute locally:
# 先切回到 main 分支
git checkout main
# 拉取上游代码(实际场景肯定是和上游冲突,我们这里的演示环境其实是 origin)
git fetch upstream
# 更新本地 main(这里也可以用 rebase,但是 reset 不管有没有冲突总是会成功)
git reset --hard upstream/main
At this point, the local main branch is exactly the same as the remote (or upstream) main branch code, and then what we need to do is to merge the code of the main branch into our own feature branch and resolve conflicts at the same time.
git checkout feat-1
git rebase main
- At this time, you will see this log:
First, rewinding head to replay your work on top of it...
Applying: docs: conflict test 1
Using index info to reconstruct a base tree...
M README.md
Falling back to patching base and 3-way merge...
Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
error: Failed to merge in the changes.
Patch failed at 0001 docs: conflict test 1
The copy of the patch that failed is found in: .git/rebase-apply/patch
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
We need to resolve the conflict, open README.md directly, find the conflict, and modify it directly. The changes here are no different from the online conflict resolution introduced in the previous section, so I won't go into details.
The code also retains only the final content, and then continues with the git command:
Maybe you are not at ease at this time, then use the git log
command to look at the commits history:
Here "conflict test 2" is the record I committed to the main branch. You can see that this time is a little later than "conflict test 1", but it was merged first. After our rebase operation, this record is in front, and "conflict test 1" of our feature branch is in the back, which seems to be harmonious. We continue to push this change to the remote. This command has appeared many times:
git push -f origin feat-xxx
At this time, if we go back to GitHub to look at the PR, we can find that the conflict has been resolved, and no redundant commit records are generated, which means that the commit record of this PR is very clean, as if the conflict has never occurred:
As for when to resolve conflicts online and when to resolve conflicts locally, it depends on how you think about "whether you need to keep records of conflict resolution ". Different communities have different understandings, and may be particularly mature in open source communities who want to use local resolution. Conflict method, because this merge record generated by online conflict resolution is actually "no nutrition". As for the DevStream community and DevLake community, we recommend the latter, but do not require it.
6.4. The CI check is not enough: how to fix the issues related to the commit message?
We mentioned the specification of commit message earlier, but it is easy to make mistakes when submitting PR for the first time. For example, feat: xxx
can actually pass the ci check, but feat: Xxx
will not work. Suppose now that we accidentally submitted a PR, but the message in the commit is not standardized, how to modify it at this time?
- Too simple, just execute:
git commit --amend
After this command is executed, you can enter the editing page and update the commit message at will. After the change, continue to push:
git push -f origin feat-xxx
This will update the commit message in the PR.
6.5. The CI check is not enough: how to fix the DCO(sign) problem?
Quite a few open source projects require that all merged commits contain a line like this:
Daniel Hu <tao.hu@merico.dev>
So the commit message will look like this:
feat: some description here
Signed-off-by: Daniel Hu <tao.hu@merico.dev>
This line of information is equivalent to the author's signature of the corresponding commit. To add such a line of signature is of course very simple, we directly add a -s
parameter after the --- git commit -s -m "some description here"
git commit
command, and the submitted commit will be brought with it. your signature.
But what if you forgot to add Signed-off-by to commits in your first PR? At this time, if the DCO check is configured for the corresponding open source project, then your PR will be "pulled out" in the ci check and not signed correctly.
Also construct an unsigned commit first:
I can't directly push it to the DevStream project code base to demonstrate how to make DCO report an error, but if I submit a PR, the effect I see is this:
Let's see how to solve it:
-
git commit --amend -s
Such a simple command can directly add Signed-off-by information to the most recent commit. After executing this line of command, it will directly enter the commit message editing page, the default is as follows:
docs: dco test
Signed-off-by: Daniel Hu <tao.hu@merico.dev>
At this time, we can modify the commit message at the same time. If we don't need it, just save and exit, and the signature information will be added automatically.
What about after signing? Of course, there is a forced push:
git push -f origin feat-xxx
In this way, the DCO error in your PR will be fixed naturally.
7. Finally
Accidentally this article is a bit long. Okay, we're done!
- Welcome to my personal website or WeChat public account " Nushu Cloud Native " to browse more of my articles;
- Welcome to follow the DevStream community and play open source with me;
- Welcome to the official DevStream blog to see more articles published by the DevStream team.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。