[Git series] Git basic concepts

Version control system

A version control system is a software that helps software developers achieve teamwork and historical version maintenance. A version control system should have the following basic functions:

Allow developers to work concurrently;
One developer is not allowed to overwrite another developer’s modification;
Save all version history.

Version control systems can be divided into the following two categories:

Centralized version control system;
Decentralized (distributed) version control system.

Git is a distributed version control system. In this chapter, we will focus on the distributed version control system, especially Git.

Distributed Version Control System

The centralized version control system uses a central server to store all documents and achieve teamwork on this central server. The main disadvantage of this system is the single point of failure that may occur in the central server: if the central server is down for an hour, then this It is impossible to cooperate in development at all within an hour. The worst case that this kind of malpractice can cause is that if the central server completely crashes before a successful backup, all historical versions of this project stored in the central server will be lost. At this time, it's time to consider a distributed version control system.

The client in the distributed version control system can not only check the latest snapshot of the project directory, but also mirror the entire warehouse. If the server goes down, any warehouse image stored on the client can be used as a backup for recovery. Each inspection will form a complete backup of the warehouse. Git does not rely on a central server, so developers can perform various operations even offline. Developers can submit, create branches, view logs and other operations offline, and only need to connect to the Internet when they want to announce their changes or get the latest version of the changes.

Git advantages

Free and open source

Git is distributed software under the GPL open source license and is available for free on the entire Internet. You can use Git to manage property-related projects without spending a penny, and because it is open source, you can also download the source code to modify it according to your needs.

Fast and light

Because most operations can be done locally, this greatly improves the speed. Git does not rely on a central server, which is why it is not necessary for every operation to interact with a remote server. The core part of Git is written in C language, which avoids the waste of runtime caused by the use of higher-level languages. Although Git mirrors the entire warehouse, the amount of data on the client is still small, which shows how efficient Git is to compress and store data on the client.

Default backup

When there are many mirror copies, the possibility of data loss is greatly reduced. The data on any client is a mirror image of the warehouse, and these data can be used to recover when the system crashes or the hard disk is damaged.

Safety

Git uses an encryption method called Secure Hash Algorithm (SHA1) to name and identify objects in the database. Every file and every submission will add a check code for verification, and every time the data is retrieved, the check code must be used for verification. This means that without understanding Git, it is impossible for developers to successfully modify file data, submit information, or other operations that will change the Git project database.

Low hardware resource requirements

When using a centralized version control system, the required central server must be powerful enough to support the requests of all members of the team. For small development teams, this problem is not difficult to solve, but if the team size continues to increase, the hardware limitations of the server will become a bottleneck. In a distributed version control system, developers only need to connect to the server when they push or pull modifications, and all the heavier tasks are done on the client side, so the hardware conditions of the server are sufficient. Simple planning.

Simpler branch management

The centralized version control system uses a simple copy function. If we create a branch in it, the branch will copy all the code of the project in the new branch. This method is not efficient and time-consuming. Deleting and merging branches in a version control system are complicated and time-consuming. But branch management is much easier in Git. Creating, deleting and merging branches in Git will only take a small amount of time.

Terminology in Distributed Storage System

Local Repository

All version control system tools will provide a personal workspace in which to operate the copied project. Developers make changes in their personal workspace and submit them. These changes become part of the project warehouse. Git further provides developers with a private copy of the entire warehouse. Developers can perform any operations on this warehouse, such as adding files, deleting files, moving files, submitting changes, etc.

Working Directory and Staging Area or Index (Working Directory and Staging Area or Index)

The working directory is the directory location where the document is pulled or created. In a centralized system, developers usually make changes and then submit the changes directly to the warehouse. Git is different. Git does not track every modified document every time. Whenever you submit an operation, Git will search the existing documents in the temporary storage area. Not all the modified documents but only the existing temporary storage area Documents will be taken into consideration.

Let's take a look at the basic workflow of Git:

first step -modify a document in the working directory;
second part —— add this document to the temporary storage area;
third step —— Commit operation. This operation moves the document from the temporary storage area into the local library. After the push operation is completed, the change is permanently stored in the Git warehouse.

If you modify two files, sort.c and search.c , and you want to submit the two changes separately, then you can add a file to the temporary storage area and submit it, and then process it in this way A file. The operation example is as follows, -m are the instructions for this submission:

# First commit
[jerry@CentOS ~]$ git add sort.c

# adds file to the staging area
[jerry@CentOS ~]$ git commit –m “Added sort operation”

# Second commit
[jerry@CentOS ~]$ git add search.c

# adds file to the staging area
[jerry@CentOS ~]$ git commit –m “Added search operation”

Binary Large Objects (Blobs)

Blob is the abbreviation of Binary Large Object. Each version of the file is blob type. blob contains all the data of the file, but there is no file metadata. This is a binary file, in the Git database, it is known as the "secure hash of the file". In Git, files are processed not by name but by content.

Trees

A tree is an object, representing a directory. It contains blob type 0617fbaa93d9e0 and other subdirectories. A tree is a blob or a secure hash hash called a tree object.

Commit operations (Commits)

The commit operation maintains the current state of the warehouse, and a commit will also be named by a secure hash. You can think of the submission operation object as a node of the linked list, and each submission operation object has a pointer to the parent submission node. From a given commit, you can go back and view the history of the commit by looking up the parent pointer. If a commit has more than one parent commit, then this commit is created by merging two branches.

Branches

The branch is used to create another line that opens. By default, Git has a master branch, which is like another version management tool, Subversion, the trunk trunk . Generally speaking, a branch is used for the development of a new feature. Once the development of the new feature is completed, merge this branch to master , and then delete this branch. Each branch can be HEAD , and HEAD always points to the latest commit state of the branch unless specified. Whenever you complete a commit operation, HEAD will update itself with the latest commit operation.

Clone

The clone operation will create an instance of the warehouse. Cloning can not only view a copy of the current work, but also mirror the entire warehouse. Users can complete various operations on the local warehouse, and only need to connect to the network when the warehouse instance is synchronized.

Pull

The pull operation copies the changes of the remote warehouse instance to the local. This operation is used to synchronize the two warehouse instances. update operation has the same effect as the pull operation in SVN.

Push

The push operation copies the changes made to the local warehouse instance to the remote warehouse. This operation is often used to permanently store the local changes to the Git warehouse. commit operation has the same effect as the push operation in SVN.

HEAD

HEAD is a pointer that always points to the latest commit in the branch. Whenever you complete a submission, HEAD will always be updated with the most recent submission. The branched heads stored in the .git/refs/heads/ directory.

[jerry@CentOS ~]$ ls -1 .git/refs/heads/
master

[jerry@CentOS ~]$ cat .git/refs/heads/master
570837e7d58fa4bccd86cb575d884502188b0c49

Revision

A revision is a revision of the source code. In Git, the revision is reflected by the submission, and these submission operations are identified and identified by the secure hash algorithm.

URL

URL indicates the location of the Git repository. The content is stored in the Git configuration file .git\config .

[gituser@CentOS ~]$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = gituser@git.server.com:project.git
fetch = +refs/heads/*:refs/remotes/origin/*

I have compiled the Git system articles into an e-book, please click the link below to get it for free:

Link: https://pan.baidu.com/s/1mM6jK9B0GuYUYtDD_2lKFA
Extraction code: 1234

Finally, recently, many friends asked me for Linux learning roadmap , so based on my own experience, I spent a month staying up late in my spare time and compiled an e-book. Whether you are in an interview or self-improvement, I believe it will be helpful to you!

Give it to everyone for free, just ask you to give me a thumbs up!

e-book | Linux development learning roadmap

I also hope that some friends can join me to make this e-book more perfect!

Gain? I hope that the old guys will have a three-strike combo, so that more people can read this article

Recommended reading: