Preface
As a tool that is used every day, everyone in the industry must be very familiar with the basic usage of Git, for example:
- Use 0613343b2c4497 to find out which colleague introduced the bug in a certain line, and he will
git-blame
- Use
git-merge
to merge other people's code into your own flawless branch, and then find that the unit test fails to run; - Use
git-push -f
to cover all the submissions of other people in the team.
In addition, Git is actually a key-value database with version functions:
- All submitted content is stored under the directory
.git/objects/
; - A memory file content
blob
object store file metadatatree
objects, as well as storage of records submittedcommit
objects and so on; - Git provides key-value style read and
git-cat-file
git-hash-object
.
I have read my previous article "What exactly are we friends should know that if a merge is not fast-forward
, then a new commit
will be generated, and it has two The parent commit
object. Take the warehouse of the well-known Go language web framework gin
as an example. The submission with a hash value of e38955615a14e567811e390c87afe705df957f3a
was generated by a merge. There are two lines of parent
➜ gin git:(master) git cat-file -p 'e38955615a14e567811e390c87afe705df957f3a'
tree 93e5046e502847a6355ed26223a902b4de2de7c7
parent ad087650e9881c93a19fd8db75a86968aa998cac
parent ce26751a5a3ed13e9a6aa010d9a7fa767de91b8c
author Javier Provecho Fernandez <javiertitan@gmail.com> 1499534953 +0200
committer Javier Provecho Fernandez <javiertitan@gmail.com> 1499535020 +0200
Merge pull request #520 from 178inaba/travis-import_path
Through a submitted parent
attribute, all submitted objects form a directed acyclic graph. But you should be clever. git-log
is linear, so Git uses some kind of graph traversal algorithm.
Check man git-log
, you can see Commit Ordering
By default, the commits are shown in reverse chronological order.
Smart, you must already know how to implement this graph traversal algorithm.
Write one by git-log
Resolve commit
object
If you want to print commit
object in the correct order, you have to parse it first. We don't need to open the file, read the byte stream, and decompress the file content from scratch, just call git-cat-file
as above. git-cat-file
of the content printed by 0613343b2c46bf needs to be extracted for backup:
- The line starting with
parent
The hash value of this row should be used to locate a node in the directed acyclic graph; - The line starting with
committer
The UNIX timestamp of this line will be used as the sorting basis for determining who is the "next node".
You can write a class in Python to parse a commit
object
class CommitObject:
"""一个Git中的commit类型的对象解析后的结果。"""
def __init__(self, *, commit_id: str) -> None:
self.commit_id = commit_id
file_content = self._cat_file(commit_id)
self.parents = self._parse_parents(file_content)
self.timestamp = self._parse_commit_timestamp(file_content)
def _cat_file(self, commit_id: str) -> str:
cmd = ['git', 'cat-file', '-p', commit_id]
return subprocess.check_output(cmd).decode('utf-8')
def _parse_commit_timestamp(self, file_content: str) -> int:
"""解析出提交的UNIX时间戳。"""
lines = file_content.split('\n')
for line in lines:
if line.startswith('committer '):
m = re.search('committer .+ <[^ ]+> ([0-9]+)', line.strip())
return int(m.group(1))
def _parse_parents(self, file_content: str) -> List[str]:
lines = file_content.split('\n')
parents: List[str] = []
for line in lines:
if line.startswith('parent '):
m = re.search('parent (.*)', line.strip())
parent_id = m.group(1)
parents.append(parent_id)
return parents
Traverse the commit
composed of 0613343b2c4760-big root pile
Congratulations, the data structure you have learned can come in handy.
Assuming that the above class CommitObject
parse the gin
with a hash value of e38955615a14e567811e390c87afe705df957f3a
, then there will be two strings in parents
ad087650e9881c93a19fd8db75a86968aa998cac
;ce26751a5a3ed13e9a6aa010d9a7fa767de91b8c
。
in:
- The time of submission with a hash value of
ad087650e9881c93a19fd8db75a86968aa998cac
Sat Jul 8 12:31:44
; - The submission time with a hash value of
ce26751a5a3ed13e9a6aa010d9a7fa767de91b8c
Jan 28 02:32:44
.
Obviously, if you print the log in reverse chronological order ( reverse chronological
), the next node printed should be ad087650e9881c93a19fd8db75a86968aa998cac
-you can confirm this with the git-log
After printing ad087650e9881c93a19fd8db75a86968aa998cac
, from its parent submission and ce26751a5a3ed13e9a6aa010d9a7fa767de91b8c
, select the next submission object to be printed. Obviously, this is a cyclical process:
commit
objects to be printed, find the one with the largest submission timestamp;- Print its message;
commit
the pool of objects to be printed, and return to the first step;
This process continues until there are no commit
objects to be printed, and all commit
objects to be printed form a priority queue-which can be implemented with a large root heap.
However, I do not intend to actually implement a heap data structure in this short demonstration-I will replace it with insertion sort.
class MyGitLogPrinter():
def __init__(self, *, commit_id: str, n: int) -> None:
self.commits: List[CommitObject] = []
self.times = n
commit = CommitObject(commit_id=commit_id)
self._enqueue(commit)
def run(self):
i = 0
while len(self.commits) > 0 and i < self.times:
commit = self.commits.pop(0)
for parent_id in commit.parents:
parent = CommitObject(commit_id=parent_id)
self._enqueue(parent)
print('{} {}'.format(commit.commit_id, commit.timestamp))
i += 1
def _enqueue(self, commit: CommitObject):
for comm in self.commits:
if commit.commit_id == comm.commit_id:
return
# 插入排序,先找到一个待插入的下标,然后将从i到最后一个元素都往尾部移动,再将新节点插入下标i的位置。
i = 0
while i < len(self.commits):
if commit.timestamp > self.commits[i].timestamp:
break
i += 1
self.commits = self.commits[0:i] + [commit] + self.commits[i:]
Finally, you can experience it by providing a startup function
@click.command()
@click.option('--commit-id', required=True)
@click.option('-n', default=20)
def cli(commit_id: str, n: int):
MyGitLogPrinter(commit_id=commit_id, n=n).run()
if __name__ == '__main__':
cli()
Comparison of True and False Monkey King
In order to see if commit
objects printed by the above code is correct, I first redirect its output to a file
➜ gin git:(master) python3 ~/SourceCode/python/my_git_log/my_git_log.py --commit-id 'e38955615a14e567811e390c87afe705df957f3a' -n 20 > /tmp/my_git_log.txt
Then print it out in the same format with git-log
➜ gin git:(master) git log --pretty='format:%H %ct' 'e38955615a14e567811e390c87afe705df957f3a' -n 20 > /tmp/git_log.txt
Finally, let the diff
command tell us if there is a difference between the two files
➜ gin git:(master) diff /tmp/git_log.txt /tmp/my_git_log.txt
20c20
< 2521d8246d9813d65700650b29e278a08823e3ae 1499266911
\ No newline at end of file
---
> 2521d8246d9813d65700650b29e278a08823e3ae 1499266911
It can be said to be exactly the same.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。