1
头图

图片

Text | Duo Xiaodong (flower name: Yi Shan)

Head of KusionStack, senior technical expert of Ant Group

Deeply cultivated in the field of infrastructure technology, focusing on technical work such as cloud-native networks, operation and maintenance, and programming languages

Read this article 2580 words for 6 minutes

|Foreword|

This article was written on the eve of KusionStack's open source. The author is inspired to review the team's arduous journey from the beginning of the Kusion project development to the successful road to open source today. It not only describes the original intention of the author and his team to do the Kusion project and the achievements of the project so far, but also expresses the author's sincere gratitude to the team.

What is KusionStack?

KusionStack is an open source programmable cloud native protocol stack!

The word kusion comes from fusion (meaning fusion) . It is hoped that through a one-stop technology stack, the multiple roles of the operation and maintenance system will be integrated, so as to improve the openness and scalability of the operation and maintenance infrastructure, and reduce costs and increase efficiency as a whole. By defining a cloud-native programmable access layer, KusionStack provides a complete set of solutions including configuration language KCL, model interface, automation tools, and best practices, connecting cloud-native infrastructure and business applications, and connecting cloud-native infrastructure and business applications. Each team connects the development, testing, integration, and release stages of the application life cycle to serve the construction of cloud-native automation systems and accelerate cloud-native implementation.

图片

PART. 1

For an ideal operation and maintenance system

In the autumn of 2019, the work of MOSN has lasted for nearly two years, during which we gradually completed the morphological verification of the core link of Alipay. In the whole process, in addition to the various technical challenges and difficulties faced by MOSN itself, the so-called cloud-native technology dividend has actually been constrained by the efficiency constraints caused by the solidification of the operation and maintenance system.

One day the supervisor came to me for dinner (the next set) , during which he described his ideal operation and maintenance system to me:

He wants SREs to write requirements in a specialized language that defines the state of the infrastructure by writing code, rather than spending a lot of energy on the check, find, fix cycle. The infrastructure team achieves higher overall ROI by providing open programmable languages and tools to support SRE teams with different demands.

I immediately realized this was similar to Hashicorp's Terraform god (later Hashicorp went public in late 2021, making it the largest open source IPO to date with a market cap of over $15 billion) . On the other hand, unlike IaaS delivery scenarios, Ant is faced with a large number of larger-scale and more complex cloud-native PaaS scenarios, which reminds me of Google’s internal use of specialized languages, tools and other technologies to open Borg[1] operation and maintenance The practice of ability [2], at the time felt that this was an interesting and challenging thing [3].

At the dinner table, we talked about some ideas and some challenges that we were not sure about. He asked me if I wanted to try it out. It didn’t matter if I didn’t. I didn't think too much about it at the time, so I agreed before finishing the meal.

图片

PART. 2

Long study, exploration and practice

Interlaced like mountains.

We have no experience in language design and development, nor experience in open automation system design. At the beginning of the project, we fell into a difficult predicament.

After a long period of repeated cycles of study, exploration, and practice, the project still has not improved significantly. What is more difficult is that we not only have to face the complex and coupled scenarios and problems within the ants, but also suffer from "this kind of high-level engineering". The way of transformation is the question of whether ants have soil to live."

The house leak happened to rain overnight, during which it was regrettable and helpless to experience some personnel changes. At the same time, due to various reasons, the project once fell into various difficulties. Throughout 2020, we have spent the unknown, tangled and helpless...

Thank you Lingxi, Tingjian and my supervisor. Thank you for not giving up on this project and sticking with me.

图片

PART. 3

Painful and happy incubation journey

Through continuous preaching, communication and communication, we have gradually found more friends with consensus in the infrastructure technical team and the SRE team.

At the same time, technically, we also got rid of the confusion, started the Kusion project in a real sense, and successfully transitioned from the PoC to the MVP stage.

In the end, we started a painful and happy incubation journey with "non-standard" applications as an entry point.

Thanks to Lingzhi, Qinghe, Zibo, Li Feng, Wuya, Xiangye, Dayuan... I can't list them all here. Thank you for your persistence in making this idea a reality.

图片

PART. 4

breakthrough and progress

Skip the various explorations and practices in the middle, and review this process. In the past year or so, we have combined compilation technology, operation and maintenance and platform technology, and successfully established an operation and maintenance system based on the Kusion programmable technology stack.

In terms of business scenarios, the project covers a large number of operation and maintenance scenarios from IaaS to SaaS. Up to now, a total of 800+ applications have been accessed, covering 9 BGs and 21 BUs. Among them, the typical case delivery operation and maintenance efficiency has improved by more than 90% . It is also the first time that Ant has incorporated a large number of heterogeneous applications into a complete set of operation and maintenance technology stacks.

At Ant, we have deeply explored DevOps and CICD practices based on cloud-native container and microservice technologies, improved Ant's cloud-native technology system, gradually released cloud-native efficiency dividends, and formed a virtual operation and maintenance R&D team of nearly 300 people.

Participants of different functions and different teams gathered together to solve their own problems, contributed 3W+ commits and 35W+ lines of code, and some participants spontaneously became Kusion developers. I think the accumulation of these engineers' cultural concepts and domain knowledge brings value far beyond the operation and maintenance business itself.

图片

In addition, Kusion has also become the basic technology of the new generation of operation and maintenance products such as programmable baseline products, cloud-native operation and maintenance products, and multi-cloud delivery products, and has become part of the upgrade of Ant's operation and maintenance system architecture.

Not forgetting the original intention, we hope to promote the rationalization of the cooperative relationship with the operation and maintenance participants through technical means, the automation based on the open technology stack, and the accumulation and accumulation of the operation and maintenance data and knowledge, so as to achieve the continuous improvement of the overall collaborative operation and maintenance efficiency.

At the same time, due to the large number of internal operation and maintenance scenarios and complex links in Ant, each link requires the close participation of the SRE who understands the operation and maintenance business best, and works collaboratively with the platform and application research and development, and finally all links are combined to form a complete set of In the operation and maintenance system, open technology will become more and more important under this idea.

The code written collaboratively by multiple roles such as platform R&D, SRE, and application R&D is a kind of data precipitation and a precipitation of business knowledge. Based on these data and knowledge, there will be more possibilities in the future.

PART. 5

On the road to open source

After a period of internal exploration, we hope to open source KusionStack to the technical community. Because we are aware of the problems we face, other companies and teams are actually facing the same. With open source, we hope that the team's work can be helpful to more people.

Of course, it is also limited by our own capabilities and the investment of energy and resources. We hope that more friends can participate and work with us to improve KusionStack, whether you are working in cloud native, operation and maintenance automation, programming language or compiler. In any field, we are very much looking forward to and welcome you to join us.

PART. 6

Looking forward to growing with you

This experience is extremely valuable to me, not only because I once again tried new explorations and achieved breakthroughs in new technical fields and ant technology upgrades, but more importantly, I also have a period of time with a group of people. It is a fantastic process of realizing the ideas and realizing the ideas together with the friends born in 1995.

In the future, Kusion's circle of friends will no longer be limited to Ant, but will be open source. We look forward to having more community friends to grow with us on KusionStack!

understand more...

KusionStack Star ✨:
https://github.com/KusionStack

The open source of KusionStack, I hope it can be helpful to everyone, and I hope to improve KusionStack with more friends. Students who are interested in cloud native, operation and maintenance automation, programming languages, and compilers are welcome to participate in community co-construction, explore and make breakthroughs in the upgrading of new technical fields, and realize more new ideas.

Click at the end of the article to read the original text directly to the project address.

【Reference link】

[1] "Large-scale cluster management at Google with Borg": https://pdos.csail.mit.edu/6.824/papers/borg.pdf

[2] Configuration Specifics: https://sre.google/workbook/configuration-specifics/

[3] "Borg, Omega, and Kubernetes": https://queue.acm.org/detail.cfm?id=2898444

【Recommended reading this week】

图片

KCL: Declarative Cloud-Native Configuration Policy Language

图片

Wonderful review | KusionStack is open source~

图片

【GLCC】The registration of college students in the programming summer camp has officially started!

图片

Review and Prospect of Ant Group Service Mesh Progress|SOFAStack 4th Anniversary

图片


SOFAStack
426 声望1.6k 粉丝

SOFAStack™(Scalable Open Financial Architecture Stack)是一套用于快速构建金融级分布式架构的中间件,也是在金融场景里锤炼出来的最佳实践。