Reprinted source: "OceanBase Database Planet" service number

The "DBA 100 People" interview program is an interview activity conducted by OceanBase around senior DBAs. It aims to pass the characters' stories, career development experiences, technical problems and practical cases encountered in their daily work, future ideas on technology trends, and hope for their growth. The way can give some advice and thinking to DBAs in various industries.


Editor's note

How does a DBA quickly locate faults? How to optimize performance? How to do a good job in technical selection and technical system construction?

Today, the first issue of "DBA 100 People" will take you to know a senior database expert & operation and maintenance person with more than ten years of workplace experience - the manager of the operation and maintenance center of the information technology department of a bank in Yunnan, who was responsible for the host, middleware, database Participate in the development of the bank's core system and payment system; lead the construction of the operation and maintenance system, emergency disaster recovery system, and monitoring system. I hope his experience can bring you reference value.

In 2001, Li Jianming, who was applying for university volunteering, knew nothing about this issue. Unlike many people who "like playing with computers" and think "programming is fun", Li Jianming never came into contact with computers in high school, because computers were not yet popularized in schools across the country at that time. He only heard people say that learning computers can be used in Find a good job after graduation.

In this way, Li Jianming became the first batch of undergraduates in China to learn information security technology. During his time in school, he also developed office software, forums, blogs, etc. part-time. For the first time, he felt the fun of programming: "I really enjoy this kind of from scratch. Yes, the process of making the product step by step.” Because the knowledge of information security technology is relatively broad, for Li Jianming, who wants to work steadily in the computer field, he must go deep into the project and lay a solid foundation for coding. Therefore, he chose to major in system analysis and integration when applying for graduate studies. For the choice of majors, Li Jianming believes that an important judgment factor is "whether the road in the future is wide or narrow, in layman's terms, when choosing a career in the future, will the market demand for the direction you choose be large or small? You can also combine your own interests. Think about it, maybe it will be more smooth in your area of interest.”

This choice paved the way for Li Jianming to enter the database field and become a DBA.

Growing DBA: Quickly locate faults and understand system operation mechanism

In 2008, Li Jianming, a graduate student, joined a credit union in the period of rapid development of electronic reform and participated in the maintenance and development of the core system and payment system. As early as 2004, the State Council requested to deepen the reform of rural credit cooperatives. In response to the call of the state, a credit cooperative established a scientific and technological settlement center and built an information system in 2005, which also brought more convenient and fast financial services to customers. However, the rising transaction volume has brought challenges to the system, and the difficulty of guaranteeing the stability of the system has continued to increase.

"Fast" is a big challenge

The establishment of the credit union information system is the responsibility of the science and technology settlement center where Li Jianming is located. Therefore, since he joined the company, he has been responsible for the maintenance of the system while optimizing the system, and he also taught himself DBA-related knowledge in his spare time. As a system maintainer, Li Jianming needs to locate the fault in time and repair it quickly when the system fails. However, to be "fast" is very challenging for the "technical novice" who is new to the job.

Li Jianming's approach is to figure out the operating mechanism and execution process of the system. When the system reports an error, you can quickly query which accounts are wrong and under what circumstances such errors will occur.

Generally speaking, operation and maintenance personnel will feel that the application system will always lack documentation. Developers may not write comprehensive documentation. Therefore, operation and maintenance personnel need to study more when maintaining the system. For example, understanding the system architecture and business architecture and its technical implementation path, understanding the system table structure can basically grasp the business structure, and understanding the commands of the operating system can grasp the system structure. In addition, further study of system components, such as delving into the table structure of the database.

Through continuous in-depth understanding of all levels of the system and continuous exploration to the upper layers, gradually understand the operating mechanism and execution process of the system, and grasp the truth in the hypothetical questions and verification answers again and again. When enough "experiments" are done, the ability to quickly locate and repair faults will naturally be mastered. In addition, Li Jianming spent seven or eight months reading through the main source code of the system. The research and self-study during this period not only enabled Li Jianming to quickly find the problem when the system went wrong, but also rapidly improved his technical ability and code quality.

In 2010, Li Jianming also officially became a system administrator and database administrator (DBA) because of his excellent performance for two years. He is responsible for the stable and efficient operation of the system, and manages and maintains databases, operating systems, application software, and middleware. Wait. "There were two system administrators at the time, and we needed to step in whenever there was a technical failure." And his study and exploration of the system in the past two years has enabled him to have a basic understanding of the system and handle new tasks with ease.

Explore, step on the pit, learn

These questions are not difficult for a mature DBA. But for the growing DBA, it is necessary to constantly explore, step on the pit, and learn in practice.

For example, when the database connection pool fails, a less experienced DBA may think that the number of connections is not enough, so more connections are created continuously, and the problem cannot be solved when the connection pool is full. Because the connection pool is full is only a phenomenon, to solve the fundamental problem, we need to look at the essence of the phenomenon. In some cases, there is indeed a reason for the excessive business volume, but in most cases, the business volume will not suddenly increase, and the reason for the connection pool being full may be that the new business is not efficient, which causes the connection pool to be blocked or the connection pool is not timely. freed. At this time, blindly adding connections only delays the blockage of the system, and does not solve the problem in essence.

For another example, when a long transaction problem occurs and the business is interrupted, a DBA who is not familiar with the principle will just foolishly wait for the transaction to be rolled back. However, after accumulating a certain amount of experience, you will know that you can actually roll back the transaction during the normal operation of the business by continuously increasing the transaction log file.

"The work of a system administrator has given me a deeper understanding of software systems. From theory to practice, I have accumulated rich experience in troubleshooting and performance optimization, and improved my technical confidence," said Li Jianming. At the same time, he summarized his experience in ensuring system stability, locating faults, and optimizing performance.

As a DBA, ensuring system stability is the top priority. Li Jianming believes that three tasks need to be done well.

The first, prevention, is concerned with the planning of infrastructure such as network, storage, computing resources, and operating systems. Make sure the underlying framework remains relatively uniform.

Second, when the system is running normally, pay attention to observing whether the system is operating abnormally and whether it is in a sub-health state from a business perspective, do a good job in the construction of observability in early warning and emergency response, and improve the disaster recovery system and standardized operation management.

Third, emergency, that is, locate the fault through the system architecture, and then combine the monitoring system to assist in analyzing the problem. For fault location and performance optimization, three aspects can be carried out:

1) Confirm the system architecture and business architecture, and troubleshoot problems from a global perspective; 2) Pay attention to various indicators, such as common CPU memory, I/O, load, business volume, business success rate, response time, etc.; 3) In a stressful scenario, the performance of each module is excluded from front to back, or summary and detailed information of some activities, including some operations currently being performed by the system. For example, what function is called, what SQL is executed, and what function is being executed by each thread, and even the response efficiency and network delay of network transmission packets are concerned.

In addition, in order to avoid shirk responsibilities between the heads of each module, it is necessary to have a technical integration team. In an emergency, everyone can take charge of the overall situation and quickly handle problems.

Experienced DBA: Database selection is not the most expensive, but the best

During his ten years in a credit union, Li Jianming has grown from an ignorant freshman to an experienced system administrator and senior engineer. The credit union's electronic system has also developed from supporting millions of accounting transactions to supporting tens of millions of transactions. Accounting transaction volume. If credit unions are the cornerstone of Li Jianming's career and the cradle for cultivating his professional skills, then banks are the advanced testing ground for his ten-year professional skills and experience.

Since 2018, Li Jianming joined a bank in Yunnan, responsible for the management of the operation and maintenance team, the construction of the operation and maintenance system, the emergency disaster recovery system, and the monitoring system. In addition, in 2021, I participated in a job that seems to be far away from DBA: database selection.

Selection that is not based on business is a hooligan. If you want to do a good job in technology selection, you must first have a certain understanding of which databases in the market can be used as candidates. Li Jianming believes that an excellent database product should be as stable and concise as Informix, and at the same time, like Oracle, it should have rich built-in system tables and high-performance features. It should also meet the requirements of ACID (Atomicity, Consistency, Isolation, Durability, or Indivisibility), Atomicity, Consistency, Isolation, Durability). In addition, ensuring the transparency of applications is an indispensable capability. The database should also give the DBA enough control, so that the DBA can see what tools he uses, how much CPU is consumed, how long it takes, and what operations he has done in the process of interacting with the database.

Secondly, according to the business characteristics, select the "optimal solution". Take Li Jianming's current technology selection when changing databases in the banking industry as an example.

Most banks will use the relatively mature Oracle database. The advantage of using Oracle is that its documents, books and other materials are relatively complete, the tools are rich and the ecology is mature, and when encountering problems, it can reuse other people's solutions or find someone who understands Oracle relatively quickly. Developer. However, with the increasing amount of bank data, Oracle's data processing capabilities are stretched, and it also shows shortcomings when doing more work in different places. Oracle's global cache mechanism will make the data of customer A run in the center of A and customer B. The data of B is run in the center of B, and if it is crossed, the performance will be greatly degraded. In addition, the cost of hardware input and the cost of software usage are increasing. It is easier said than done to replace the database, Li Jianming's bank is in a dilemma in database selection.

In fact, what the bank really needs is a database with better performance and lower cost, which can flexibly expand and shrink, reduce nodes when the business volume is low, and increase nodes when dealing with transaction peaks. For the traditional centralized database Oracle, distributed database seems to be the new direction of choice.

From a system point of view, compared with the current outstanding distributed databases on the market, Li Jianming believes that the architecture of the database should be determined by itself, not by the upper-layer applications and DBAs who always pay attention to its data distribution mechanism. On this point, it can be ruled out Some database options, OceanBase database has the advantage due to its all-in-one architecture. From the point of view of expansion and shrinkage, there are database-level, table-level, and row-level flexibility. The database-level expansion and shrinkage are not detailed enough, and the row-level expansion and shrinkage are not needed for the current business transaction volume. , Therefore, OceanBase, which expands and shrinks at the table level, is a good choice, and OceanBase's three locations and five centers can prevent data loss. From the perspective of performance and mature cases, OceanBase is an outstanding database product that has experienced the test of large-scale events. It has stably supported the "Double 11" event for ten years, and has been verified by Alipay and MYbank. And in the performance test, it reached 8000 TPS (transactions per second).

Because a bank in Yunnan uses a distributed database on the basis of the traditional core system. Therefore, after comprehensive consideration, OceanBase is considered to be a more suitable choice. Li Jianming, who participated in this database selection, summed up four experiences:

First, the current product and technical strength of the domestic database is also gradually improving, which can be used as part of the selection consideration.

Second, in the face of the increasingly huge amount of data, distributed database is a better choice.

Third, when choosing a distributed database, consider the ACID of the database.

Fourth, consider the degree of modification and compatibility of the application. Consider database performance, volume, and compatibility with hardware.

Seven Growth Tips for DBAs

At the end of the interview, when talking about what qualities or abilities an excellent DBA should have, Li Jianming shared his views and gave seven suggestions based on his more than ten years of workplace experience:

  1. Possess a solid foundation in database theory. For example, the introduction of database systems, the core concepts of databases, principles of distributed databases, etc., theories can provide macro guidance for practice at work.
  1. Familiar with software development fundamentals and technical architecture. DBAs may not need to write good code. But if he is not familiar with the code, such as how to write the code, how to do load balancing, how to connect to the database, and not clear about common frameworks, then he may only say "I think there is no problem with the database" when troubleshooting. Let alone guarantee the stability of the system from a global perspective.
  1. Familiar with operating system operation and performance tuning. The database eventually has to run on the operating system. The operating proficiency of the operating system can be accumulated through daily work, and the performance tuning can be mastered by reading the instructions in the official documents, such as understanding the meaning of parameters and the impact of modifying parameters, and more details in daily work. hands-on.
  1. Proficient in database operation and maintenance. In particular, it has to go through the baptism of high concurrency and large amount of data. The proficiency of operation depends more on the accumulation of quantity. Whether you can encounter high concurrency scenarios depends on the business of your company. For example, an Oracle database that supports a small business volume can run very well according to the default parameters in many cases. DBA will not encounter major challenges, at most it is to expand storage space. Therefore, it is difficult to accumulate experience in this area.
  1. The more difficult the theory is, the harder it should be to master it. For a lot of technical knowledge, first learn one of the knowledge points and reach a certain depth before developing horizontally. If you are familiar with multiple skills, and each skill only stays on the surface, then it is difficult for you to reach a high level in the technical field.
  1. Stay curious about knowledge and persist in lifelong learning. For technical people, if you want to learn IT theory, you can read technical books; you can use geek time to learn the practical experience of the system; to learn technical knowledge in professional fields, you can read the official documents of the manufacturer; You can browse the CSDN when you are “diagnosed”; for disciplinary and common-sense content, you can use the App; for researching strong theoretical and academic knowledge, you can read papers.
  1. Develop your reverse thinking and structured thinking skills. Break the inertia of thinking and imagine multiple possibilities, especially thinking in two extreme directions. Keep asking questions to challenge your own assumptions and test new ones.

If he could give him some advice when he was young, Li Jianming said , "I have a career but I know no limit. I must be willing to give up, find the direction I am most interested in or be good at, work hard, and practice nirvana. I have to learn more. Some basic principles across disciplines, broaden one's own knowledge, improve multi-dimensional thinking ability, and think diligently, plan more hands-on." Share with many database practitioners.


About the author

Li Jianming

Currently, he is the manager of the operation and maintenance center of the information technology department in a bank in Yunnan. Has the qualifications of system analyst, Elasticsearch certified engineer, Kubernetes certified administrator, DevOps Master, Oracle certified expert, OceanBase certified expert, etc.


OceanBase技术站
22 声望122 粉丝

海量记录,笔笔算数