Summary of golang back-end technology development essentials

Golang

Let’s talk about back-end development first. The commonly used languages are Java, Golang, and Python. Programmers of the Java language type are currently the most numerous on the market, and are also chosen by many companies. The entire back-end project developed in the Java language has a relatively good project specification, which is suitable for complex business logic. I am working on Golang language. Golang language is suitable for developing microservices and is characterized by fast development and high efficiency. Most companies (byte main language Go, B station main Go, Tencent prefers Go, etc.) have begun to choose to develop in Golang, because one of the biggest features of this language compared to Java and Python is that it saves memory and supports high levels of support. Concurrency, concise and efficient, easy to learn.

There are many back-end frameworks in Golang language, such as Gin (recommended) , Beego, Iris, etc. The operation of relational database includes gorm. These are the basic capabilities that Golang backend development masters.
Microservice frameworks are: go-zero kratos

Golang language has some unique features, such as coroutine goroutine, which is more lightweight and efficient than threads. For example, channel is a way to support communication between coroutines through shared memory. When multiple goroutines consume the data in the channel, the channel supports a locking mechanism, and each message is eventually allocated to only one goroutine for consumption. Inside Golang, there is a whole set of goroutine scheduling mechanism GMP, where G refers to Goroutine, M refers to Machine, and P refers to Process. The principle of GMP is roughly to save the Goroutines that need to be run through the global Cache and each thread Cache, and allocate the Goroutines to run on a limited Machine through the coordination of the Process.

Golang is a language that supports GC. It uses a three-color marking garbage collection algorithm internally. The principle is roughly to mark those referenced objects through the reachability algorithm, and store the remaining unmarked objects that need to be released. Recycle. In the older versions, STW had a greater impact. The so-called STW was during GC, because multi-threaded access to memory would cause unsafe problems. In order to ensure the accuracy of memory GC, when marking objects, it will pass The barrier stops the program code and continues to run until all objects have been processed, and then continue to run the program. This short time is called STW (Stop The World). Due to STW, it will cause problems that the program cannot provide services. In Java, this phenomenon also exists. However, with the continuous update of the Golang version, the GC algorithm is also continuously optimized, and the STW time is gradually getting shorter and shorter.

One thing you need to pay attention to is that when defining a map, try not to store pointers in the value, because this will cause the GC to take too long.

Another knowledge point is that Golang's map is an unordered map. If you need to traverse the data from the map, you need to save it in slices and sort them in a certain order, so as to ensure that the order of the data queried each time is consistent. And map is unsafe for lock-free concurrency. When using map for memory caching, you need to consider the security issues brought by multi-threaded access to the cache. There are two common methods, one is to add a read-write lock RWLock, and the other is to use sync.Map. In the scenario of writing more and reading less, it is recommended to use RWLock, because sync.Map uses the space-to-time method internally. There are two maps inside, one supports read operations and the other supports write operations. When write operations are too frequent, the map will be continuously updated. , which brings frequent GC operations, which will bring a relatively large performance overhead.

Goroutines opened in Golang are stateless. If the main function needs to wait for the completion of Goroutine execution or terminate the Goroutine operation, there are usually three methods. The first is to use waitGroup in sync, including Add, Done, and Wait methods. It can be analogous to CountDownLatch in Java. The second is to use the Done method in the Context package to bring the context of the main function into the Goroutine, and use select in the main function to monitor the Done signal from the context received by the Goroutine. The third is to customize a channel, pass it into the Goroutine, and the main function waits to read the termination information sent to the channel after the execution in the Goroutine is completed.

Golang has no concept of inheritance, only the concept of composition. The definition of each struct can be regarded as a class, and structs can be combined and nested. In software design principles, the composition of classes can achieve the effect of decoupling more than the inheritance of classes. Golang has no obvious interface implementation logic. When a struct implements all the methods declared by an interface, the struct implements the interface by default. In the input parameters of the function call, we usually pass in the struct that specifically implements this interface at the caller, and define this interface to receive in the receiving parameter of the function body, so as to achieve the reuse effect of the called function. This is also the embodiment of the idea of polymorphism in object-oriented features.

In Golang, error handling is the most painful. Basically nine out of ten function calls will return an error, and each error needs to be handled or thrown upwards. Usually in business logic, we will customize error and declare the type of error. In Golang's official errors package, error is just a struct, which provides methods such as New, Wrap, and Error, and provides functions for creating errors, throwing errors upward, and outputting error information. Therefore, it should be noted that we cannot use the equivalent value of string to compare whether the errors are the same, because error is a struct and an instance object. Although the value information of the two errors is the same, the object is only a storage address value in the memory. are not the same. Usually we use the defer function in the first line of the function to uniformly process all errors in the function body. Where defer is the delay processing flag, the function will intercept and process the code in the defer anonymous function before returning. (can be solved using the pkg/errors package)

The project structure of Golang has a relatively well-known example on github, which you can refer to or imitate. What everyone needs to pay attention to is that when an external project needs to call the code of the project, it can only call functions or object methods outside the inner package. For the code in the inner package, it is not available to the external calling project. This is also a code protection mechanism.

MySQL

Back-end projects are inseparable from the addition, deletion, modification and inspection of data. Usually, MySQL is the one that everyone comes into contact with the most. It is recommended to read MySQL 45.

The commonly used versions of MySQL are 5.7 and 8.0. Usually, for forward compatibility, the MySQL version used by most companies is 5.7. In this version, MySQL supports the InnoDB storage engine by default. The feature of this engine is that it supports transactions, which is what we often call ACID.

Generally speaking, if you need to add, modify, delete and other operations on multiple tables, in order to prevent the inconsistency between the success and failure of multi-stage operations, you need to use the transaction feature. If the operation does not completely fail, the transaction is rolled back, cancelling all operations.

Transactions have four isolation levels, read uncommitted, read committed, repeatable read, and serialized. The transaction isolation level supported by default in InnoDB in MySQL is Repeatable Read.

It should be noted that for each isolation level of the transaction, the storage engine will provide the corresponding lock mechanism implementation. When you operate on data, you need to pay attention to the situation of deadlock. In data reading and operation, read-write locks are supported. Read locks are shared locks, and multiple read locks can be owned at the same time. Write locks are also called exclusive locks. Only one write lock is allowed to operate on data at a time. Different storage engines have different lock levels, including table locks, row locks, and gap locks. Everyone pays attention to when performing delete or update operations, it is best to bring where conditions to prevent full table deletion or update, or deadlock caused by triggering table locks.

The efficiency of data query through index lookup is very different from that of full table scan. The essential reason is that inside the InnoDB engine, a B+ tree is built for the indexed table fields to improve query efficiency. In the process of writing the query statement, try to indicate the fields to be queried, so that the InnoDB search does not need to be returned to the table if a joint index has been created for the field to be queried. The leaf node of the B+ tree usually stores the primary key of the table. The corresponding primary key is queried in the index B+ tree through the query condition, and then the way to find the entire row of data in the B+ tree established with the primary key as the query condition is called the return table. The return table will perform two B+ tree queries.

The supported query method of the joint index is the leftmost matching principle. If the where condition in the query statement is not queried according to the leftmost matching principle of the joint index, InnoDB will perform a full table scan. Index optimization uses the Explain statement.

In table design, a table should not have too many fields, generally no more than 20 fields. The field type of each field should be reduced as much as possible according to the actual situation. For example, uuid is 32 bits by default, so define varchar(32), and define varchar(255) will cause a waste of space.

In the paging query, the page and pageSize fields supported by limit, when the page is larger, the query efficiency is lower. Therefore, try to design an auto-incremented integer field. When the page size is too large, the query efficiency can be improved by adding a where condition to filter the auto-incremented integer field.

MySQL is a stand-alone storage by default. For business scenarios with more reads and fewer writes, it can be deployed master-slave, supports read-write separation, and reduces the pressure on write servers.

MySQL can only support a few k concurrency at most. For scenarios with a large number of concurrent query data, it is recommended to add cache services such as Redis, Memcached, etc. upstream.

MySQL provides binlog logs when operating data, and usually uses component services such as cancal to export data to message queues for analysis, specific searches, user recommendations, and other scenarios. If the MySQL server data is lost, the binlog log can also be used for data recovery. However, because the data operation will be stored in the system memory for a period of time and flushed to the hard disk regularly, the binlog log may not be able to completely recover all the data.

Redis

When the number of users increases sharply and access is frequent, adding a cache service upstream of MySQL to synchronize some hot data can reduce the access pressure on the database. A common cache service is Redis.

Redis is an in-memory distributed cache component written in C language. The feature is that it supports a large number of read and write scenarios, and query data is efficient.

Although redis is a distributed cache, in order to prevent service downtime, a persistence mechanism is usually used to save data to the hard disk. The persistence mechanisms supported by redis include AOF and RDB. AOF restores data by recording the log of each write, modify, and delete operation, and re-executes the command through the operation log after the service is down. By recording data snapshots, RDB restores all data before the time period through data snapshots after the service is down. Generally speaking, both have their own shortcomings. The disadvantage of AOF is that data recovery is slow, and the disadvantage of RDB is that data snapshots are executed regularly, so the data in the period between the downtime and the last data snapshot recording time operation, will be lost. So we will use both synchronous execution. It is recommended that the time interval of RDB is not set too short, because executing the internal bgsave command during RDB snapshot will cause redis to fail to provide services for a short period of time.

Although redis can effectively reduce the access pressure of the database, redis is not a silver bullet. If the data is ultimately based on the database, then when reading and writing data, you need to consider the problem of inconsistency between the cache and the database.

Solution for data consistency between redis and mysql <br>Read operation: If a data in redis is out of date, query the data directly from Mysql.
Write operation: Update Mysql first, and then update Redis; if the update of redis fails, you can consider retrying,

For the above operations, if there is still inconsistency, consider adding a bottom-up solution, monitoring the mysql binlog log, and then sending the binlog log to the kafka queue for consumption.

Introducing redis, in addition to the problem of data inconsistency, there may also be cache avalanches, cache penetration, and cache breakdown. When adding a cache, try to set a different cache invalidation time to prevent a large number of cached data invalidation at the same time, and the problem of excessive db access pressure caused by data access to db; cache penetration can consider Bloom filter, cache breakdown consideration Distributed lock solution.

The reason why redis is fast in reading is because a large amount of data is stored in memory. If a large amount of cached data storage is required, the memory capacity of a single machine is limited, and redis needs to be deployed in a cluster. The cluster deployment and storage method of redis is to evenly distribute more than 10,000 split slots in each redis server, and the redis key stores data in the redis server corresponding to a certain slot through consistent hashing. The expansion and contraction operations of redis will cause relatively large data migration. At this time, try to stop the service externally, otherwise it may cause the problem of cache data invalidation.

Redis uses the sentinel mechanism to detect the problem of service online and offline. The usual deployment pattern is one master, two slaves and three sentinels.

There are many application scenarios of redis, such as using zset to implement leaderboards, using list to implement lightweight message queues, using hash set to implement Weibo likes, and so on.

When storing in redis, you need to pay attention that the key value should not be in Chinese, and the value value should not be too large. When designing keys, the design specifications of keys should be unified according to the business.

Although redis has 16 db libraries, they are only logically isolated. The cached data is stored in one place, and the reading and writing of different db libraries is a competitive relationship.

Kafka

Next, let's talk about message queues, so only talk about Kafka here.

There is no need to talk about the application scenarios of message queues. It is good to use upstream and downstream decoupling, traffic peak shaving, asynchronous processing, etc. according to actual scenarios.

Let's talk about some common problems encountered by message queues. For example, message loss, repeated message sending, message retry mechanism, message sequence, message repeated consumption, etc.

The loss of messages in Kafka is extremely low, because Kafka is a guaranteed at-least-once delivery mechanism. As long as the offset is within the HW, Kafka has persisted to the hard disk by default, so there will be no message loss when consuming offset messages within the HW.

Kafka provides an ACK mechanism for message sending. This ACK mechanism has three values to choose from.

When ACK=0, that is, the message is sent to the leader to confirm that the sending is successful. At this time, it is not known whether other replicas have persisted the message or not. In this case, it is very likely that the message is sent but lost. Because if the leader node is down at this time, other replicas will run for the leader. When a replica runs for the leader, Kafka introduces the leader epoach mechanism to truncate the log. If the replica does not synchronize to the leader to receive this message , then the message will be lost.

When ACK=1, the message is sent to all replicas in the ISR set under the partition. When there are multiple replicas in the ISR set, even if the node where the leader is located goes down, there will be no message loss. Because the leader under the partition is generated from the ISR set by default, and all replicas in the ISR set have already stored the message, so the possibility of loss is almost zero.

When ACK=-1, the message is sent to all replicas under the partition. Regardless of whether the node where the leader is located is down, or whether there is only one replica under the ISR, as long as there are more than one replica under the partition, the message will not be lost.

In daily situations, we default ACK=1, because ACK=0 messages are very likely to be lost, and ACK=-1 messages take too long to send confirmation, and the sending efficiency is too low.

For the problem of repeated message sending, I recommend deduplication from the consumer side. Because for the producer side, if a message is sent but no ACK is received, but it is actually sent successfully but it is judged that the message fails to be sent, so in the scenario of repeated sending, Kafka is helpless. However, the transaction mechanism can be turned on to ensure that it is only sent once, but once the transaction is turned on, the sending and consumption capacity of Kafka will be greatly reduced, so it is not recommended to start the transaction.

In Kafka, each message sent by the producer will be stored in an offset in the partition under the corresponding topic. Message sending must specify topic, and a partition can be specified or not specified. When the partition is not specified, the messages under a topic will be distributed under each partition through load balancing. Because only the messages under the same parititon are ordered, when sending messages to a topic with multiple partitions without specifying a partition, messages will be out of order.

Kafka logically isolates messages through topics, and physically isolates messages through partitions under topics. The purpose of dividing multiple partitions under topics is to improve the consumption capacity of the consumer side. A partition can only be consumed by one consumer, but a consumer can consume multiple partitions. Each consumer end will be assigned to a consumer group. If there is only one consumer end in the consumer group group, then all partitions under the topic subscribed by the consumer group will be consumed by this consumer end. If the number of consumers in the consumer group is less than or equal to the number of partitions under the topic, the consumers in the consumer group will be evenly allocated to a certain number of partitions, which may be one partition or multiple partitions. On the contrary, if the number of consumers in the consumer group is greater than the number of partitions under the topic, then there will be consumers in the consumer group that cannot be divided into partitions and cannot consume data.

In practical application scenarios, the number of consumer terminals equal to the number of partitions is usually set in the consumer group. Make sure that each consumer consumes at least one offset message under a partition.

Each service in the Kafka cluster is called a broker, and a controller will be elected by zookeeper to handle internal requests and external operations among multiple brokers. However, the real read and write operations of data all occur on the partition, and the partition belongs to a topic. In order to prevent data loss, multiple partitions are generally set up, each of which is called a replica. Each partition will elect a partition leader from multiple replicas, which is responsible for processing data write and read operations. Other replicas are responsible for leader interaction and data synchronization. Multiple replicas under the same partition will be evenly distributed among different brokers. Therefore, in terms of design, we can find that in fact, Kafka's message processing is load-balanced, and basically every broker will participate. The leader of the partition is elected from the ISR set by default. The full name of ISR is In Sync Replica, which means Replica that has been consistent with the leader's messages. If within a certain period of time or within a certain number of offsets, the replica is not consistent with the leader's offset, then it cannot exist in the ISR set, and even if it existed in the ISR set before, it will be kicked out. After waiting for a period of time, the messages are synchronized in time before they have the opportunity to join the ISR set. Therefore, to a certain extent, the leader is elected from the ISR set to ensure that the message can be synchronized and consistent and will not be lost when the leader is re-elected.

Because the consumer group mechanism is introduced in Kafka, the consumption capacity of the consumer side can be greatly improved. However, because of the rebalance mechanism of the consumer group, the consumption on the consumer side will be temporarily unavailable. The problem is this, because there is an equalizer called coordinate in the consumer group, which is responsible for evenly distributing the partitions to each consumer end of the consumer group. If the consumer end in the consumer group is added or decreased, the partition needs to be reassigned. At this time, all the consumer ends under the consumer group will stop consuming and wait for the coordinate to reassign him a new partition. The more consumers and partitions there are, the longer this waiting time will be. Therefore, it is not recommended to set too many partitions under the topic, generally less than 20.

Summary of golang back-end technology development essentials

goper

引用和评论

golang 开发小工具

大模型时代，后端程序员如何避免被AI卷死？

AI编程神器巅峰对决！Cursor、Windsurf、Trae谁将取代Copilot？实测结果颠覆认知！

国内版的AI编程工具Trea，真的来了！免费使用DeepSeek！

揭秘Chrome DevTools：从原理到自定义调试工具

大数据从业者必知必会的Hive SQL调优技巧

一个后端工程师对前端云 Vercel 的体验和探索