ByteDance Douyin e-commerce Java experience and answers

Today, I will share a story about ByteDance Douyin e-commerce, I hope it will be helpful to my friends~

Article directory:

HashMap put method

Put method process:

If the table is not initialized, the initialization process is performed first
Use the hash algorithm to calculate the index of the key
Determine whether there is an element at the index, and insert it directly if it is not
If there is an element at the index, then traverse the insertion. There are two cases, one is the form of the linked list and the insertion is directly traversed to the end, and the other is the red-black tree and inserts according to the red-black tree structure.
If the number of linked lists is greater than the threshold 8, it must be converted into a red-black tree structure
After the addition is successful, it will check whether it needs expansion

HashMap expansion process

1.8 Expansion mechanism: When the number of elements is greater than the threshold, expansion will be carried out, and an array with twice the capacity will be used instead of the original array. Copy the elements of the original array to the new array by means of tail insertion. The relative position of the linked list elements remains unchanged after 1.8 expansion, while the linked list elements will be inverted after 1.7 expansion.

The capacity of the array is a power of 2 of the expansion, then an Entity during expansion, either at a new location original position , either original length original position + position. The reason is that the length of the array has been doubled. In binary terms, has one more high bit to participate in the array subscript calculation . In other words, there is no need to recalculate the position of the element in the array during the element copy process. You only need to see whether the bit added by the original hash value is 1 or 0. If it is 0, the index does not change, and if it is 1, the index becomes "Original index + oldCap" ( e.hash & (oldCap - 1) == 0 ).

This can save the time to recalculate the hash value, and since the newly added 1bit is 0 or 1 can be considered random, so the resize process will evenly distribute the previous conflicting nodes to the new bucket.

How to solve the sticky package problem with a custom protocol

fixed-length data packet . Add a custom fixed-length header to the byte stream. The header contains the length of the byte stream, and then send to the opposite end once. When receiving, the opposite end first fetches the fixed-length header from the buffer, and then fetches the real data.
takes the specified character string as the end of the packet . When a special symbol value is encountered in the byte stream, it is considered to be the end of a packet.
header + body format . The packet of this format is generally divided into two parts, namely the header and the body. The header is of a fixed size, and the header contains a field to indicate how large the body is.

LeetCode 129 (Find the sum of the numbers from the root node to the leaf node)

Depth first search. Starting from the root node, each node is traversed, and if a leaf node is encountered, the number corresponding to the leaf node is added to the sum of the numbers. If the current node is not a leaf node, calculate the number corresponding to its child node, and then recursively traverse the child node.

There are no difficult questions, only the brave ones!

//  输入: [1,2,3]
//     1
//    / \
//   2   3
// 输出: 25
class Solution {
    public int sumNumbers(TreeNode root) {
        if (root == null) {
            return 0;
        }

        return sumNumbersHelper(root, 0);
    }

    private int sumNumbersHelper(TreeNode node, int sum) {
        if (node == null) {
            return 0;
        }
        if (sum > Integer.MAX_VALUE / 10 || (sum == Integer.MAX_VALUE / 10 && node.val > Integer.MAX_VALUE % 10)) {
            throw new IllegalArgumentException("exceed max int value");
        }
        sum = sum * 10 + node.val;

        if (node.left == null && node.right == null) {
            return sum;
        }

        return sumNumbersHelper(node.right, sum) + sumNumbersHelper(node.left, sum);
    }
}

MySQL index structure

The most commonly used index type in MySQL database is BTREE index, and the bottom layer is implemented based on the B+ tree data structure.

The B+ tree is implemented based on the B-tree and the sequential access pointers of the leaf nodes. It has the balance of the B-tree and improves the performance of the interval query by sequentially accessing the pointers.

In the B+ tree, the keys in the node are arranged in ascending order from left to right. If the left and right adjacent keys of a pointer are key _i and key _i+1 , then all the keys that the pointer points to the node are greater than or equal to key _i and less than or equal to key _i+1 .

When performing a search operation, first perform a binary search at the root node to find the pointer where the key is located, and then recursively search the node pointed to by the pointer. Until the leaf node is found, a binary search is performed on the leaf node to find the data item corresponding to the key.

Why use B+ tree

The feature of B+ tree is that it is short and fat enough, which can effectively reduce the number of visits to nodes and improve performance.

Binary tree: binary search tree, although it also has good search performance log2N, when N is relatively large, the depth of the tree is relatively high. The time of data query mainly depends on the number of disk IO. The greater the depth of the binary tree, the more the number of searches and the worse the performance. The worst case will degenerate into a linked list. Therefore, the B+ tree is more suitable as a MySQL index structure.

B-tree: Because the branch node of B+ stores data, we need to find specific data, we need to perform an in-order traversal to scan in order, and because the data of B+ tree is stored in leaf nodes, the leaf nodes are all Index, easy to scan the library, only need to scan the leaf nodes. Therefore, the B+ tree is more suitable for query in the interval, and the range-based query in the database is very frequent, so the B+ tree is more suitable for database indexing.

The role of having

Having is used to specify some conditions to output the query results after grouping query. The function of having is similar to where, but having can only be used in group by occasions, and must be located after group by and before order by.

SELECT cust_id, COUNT(*) AS orders
FROM orders
GROUP BY cust_id
HAVING COUNT(*) >= 2;

Clustered index

The leaf nodes of the clustered index are the rows of the entire table. The InnoDB primary key uses a clustered index. Clustered index is much more efficient than non-clustered index query. The storage of the leaf nodes of the clustered index is logically continuous, using a doubly linked list connection, and the leaf nodes are sorted in the order of the primary key, so the sorting and range searching of the primary key is faster.

For InnoDB, the clustered index is generally the primary key index in the table. If the specified primary key is not displayed in the table, the first unique index in the table that is not allowed to be NULL is selected. If there is no primary key and no suitable unique index, then innodb will generate a hidden primary key as a clustered index. The hidden primary key is 6 bytes in length, and its value will increase with the insertion of data.

Advantages of clustered index over non-clustered index

Data access is faster, because the clustered index stores the index and data in the same B+ tree, so the data from the clustered index is faster than the non-clustered index;
The storage of the leaf nodes of the clustered index is logically continuous, so the sort search and range search for the primary key will be faster.

Seven parameters of thread pool

The general constructor of ThreadPoolExecutor:

public ThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime, TimeUnit unit, BlockingQueue<Runnable> workQueue, ThreadFactory threadFactory, RejectedExecutionHandler handler);

corePoolSize: When there is a new task, if the number of threads in the thread pool does not reach the basic size of the thread pool, a new thread will be created to perform the task, otherwise the task will be placed in the blocking queue. When the number of surviving threads in the thread pool is always greater than corePoolSize, you should consider increasing corePoolSize.
maximumPoolSize: When the blocking queue is full, if the number of threads in the thread pool does not exceed the maximum number of threads, a new thread will be created to run the task. Otherwise, the new task will be processed according to the rejection policy. Non-core threads are similar to temporarily borrowed resources. These threads should exit after their idle time exceeds keepAliveTime to avoid waste of resources.
BlockingQueue: Store tasks waiting to be run.
keepAliveTime: non-core thread idle, keep alive time, this parameter is only valid for non-core threads. Set to 0, which means that redundant idle threads will be terminated immediately.

TimeUnit: Time unit

TimeUnit.DAYS
TimeUnit.HOURS
TimeUnit.MINUTES
TimeUnit.SECONDS
TimeUnit.MILLISECONDS
TimeUnit.MICROSECONDS
TimeUnit.NANOSECONDS

ThreadFactory: Whenever a new thread is created in the thread pool, it is done through the thread factory method. Only one method newThread is defined in ThreadFactory, and it is called whenever the thread pool needs to create a new thread.

public class MyThreadFactory implements ThreadFactory {
    private final String poolName;
    
    public MyThreadFactory(String poolName) {
        this.poolName = poolName;
    }
    
    public Thread newThread(Runnable runnable) {
        return new MyAppThread(runnable, poolName);//将线程池名字传递给构造函数，用于区分不同线程池的线程
    }
}

RejectedExecutionHandler: When the queue and thread pool are full, new tasks are processed according to the rejection policy.

AbortPolicy：默认的策略，直接抛出RejectedExecutionException
DiscardPolicy：不处理，直接丢弃
DiscardOldestPolicy：将等待队列队首的任务丢弃，并执行当前任务
CallerRunsPolicy：由调用线程处理该任务

The running process of the thread pool

Four isolation levels of mysql

First understand the following concepts: dirty reading, non-repeatable reading, and phantom reading.

Dirty read refers to the data in another uncommitted transaction that is read in the process of a transaction.
Non-repeatable read means that for a row of records in the database, multiple queries within a transaction range have returned different data values. This is because during the query interval, another transaction modifies the data and commits it.
A phantom read is when a transaction is reading a record in a certain range, and another transaction inserts a new record in the range. When the previous transaction reads the record in the range again, a phantom row will be generated. Just like hallucinations, this is where hallucinations take place.

The difference between non-repeatable reads and dirty reads is that a dirty read is a transaction that reads dirty data that was not committed by another transaction, while a non-repeatable read reads the data submitted by the previous transaction.
Both phantom read and non-repeatable read read another committed transaction. The difference is that the focus of non-repeatable read is modification, while the focus of phantom read is addition or deletion.

Transaction isolation is to solve the problems of dirty reads, non-repeatable reads, and phantom reads mentioned above.

MySQL database provides us with four isolation levels:

Serializable: Solve the problem of phantom reading by forcing transaction ordering to make it impossible to conflict with each other.
Repeatable read (repeatable read): MySQL's default transaction isolation level, it ensures that multiple instances of the same transaction will see the same data row when reading data concurrently, which solves the problem of non-repeatable read.
Read committed: A transaction can only see the changes made by the committed transaction. Can avoid the occurrence of dirty reads.
Read uncommitted: All transactions can see the execution results of other uncommitted transactions.

View the isolation level:

select @@transaction_isolation;

Set the isolation level:

set session transaction isolation level read uncommitted;

Dubbo's load balancing strategy

Dubbo provides a variety of balancing strategies, the default is Random random call.

Random LoadBalance

Random load balancing mechanism based on weight.

Random, set random probability by weight
The probability of collision on a cross-section is high, but the larger the amount of calls, the more even the distribution, and the more even after the weight is used according to the probability, it is beneficial to dynamically adjust the weight of the provider

RoundRobin LoadBalance

Weight-based polling load balancing mechanism is not recommended.

Round-robin, set the round-robin ratio according to the weight after the convention
There is a problem that slow providers accumulate requests. For example, the second machine is slow but not hung up. When the request is transferred to the second machine, it is stuck there. Over time, all requests are stuck in the second machine.

LeastActive LoadBalance

Minimum number of active calls, random with the same active number, active number refers to the difference between the counts before and after the call
The slower provider receives fewer requests, because the slower the provider will have a greater difference in counts before and after the call

ConsistentHash LoadBalance

Consistent Hash, requests with the same parameters are always sent to the same provider. (If what you need is not random load balancing, but a type of request to a node, then use this consistent hash strategy.)
When a certain provider hangs up, the request originally sent to that provider will be spread to other providers based on the virtual node without causing drastic changes.

Reference materials: https://blog.csdn.net/yjn1995/article/details/98845537

java dynamic proxy

Dynamic proxy: The proxy class is created when the program is running, a proxy object is temporarily generated in the memory, and the business method is enhanced during the run.

JDK implementation of proxy only needs to use the newProxyInstance method:

static Object newProxyInstance(ClassLoader loader, Class<?>[] interfaces,   InvocationHandler h )

Parameter Description:

ClassLoader loader: Specify the class loader used by the current target object
Class<?>[] interfaces: the type of interface implemented by the target object
InvocationHandler h: When the proxy object calls the method of the target object, it will trigger the invoke method of the event handler ()

The following is the JDK dynamic proxy Demo:

public class DynamicProxyDemo {    public static void main(String[] args) {        //被代理的对象        MySubject realSubject = new RealSubject();        //调用处理器        MyInvacationHandler handler = new MyInvacationHandler(realSubject);        MySubject subject = (MySubject) Proxy.newProxyInstance(realSubject.getClass().getClassLoader(),                realSubject.getClass().getInterfaces(), handler);        System.out.println(subject.getClass().getName());        subject.rent();    }}interface MySubject {    public void rent();}class RealSubject implements MySubject {    @Override    public void rent() {        System.out.println("rent my house");    }}class MyInvacationHandler implements InvocationHandler {    private Object subject;    public MyInvacationHandler(Object subject) {        this.subject = subject;    }    @Override    public Object invoke(Object object, Method method, Object[] args) throws Throwable {        System.out.println("before renting house");        //invoke方法会拦截代理对象的方法调用        Object o = method.invoke(subject, args);        System.out.println("after rentint house");        return o;    }}

Where does Spring use dynamic proxy?

Spring AOP is implemented through dynamic proxy technology.

What is AOP?

AOP, aspect-oriented programming, as a supplement to object-oriented, encapsulates common logic (transaction management, log, cache, etc.) into aspects and separates it from business code, which can reduce the duplication of system code and reduce the coupling between modules . Aspects are those common logic that has nothing to do with the business, but all business modules will call.

How to implement dynamic proxy?

There are two ways to implement dynamic proxy technology:

JDK dynamic proxy based on interface.
Dynamic proxy based on inherited CGLib. In Spring, if the target class does not implement the interface, then Spring AOP will choose to use CGLIB to dynamically proxy the target class.

Talk about CGlib dynamic agent

CGLIB (Code Generator Library) is a powerful, high-performance code generation library. It is widely used in AOP frameworks (Spring, dynaop) to provide method interception operations. The CGLIB proxy introduces an indirect level for the object to control the access of the object mainly through the operation of the bytecode.

Compared with the JDK dynamic proxy, the CGLib dynamic proxy has much smaller limitations. The target object does not need to implement an interface. The bottom layer is to generate the proxy sub-object .

How does MQ ensure that messages will not be lost?

Message loss scenarios: messages from producer to RabbitMQ Server are lost, messages stored in RabbitMQ Server are lost, and messages from RabbitMQ Server to consumer are lost.

Message loss is solved from three aspects: producer confirmation mechanism, consumer manual confirmation of the message and persistence. The following implementation takes RabbitMQ as an example.

Producer confirmation mechanism

The producer sends a message to the queue, and cannot ensure that the sent message reaches the server successfully.

Solution:

Transaction mechanism. After a message is sent, the sender will be blocked, waiting for RabbitMQ's response, before it can continue to send the next message. Poor performance.
Turn on the producer confirmation mechanism, as long as the message is successfully sent to the switch, RabbitMQ will send an ack to the producer (even if the message is not received by the Queue, it will send an ack). If the message is not successfully sent to the switch, a nack message will be sent to indicate that the sending has failed.

In Springboot, confirm mode is set through the publisher-confirms parameter:

spring:    rabbitmq:           #开启 confirm 确认机制        publisher-confirms: true

Provide a callback method on the production side. When the server confirms one or more messages, the producer will call back this method and perform follow-up processing on the message based on the specific results, such as resending, logging, etc.

// 消息是否成功发送到Exchangefinal RabbitTemplate.ConfirmCallback confirmCallback = (CorrelationData correlationData, boolean ack, String cause) -> {            log.info("correlationData: " + correlationData);            log.info("ack: " + ack);            if(!ack) {                log.info("异常处理....");            }    };rabbitTemplate.setConfirmCallback(confirmCallback);

Return message mechanism

The producer confirmation mechanism only ensures that the message arrives at the switch correctly. Messages that fail to be routed from the switch to the Queue will be discarded, resulting in message loss.

For non-routable messages, they can be processed through the Return message mechanism.

The Return message mechanism provides a callback function ReturnCallback, which will be called back when the message fails to be routed from the switch to the Queue. You need to set mandatory to true to monitor unreachable messages.

spring:    rabbitmq:        #触发ReturnCallback必须设置mandatory=true, 否则Exchange没有找到Queue就会丢弃掉消息, 而不会触发ReturnCallback        template.mandatory: true

Monitor route unreachable messages through ReturnCallback.

    final RabbitTemplate.ReturnCallback returnCallback = (Message message, int replyCode, String replyText, String exchange, String routingKey) ->            log.info("return exchange: " + exchange + ", routingKey: "                    + routingKey + ", replyCode: " + replyCode + ", replyText: " + replyText);rabbitTemplate.setReturnCallback(returnCallback);

When a message fails to be routed from the switch to the Queue, return exchange: , routingKey: MAIL, replyCode: 312, replyText: NO_ROUTE will be returned.

Consumer manual message confirmation

It is possible that the consumer may crash before the MQ service has time to process the message, causing the message to be lost. Because the messager uses automatic ack by default, once the consumer receives the message, he will notify the MQ Server that the message has been processed, and MQ will remove the message.

Solution: The consumer is set to manually confirm the message. After the consumer has processed the logic, he replies ack to the broker, indicating that the message has been successfully consumed and can be deleted from the broker. When the messager fails to consume, it responds with nack to the broker, and decides whether to re-enter the queue or remove it from the broker according to the configuration, or enter the dead letter queue. As long as the consumer's acknowledgment is not received, the broker will keep the message, but it will not requeue or distribute it to other consumers.

Consumers set manual ack:

#设置消费端手动 ackspring.rabbitmq.listener.simple.acknowledge-mode=manual

After the message is processed, manually confirm:

    @RabbitListener(queues = RabbitMqConfig.MAIL_QUEUE)    public void onMessage(Message message, Channel channel) throws IOException {        try {            Thread.sleep(5000);        } catch (InterruptedException e) {            e.printStackTrace();        }        long deliveryTag = message.getMessageProperties().getDeliveryTag();        //手工ack；第二个参数是multiple，设置为true，表示deliveryTag序列号之前（包括自身）的消息都已经收到，设为false则表示收到一条消息        channel.basicAck(deliveryTag, true);        System.out.println("mail listener receive: " + new String(message.getBody()));    }

When the message consumption fails, the consumer responds to the broker with nack. If the consumer sets requeue to false, the broker will delete the message or enter the dead letter queue after nack, otherwise the message will re-enter the queue.

Persistence

If the RabbitMQ service is abnormally restarted, the message will be lost. RabbitMQ provides a persistence mechanism to persist messages in memory to the hard disk. Even if RabbitMQ is restarted, the messages will not be lost.

Message persistence needs to meet the following conditions:

Message settings are persistent. Before publishing the message, set the delivery mode to 2, which means that the message needs to be persisted.
Queue settings are persistent.
The switch settings are persistent.

When publishing a message to the switch, Rabbit will write the message to the persistent log before sending a response to the producer. Once a message is consumed from the queue and confirmed, RabbitMQ will remove the message from the persistent log. Before consuming the message, if RabbitMQ restarts, the server will automatically rebuild the switch and queue, and load the message in the persistent log to the corresponding queue or switch to ensure that the message will not be lost.

is from Niuke.com. I compiled the answer myself. If you have any questions, you can leave a message in the comment area to point it out.

Link to original post: https://www.nowcoder.com/discuss/711241?channel=-1&source_id=profile_follow_post_nctrack