Spark

Some of the content of Spark overlaps with the previous Spark series , but some details have not been in depth before.

Master start

Based on the consideration of high availability, multiple Masters will be started, but only one Master can provide services (ALIVE state), and other Masters are in STANDBY state.

On the left and right of the zookeeper cluster, one is used for election, and the other is to save the information of persistent Application, Driver, and Executor for easy recovery.

Worker start

When the Worker starts, it will register with the Master. When registering, it needs to upload the workid, host, port, and the number of CPU cores and memory size.

Since the Master is highly available, the Worker does not know which ALIVE node is at the beginning, so he will send the registration request to each Master node.

The Worker will be called by the Master only if the registration is successful, so in order to ensure the successful registration, the Worker will continue to try.

The total number of attempts is 16. In order to prevent all workers from sending heartbeats at the same time and causing pressure on the Master, the interval between sending heartbeats is random.

The first 6 heartbeat intervals are between 5 and 15s, and the next 10 heartbeat intervals are between 30 and 90s.

If the registration is successful within these 16 times, the attempt will be cancelled. If 16 times are unsuccessful, then this Worker does not need to be started, and exits the Worker process directly.

The retry here is somewhat different from when the registration was first started. There may be a retry due to some reasons, but at this time the Worker has the ALIVE node information of the Master. When the Worker registers, it can directly register with the ALIVE node, without each Master sending a registration request.

When the Master receives the registration, it will check whether it is an ALIVE node. If it is a STANDBY node, it will directly tell the Worker that I am a STANDBY node. The Worker knows that he is a STANDBY node and will not do anything.

Then the Master will see whether the worker has been registered (according to the workid provided by the worker registration), and if it has been registered, it will tell the worker that the registration has failed.

When the worker receives the registration failure information, it will check whether I have successfully registered (the registration will be saved in a variable if the registration is successful). If the registration is successful, it will ignore this message and know that it is a repeated registration.

If you find that you have not successfully registered, and the Master also said that the registration has not been successful, it means that the registration has not been successful, so exit the Worker process.

If it is not a STANDBY node and the Worker has not been registered, then save the corresponding information of the Worker, persist it, and then inform the Worker that the registration has been successful.

After the worker receives the success, it will change the variable to successful registration (useful for the above judgment), then record the address of the Master (the address will be sent directly after the request), and cancel the retry task of registration (if it has been successful, there is no need to do it again). try to register).

Finally, the status will be sent to the Master. Since the registration has just started, there is no Driver and Executor in the Worker, so the Master will not process it.

How does the Master know that the Worker is alive?

After the worker is successfully registered, another very important thing is to send a heartbeat to maintain the state. When sending a heartbeat, just send the workerid directly.

After the Master receives the request, it first checks whether it has been registered. If it has not been registered, the Worker will be re-registered, and the above registration process will be repeated. If registered, modify the last heartbeat time of the Worker.

The Master will have a scheduled task every 60s to process the Worker that has not been sent for more than 60s, mark the Worker as DEAD state, and remove other related memory (idToWorker is used to map the relationship between workerid and Worker, addressToWorker is used for The relationship between mapping address and Worker).

If the Worker is already in the DEAD state, the Worker information will be removed after 960s.

For example, if no heartbeat is sent in 60s, the Master will remove the relevant content, and then within 960s, after the Worker restarts and registers, it will delete the Worker in the DEAD state, and then add new Worker information.

Driver start

The Driver needs to register the Application information with the Master after startup. Like the Worker registering the Master, the Driver does not know which ALIVE node is, so he also registers with all the Masters.

The registration information includes the name of the Application, the maximum number of cores required by the Application, the memory size required by each Executor, the number of cores required by each Executor, and the commands to execute the Executor.

The registration here also has the number of retries. The maximum number of retries is 3 times, and the interval between each time is 20s. After the registration is successful, the retry will be canceled.

After the Master receives the request, if it is a STANDBY node, it will not process it and will not reply any information (this is different from the Worker registration, Wokrer will reply to the information, but the Worker will not process it).

If it is not a STANDBY node, the Application information will be stored in memory and persisted.

Then it will send a message that the registration has been successful to the Driver. When the Driver receives the message, it will record the information of the Master and the internal ID has been successfully registered, so there is no need to try again.

How does the Master know that the Driver is alive

There is no heartbeat between the driver and the master. Unlike the worker, which sends heartbeats regularly, the master removes the expired workers according to the heartbeat. How does the master know whether the driver quits?

The first method is that the Driver actively informs the Master, and the second method is that the Driver exits abnormally. Once the Master monitors that the Driver exits. Both of these methods will cancel the registration of the Application.

After the Master receives the message of unregistering the Application, it will remove the cached Application and Application-related Driver information.

For the Driver, there may be a running Executor, it will send a message to the Driver to kill the Executor, and the Driver will stop the Application after receiving the message.

Corresponding to the Worker, the Master will send a group message to all the Workers to inform that the Application has been completed. After the Worker receives the message, it will clean up the temporary files of the Driver.

Finally, the information of the Application is persisted, and other Workers are informed that the Application has been completed.

Executor start

When the Master schedules resources, it will let the Worker start the Executor.

After the Worker receives the message, it will judge whether it is the message sent by the Master of the STANDBY node, and if not, ignore it.

If it is, the Worker will create a thread to start an Executor process. After starting the Executor process, it will reply to the Master that the Executor has been started successfully.

The Master knows that the Executor has been successfully started, and will also inform the Driver that I have started your Executor for you. Since the Executor does not end, the Driver does not do other processing.

After the Executor is started, it will register with the Driver. The Driver first determines whether it has been registered or is in the blacklist. If so, it returns a failure. If not, it saves the information of the Executor and informs that the Executor has been registered successfully.

When the Driver registers with the Executor, one more thing is to send the registration information to the event bus. There is also a heartbeat receiver in the Driver, which is used to manage the state of the Executor.

The heartbeat receiver will monitor bus events, and when it finds that an Executor is added, it will record the Executor's id and time.

How does the Worker know that the Executor is alive

When the worker creates a thread to start the Executor process, the thread will not exit directly after creation, but will wait to obtain the exit status of the Executor process.

After the acquisition, the status is sent to the Worker, and the Worker forwards the status to the Master and changes its own memory and CPU information.

When the Master finds that the Executor has finished executing (regardless of failure or success), it will update the memory information and forward the status to the Driver.

After the Driver receives the status and finds that the Executor has finished executing, it will send an event to remove the Executor to the event bus.

The heartbeat receiver will monitor the events of the bus, and remove the Executor when it finds that the Executor is removed.

How does Executor know that Worker is alive

Executor has a WorkerWatcher. When the Worker process exits, the connection is disconnected, or the network fails, it will be monitored by the WorkerWatcher, and then the Executor process will be terminated.

How does the Driver know that the Executor is alive

After the Executor receives the message that the Drvier is successfully registered, it starts to create the Executor object. After the object is instantiated, it will start the heartbeat request to the Driver. Since multiple Executors may be started, in order to avoid too many requests at the same time, here A random value will be added to the delay time.

After the heartbeat receiver receives the heartbeat request, first check whether the Executor has been registered. If not, let the Executor re-register. If it has been registered, update the time.

The heartbeat receiver has a scheduled task that scans the last reported time of each Executor. If the Executor has not sent a heartbeat for a certain period of time, it will remove the Executor's information from memory and submit the "kill" Executor. Task.

This task will finally be sent to the ClientEndpoint, and the ClientEndpoint will be forwarded to the Master.

Spark - Standalone mode

Spark

Master start

Worker start

How does the Master know that the Worker is alive?

Driver start

How does the Master know that the Driver is alive

Executor start

How does the Worker know that the Executor is alive

How does Executor know that Worker is alive

How does the Driver know that the Executor is alive

大军

引用和评论

trino -- 查询流程解析

【活动回顾】StarRocks Singapore Meetup #2 @Shopee

PySpark一：Windows10环境搭建

美的楼宇科技基于阿里云 EMR Serverless Spark 构建 LakeHouse 湖仓数据平台

【赵渝强老师】Spark的容错机制：检查点

最佳实践 | 在 EMR Serverless Spark 中实现 StarRocks 读写操作

最佳实践 | 在 EMR Serverless Spark 中实现 Doris 读写操作