background
There are many examples of asynchronous processing in Spark, and every place is worth reviewing. It will help you understand the mechanism of spark and write elegant code for yourself.
NettyRpcEnv.ask interpretation
RpcEnv function
NettyRpcEnv
is the only implementation RpcEnv
RpcEnv
is 060c736c23bb66, you can take a look at its class header information first
/**
* An RPC environment. [[RpcEndpoint]]s need to register itself with a name to [[RpcEnv]] to
* receives messages. Then [[RpcEnv]] will process messages sent from [[RpcEndpointRef]] or remote
* nodes, and deliver them to corresponding [[RpcEndpoint]]s. For uncaught exceptions caught by
* [[RpcEnv]], [[RpcEnv]] will use [[RpcCallContext.sendFailure]] to send exceptions back to the
* sender, or logging them if no such sender or `NotSerializableException`.
*
* [[RpcEnv]] also provides some methods to retrieve [[RpcEndpointRef]]s given name or uri.
*/
In a word, the environment of RPC. Here, the two most important operations are
- You can go to register
RpcEndpoint
- You can get
RpcEndpointRef
And RpcEndpoint
and RpcEndpointRef
? I won’t go into details here. I will explain them in detail in other articles. Let’s briefly talk about it.
A brief review of RpcEndpoint
and RpcEndpointRef
RpcEndpoint
RpcEndpoint
As we all know, spark will have
executor
,driver
and other roles. The communication between them uses Netty. It is not only one Netty service that is started on the executor or driver. There will be multiple Netty RPC services open for different functions. , Use different port numbers to distinguish. After communication between services, the communication "letter" is processed by many kinds of logical units, such asInbox
, such asEventLoop
etc. These are all tool-level units, which are abstracted out as pluggable and expandable large logical function modules. Spark the calledRpcEndpoint
, from which is used to handle otherclient
transmitting side orserver
end over the returnmessage
module.RpcEndpoint
itself is a trait, it can have multiple implementations
RpcEndpointRef
RpcEndpointRef
The previous network communication of spark used akka, and after the revision, Netty was used. In akka, if the communication between two nodes is to communicate using the actorRef of the destination, that is, the AActor wants to send messages to the Actor. BActorRef to send the message. After the upgrade to the network communication Spark Netty, Endpoint can be understood as the original Actor indirectly, to then send another message, then Actor, need
RpcEndpoint
Ref, i.e.RpcEndpointRef
. At first glance, this concept is a bit silly. Just imagine that when sending a message from A to B, the premise of sending a message is that A has a "reference" to B first. This seems to be incomprehensible in ordinary Http services. I want to visit some Ordinarily, a machine only needs to know the IP and Port of the other party. Isn't it OK? Now it needs a "stand-in" for the other party? What the hell is this? With questions we can continue to look down, here you only need to be aware of this:- used to access the
RpcEndpointRef
B machine. You understand that the achievement is a packaged instance of the B machine's IP and Port.
- used to access the
Graphic RpcEndpoint
and RpcEndpointRef
diagram
A machine can be a physical machine or a virtual machine, and B machine can be the same physical machine, virtual machine (with different port numbers), or different (in spark, there is even a msg sent to itself, follow-up Can speak). Then, to send a message from A to B, use B’s
RpcEndpointRef
, and send the message to machine B through it- [1] how to access
- [2] principle of internal
- [Figure 3] What is the instance of RpcEndpointRef of
A brief review of Driver and Executor
Ask, as the name suggests-ask. It may be to say hello, to see if it is available, to inquire, etc. This is the purpose of NettyRpcEnv.ask. In order to talk about the role of NettyRpcEnv.ask, you need to briefly talk about the concept and process
Driver thread and Executor process
First of all, two things need to be clarified. In the yarn environment
Driver
is a thread executed in theApplicatioMaster
Strictly speaking, this statement is not correct, because
Driver
sparkContext
context when the user's class is formed. It is actually a user class thread that is executed. In this thread,SparkEnv
andRpcEnv
And so on, and establishedDriver
's Netty Service, etc., and communicateExecutor
Executor
is a process, started on each node through the java command
What are the Yarn series and ApplicatioMaster
will not be repeated here, and will be detailed in other articles.
Secondly, here only need to understand that Driver
itself is a coordinated scheduling node, it can assign tasks to Executor
, and master Executor
, the assignment is to send Task to Executor
Executor
needs to know that the master refers And so on.
[Figure 4] diagram
Take a chestnut, one
Driver
and twoExecutor
for interactive communication,Driver
holds twoExecutor
(one is called E1, one is called E2)RpcEndpointRef
, tentatively referred to as E1Ref and E2Ref, and sends msg to the E1 node through these two Refs. And E2 node, these two nodes themselves process msgRpcEndpoint
And E1 and E2 themselves also regularly report their own situation to the Driver, here called heartbeat heartbeat, then in turn, they use their own internal controlDriverEpcEndpointRef
to send heartbeat toDriver
, andDriver
uses its ownDriverRpcEndpoint
for processing. heartbeat msg. The above components of all nodes are in their ownNettyRpcEnv
, which is the realization ofRpcEnv
Example: Create a DriverRpcEndpointRef in RpcEnv
background
Finally it comes to the content Benpian, NettyRpcEnv.ask interpretation, the need for a scene call NettyRpcEnv.ask methods can that be a problem in the RpcEnv
establishing a DriverRpcEndpointRef
this scene described
Why is DriverRpcEndpointRef created in RpcEnv
The above [Figure 4] introduces a process of communication between Driver and Executor. In fact, when constructing the Driver
ApplicationMaster
, part of the communication needs to be DriverRpcEndpointRef
out through 060c736c23c0ba, that is, using DriverRpcEndpointRef
send msg to DriverRpcEndpoint
, and DriverRpcEndpoint
processes and responds.
【Picture 5】Illustration
ApplicationMaster
starting the [ Run ] Driver thread inNettyRpcEnv
, I got 060c736c23c138 from the Driver thread,- And with
NettyRpcEnv
thesetupEndpointRef
method [ the Get ] to twoDriverEndpointRef
- Follow-up through [ Use ] this
DriverEndpointRef
to visit the Driver'sDriverEndpoint
- One thing to note is,
ApplicationMaster
node itselfDriver
node, in fact, accessDriver
ofDriverEndpoint
supposedly (the Spark source code can be directly accessed does not implement, or to isolate and better packaging, reduced coupling futureDriver
If it is executed as a process, it will beApplicationMaster
to modify if it is no longer running on 060c736c23c1bc), but Netty's Rpc access method is still used here.
Look at the source code
This part of the code is in ApplicationMaster.scala
, just follow the method runDriver
[Figure 6] Illustrate
- (I) There is a server with IP 10.1.2.5, and
ApplicationMaster
- (II) a process, starts the Driver thread on this node, initializes the user's class, and starts a Netty serviec on the 10.1.2.5 node, the IP and Port are 10.1.2.5:13200
- (III) b process, in
ApplicationMaster
continue to call on the node RpcEnv.setupEndpointRef , aims to setup aDriver
ofDriverEndpointRef
toRpcEnv
, this setup process is to 10.1.2.5:13200 access it, if the service pass, Then constructDriverEndpointRef
, this "visit" is the NettyRpcEnv.ask method to be used in this article. can see that the calling sequence is
- (ApplicationMaster.scala) rpcEnv.setupEndpointRef ↓
- (NettyRpcEnv.scala) NettyRpcEnv.asyncSetupEndpointRefByURI ↓
- (NettyRpcEndpointRef.scala) NettyRpcEndpointRef.ask ↓
- (NettyRpcEnv.scala) NettyRpcEnv.ask — — — — ↓ (After multiple steps, the middle part is omitted, other articles will talk about it)
- 10.1.2.5:13200 netty service
- (I) There is a server with IP 10.1.2.5, and
code is as follows
private def runDriver(): Unit = { addAmIpFilter(None) /* 这里,调用startUserApplication方法来执行用户的class,也就是我们的jar包, invoke我们的main方法,从而启动了sparkContext,内部启动一系列的scheduler以及 backend,以及taskscheduler等等等等core的内容,其他篇章会详细讲解 */ userClassThread = startUserApplication() // This a bit hacky, but we need to wait until the spark.driver.port property has // been set by the Thread executing the user class. logInfo("Waiting for spark context initialization...") val totalWaitTime = sparkConf.get(AM_MAX_WAIT_TIME) try { /* 这里,阻塞的等待SparkContext从Driver线程中返回回来 */ val sc = ThreadUtils.awaitResult(sparkContextPromise.future, Duration(totalWaitTime, TimeUnit.MILLISECONDS)) if (sc != null) { rpcEnv = sc.env.rpcEnv val userConf = sc.getConf val host = userConf.get("spark.driver.host") val port = userConf.get("spark.driver.port").toInt registerAM(host, port, userConf, sc.ui.map(_.webUrl)) /* **这里,上演了好戏,通过NettyRpcEnv的setupEndpointRef方法来获取到driverRef 这个里面其实是去ask一下Driver你在吗?是否存在这个Driver的服务,如果存在,则 返回OK,构建出Driver的Ref** */ val driverRef = rpcEnv.setupEndpointRef( RpcAddress(host, port), YarnSchedulerBackend.ENDPOINT_NAME) createAllocator(driverRef, userConf) } else { // Sanity check; should never happen in normal operation, since sc should only be null // if the user app did not create a SparkContext. throw new IllegalStateException("User did not initialize spark context!") } resumeDriver() userClassThread.join() } catch { case e: SparkException if e.getCause().isInstanceOf[TimeoutException] => logError( s"SparkContext did not initialize after waiting for $totalWaitTime ms. " + "Please check earlier log output for errors. Failing the application.") finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_SC_NOT_INITED, "Timed out waiting for SparkContext.") } finally { resumeDriver() } }
Interpretation of NettyRpcEnv.ask
Looking back on Future
How to understand Future? It can be well understood from the literal meaning. Future means the future and also means futures.
When it comes to futures, it is full of uncertainty, because after all, it has not happened and no one knows what the future will be. Therefore, to define a Future is to define another (future) event (of another thread) that does not occur in the time and space (thread) of the present. Compared with the tasteless Future of Java, the Future of scala is very elegant and perfect. Search my blog to see a detailed introduction of Future for scala.
- official article : https://docs.scala-lang.org/zh-cn/overviews/core/futures.html
- We will not construct the cognitive concepts of Future and Promise from the perspective of the source code. There will be other articles to explain it.
【Figure 7】Illustration
Defining a thread in java is the right approach, while in scala on the left, using Future is a lot more elegant
code
import scala.concurrent.ExecutionContext.Implicits.global import scala.concurrent.Future /** * 解读Future的基础 */ object DocFutureTest { def apply(): Unit = { println("I am DocFutureTest") } def main(args: Array[String]): Unit = { val sleeping = 3000; val main_thread = Thread.currentThread().getName; /* 定义另一个线程发生的事件 这个事件相当于java中的如下的代码块: 从整体的间接性上看,scala的更为优雅一些,直接一个Future可以包裹住左右需要处理的内容 后续如果需要进行异常处理的话还可以根据Success和Failture进行模式匹配 public class JavaThreading { public static void main(String[] args) throws InterruptedException { new Thread( () -> System.out.println("这是一条发生在另一个叫做叫做" + Thread.currentThread().getName() + " 线程的故事") ).start(); System.out.println(Thread.currentThread().getName()); Thread.sleep(3000); } } */ var future_run = Future { Thread.sleep(1000) println("这是一条发生在另一个叫做叫做" + Thread.currentThread().getName +" 线程的故事") } // 主线程休息3000ms // 如果不休息的话,main线程会先停止,导致上面的Future定义的thread还没有被执行到就结束了 Thread.sleep(sleeping) println(s"$main_thread 线程休息 $sleeping 毫秒") } }
Future + callback (intercepted part)
case class ExceptionError(error: String) extends Exception(error) def main(args: Array[String]): Unit = { val sleeping = 3000; val main_thread = Thread.currentThread().getName; // 定义另一个线程发生的事件 var future_run = Future { Thread.sleep(1000) prntln("这是一条发生在另一个叫做叫做" + Thread.currentThread().getName + " 线程的故事") // 如果需要onFailure的话 则释放此句 // throw ExceptionError("error") future_run onFailure { case t => println("exception " + t.getMessage) } future_run onSuccess { case _ => println("success") }
Note
- When Future is defined, the body of the thread that needs to be executed is defined. Then the execution is immediately, similar to that Java defines a Thread, and then directly calls
start()
. - Scala's Try[] is widely used in Future. If an exception occurs and
onFailure
is not processed, then you may not see the exception being thrown, which is quite different from java.
- When Future is defined, the body of the thread that needs to be executed is defined. Then the execution is immediately, similar to that Java defines a Thread, and then directly calls
Review Promise
After talking about Future in a simple way, what is Promise? In fact, the realization of Future includes the realization of Promise, which means that without Promise, Future cannot be run. From a literal understanding, Promise is a promise. With the definition of the future of Future, an exact promise must be given before it can be carried out. Otherwise, it is a big talk that cannot be fulfilled without any basis.
- official article : https://docs.scala-lang.org/zh-cn/overviews/core/futures.html
Speaking of now, including reading the introduction of Future above, many people are definitely still in a baffled state, because this is also the case when I first contacted, but what I like is to use the most intuitive diagram and imagination to describe an abstract problem. Without further ado, continue to the picture above
[Figure 8] illustrates the relationship between : Future and Promise
Future meaning
- The main line of life is your Main Thread, which may be a certain Thread processed in spark
- At the moment of Now, you started the road to become a star (Thread of become star)
- At the moment of Now, you have opened the way to become handsome (Thread of become handsome)
- Once you open these two paths, as long as your Main Thread is not over, you can continue to walk through these two "roads" until Success or Failure . This is Future, which can be understood as, Opened a new trajectory
- **Promiose的含义**
- 当你开启了两条新的“之路”的时候,我可以在你两条路的重点给你不同的承诺
- 当你success的完成了Future的时候,我promise你一个结果
- 当你failure的完成了Future的时候,我promise你另一个结果
- **Future与Promise的对比**
- Future是一条线,包含执行过程的一条线,按照Timeline要去走下去
- Promise是一个点,一个被触发的点,想达到这个点必须又一个Future搭出一条路径才可以
上面两句话如何理解呢,你可以这么想,人生(Main Thread)是一个数轴,你如果希望按照timeline向着右侧一直前进就需要有一条连续的道路,这个“道路”就是一个Thread,也可以是Future定义出的道路,我们只能脚踏实地的通过道路走到目标终点,而不能直接跳到终点。Promise类似于一个milestone点,如果只有一个Promise,不定义出“道路”也就是不定义出一个Future(或Main Thread)的话,是无法实现这个Promise的。只定义了Promise,不去考虑直线路径(Future),无法实现,但只定义Future,不定义Promise(其实在Future中是内置了Default的Promise的)是可以直接执行Future的。如下图所示,开出了两张空头支票,没有定义具体的路线(Future 实现方式),那么这两个Promise是无法兑现的。
需要注意一点,这张图只是画出了定义了Promise,但是如果想对象这个Promise的话,是可以通过Promise中的方法来搭建出一个Future来执行的,与Future不同的是,Future只要定义了就可以马上执行,Promise定义了的话,必须要显式的触发“搭建Future”的操作才可以。
Look at the code that Promise does not execute
Here, we define a Promise, and "Promise" calls a
map
operation after the future corresponding to the Promise ends to print out a sentence of future:...But when we execute the following statement, we will find that nothing is executed
import scala.concurrent.Promise import scala.util.{Failure, Success} object PromiseTest { def main(args: Array[String]): Unit = { import scala.concurrent.ExecutionContext.Implicits.global val promise = Promise[String] promise.future.onComplete(v => println("onComplete " + v)) promise.future.map(str => println("future: " + str + " ==> " + Thread.currentThread().getName)) promise.future.failed.foreach(e => println(e + " ==> " + Thread.currentThread().getName)) Thread.sleep(3000) } }
Look at the executable code
The only difference from the above code is that the processing
promise.trySuccess
The code details are detailed in other chapters. Here you can understand it like this. Joining and trySuccess is to build a future road to Promise and trigger this road to start execution (start)
As for trySuccess, tryComplete and other specific details, we can talk about scala multi-threading.
**promise.future.onComplete**
The callback processing after the Future is executed, whether it is Success or Failure can execute this
onComplete
processing**promise.future.map**
Continued map processing for Future after
promise.future.onComplete
promise.trySuccess
The trigger that triggers the execution of the entire Future
import scala.concurrent.Promise import scala.util.{Failure, Success} object PromiseTest { def main(args: Array[String]): Unit = { import scala.concurrent.ExecutionContext.Implicits.global val promise = Promise[String] promise.future.onComplete(v => println("onComplete " + v)) promise.future.map(str => println("future: " + str + " ==> " + Thread.currentThread().getName)) promise.future.failed.foreach(e => println(e + " ==> " + Thread.currentThread().getName)) **promise.trySuccess("try success " + " --> " + Thread.currentThread().getName)** Thread.sleep(3000) } }
ask code
In fact, after finishing all the contents above, the ask code feels that it can be explained in a few sentences.
Ask itself returns Future, which is processed asynchronously
[Figure 9]
A 10.1.1.1 client machine accesses a 10.1.1.2 Netty service through rpc. When the response is returned correctly, the
TransportResponseHandler
in the client machine, and the listener's onSuccess method is called. The onSuccess method is The method defined in the ask code below. In this method, the tryComplete of the promise is executed again, which triggers the execution of the future of the promise
private[netty] def ask[T: ClassTag](message: RequestMessage, timeout: RpcTimeout): Future[T] = {
// 定义了一个Any的promise
val promise = Promise[Any]()
val remoteAddr = message.receiver.address
def onFailure(e: Throwable): Unit = {
if (!promise.tryFailure(e)) {
e match {
case e : RpcEnvStoppedException => logDebug (s"Ignored failure: $e")
case _ => logWarning(s"Ignored failure: $e")
}
}
}
/*
这里声明的onSuccess会被填充到RpcResponseCallback的onSuccess中,这个
RpcResponseCallback就是上面【图9】中的listener,当我们从Server端获取到response后
注意,获取的不是RpcFailure类型的response,则都会进入到【图9】的
else if (message instanceof RpcResponse) { 分支中
*/
def onSuccess(reply: Any): Unit = reply match {
case RpcFailure(e) => onFailure(e)
case rpcReply =>
/*
当返回的response是OK的没有问题后,onSuccess被callback,这里promise的trySuccess也
进行call操作,这里就是上面所说的,为了一个promise铺设了一条future,从而可以执行
这个Future的线程了
*/
if (!promise.trySuccess(rpcReply)) {
logWarning(s"Ignored message: $reply")
}
}
try {
if (remoteAddr == address) {
val p = Promise[Any]()
p.future.onComplete {
case Success(response) => onSuccess(response)
case Failure(e) => onFailure(e)
}(ThreadUtils.sameThread)
dispatcher.postLocalMessage(message, p)
} else {
val rpcMessage = RpcOutboxMessage(message.serialize(this),
onFailure,
(client, response) => **onSuccess**(deserialize[Any](client, response)))
postToOutbox(message.receiver, rpcMessage)
/*
如果是callback了Failure,则这里会被执行
*/
promise.future.failed.foreach {
case _: TimeoutException => rpcMessage.onTimeout()
case _ =>
}(ThreadUtils.sameThread)
}
val timeoutCancelable = timeoutScheduler.schedule(new Runnable {
override def run(): Unit = {
onFailure(new TimeoutException(s"Cannot receive any reply from ${remoteAddr} " +
s"in ${timeout.duration}"))
}
}, timeout.duration.toNanos, TimeUnit.NANOSECONDS)
/*
当promise的future执行后,会调用这里的onComplete方法
*/
promise.future.onComplete { v =>
timeoutCancelable.cancel(true)
}(ThreadUtils.sameThread)
} catch {
case NonFatal(e) =>
onFailure(e)
}
/*
利用RpcTimeout中的addMessageIfTimeout的偏函数再去模式匹配一下产生的Throwable内容
如果是RpcTimeoutException 则 直接throw这个ex
如果是TimeoutException 则包装成RpcTimeoutException后再throw出去
*/
promise.future.mapTo[T].recover(timeout.addMessageIfTimeout)(ThreadUtils.sameThread)
}
to sum up
This article uses a small space to explain the oasrpc.netty.NettyRpcEnv.ask() method, and briefly describes a small case of spark asynchronous processing. This small case requires a lot of prior knowledge points, which may suddenly be seen It’s a bit awkward to get here. Learning requires a little bit of integration to accumulate. If you don’t understand, you can slowly accumulate knowledge of other modules and then come here to look at the running account and it will be more rewarding.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。