scala一个诡异问题?

var calcUsers = new ArrayBuffer[Int]()
      nRDD.foreach(item=>{
        val arr = item.split(' ')
        val currUserId = arr(1).toInt
        calcUsers.+=(currUserId)
        println("calcUsers",currUserId,calcUsers.length)
      })
      println("calcUsers",calcUsers.length)

第一个println可以看到数组长度再不断变长

第二个println却会输出0

为什么

如何解决?

阅读 1.7k
1 个回答

RDD's is a disctributed data structure. The RDD actually does not live on the driver node (the node on which your code is actually running). All RDD operations (map, foreach) etc are actually performed on the executor nodes. So, Spark creates a closure of the operating function (you can think of it as an object containing the copies of all required variables and the function itself) and sends this object to each executor node, where it execute on the actual RDD.

In simpler words, Spark will create multiple copies of your calcUsers and will send it to executor nodes along with the function. Each executor will then execute the function using their own copy of calcUsers. The calcUser which you are seeing here will not be used at all.

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题