scala一个诡异问题？

Question

scala一个诡异问题？

1.1k18209392

发布于
2021-10-21

var calcUsers = new ArrayBuffer[Int]()
      nRDD.foreach(item=>{
        val arr = item.split(' ')
        val currUserId = arr(1).toInt
        calcUsers.+=(currUserId)
        println("calcUsers",currUserId,calcUsers.length)
      })
      println("calcUsers",calcUsers.length)

第一个println可以看到数组长度再不断变长

第二个println却会输出0

为什么

如何解决？

后端 scala 人工智能算法

阅读 1.7k

1 个回答

得票最新

勇敢的少年

1.1k18209392

发布于
2021-10-21

✓ 已被采纳

RDD's is a disctributed data structure. The RDD actually does not live on the driver node (the node on which your code is actually running). All RDD operations (map, foreach) etc are actually performed on the executor nodes. So, Spark creates a closure of the operating function (you can think of it as an object containing the copies of all required variables and the function itself) and sends this object to each executor node, where it execute on the actual RDD.

In simpler words, Spark will create multiple copies of your calcUsers and will send it to executor nodes along with the function. Each executor will then execute the function using their own copy of calcUsers. The calcUser which you are seeing here will not be used at all.

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

相似问题

找不到问题？创建新问题

scala一个诡异问题？

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

为什么在 aws 新开 ec2 机器不显示价格？

爬取知乎热榜数据，跳转链接从哪里爬取？

一般pouchDB和其他数据库结合使用，它所扮演的角色是什么呢？

如何防止接口的 key 泄露?

如何使用 python 代码实现迅雷磁力链接资源的下载？

请问，FastAPI如何获取到前端上传的二进制文件并且返回？