Elasticsearch 7.17
Spark 3.5
AWS EMR 7.3
24/10/24 02:45:56 ERROR NetworkClient: Node [192.168.83.87:9200] failed (java.net.BindException: Address already in use); selected next node [192.168.83.232:9200]
24/10/24 02:45:56 ERROR Executor: Exception in task 7506.0 in stage 1.0 (TID 17506)
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[192.168.83.232:9200, 192.168.83.87:9200, 192.168.83.26:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:160) ~[DataAnalysis_EMR_Spark3-1.0.jar:?]
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:441) ~[DataAnalysis_EMR_Spark3-1.0.jar:?]
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:437) ~[DataAnalysis_EMR_Spark3-1.0.jar:?]
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:397) ~[DataAnalysis_EMR_Spark3-1.0.jar:?]
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:401) ~[DataAnalysis_EMR_Spark3-1.0.jar:?]
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:177) ~[DataAnalysis_EMR_Spark3-1.0.jar:?]
at org.elasticsearch.hadoop.rest.request.GetAliasesRequestBuilder.execute(GetAliasesRequestBuilder.java:68) ~[DataAnalysis_EMR_Spark3-1.0.jar:?]
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:623) ~[DataAnalysis_EMR_Spark3-1.0.jar:?]
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:71) ~[DataAnalysis_EMR_Spark3-1.0.jar:?]
at org.elasticsearch.spark.rdd.EsSpark$.$anonfun$doSaveToEs$1(EsSpark.scala:108) ~[DataAnalysis_EMR_Spark3-1.0.jar:?]
at org.elasticsearch.spark.rdd.EsSpark$.$anonfun$doSaveToEs$1$adapted(EsSpark.scala:108) ~[DataAnalysis_EMR_Spark3-1.0.jar:?]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) ~[spark-core_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:174) ~[spark-core_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
at org.apache.spark.scheduler.Task.run(Task.scala:152) ~[spark-core_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:632) ~[spark-core_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) ~[spark-common-utils_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) ~[spark-common-utils_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:96) ~[spark-core_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:635) [spark-core_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
[192.168.83.232:9200, 192.168.83.87:9200, 192.168.83.26:9200]
# ES
es.nodes.wan.only = true
es.nodes.discovery = false
es.nodes.client.only= true
# Spark
spark.driver.bindAddress = 0.0.0.0
spark.driver.port = 0
spark.executor.bindAddress = 0.0.0.0
spark.executor.port = 0
- 最后排查结果是多个索引的数据文件太多,在切换索引时,新绑定端口不够用,用 spark
repartition
减少数据文件个数后问题得以解决。 - 感觉很奇怪:看起来像是一个数据文件占用了一个端口,而不是一个线程占用了一个端口。
- 还有另外一种可能是,先
bind
后 connect
,可以参考 一次Commons-HttpClient的BindException排查 - 感觉都不大科学
- 报错内容也很有迷惑性,
source
端口绑定失败,却去切换 destination
的地址。
本文出自 qbit snap
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。