why Spark is not distributing jobs to all executors, but to only one executer?

My Spark cluster has 1 master and 3 workers (on 4 separate machines, each machine with 1 core), and other settings are as in the picture below, where spark.cores.max is set to 3 , and spark.executor.cores also 3 (in pic-1 )

But when I submit my job to Spark cluster, from the Spark web-UI I can see only one executor is used (according to used memory and RDD blocks in pic-2 ), but not all of the executors. In this case the processing speed is much slower than I expected.

Since I've set the max cores to be 3, shouldn't all the executors be used to this job?

How to configurate Spark to distribute current job to all executors, instead of only one executor running current job?

Thanks a lot.

------------------pic-1 : 火花设置

------------------pic-2 : 在这里输入图像描述


You said you are running two receivers, what kind of Receivers are they (Kafka, Hdfs, Twitter ??)

Which spark version are you using?

In my experience, if you are using any Receiver other than file receiver, then it will occupy 1 core permanently. So when you say you have 2 receivers, then 2 cores will be permanently used for receiving the data, so you are left with only 1 core which is doing the work.

Please post the Spark master hompage screenshot as well. And Job's Streaming page screenshot.


In spark streaming only 1 receiver is launched, to get the data from input source to RDD.

Repartitioning the data after the 1st transformation can increase parallelism.

链接地址: http://www.djcxy.com/p/26350.html

上一篇: 什么是python中的点子

下一篇: 为什么Spark不会将工作分配给所有执行人员,而只是分配给一名执行人员?