I am trying to set DC/OS Spark-Kafka-Cassandra cluster using 1 master and 3 private AWS m3.xlarge instances (each having 4 processors, 15GB RAM).
I have questions regarding some strange behaviour I have incurred in the spike I did several days ago.
On each of the private nodes I have following fixed resources reserved (I speak about CPU usage, memory is not the issue)
0.5 CPUs for Cassandra on each node
0.3 - 0.5 CPUs for Kafka one each node
0.5 CPUs is the Mesos overhead (I simply see in DC/OS UI that it is occupied 0.5CPUs more than the summation of all the services that are running on a node -> this probably belongs to some sort of Mesos overhead)
rest of the resources I have available for running Spark jobs (around 2.5 CPUs)
Now, I want to run 2 streaming jobs, so that they run on every node of the cluster. This requires me to set in dcos spark run command that number of executors is 3 (although I have 3 nodes in the cluster), as well as that number of CPU cores is 3 (it is impossible to set 1 or 2,because as far as I see minimum CPUs per executor is 1). Of course, for each of the streaming jobs, 1 CPU in the cluster is occupied by the driver program.
First strange situation that I see is that instead of running 3 executors with 1 core each, Mesos launches 2 executors on 2 nodes where one has 2 CPUs, while the other has 1 CPU. There is nothing launched on the 3rd node even though there were enough resources. How to force Mesos to run 3 executors on the cluster?
Also, when I run 1 pipeline with 3 CPUs, I see that those CPUs are blocked, and cannot be reused by other streaming pipeline, even though they are not doing any workload. Why Mesos can not shift available resources between applications? Isn't that the main benefit of using Mesos? Or maybe simply there are not enough resources to be shifted?
EDITED
Also the question is can I assign less than one CPU per Executor?
Kindest regards,
Srdjan
Related
I'm working with Spark and Yarn on an Azure HDInsight cluster, and I have some troubles on understanding the relations between the workers' resources, executors and containers.
My cluster has 10 workers D13 v2 (8 cores and 56GB of memery), therefore I should have 80 cores available for spark applications. However, when I try to start an application with the following parameters
"executorMemory": "6G",
"executorCores": 5,
"numExecutors": 20,
I see in the Yarn UI 100 cores available (therefore, 20 more than what I should have). I've run an heavy query, and on the executor page of Yarn UI I see all 20 executors working, with 4 or 5 active task in parallel. I tried also pushing the numExecutors to 25, and I do see all 25 working, again with several tasks in parallel for each executor.
It was my understanding the 1 executor core = 1 cluster core, but this is not compatible with what I observe. The official Microsoft documentation (for instance here) it's not really helpful. It states:
An Executor runs on the worker node and is responsible for the tasks
for the application. The number of worker nodes and worker node size
determines the number of executors, and executor sizes.
but it does not say what the relation is. I suspect Yarn is only bound by memory limits (e.g. I can run how many executors I want, if I have enough memory), but I don't understand how this might work in relation with the available cpus in the cluster.
Do you know what I am missing?
My question: Is it true that running Apache Spark applications in YARN master, with deploy-mode as either client or cluster, the executor-cores should always be set to 1?
I am running an application processing millions of data on a cluster with 200 data nodes each having 14 cores. It runs perfect when I use 2 executor-cores and 150 executors on YARN, but one of the cluster admins is asking me to use 1 executor-core. He is adamant that Spark in YARN should be used with 1 executor core, because otherwise it will be stealing resources from other users. He points me to this page on Apache docs where it says the default value for executor-core is 1 for YARN.
https://spark.apache.org/docs/latest/configuration.html
So, is it true we should use only 1 for executor-cores?
If the executors use 1 core, aren't they single threaded?
Kind regards,
When we run spark application using a cluster manager like Yarn, there’ll be several daemons that’ll run in the background like NameNode, Secondary NameNode, DataNode, JobTracker and TaskTracker. So, while specifying num-executors, we need to make sure that we leave aside enough cores (~1 core per node) for these daemons to run smoothly.
ApplicationMaster is responsible for negotiating resources from the ResourceManager and working with the NodeManagers to execute and monitor the containers and their resource consumption. If we are running spark on yarn, then we need to budget in the resources that AM would need
Example
**Cluster Config:**
200 Nodes
14 cores per Node
Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 14-1 = 13
So, Total available of cores in cluster = 13 x 200 = 2600
Let’s assign 5 core per executors => --executor-cores = 5 (for good HDFS throughput)
Number of available executors = (total cores/num-cores-per-executor) = 2600/5 = 520
Leaving 1 executor for ApplicationManager => --num-executors = 519
Please note : This is just a sample recommended configuration , you
may wish to revise based upon the performance of your application.
Also A better practice is to monitor the node resources while you
execute your job , this gives a better picture of the resource
utilisation in your cluster
I am using a Spark 2.2.0 cluster configured in Standalone mode. Cluster has 2 octa core machines. This cluster is exclusively for Spark jobs and no other process uses them. I have around 8 Spark Streaming apps which run on this cluster.I explicitly set SPARK_WORKER_CORES (in spark-env.sh) to 8 and allocate one core to each app using total-executor-cores setting. This config reduces the capability to work in parallel on multiple tasks. If a stage works on a partitioned RDD with 200 partitions, only one task executes at a time. What I wanted Spark to do was to start separate thread for each job and process in parallel. But I couldn't find a separate Spark setting to control the number of threads.So, I decided to play around and bloated the number of cores (i.e. SPARK_WORKER_CORES in spark-env.sh) to 1000 on each machine. Then I gave 100 cores to each Spark application. I found that spark started processing 100 partitons in parallel this time indicating that 100 threads were being used.I am not sure if this is the correct method of impacting the number of threads used by a Spark job.
You mixed up two things:
Cluster manger properties - SPARK_WORKER_CORES - total number of cores that worker can offer. Use it to control a fraction of resources that should be used by Spark in total
Application properties --total-executor-cores / spark.cores.max - number of cores that application requests from the cluster manager. Use it control in-app parallelism.
Only the second on is directly responsible for app parallelism as long as, the first one is not limiting.
Also CORE in Spark is a synonym of thread. If you:
allocate one core to each app using total-executor-cores setting.
then you specifically assign a single data processing thread.
I deploy a spark job in the cluster mode with the following
Driver core - 1
Executor cores - 2
Number of executors - 2.
My understanding is that this application should occupy 5 cores in the cluster (4 executor cores and 1 driver core) but i dont observe this in the RM and Spark UIs.
On the resource manager UI, i see only 4 cores used for this application.
Even in Spark UI (on click of ApplicationMaster URL from RM), under the executors tab, the driver cores is shown as zero.
Am i missing something?
The cluster manager is YARN.
My understanding is that this application should occupy 5 cores in the cluster (4 executor cores and 1 driver core)
That's the perfect situation in YARN where it could give you 5 cores off the CPUs it manages.
but i dont observe this in the RM and Spark UIs.
Since the perfect situation does not occur often something it's nice to have as many cores as we could get from YARN so the Spark application could ever start.
Spark could just wait indefinitely for the requested cores, but that could not always be to your liking, could it?
That's why Spark on YARN has an extra check (aka minRegisteredRatio) that's the minimum of 80% of cores requested before the application starts executing tasks. You can use spark.scheduler.minRegisteredResourcesRatio Spark property to control the ratio. That would explain why you see less cores in use than requested.
Quoting the official Spark documentation (highlighting mine):
spark.scheduler.minRegisteredResourcesRatio
0.8 for YARN mode
The minimum ratio of registered resources (registered resources / total expected resources) (resources are executors in yarn mode, CPU cores in standalone mode and Mesos coarsed-grained mode ['spark.cores.max' value is total expected resources for Mesos coarse-grained mode] ) to wait for before scheduling begins. Specified as a double between 0.0 and 1.0. Regardless of whether the minimum ratio of resources has been reached, the maximum amount of time it will wait before scheduling begins is controlled by config spark.scheduler.maxRegisteredResourcesWaitingTime.
Is there any advantage to starting more than one spark instance (master or worker) on a particular machine/node?
The spark standalone documentation doesn't explicitly say anything about starting a cluster or multiple workers on the same node. It does seem to implicitly conflate that one worker equals one node
Their hardware provisioning page says:
Finally, note that the Java VM does not always behave well with more than 200 GB of RAM. If you purchase machines with more RAM than this, you can run multiple worker JVMs per node. In Spark’s standalone mode, you can set the number of workers per node with the SPARK_WORKER_INSTANCES variable in conf/spark-env.sh, and the number of cores per worker with SPARK_WORKER_CORES.
So aside from working with large amounts of memory or testing cluster configuration, is there any benefit to running more than one worker per node?
I think the obvious benefit is to improve the resource utilization of the hardware per box without losing performance. In terms of parallelism, one big executor with multiple cores seems to be same with multiple executors with less cores.