dse spark-submit to specific work pool instead of "default" - apache-spark

I am able to successfully build the example project from https://github.com/datastax/SparkBuildExamples/tree/master/scala/sbt/dse/src/main/scala/com/datastax/spark/example
I am also successful in submitting dse spark-submit. The program runs fine and results are good as expected
dse spark-submit --class com.datastax.spark.example.WriteRead target/writeRead-0.1.jar
I now wish to submit it the above job to an existing pool as configured in dse.yaml
resource_manager_options:
worker_options:
cores_total: 6
memory_total: 32G
workpools:
- name: alwayson_sql
cores: 2
memory: 4G
- name: pool_1
cores: 2
memory: 16G
I am unable to determine how/what changes in code or spark-submit that I should do in order to submit the application to the pool "pool_1"
The application is submitted to the default pool and I am unable to submit it to "pool_1".
Please help.

After some additional research I figured out the correct way to dse spark-submit to use the pool "pool_1"
bin/dse spark-submit \
--master dse://?workpool=pool_1 \
--conf spark.network.timeout=500 \
--class com.datastax.spark.example.WriteRead target/writeRead-0.1.jar
(Per input from Alex)DSE Documentation:
Documentation link

Related

What is happening when starting a Spark application on Kubernetes

I read this: Running Spark on Kubernetes.
I want to know more details about the interaction between Kubernetes Controller/Scheduler and Spark runtime when launching a Spark job on K8s.
Specially, assuming we launch an Spark app by :
bin/spark-submit \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--..............
My question is: the K8s may not be able to allocate 5 executors (or called containers/pods) immediately due to unavailability of cluster resources at the moment the Spark app is launched. Which way does Spark app take? (1) Spark starts running tasks as soon as possible when there is at least one executor is allocated. (2) Spark won't launch any tasks until all of the 5 executors have been allocated.
If you know Hadoop YARN, it would be great if you could also answer the question in the scenario of running Spark app on Hadoop YARN(DynamicAllocation Disabled) and point out the difference.

Spark client mode - YARN allocates a container for driver?

I am running Spark on YARN in client mode, so I expect that YARN will allocate containers only for the executors. Yet, from what I am seeing, it seems like a container is also allocated for the driver, and I don't get as many executors as I was expecting.
I am running spark submit on the master node. Parameters are as follows:
sudo spark-submit --class ... \
--conf spark.master=yarn \
--conf spark.submit.deployMode=client \
--conf spark.yarn.am.cores=2 \
--conf spark.yarn.am.memory=8G \
--conf spark.executor.instances=5 \
--conf spark.executor.cores=3 \
--conf spark.executor.memory=10G \
--conf spark.dynamicAllocation.enabled=false \
While running this application, Spark UI's Executors page shows 1 driver and 4 executors (5 entries in total). I would expect 5, not 4 executors.
At the same time, YARN UI's Nodes tab shows that on the node that isn't actually used (at least according to Spark UI's Executors page...) there's a container allocated, using 9GB of memory. The rest of the nodes have containers running on them, 11GB of memory each.
Because in my Spark Submit the driver has 2GB less memory than executors, I think that the 9GB container allocated by YARN is for the driver.
Why is this extra container allocated? How can i prevent this?
Spark UI:
YARN UI:
Update after answer by Igor Dvorzhak
I was falsely assuming that the AM will run on the master node, and that it will contain the driver app (so setting spark.yarn.am.* settings will relate to the driver process).
So I've made the following changes:
set the spark.yarn.am.* settings to defaults (512m of memory, 1 core)
set the driver memory through spark.driver.memory to 8g
did not try to set driver cores at all, since it is only valid for cluster mode
Because AM on default settings takes up 512m + 384m of overhead, its container fits into the spare 1GB of free memory on a worker node.
Spark gets the 5 executors it requested, and the driver memory is appropriate to the 8g setting. All works as expected now.
Spark UI:
YARN UI:
Extra container is allocated for YARN application master:
In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Even though in client mode driver runs in the client process, YARN application master is still running on YARN and requires container allocation.
There are no way to prevent container allocation for YARN application master.
For reference, similar question asked time ago: Resource Allocation with Spark and Yarn.
You can specify the driver memory and number of executors in spark submit as below.
spark-submit --jars..... --master yarn --deploy-mode cluster --driver-memory 2g --driver-cores 4 --num-executors 5 --executor-memory 10G --executor-cores 3
Hope it helps you.

spark executors on mesos are unbalanced when setting role quota

I am going to run spark job on mesos, and I want to limit resource of specified role.
I try to run 5 executors in my job
spark-shell \
--master mesos://zk://host1:2181,host2:2181,host3:2181/mesos \
--conf spark.executor.cores=1 \
--conf spark.cores.max=5 \
--conf spark.mesos.role=myrole
It works well that I can get many resource offers to distribute executors when quota setting is disable.
18/01/25 13:35:49 DEBUG MesosCoarseGrainedSchedulerBackend: Received 4 resource offers.
If I enable quota setting(http://mesos.apache.org/documentation/latest/quota/), then I always get only 1 resource offer.
18/01/25 13:36:31 DEBUG MesosCoarseGrainedSchedulerBackend: Received 1 resource offers.
I have no idea what happened there
My environment:
spark 2.2
mesos 1.4.1 (master*3, slave*5)
CentOS 7.3

Erro spark-assembly-1.4.1-hadoop2.6.0.jar does not exist

I'm trying to submit a Spark app from local machine Terminal to my Cluster. I'm using --master yarn-cluster. I need to run the driver program on my Cluster too, not on the machine I do submit the application i.e my local machine
I'm using
bin/spark-submit
--class com.my.application.XApp
--master yarn-cluster --executor-memory 100m
--num-executors 50 hdfs://name.node.server:8020/user/root/x-service-1.0.0-201512141101-assembly.jar
1000
and getting error
Diagnostics: java.io.FileNotFoundException: File
file:/Users/nish1013/Dev/spark-1.4.1-bin-hadoop2.6/lib/spark-assembly-1.4.1-hadoop2.6.0.jar
does not exist
I can see in my service list ,
YARN + MapReduce2 2.7.1.2.3 Apache Hadoop NextGen MapReduce (YARN)
Spark 1.4.1.2.3 Apache Spark is a fast and general engine for
large-scale data processing.
already installed.
My spark-env.sh in local machine
export HADOOP_CONF_DIR=/Users/nish1013/Dev/hadoop-2.7.1/etc/hadoop
Has anyone encountered similar before ?
I think the right command to call is like following:
bin/spark-submit
--class com.my.application.XApp
--master yarn-cluster --executor-memory 100m
--num-executors 50 --conf spark.yarn.jars=hdfs://name.node.server:8020/user/root/x-service-1.0.0-201512141101-assembly.jar
1000
or you can add
spark.yarn.jars hdfs://name.node.server:8020/user/root/x-service-1.0.0-201512141101-assembly.jar
in your spark.default.conf file

Spark on Mesos Cluster - Task Fails

I'm trying to run a Spark application in a Mesos cluster where I have one master and one slave. The slave has 8GB RAM assigned for Mesos. The master is running the Spark Mesos Dispatcher.
I use the following command to submit a Spark application (which is a streaming application).
spark-submit --master mesos://mesos-master:7077 --class com.verifone.media.ums.scheduling.spark.SparkBootstrapper --deploy-mode cluster scheduling-spark-0.5.jar
And I see the following output which shows its successfully submitted.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/09/01 12:52:38 INFO RestSubmissionClient: Submitting a request to launch an application in mesos://mesos-master:7077.
15/09/01 12:52:39 INFO RestSubmissionClient: Submission successfully created as driver-20150901072239-0002. Polling submission state...
15/09/01 12:52:39 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20150901072239-0002 in mesos://mesos-master:7077.
15/09/01 12:52:39 INFO RestSubmissionClient: State of driver driver-20150901072239-0002 is now QUEUED.
15/09/01 12:52:40 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
"action" : "CreateSubmissionResponse",
"serverSparkVersion" : "1.4.1",
"submissionId" : "driver-20150901072239-0002",
"success" : true
}
However, this fails in Mesos, and when I look at the Spark Cluster UI, I see the following message.
task_id { value: "driver-20150901070957-0001" } state: TASK_FAILED message: "" slave_id { value: "20150831-082639-167881920-5050-4116-S6" } timestamp: 1.441091399975446E9 source: SOURCE_SLAVE reason: REASON_MEMORY_LIMIT 11: "\305-^E\377)N\327\277\361:\351\fm\215\312"
Seems like it is related to memory, but I'm not sure whether I have to configure something here to get this working.
UPDATE
I looked at the mesos logs in the slave, and I see the following message.
E0901 07:56:26.086618 1284 fetcher.cpp:515] Failed to run mesos-fetcher: Failed to fetch all URIs for container '33183181-e91b-4012-9e21-baa37485e755' with exit status: 256
So I thought that this could be because of the Spark Executor URL, so I modified the spark-submit to be as follows and increased memory for both driver and slave, but still I see the same error.
spark-submit \
--master mesos://mesos-master:7077 \
--class com.verifone.media.ums.scheduling.spark.SparkBootstrapper \
--deploy-mode cluster \
--driver-memory 1G \
--executor-memory 4G \
--conf spark.executor.uri=http://d3kbcqa49mib13.cloudfront.net/spark-1.4.1-bin-hadoop2.6.tgz \
scheduling-spark-0.5.jar
UPDATE 2
I went past this point by following #hartem's advice (see comments). Tasks are running now, but still, actual Spark application does not run in the cluster. When I look at the logs I see the following. After the last line, seems that Spark does not proceed any further.
15/09/01 10:33:41 INFO SparkContext: Added JAR file:/tmp/mesos/slaves/20150831-082639-167881920-5050-4116-S8/frameworks/20150831-082639-167881920-5050-4116-0004/executors/driver-20150901103327-0002/runs/47339c12-fb78-43d6-bc8a-958dd94d0ccf/spark-1.4.1-bin-hadoop2.6/../scheduling-spark-0.5.jar at http://192.172.1.31:33666/jars/scheduling-spark-0.5.jar with timestamp 1441103621639
I0901 10:33:41.728466 4375 sched.cpp:157] Version: 0.23.0
I0901 10:33:41.730764 4383 sched.cpp:254] New master detected at master#192.172.1.10:7077
I0901 10:33:41.730908 4383 sched.cpp:264] No credentials provided. Attempting to register without authentication
I had similar issue problem was slave could not find the required jar for running the class file(SparkPi). So i gave the http URL of the jar it worked, it requires jar to be placed in distributed system not on local file system.
/home/centos/spark-1.6.1-bin-hadoop2.6/bin/spark-submit \
--name SparkPiTestApp \
--class org.apache.spark.examples.SparkPi \
--master mesos://xxxxxxx:7077 \
--deploy-mode cluster \
--executor-memory 5G --total-executor-cores 30 \
http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.4.0-SNAPSHOT.jar 100
Could you please do export GLOG_v=1 before launching the slave and see if there is anything interesting in the slave log? I would also look for stdout and stderr files under the slave working directory and see if they contain any clues.

Resources