Spark multiple jobs error - apache-spark

I am trying to submit multiple applications on spark.
After first application is completed, Spark allocates all the worker nodes to driver. As a result no cores are left for execution
My Environment: 2 worker nodes each with 1 core and 2GB RAM, the driver is running on the nodes.
Spark submit command: ./spark-submit --class Main --master spark://ip:6066 --deploy-mode cluster /jarPath
So if I submit 3 jobs, after first is completed, second and third gets one core each for their drivers and no cores are left for execution.
Please tell a way to resolve this.

Try killing old instances of spark:
http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications
./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>
You can find the driver ID through the standalone Master web UI at http://:8080.

Related

Why is the executors entry not visible in spark web ui

I am running a spark job and even though I've set the the --num-executors parameter to 3 i can't see any executors in the in the web ui executors tab why is happening
Spark in local mode is non-distributed. Spark process will run on single JVM and driver will also behave as an executor.
You can only define number of threads in master URL.
You can switch to standalone mode.
Start the master using below command:
spark-class org.apache.spark.deploy.master.Master
And the worker using:
spark-class org.apache.spark.deploy.worker.Worker spark://<host>:7077
Now run the spark-submit command.
If you have 6 cores, just specifying --executor-cores 2 will create 3 executors and you can check the on spark UI.

Does Yarn allocates one container for the application master from the number of executors that we pass in our spark-submit command

Lets assume that I am submitting a Spark application in yarn-client mode . In Spark submit I am passing the --num-executors as 10 . When the client submits this spark application to resourceManager,
Does resource manager allocate one executor container for Application master process from the --num-executors(10) and teh rest 9 will be given for actual executors ?
or
Does it allocate one new container for application master or give 10 containers for executors alone ?
--num-executors is to request that number of executors from a cluster manager (that may also be Hadoop YARN). That's Spark's requirement.
An application master (of a YARN application) is just a thing of YARN.
It may happen that a Spark application can also be a YARN application. In such case, the Spark application gets 10 containers and one extra container for the AM.

How do I run multiple spark applications in parallel in standalone master

Using Spark(1.6.1) standalone master, I need to run multiple applications on same spark master.
All application submitted after first one, keep on holding 'WAIT' state always. I also observed, the one running holds all cores sum of workers.
I already tried limiting it by using SPARK_EXECUTOR_CORES but its for yarn config, while I am running is "standalone master". I tried running many workers on same master but every time first submitted application consumes all workers.
I was having same problem on spark standalone cluster.
What I got is, Somehow it is utilising all the resources for one single job. We need to define the resources so that their will be space to run other job as well.
Below is the command I am using to submit spark job.
bin/spark-submit --class classname --master spark://hjvm1:6066 --deploy-mode cluster --driver-memory 500M --conf spark.executor.memory=1g --conf spark.cores.max=1 /data/test.jar
A crucial parameter for running multiple jobs in parallel on a Spark standalone cluster is spark.cores.max. Note that spark.executor.instances,
num-executors and spark.executor.cores alone won't allow you to achieve this on Spark standalone, all your jobs except a single active one will stuck with WAITING status.
Spark-standalone resource scheduling:
The standalone cluster mode currently only supports a simple FIFO
scheduler across applications. However, to allow multiple concurrent
users, you can control the maximum number of resources each
application will use. By default, it will acquire all cores in the
cluster, which only makes sense if you just run one application at a
time. You can cap the number of cores by setting spark.cores.max ...
I am assuming you run all the workers on one server and try to simulate a cluster. The reason for this assumption is that if otherwise you could use one worker and master to run Standalone Spark cluster.
The executor cores are something completely different compared to the normal cores. To set the number of executors you will need YARN to be turned on as you earlier said. The executor cores are the number of Concurrent tasks as executor can run (when using hdfs it is advisable to keep this below 5) [1].
The number of cores you want to limit to make the workers run are the “CPU cores”. These are specified in the configuration of Spark 1.6.1 [2]. In Spark there is the option to set the amount of CPU cores when starting a slave [3]. This happens with -c CORES, --cores CORES . Which defines the total CPU cores to allow Spark applications to use on the machine (default: all available); only on worker.
The command to start Spark would be something like this:
./sbin/start-all.sh --cores 2
Hope this helps
In the configuration settings add this line to "./conf/spark-env.sh " this file.
export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=1"
maximum cores now will limit to 1 for the master.
if multiple spark application is running then it will use only one core for the master. By then defining the amount of workers and give the workers the setting:
export SPARK_WORKER_OPTS="-Dspark.deploy.defaultCores=1"
Each worker has then one core as well. Remember this has to be set for every worker in the configuration settings.

why Spark executor needs to connect with Worker

When I kicked off one Spark job I will find the Executor startup command line as following:
bin/java -cp /opt/conf/:/opt/jars/* -Xmx1024M -Dspark.driver.port=56559
org.apache.spark.executor.CoarseGrainedExecutorBackend
--driver-url spark://CoarseGrainedScheduler#10.1.140.2:56559
--executor-id 1 --hostname 10.1.140.5 --cores 2
--app-id app-20161221132517-0000
--worker-url spark://Worker#10.1.140.5:56451
From above command we would find the line --worker-url spark://Worker#10.1.140.5:56451,that's I'm curious about, why Executor needs to communicate with Worker, in my mind executor only needs to talk with other executors and Driver.
You can see in the above image that Executors are part of worker nodes.
Application : User program built on Spark. Consists of a driver program and executors on the cluster.
Worker node : Any node that can run application code in the cluster
Executor : A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.
Source
Executor fate is connected with the worker fate. If worker is abnormally terminated executors have to be able to detect this fact and stop themselves. Without this process one could end up with "ghost" executors.

Running a distributed Spark Job Server with multiple workers in a Spark standalone cluster

I have a Spark standalone cluster running on a few machines. All workers are using 2 cores and 4GB of memory. I can start a job server with ./server_start.sh --master spark://ip:7077 --deploy-mode cluster --conf spark.driver.cores=2 --conf spark.driver.memory=4g, but whenever I try to start a server with more than 2 cores, the driver's state gets stuck at "SUBMITTED" and no worker takes the job.
I tried starting the spark-shell on 4 cores with ./spark-shell --master spark://ip:7077 --conf spark.driver.cores=4 --conf spark.driver.memory=4g and the job gets shared between 2 workers (2 cores each). The spark-shell gets launched as an application and not a driver though.
Is there any way to run a driver split between multiple workers? Or can I run the job server as an application rather than a driver?
The problem was resolved in the chat
You have to change your JobServer .conf file to set the master parameter to point to your cluster:
master = "spark://ip:7077"
Also, the memory that JobServer program uses can be set in the settings.sh file.
After setting these parameters, you can start JobServer with a simple call:
./server_start.sh
Then, once the service is running, you can create your context via REST, which will ask the cluster for resources and will receive an appropriate number of excecutors/cores:
curl -d "" '[hostname]:8090/contexts/cassandra-context?context-factory=spark.jobserver.context.CassandraContextFactory&num-cpu-cores=8&memory-per-node=2g'
Finally, every job sent via POST to JobServer on this created context will be able to use the executors allocated to the context and will be able to run in a distributed way.

Resources