I am running a spark job and even though I've set the the --num-executors parameter to 3 i can't see any executors in the in the web ui executors tab why is happening
Spark in local mode is non-distributed. Spark process will run on single JVM and driver will also behave as an executor.
You can only define number of threads in master URL.
You can switch to standalone mode.
Start the master using below command:
spark-class org.apache.spark.deploy.master.Master
And the worker using:
spark-class org.apache.spark.deploy.worker.Worker spark://<host>:7077
Now run the spark-submit command.
If you have 6 cores, just specifying --executor-cores 2 will create 3 executors and you can check the on spark UI.
Related
Lets assume that I am submitting a Spark application in yarn-client mode . In Spark submit I am passing the --num-executors as 10 . When the client submits this spark application to resourceManager,
Does resource manager allocate one executor container for Application master process from the --num-executors(10) and teh rest 9 will be given for actual executors ?
or
Does it allocate one new container for application master or give 10 containers for executors alone ?
--num-executors is to request that number of executors from a cluster manager (that may also be Hadoop YARN). That's Spark's requirement.
An application master (of a YARN application) is just a thing of YARN.
It may happen that a Spark application can also be a YARN application. In such case, the Spark application gets 10 containers and one extra container for the AM.
I am trying to submit multiple applications on spark.
After first application is completed, Spark allocates all the worker nodes to driver. As a result no cores are left for execution
My Environment: 2 worker nodes each with 1 core and 2GB RAM, the driver is running on the nodes.
Spark submit command: ./spark-submit --class Main --master spark://ip:6066 --deploy-mode cluster /jarPath
So if I submit 3 jobs, after first is completed, second and third gets one core each for their drivers and no cores are left for execution.
Please tell a way to resolve this.
Try killing old instances of spark:
http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications
./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>
You can find the driver ID through the standalone Master web UI at http://:8080.
I have a Spark standalone cluster running on a few machines. All workers are using 2 cores and 4GB of memory. I can start a job server with ./server_start.sh --master spark://ip:7077 --deploy-mode cluster --conf spark.driver.cores=2 --conf spark.driver.memory=4g, but whenever I try to start a server with more than 2 cores, the driver's state gets stuck at "SUBMITTED" and no worker takes the job.
I tried starting the spark-shell on 4 cores with ./spark-shell --master spark://ip:7077 --conf spark.driver.cores=4 --conf spark.driver.memory=4g and the job gets shared between 2 workers (2 cores each). The spark-shell gets launched as an application and not a driver though.
Is there any way to run a driver split between multiple workers? Or can I run the job server as an application rather than a driver?
The problem was resolved in the chat
You have to change your JobServer .conf file to set the master parameter to point to your cluster:
master = "spark://ip:7077"
Also, the memory that JobServer program uses can be set in the settings.sh file.
After setting these parameters, you can start JobServer with a simple call:
./server_start.sh
Then, once the service is running, you can create your context via REST, which will ask the cluster for resources and will receive an appropriate number of excecutors/cores:
curl -d "" '[hostname]:8090/contexts/cassandra-context?context-factory=spark.jobserver.context.CassandraContextFactory&num-cpu-cores=8&memory-per-node=2g'
Finally, every job sent via POST to JobServer on this created context will be able to use the executors allocated to the context and will be able to run in a distributed way.
I use Spark 1.3.0 in a cluster of 5 worker nodes with 36 cores and 58GB of memory each. I'd like to configure Spark's Standalone cluster with many executors per worker.
I have seen the merged SPARK-1706, however it is not immediately clear how to actually configure multiple executors.
Here is the latest configuration of the cluster:
spark.executor.cores = "15"
spark.executor.instances = "10"
spark.executor.memory = "10g"
These settings are set on a SparkContext when the Spark application is submitted to the cluster.
You first need to configure your spark standalone cluster, then set the amount of resources needed for each individual spark application you want to run.
In order to configure the cluster, you can try this:
In conf/spark-env.sh:
Set the SPARK_WORKER_INSTANCES = 10 which determines the number of Worker instances (#Executors) per node (its default value is only 1)
Set the SPARK_WORKER_CORES = 15 # number of cores that one Worker can use (default: all cores, your case is 36)
Set SPARK_WORKER_MEMORY = 55g # total amount of memory that can be used on one machine (Worker Node) for running Spark programs.
Copy this configuration file to all Worker Nodes, on the same folder
Start your cluster by running the scripts in sbin (sbin/start-all.sh, ...)
As you have 5 workers, with the above configuration you should see 5 (workers) * 10 (executors per worker) = 50 alive executors on the master's web interface (http://localhost:8080 by default)
When you run an application in standalone mode, by default, it will acquire all available Executors in the cluster. You need to explicitly set the amount of resources for running this application:
Eg:
val conf = new SparkConf()
.setMaster(...)
.setAppName(...)
.set("spark.executor.memory", "2g")
.set("spark.cores.max", "10")
Starting in Spark 1.4 it should be possible to configure this:
Setting: spark.executor.cores
Default: 1 in YARN mode, all the available cores on the worker in standalone mode.
Description: The number of cores to use on each executor. For YARN and standalone mode only. In standalone mode, setting this parameter allows an application to run multiple executors on the same worker, provided that there are enough cores on that worker. Otherwise, only one executor per application will run on each worker.
http://spark.apache.org/docs/1.4.0/configuration.html#execution-behavior
Until nowaday, Apache Spark 2.2 Standalone Cluster Mode Deployment don't resolve the issue of the number of EXECUTORS per WORKER,.... but there is an alternative for this, which is: launch Spark Executors Manually:
[usr#lcl ~spark/bin]# ./spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler#DRIVER-URL:PORT --executor-id val --hostname localhost-val --cores 41 --app-id app-20170914105902-0000-just-exemple --worker-url spark://Worker#localhost-exemple:34117
I hope that help you !
In stand-alone mode, by default, all the resources on the cluster are acquired as you launch an application. You need to specify the number of executors you need using the --executor-cores and the --total-executor-cores configs.
For example, if there is 1 worker (1 worker == 1 machine in your cluster, it's a good practice to have only 1 worker per machine) in your cluster which has 3 cores and 3G available in its pool (this is specified in spark-env.sh), when you submit an application with --executor-cores 1 --total-executor-cores 2 --executor-memory 1g, two executors are launched for the application with 1 core and 1g each. Hope this helps!
I use Spark 1.3.0 in a cluster of 5 worker nodes with 36 cores and 58GB of memory each. I'd like to configure Spark's Standalone cluster with many executors per worker.
I have seen the merged SPARK-1706, however it is not immediately clear how to actually configure multiple executors.
Here is the latest configuration of the cluster:
spark.executor.cores = "15"
spark.executor.instances = "10"
spark.executor.memory = "10g"
These settings are set on a SparkContext when the Spark application is submitted to the cluster.
You first need to configure your spark standalone cluster, then set the amount of resources needed for each individual spark application you want to run.
In order to configure the cluster, you can try this:
In conf/spark-env.sh:
Set the SPARK_WORKER_INSTANCES = 10 which determines the number of Worker instances (#Executors) per node (its default value is only 1)
Set the SPARK_WORKER_CORES = 15 # number of cores that one Worker can use (default: all cores, your case is 36)
Set SPARK_WORKER_MEMORY = 55g # total amount of memory that can be used on one machine (Worker Node) for running Spark programs.
Copy this configuration file to all Worker Nodes, on the same folder
Start your cluster by running the scripts in sbin (sbin/start-all.sh, ...)
As you have 5 workers, with the above configuration you should see 5 (workers) * 10 (executors per worker) = 50 alive executors on the master's web interface (http://localhost:8080 by default)
When you run an application in standalone mode, by default, it will acquire all available Executors in the cluster. You need to explicitly set the amount of resources for running this application:
Eg:
val conf = new SparkConf()
.setMaster(...)
.setAppName(...)
.set("spark.executor.memory", "2g")
.set("spark.cores.max", "10")
Starting in Spark 1.4 it should be possible to configure this:
Setting: spark.executor.cores
Default: 1 in YARN mode, all the available cores on the worker in standalone mode.
Description: The number of cores to use on each executor. For YARN and standalone mode only. In standalone mode, setting this parameter allows an application to run multiple executors on the same worker, provided that there are enough cores on that worker. Otherwise, only one executor per application will run on each worker.
http://spark.apache.org/docs/1.4.0/configuration.html#execution-behavior
Until nowaday, Apache Spark 2.2 Standalone Cluster Mode Deployment don't resolve the issue of the number of EXECUTORS per WORKER,.... but there is an alternative for this, which is: launch Spark Executors Manually:
[usr#lcl ~spark/bin]# ./spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler#DRIVER-URL:PORT --executor-id val --hostname localhost-val --cores 41 --app-id app-20170914105902-0000-just-exemple --worker-url spark://Worker#localhost-exemple:34117
I hope that help you !
In stand-alone mode, by default, all the resources on the cluster are acquired as you launch an application. You need to specify the number of executors you need using the --executor-cores and the --total-executor-cores configs.
For example, if there is 1 worker (1 worker == 1 machine in your cluster, it's a good practice to have only 1 worker per machine) in your cluster which has 3 cores and 3G available in its pool (this is specified in spark-env.sh), when you submit an application with --executor-cores 1 --total-executor-cores 2 --executor-memory 1g, two executors are launched for the application with 1 core and 1g each. Hope this helps!