Im using Spark 2.1.1 Standalone cluster,
Although I have 29 free cores in my cluster (Cores in use: 80 Total, 51 Used), when submitting new spark job with --total-executor-cores 16 this config is not taking affect and the job submitted only with 6 cores..
What am I missing?
(deleting checkpoints doesn't help)
Here is my spark-submit command:
PYSPARK_PYTHON="/usr/bin/python3.4"
PYSPARK_DRIVER_PYTHON="/usr/bin/python3.4" \
/opt/spark/spark-2.1.1-bin-hadoop2.7/bin/spark-submit \
--master spark://XXXX.XXXX:7077 \
--conf "spark.sql.shuffle.partitions=2001" \
--conf "spark.port.maxRetries=200" \
--conf "spark.executorEnv.PYTHONHASHSEED=0" \
--executor-memory 24G \
--total-executor-cores 16 \
--driver-memory 8G \
/home/XXXX/XXXX.py \
--spark_master "spark://XXXX.XXXX:7077" \
--topic "XXXX" \
--broker_list "XXXX" \
--hdfs_prefix "hdfs://XXXX"
My problem was the high number of memory I asked from spark (--executor-memory 24G) - spark tried to find worker nodes with 24G free memory and found only 2 nodes, each node had 3 free cores (that's why I saw only 6 cores).
When decreasing the number of memory to 8G, spark found the number of cores specified.
Related
Here is my cluster configuration:
Master nodes: 1 (16 vCPU, 64 GB memory)
Worker nodes: 2 (total of 64 vCPU, 256 GB memory)
Here is the Hive query I'm trying to run on the Spark SQL shell:
select a.*,b.name as name from (
small_tbl b
join
(select *
from large_tbl where date = '2019-01-01') a
on a.id = b.id);
Here is the query execution plan as shown on the Spark UI:
The configuration properties set while launching the shell are as follows:
spark-sql --conf spark.driver.maxResultSize=30g \
--conf spark.broadcast.compress=true \
--conf spark.rdd.compress=true \
--conf spark.memory.offHeap.enabled=true \
--conf spark.memory.offHeap.size=304857600 \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.executor.instances=12 \
--conf spark.executor.memory=16g
--conf spark.executor.cores=5 \
--conf spark.driver.memory=32g \
--conf spark.yarn.executor.memoryOverhead=512 \
--conf spark.executor.extrajavaoptions=-Xms20g \
--conf spark.executor.heartbeatInterval=30s \
--conf spark.shuffle.io.preferDirectBufs=true \
--conf spark.memory.fraction=0.5
I have tried most of the solutions suggested here and here which is evident in the properties set above. As far as I know it's not a good idea to increase the maxResultSize property on the driver side since datasets may grow beyond driver's memory size and driver shouldn't be used to store data in this scale.
I have executed the query on Tez engine successfully which took around 4 minutes, whereas Spark takes more than 15 mins to execute and terminates abruptly with the lack of heap space issue.
I strongly believe there must be a way to speed up the query execution on Spark. Please suggest me a solution that works for this kind of queries.
I am running a spark job with 3 files each of 100MB size, for some reason my spark UI shows all dataset concentrated into 2 executors.This is making the job run for 19 hrs and still running.
Below is my spark configuration . spark 2.3 is the version used.
spark2-submit --class org.mySparkDriver \
--master yarn-cluster \
--deploy-mode cluster \
--driver-memory 8g \
--num-executors 100 \
--conf spark.default.parallelism=40 \
--conf spark.yarn.executor.memoryOverhead=6000mb \
--conf spark.dynamicAllocation.executorIdleTimeout=6000s \
--conf spark.executor.cores=3 \
--conf spark.executor.memory=8G \
I tried repartitioning inside the code which works , as this makes the file go into 20 partitions (i used rdd.repartition(20)). But why should I repartition , i believe specifying spark.default.parallelism=40 in the script should let spark divide the input file to 40 executors and process the file in 40 executors.
Can anyone help.
Thanks,
Neethu
I am assuming you're running your jobs in YARN if yes, you can check following properties.
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.resource.memory-mb
yarn.scheduler.maximum-allocation-vcores
yarn.nodemanager.resource.cpu-vcores
In YARN these properties would affect number of containers that can be instantiated in a NodeManager based on spark.executor.cores, spark.executor.memory property values (along with executor memory overhead)
For example, if a cluster with 10 nodes (RAM : 16 GB, cores : 6) and set with following yarn properties
yarn.scheduler.maximum-allocation-mb=10GB
yarn.nodemanager.resource.memory-mb=10GB
yarn.scheduler.maximum-allocation-vcores=4
yarn.nodemanager.resource.cpu-vcores=4
Then with spark properties spark.executor.cores=2, spark.executor.memory=4GB you can expect 2 Executors/Node so total you'll get 19 executors + 1 container for Driver
If the spark properties are spark.executor.cores=3, spark.executor.memory=8GB then you will get 9 Executor (only 1 Executor/Node) + 1 container for Driver
you can refer to link for more details
Hope this helps
My Spark Streaming job failed with the below exception
Diagnostics: Container is running beyond physical memory limits.
Current usage: 1.5 GB of 1.5 GB physical memory used; 3.6 GB of 3.1 GB
virtual memory used. Killing container.
Here is my spark submit command
spark2-submit \
--name App name \
--class Class name \
--master yarn \
--deploy-mode cluster \
--queue Queue name \
--num-executors 5 --executor-cores 3 --executor-memory 5G \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.locality.wait=10 \
--conf spark.task.maxFailures=8 \
--conf spark.ui.killEnabled=false \
--conf spark.logConf=true \
--conf spark.yarn.driver.memoryOverhead=512 \
--conf spark.yarn.executor.memoryOverhead=2048 \
--conf spark.yarn.max.executor.failures=40 \
jar path
I am not sure what's causing the above issue. Am I missing something in the above command or is it failing as I didn't set --driver-memory in my spark submit command?
I have following settings in my Spark job:
--num-executors 2
--executor-cores 1
--executor-memory 12G
--driver memory 16G
--conf spark.streaming.dynamicAllocation.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.streaming.receiver.writeAheadLog.enable=false
--conf spark.executor.memoryOverhead=8192
--conf spark.driver.memoryOverhead=8192'
My understanding is job should run with 2 executors however it is running with 3. This is happening to multiple of my jobs. Could someone please explain the reason?
I have 1 driver and 6 core instances with 16GB ram and 8 cores each.
I am running spark-submit with below options:
spark-submit --driver-memory 4g \
--executor-memory 6g \
--num-executors 12 \
--executor-cores 2 \
--conf spark.driver.maxResultSize=0 \
--conf spark.network.timeout=800 job.py
I am getting Java heap memory error multiple times, I think there is something wrong with the options can someone help me out with this.
Thanks