spark scalability: what am I doing wrong? - apache-spark

I am processing data with spark and it works with a day worth of data (40G) but fails with OOM on a week worth of data:
import pyspark
import datetime
import operator
sc = pyspark.SparkContext()
sqc = pyspark.sql.SQLContext(sc)
sc.union([sqc.parquetFile(hour.strftime('.....'))
.map(lambda row:(row.id, row.foo))
for hour in myrange(beg,end,datetime.timedelta(0,3600))]) \
.reduceByKey(operator.add).saveAsTextFile("myoutput")
The number of different IDs is less than 10k.
Each ID is a smallish int.
The job fails because too many executors fail with OOM.
When the job succeeds (on small inputs), "myoutput" is about 100k.
what am I doing wrong?
I tried replacing saveAsTextFile with collect (because I actually want to do some slicing and dicing in python before saving), there was no difference in behavior, same failure. is this to be expected?
I used to have reduce(lambda x,y: x.union(y), [sqc.parquetFile(...)...]) instead of sc.union - which is better? Does it make any difference?
The cluster has 25 nodes with 825GB RAM and 224 cores among them.
Invocation is spark-submit --master yarn --num-executors 50 --executor-memory 5G.
A single RDD has ~140 columns and covers one hour of data, so a week is a union of 168(=7*24) RDDs.

Spark very often suffers from Out-Of-Memory errors when scaling. In these cases, fine tuning should be done by the programmer. Or recheck your code, to make sure that you don't do anything that is way too much, such as collecting all the bigdata in the driver, which is very likely to exceed the memoryOverhead limit, no matter how big you set it.
To understand what is happening you should realize when yarn decides to kill a container for exceeding memory limits. That will happen when the container goes beyond the memoryOverhead limit.
In the Scheduler you can check the Event Timeline to see what happened with the containers. If Yarn has killed a container, it will be appear red and when you hover/click over it, you will see a message like:
Container killed by YARN for exceeding memory limits. 16.9 GB of 16 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
So in that case, what you want to focus on is these configuration properties (values are examples on my cluster):
# More executor memory overhead
spark.yarn.executor.memoryOverhead 4096
# More driver memory overhead
spark.yarn.driver.memoryOverhead 8192
# Max on my nodes
#spark.executor.cores 8
#spark.executor.memory 12G
# For the executors
spark.executor.cores 6
spark.executor.memory 8G
# For the driver
spark.driver.cores 6
spark.driver.memory 8G
The first thing to do is to increase the memoryOverhead.
In the driver or in the executors?
When you are overviewing your cluster from the UI, you can click on the Attempt ID and check the Diagnostics Info which should mention the ID of the container that was killed. If it is the same as with your AM Container, then it's the driver, else the executor(s).
That didn't resolve the issue, now what?
You have to fine tune the number of cores and the heap memory you are providing. You see pyspark will do most of the work in off-heap memory, so you want not to give too much space for the heap, since that would be wasted. You don't want to give too less, because the Garbage Collector will have issues then. Recall that these are JVMs.
As described here, a worker can host multiple executors, thus the number of cores used affects how much memory every executor has, so decreasing the #cores might help.
I have it written in memoryOverhead issue in Spark and Spark – Container exited with a non-zero exit code 143 in more detail, mostly that I won't forget! Another option, that I haven't tried would be spark.default.parallelism or/and spark.storage.memoryFraction, which based on my experience, didn't help.
You can pass configurations flags as sds mentioned, or like this:
spark-submit --properties-file my_properties
where "my_properties" is something like the attributes I list above.
For non numerical values, you could do this:
spark-submit --conf spark.executor.memory='4G'

It turned out that the problem was not with spark, but with yarn.
The solution is to run spark with
spark-submit --conf spark.yarn.executor.memoryOverhead=1000
(or modify yarn config).

Related

spark - application returns different results based on different executor memory?

I am noticing some peculiar behaviour, i have spark job which reads the data and does some grouping ordering and join and creates an output file.
The issue is when I run the same job on yarn with memory more than what the environment has eg the cluster has 50 GB and i submit spark-submit with close to 60 GB executor and 4gb driver memory.
My results gets decreased seems like one of the data partitions or tasks are lost while processing.
driver-memory 4g --executor-memory 4g --num-executors 12
I also notice the warning message on driver -
WARN util.Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
but when i run with limited executors and memory example 15GB, it works and i get exact rows/data. no warning message.
driver-memory 2g --executor-memory 2g --num-executors 4
any suggestions are we missing some settings on cluster or anything?
Please note my job completes successfully in both the cases.
I am using spark version 2.2.
This is meaningless (except maybe for debugging) - the plan is larger when there are more executors involved and the warning is that it is too big to be converted into a string. if you need it you can set spark.debug.maxToStringFields to a larger number (as suggested in the warning message)

Spark: executor memory exceeds physical limit

My input dataset is about 150G.
I am setting
--conf spark.cores.max=100
--conf spark.executor.instances=20
--conf spark.executor.memory=8G
--conf spark.executor.cores=5
--conf spark.driver.memory=4G
but since data is not evenly distributed across executors, I kept getting
Container killed by YARN for exceeding memory limits. 9.0 GB of 9 GB physical memory used
here are my questions:
1. Did I not set up enough memory in the first place? I think 20 * 8G > 150G, but it's hard to make perfect distribution, so some executors will suffer
2. I think about repartition the input dataFrame, so how can I determine how many partition to set? the higher the better, or?
3. The error says "9 GB physical memory used", but i only set 8G to executor memory, where does the extra 1G come from?
Thank you!
When using yarn, there is another setting that figures into how big to make the yarn container request for your executors:
spark.yarn.executor.memoryOverhead
It defaults to 0.1 * your executor memory setting. It defines how much extra overhead memory to ask for in addition to what you specify as your executor memory. Try increasing this number first.
Also, a yarn container won't give you memory of an arbitrary size. It will only return containers allocated with a memory size that is a multiple of it's minimum allocation size, which is controlled by this setting:
yarn.scheduler.minimum-allocation-mb
Setting that to a smaller number will reduce the risk of you "overshooting" the amount you asked for.
I also typically set the below key to a value larger than my desired container size to ensure that the spark request is controlling how big my executors are, instead of yarn stomping on them. This is the maximum container size yarn will give out.
nodemanager.resource.memory-mb
The 9GB is composed of the 8GB executor memory which you add as a parameter, spark.yarn.executor.memoryOverhead which is set to .1, so the total memory of the container is spark.yarn.executor.memoryOverhead + (spark.yarn.executor.memoryOverhead * spark.yarn.executor.memoryOverhead) which is 8GB + (.1 * 8GB) ≈ 9GB.
You could run the entire process using a single executor, but this would take ages. To understand this you need to know the notion of partitions and tasks. The number of partition is defined by your input and the actions. For example, if you read a 150gb csv from hdfs and your hdfs blocksize is 128mb, you will end up with 150 * 1024 / 128 = 1200 partitions, which maps directly to 1200 tasks in the Spark UI.
Every single tasks will be picked up by an executor. You don't need to hold all the 150gb in memory ever. For example, when you have a single executor, you obviously won't benefit from the parallel capabilities of Spark, but it will just start at the first task, process the data, and save it back to the dfs, and start working on the next task.
What you should check:
How big are the input partitions? Is the input file splittable at all? If a single executor has to load a massive amount of memory, it will run out of memory for sure.
What kind of actions are you performing? For example, if you do a join with very low cardinality, you end up with a massive partitions because all the rows with a specific value, end up in the same partitions.
Very expensive or inefficient actions performed? Any cartesian product etc.
Hope this helps. Happy sparking!

Spark shows different number of cores than what is passed to it using spark-submit

TL;DR
Spark UI shows different number of cores and memory than what I'm asking it when using spark-submit
more details:
I'm running Spark 1.6 in standalone mode.
When I run spark-submit I pass it 1 executor instance with 1 core for the executor and also 1 core for the driver.
What I would expect to happen is that my application will be ran with 2 cores total.
When I check the environment tab on the UI I see that it received the correct parameters I gave it, however it still seems like its using a different number of cores. You can see it here:
This is my spark-defaults.conf that I'm using:
spark.executor.memory 5g
spark.executor.cores 1
spark.executor.instances 1
spark.driver.cores 1
Checking the environment tab on the Spark UI shows that these are indeed the received parameters but the UI still shows something else
Does anyone have any idea on what might cause Spark to use different number of cores than what I want I pass it? I obviously tried googling it but didn't find anything useful on that topic
Thanks in advance
TL;DR
Use spark.cores.max instead to define the total number of cores available, and thus limit the number of executors.
In standalone mode, a greedy strategy is used and as many executors will be created as there are cores and memory available on your worker.
In your case, you specified 1 core and 5GB of memory per executor.
The following will be calculated by Spark :
As there are 8 cores available, it will try to create 8 executors.
However, as there is only 30GB of memory available, it can only create 6 executors : each executor will have 5GB of memory, which adds up to 30GB.
Therefore, 6 executors will be created, and a total of 6 cores will be used with 30GB of memory.
Spark basically fulfilled what you asked for. In order to achieve what you want, you can make use of the spark.cores.max option documented here and specify the exact number of cores you need.
A few side-notes :
spark.executor.instances is a YARN-only configuration
spark.driver.memory defaults to 1 core already
I am also working on easing the notion of the number of executors in standalone mode, this might get integrated into a next release of Spark and hopefully help figuring out exactly the number of executors you are going to have, without having to calculate it on the go.

How to set executor number by memory in YARN mode?

I did some testing on r3.8 xlarge cluster, each instance has 32 cores, and 244G memory.
If I set spark.executor.cores=16, spark.executor.memory=94G, there're 2 executors per instance, but when I set spark.executor.memory larger than 94G, there will be only one executor per instance;
If I set spark.executor.cores=8, spark.executor.memory=35G, there're 4 executors per instance, but when I set spark.executor.memory larger than 35, there will be no larger than 3 executors per instance.
So, my question is, how does the executor number come out by memory set? What's the formula? I though the Spark just simply use 70% of the physical memory to allocate to the executors but seems I'm wrong...
In Yarn mode you need to set number of executor by num-executors and executor memory by executor-memory. Here's a example:
spark-submit --master yarn-cluster --executor-memory 6G --num-executors 31 --executor-cores 32 example.jar Example
Now each executor requests a container from yarn with 6G + memory overhead and 1 core.
More info on spark documentation
Regarding the behavior you're seeing it sounds like the amount of memory available to your YARN NodeManagers is actually less than the 244GB that is available to the OS. To verify this, take a look at your YARN ResourceManager Web UI and you can see how much memory is availible in total across the cluster. This is determined from the yarn.nodemanager.resource.memory-mb in yarn-site.xml.
To answer your question about how the number of executors is determined: In YARN, if you're using spark with dynamicAllocation.enabled set to true, the number of executors is limited above dynamicAllocation.minExecutors and below dynamicAllocation.maxExecutors.
Other than that you're then subjected to YARN's resource allocation which, for most schedulers, will allocate resources to fill up a given queue that your job runs in.
In the situation where you have a totally unutilized cluster with one YARN queue and you submit a job to it, the Spark job will continue to add executors with the given number of cores and memory amount until the entire cluster is full (or there is not enough cores/memory for an additional executor to be allocated).

How to control how many executors to run in yarn-client mode?

I have a Hadoop cluster of 5 nodes where Spark runs in yarn-client mode.
I use --num-executors for the number of executors. The maximum number of executors I am able to get is 20. Even if I specify more, I get only 20 executors.
Is there any upper limit on the number of executors that can get allocated ? Is it a configuration or the decision is made on the basis of the resources available ?
Apparently your 20 running executors consume all available memory. You can try decreasing Executor memory with spark.executor.memory parameter, which should leave a bit more place for other executors to spawn.
Also, are you sure that you correctly set the executors number? You can verify your environment settings from Spark UI view by looking at the spark.executor.instances value in the Environment tab.
EDIT: As Mateusz Dymczyk pointed out in comments, limited number of executors may not only be caused by overused RAM memory, but also by CPU cores. In both cases the limit comes from the resource manager.

Resources