Spark driver memory exceeded the storage memory - apache-spark

I can not understand why in Spark UI, the driver used storage memory (2.1GB) exceeds the total available memory (1.5GB).
When I use the same application with Spark 2.1.1 I don't have the same behavior, the Spark driver memory is few Mb. Also, the application behavior becomes slower and slower with the same data.
My questions:
The used storage memory is a accumulation and not the current used memory?
Is a UI bug?
What are these 2.1Gb of data?

Related

Spark driver pod getting killed with 'OOMKilled' status

We are running a Spark Streaming application on a Kubernetes cluster using spark 2.4.5.
The application is receiving massive amounts of data through a Kafka topic (one message each 3ms). 4 executors and 4 kafka partitions are being used.
While running, the memory of the driver pod keeps increasing until it is getting killed by K8s with an 'OOMKilled' status. The memory of executors is not facing any issues.
When checking the driver pod resources using this command :
kubectl top pod podName
We can see that the memory increases until it reaches 1.4GB, and the pod is getting killed.
However, when checking the storage memory of the driver on Spark UI, we can see that the storage memory is not fully used (50.3 KB / 434 MB). Is there any difference between the storage memory of the driver, and the memory of the pod containing the driver ?
Has anyone had experience with a similar issue before?
Any help would be appreciated.
Here are few more details about the app :
Kubernetes version : 1.18
Spark version : 2.4.5
Batch interval of spark streaming context : 5 sec
Rate of input data : 1 kafka message each 3 ms
Scala language
In brief, the Spark memory consists of three parts:
Reversed memory (300MB)
User memory ((all - 300MB)*0.4), used for data processing logic.
Spark memory ((all-300MB)*0.6(spark.memory.fraction)), used for cache and shuffle in Spark.
Besides this, there is also max(executor memory * 0.1, 384MB)(0.1 is spark.kubernetes.memoryOverheadFactor) extra memory used by non-JVM memory in K8s.
Adding executor memory limit by memory overhead in K8S should fix the OOM.
You can also decrease spark.memory.fraction to allocate more RAM to user memory.

How to get Mesos Agents Framework Executor Memory

Inside Mesos Web UI I can see memory usage of my Spark executors in a table
Agents -> Framework -> Executors
There is a table listing all executors for my Spark driver and their memory usage is indicated in column Mem (Used / Allocated).
Is there a way to obtain this number directly via a link and if yes how?
For example I can obtain a bunch of Mesos metrics via http://IP/mesos/metrics/snapshot but memory usage of executors is not one of them.
The memory usage of executors in fact is related with mesos task, means for every task how many memory the executors will consume.
If above is what you need, you can use following rest api to get a json and then parse the memory used from it.
http://mesos_ip:5050/master/tasks
FYI.
Found the answer myself. For each worker/agent on which executors may run, direct access to memory info is here:
http://IP_of_worker1:5051/slave(1)/monitor/statistics
http://IP_of_worker2:5051/slave(1)/monitor/statistics
etc
The content is in the form of a json and framework_id allows to find the related executors and their memory consumption, cpu usage etc what is given in the table.

Spark execution memory monitoring [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
What I want is to be able to monitor Spark execution memory as opposed to storage memory available in SparkUI. I mean, execution memory NOT executor memory.
By execution memory I mean:
This region is used for buffering intermediate data when performing shuffles, joins, sorts and aggregations. The size of this region is configured through spark.shuffle.memoryFraction (default0.2).
According to: Unified Memory Management in Spark 1.6
After intense search for answers I found nothing but unanswered StackOverflow questions, answers that relate only to storage memory or ones with vague answers of the type use Ganglia, use Cloudera console etc...
There seems to be a demand for this information on Stack Overflow, and yet not a single satisfactory answer is available. Here are some top posts of StackOverflow when searching monitoring spark memory
Monitor Spark execution and storage memory utilisation
Monitoring the Memory Usage of Spark Jobs
SPARK: How to monitor the memory consumption on Spark cluster?
Spark - monitor actual used executor memory
How can I monitor memory and CPU usage by spark application?
How to get memory and cpu usage by a Spark application?
Questions
Spark version > 2.0
Is it possible to monitor Execution memory of Spark job? By monitoring I mean at minimum see used/available just like for storage memory per executor in Executor tab of SparkUI. Yes or No?
Could I do it with SparkListeners (#JacekLaskowski ?) How about history-server? Or the only way is through the external tools? Graphana, Ganglia, others? If external tools, could you please point to a tutorial or provide some more detailed guidelines?
I saw this SPARK-9103 Tracking spark's memory usage seems like it is not yet possible to monitor execution memory. Also this seems relevant SPARK-23206 Additional Memory Tuning Metrics.
Does Peak Execution memory is reliable estimate of usage/occupation of execution memory in a task? If for example it a Stage UI says that a task uses 1 Gb at peak, and I have 5 cpu per executor, does it mean I need at least 5 Gb execution memory available on each executor to finish a stage?
Are there some other proxies we could use to get a glimpse of execution memory?
Is there a way to know when the execution memory starts to eat into storage memory? When my cached table disappears from Storage tab in SparkUI or only part of it remains, does it mean it was evicted by the execution memory?
Answering my own question for future reference:
We are using Mesos as cluster manager. In the Mesos UI I found a page that lists all executors on a given worker and there one can find a Memory usage of the executor. It seems to be a total memory usage storage+execution. I can clearly see that when the memory fills up the executor dies.
To access:
Go to Agents tab which lists all cluster workers
Choose worker
Choose Framework - the one with the name of your script
Inside you will have a list of executors for your job running on this particular worker.
For memory usage see: Mem (Used / Allocated)
The similar can be done for driver. For a framework you choose the one with a name Spark Cluster
If you want to know how to extract this number programatically see my response to this question: How to get Mesos Agents Framework Executor Memory
I enable Spark internal metrics for executor and I can get information about JVMHeapMemory, jvm.heap.usage, OnHeapExecutionMemory, OnHeapStroageMemory and OnHeapUnifiedMemory for my research. Please refer to the doc (https://spark.apache.org/docs/3.0.0-preview/monitoring.html) for more information.

Cassandra outofMemory

We are having a 5 cluster setup for Cassandra 3.0.9. We are seeing outofMemory exception in Cassandra. It is using thrift library API 0.9.2. These outof memory Exceptions are every 2-3 days on random nodes from Cluster.
The Max heap size for each Cassandra process is 8GB and RAM is 32GB.
We tried to Analyze the heap dump and it shows each of the thrift Thread Object is 128MB and there are around 55 threads. These thrift Object are consuming a lot of memory i.e. around 7GB.
Heap Dump
We are not sure whether there is any Memory leak into the thrift API.
Any help would be really helpful.

What is and how to control Memory Storage in Executors tab in web UI?

I use Spark 1.5.2 for a Spark Streaming application.
What is this Storage Memory in Executors tab in web UI? How was this to reach 530 MB? How to change that value?
CAUTION: You use the very, very old and currently unsupported Spark 1.5.2 (which I noticed after I had posted the answer) and my answer is about Spark 1.6+.
The tooltip of Storage Memory may say it all:
Memory used / total available memory for storage of data like RDD partitions cached in memory.
It is part of Unified Memory Management feature that was introduced in SPARK-10000: Consolidate storage and execution memory management that (quoting verbatim):
Memory management in Spark is currently broken down into two disjoint regions: one for execution and one for storage. The sizes of these regions are statically configured and fixed for the duration of the application.
There are several limitations to this approach. It requires user expertise to avoid unnecessary spilling, and there are no sensible defaults that will work for all workloads. As a Spark user, I want Spark to manage the memory more intelligently so I do not need to worry about how to statically partition the execution (shuffle) memory fraction and cache memory fraction. More importantly, applications that do not use caching use only a small fraction of the heap space, resulting in suboptimal performance.
Instead, we should unify these two regions and let one borrow from another if possible.
Spark Properties
You can control the storage memory using spark.driver.memory or spark.executor.memory Spark properties that set up the entire memory space for a Spark application (the driver and executors) with the split between regions controlled by spark.memory.fraction and spark.memory.storageFraction.
You should consider watching the slides Memory Management in Apache Spark by the author Andrew Or and the video Deep Dive: Apache Spark Memory Management by the author himself (again).
You may want to read how the Storage Memory values (in web UI and internally) are calculated in How does web UI calculate Storage Memory (in Executors tab)?

Resources