I've observed that the RAM holds on to data even after stopping/quitting the spark-shell instance. How can I free RAM memory?
Even after you quit your spark-shell instance, the spark application will still be running. For the RAM to be released, you can try killing the application instance either using the web UI end point or directly on your shell
You can find the application process by the following
ps -aef|grep spark-shell
Once you find the process, then you can kill it by
kill -9 pid
Related
I am new to EMR and I am running an EMR cluster, with 1 master (32gb) and 5 core nodes (16gb). I launch 11 apps. The apps have to be separated in case one of them fail (all of them are streaming apps). I must mention that I also got ElasticSearch running on the cluster.
After some time the master node is running out of memory and stops responding and some apps starting to fail. In the process overview I found many smaller hadoop processes with that occupy 1-1.3GB of RAM. I guess these are the driver processes from each app. I tried to reduce the the driver memory under "spark.driver.memory" to 512MB, but it's still at 1.3GB after relaunching the apps. Is this because of yarn?
ES just allocates ca. 6.5 GB of RAM of the master node
I had to specify the driver memory in spark-submit command like this:
spark-submit --driver-memory 500M
because to specify it inside the python file is too late, when you run the driver in client mode, because it allocates the memory before
I have a micro test cluster, 3 nodes, 1core/1G per node, using standalone cluster manager.
When I launch a second spark-shell while the first is running, the latter gets killed abruptly.
I would like to understand the underlying mechanism.
Standalone Cluster manager is supposed to have a preemptive FIFO approach to resource acquisition.
To my understanding, the second spark-shell should be the one to be kicked out due to resources not being sufficient. Why is it that the existing spark-shell process is the one to be aborted?
i'm running spark using docker on DC/OS. When i submit the spark jobs, using the following memory configurations
Driver 2 Gb
Executor 2 Gb
Number of executors are 3.
The spark submit works fine, after 1 hour the docker container(worker container) crashes due to OOM (exit code 137). but my spark logs shows that 1Gb+ of memory is available.
The strange thing is the same jar which is running in the container , runs normally for almost 20+ hours in the standalone mode.
Is it the normal behaviour of the Spark contianers, or is there Something im doing wrong.Or are there any extra configuraton do I need to use for the docker container.
Thanks
It looks like I have a similar issue. Have you looked at the cache/buffer memory usage on the OS?
Using the command below you can get some info on the type of memory usage on the OS:
free -h
In my case the buffer / cache kept on growing until there was no more memory available in the Container. In my case the VM was a CentOS machine running on AWS and it crashed entirely when this happened.
Is your spark calling REST end point, if yes, try closing connections
I'm trying to setup a spark cluster and I've come across an annoying bug...
When I submit a spark application it runs fine on workers until I kill one (for example by using stop-slave.sh on the worker node).
When the worker is killed spark will then try to relaunch an executor on an available worker node but it fails everytime (I know because the webUI either displays FAILED or LAUNCHING for the executor, it never succeeds).
I can't seem to find any help, even on the documentation, so can someone assure me that spark can and will try to relaunch a worker on an available node if one is killed (on the same node where the worker previously ran or on another available node if the node where it previously rank is unreachable) ?
Here's the output from the worker node :
Spark worker error
Thank you for your help !
Right now I am running multiple instances of a jar (code written in scala) at the same time on a cluster with 24 cores and 64G memory, with Ubuntu 11.04 (GNU/Linux 2.6.38-15-generic x86_64). I observe an issue of heavy memory usage, which is super-linear to the number of instances I run. To be more specific, here is what I am doing
Code in scala and use sbt to pack into a jar.
Login to the cluster, use screen to open a new screen session.
Open multiple windows in this screen.
In each window, run java -cp myjar.jar main.scala.MyClass
What I observe is that, say when I only run 7 instances, about 10G memory is used, and everything is fine. Now I run 14 instances. Memory is quickly eaten up and all 64G is occupied, and then the machine slows down dramatically and it is even difficult to log in. By monitoring the machine through htop, I can see that only a few cores are running at a time. Can anyone tell me what is happening to my program and how to fix it so that I am able to use the computational resources efficiently? Thanks!
To use the computational resources efficiently, you would have to start one jar which starts multiple threads in one JVM. If you start 14 instances of the same jar, you have 14 isolated JVMs running.
Get ids of all java processes using jps
Get the most heavy weight process using jmap
Get heap dump of that process using the same jmap.
Analyze heap usage with jhat
Alternatively you could copy dump locally and explore with tools like Eclipse Memory Analyzer Open Source Project
If after solving this issue you totally loved these shell like tools(as I do) go through complete list of java troubleshooting tools - it will save you a lot of time so you could go to pub earlier instead of staying late and debugging memory/cpu issues