I've written some JavaScript code which runs under node.js to backup a large (~20 MB) file to an Azure blob. This works when run from a bash shell, but fails with the following error when run as a cron job (run as root in both contexts):
FATAL ERROR: v8::Context::New() V8 is no longer usable
Presumably this means that it runs out of heap space, but where is the limit set for cron jobs?
(This is on a 64-bit RHEL 6.2 server, with 8GB RAM and 426 GB free disk space. Node.js is version 0.8.1 and Azure is from the file azure-2012-06.tar.gz.)
Thanks
Keith
When you say large do you really mean ~20MB as large file? How long does your cron job runs before return OOM? Also when you run cron job have you checked the memory usage to verify the actual usage to verify if that is the cause.
About your memory/heap limit in cron job, cron just act as job scheduling engine so node.js still uses its own memory/heap setting, this should not be specific to cron scheduler. If you want to change/modify V8 memory you can use --max-old-space-size option to boost it to higher value.
I am interested to see how do you run your node.js code as cron job. There is some possibility that the way you are writing cron job, you are consuming lots of memory while making call to Azure Storage.
Related
I'm trying to fetch some data from Cloudera's Quick Start Hadoop distribution (a Linux VM for us) on our SAP HANA database using SAP Spark Controller. Every time I trigger the job in HANA, it gets stuck and I see the following warning being logged continuously every 10-15 seconds in SPARK Controller's log file, unless I kill the job.
WARN org.apache.spark.scheduler.cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Although it's logged like a warning it looks like it's a problem that prevents the job from executing on Cloudera. From what I read, it's either an issue with the resource management on Cloudera, or an issue with blocked ports. In our case we don't have any blocked ports so it must be the former.
Our Cloudera is running a single node and has 16GB RAM with 4 CPU cores.
Looking at the overall configuration I have a bunch of warnings, but I can't determine if they are relevant to the issue or not.
Here's also how the RAM is distributed on Cloudera
It would be great if you can help me pinpoint the cause for this issue because I've been trying various combinations of things over the past few days without any success.
Thanks,
Dimitar
You're trying to use the Cloudera Quickstart VM for a purpose beyond it's capacity. It's really meant for someone to play around with Hadoop and CDH and should not be used for any production level work.
Your Node Manager only has 5GB of memory to use for compute resources. In order to do any work, you need to create an Application Master(AM) and a Spark Executor and then have reserve memory for your executors which you won't have on a Quickstart VM.
I have a daily pipeline running on Spark Standalone 2.1. Its deployed in and runs on AWS EC2 and uses S3 for its persistence layer. For the most part, the pipeline runs without a hitch, but occasionally the job hangs on a single worker node during a reduceByKey operation. When I work into the worker, I notice that the CPU (as seen via top) is pegged at 100%. My remedy so far is to reboot the worker node so that Spark re-assigns the task and the job proceeds fine from there.
I would like to be able to mitigate this issue. I gather that I can prevent CPU pegging by switching to use YARN as my cluster manager, but I wonder whether I could configure Spark Standalone to prevent CPU pegging by maybe limiting the number of cores that get assigned to the Spark job ? Any suggestions would be greatly appreciated.
I have a very weird memory issue (which is what a lot of people will most
likely say ;-)) with Spark running in standalone mode inside a Docker
container. Our setup is as follows: We have a Docker container in which we have a Spring boot application that runs Spark in standalone mode. This Spring boot app also contains a few scheduled tasks (managed by Spring). These tasks trigger Spark jobs. The Spark jobs scrape a SQL database, shuffles the data a bit and then writes the results to a different SQL table (writing the results doesn't go through Spark). Our current data set is very small (the table contains a few million rows).
The problem is that the Docker host (a CentOS VM) that runs the Docker
container crashes after a while because the memory gets exhausted. I currently have limited the Spark memory usage to 512M (I have set both executor and driver memory) and in the Spark UI I can see that the largest job only takes about 10 MB of memory. I know that Spark runs best if it has 8GB of memory or more available. I have tried that as well but the results are the same.
After digging a bit further I noticed that Spark eats up all the buffer / cache memory on the machine. After clearing this manually by forcing Linux to drop caches (echo 2 > /proc/sys/vm/drop_caches) (clearing the dentries and inodes) the cache usage drops considerably but if I don't keep doing this regularly I see that the cache usage slowly keeps going up until all memory is used in buffer/cache.
Does anyone have an idea what I might be doing wrong / what is going on here?
Big thanks in advance for any help!
i'm running spark using docker on DC/OS. When i submit the spark jobs, using the following memory configurations
Driver 2 Gb
Executor 2 Gb
Number of executors are 3.
The spark submit works fine, after 1 hour the docker container(worker container) crashes due to OOM (exit code 137). but my spark logs shows that 1Gb+ of memory is available.
The strange thing is the same jar which is running in the container , runs normally for almost 20+ hours in the standalone mode.
Is it the normal behaviour of the Spark contianers, or is there Something im doing wrong.Or are there any extra configuraton do I need to use for the docker container.
Thanks
It looks like I have a similar issue. Have you looked at the cache/buffer memory usage on the OS?
Using the command below you can get some info on the type of memory usage on the OS:
free -h
In my case the buffer / cache kept on growing until there was no more memory available in the Container. In my case the VM was a CentOS machine running on AWS and it crashed entirely when this happened.
Is your spark calling REST end point, if yes, try closing connections
Right now I am running multiple instances of a jar (code written in scala) at the same time on a cluster with 24 cores and 64G memory, with Ubuntu 11.04 (GNU/Linux 2.6.38-15-generic x86_64). I observe an issue of heavy memory usage, which is super-linear to the number of instances I run. To be more specific, here is what I am doing
Code in scala and use sbt to pack into a jar.
Login to the cluster, use screen to open a new screen session.
Open multiple windows in this screen.
In each window, run java -cp myjar.jar main.scala.MyClass
What I observe is that, say when I only run 7 instances, about 10G memory is used, and everything is fine. Now I run 14 instances. Memory is quickly eaten up and all 64G is occupied, and then the machine slows down dramatically and it is even difficult to log in. By monitoring the machine through htop, I can see that only a few cores are running at a time. Can anyone tell me what is happening to my program and how to fix it so that I am able to use the computational resources efficiently? Thanks!
To use the computational resources efficiently, you would have to start one jar which starts multiple threads in one JVM. If you start 14 instances of the same jar, you have 14 isolated JVMs running.
Get ids of all java processes using jps
Get the most heavy weight process using jmap
Get heap dump of that process using the same jmap.
Analyze heap usage with jhat
Alternatively you could copy dump locally and explore with tools like Eclipse Memory Analyzer Open Source Project
If after solving this issue you totally loved these shell like tools(as I do) go through complete list of java troubleshooting tools - it will save you a lot of time so you could go to pub earlier instead of staying late and debugging memory/cpu issues