I am running a Node program that does a long running data migration job. After an hour is process, Node process terminates by Abrt daemon and creates core dump.
Looking into the reason I see this:
node process was killed by signal 6 (SIGABRT)
Any ideas why Node process is killed and how to deal with it?
It turned out to be MemoryLeak issue in Strong-Oracle module I am using. I have increased Nodejs process memory to run with 4G memory. Working fine now.
Related
I am trying to run a Next.js server on a DigitalOcean virtual machine. The server works, but when I run npm run start, the logs say Killed after ~1 minute.
Here is an example log of what happens:
joey#mydroplet:~/Server$ sudo node server
info - SWC minify release candidate enabled. https://nextjs.link/swcmin
event - compiled client and server successfully in 3.3s (196 modules)
wait - compiling...
event - compiled client and server successfully in 410 ms (196 modules)
> Ready on https://localhost:443
> Ready on http://localhost:8080
wait - compiling / (client and server)...
event - compiled client and server successfully in 1173 ms (261 modules)
Killed
joey#mydroplet:~/Server$
After some research, I came across a couple of threads which detail a server lacking enough memory/resources to continue the operation. I upgraded the memory from 512 mb to 1 gb, but this still happens.
Do I need to further upgrade the memory?
This is the plan that I am on:
It was the memory. Upgrading the memory of the server from 1 gb to 2 gb solved this problem.
This is the plan that worked for me:
In my device I enabled software watchdog to monitor a file which is updated every 5 second by a application. I have configured software watchdog as below
file = /data/file_name_to_watch
change = 10
Watchdog is getting started at bootup using below command during bootup:
/usr/sbin/watchdog.sh -f -v -c watchdog.conf
Application which is responsible to update the file(file_name_to_watch) is started after watchdog deamon during bootup. File being monitored by watchdog is updated every 5 seconds by the application.
Problem is that watchdog is rebooting the system if it is started at bootup and this same problem doesn't exist when watchdog is not started at bootup but started manually after application is launched.
dmesg shows "Watchdog did not stop"
Also, changing the watchdog configuration file to below didn't help.
file = /data/file_name_to_watch
change = 20
I have checked that the file is getting updated before 10 seconds elapsed after watchdog is launched during bootup.
Any pointers to debug this problem will be appreciated.
Code which I am using for watchdog: https://layers.openembedded.org/layerindex/recipe/122/
Debugged and found the problem to be time(NULL) returning a huge number in src/file_stat.c
This was happening due to date being not set very early during bootup.
I am creating a shell script which will be executed from Jenkins because we have many streaming jobs and it seems easier to manager from Jenkins. So I have created the below script.
#!/bin/bash
spark-submit "spark parameters here" > /dev/null 2>&1 &
processId=$!
echo $processId
sleep 5m
kill $processId
If I don't have a sleep, the spark-submit process is killed immediately and no spark application is submitted. And if there is a sleep the spark-submit process gets enough time to submit the spark application.
My question is, is there a better way to know if the spark application is in RUNNING state so that the spark-submit process can be killed ?
Spark 1.6.0 with YARN
You should spark-submit your Spark application and use yarn application -status <ApplicationId> as described in application section:
Prints the status of the application.
You could get <ApplicationId> from the logs of spark-submit (in client deploy mode) or use yarn application -list -appType SPARK -appStates RUNNING.
I don't know what Spark version you are using or if you are running in standalone mode, but anyway, you can use the REST API for submitting/killing your apps. The last time I checked it was pretty much undocumented, but it worked properly.
When you submit an application, you will get a submissionId which you can use later for either getting the current state or killing it. The possible states are documented here:
// SUBMITTED: Submitted but not yet scheduled on a worker
// RUNNING: Has been allocated to a worker to run
// FINISHED: Previously ran and exited cleanly
// RELAUNCHING: Exited non-zero or due to worker failure, but has not yet started running again
// UNKNOWN: The state of the driver is temporarily not known due to master failure recovery
// KILLED: A user manually killed this driver
// FAILED: The driver exited non-zero and was not supervised
// ERROR: Unable to run or restart due to an unrecoverable error (e.g. missing jar file)
This is specially useful for long-running apps (e.g. streaming), since you don't have to babysit the shell script.
I have a BitBake build process that runs on a Docker container (CentOS 7). The BitBake fails during recipe gcc-cross-i586-5.2.0-r0: task do_compile on each run that I try it in.
An example of bitbake's output:
NOTE: recipe gcc-cross-i586-5.2.0-r0: task do_compile: Started
ERROR: Worker process (367) exited unexpectedly (-9), shutting down...
ERROR: Worker process (367) exited unexpectedly (-9), shutting down...
ERROR: Worker process (367) exited unexpectedly (-9), shutting down...
ERROR: Worker process (367) exited unexpectedly (-9), shutting down...
NOTE: Tasks Summary: Attempted 1538 tasks of which 17 didn't need to be rerun and all succeeded.
Is this a problem with recipe gcc-cross-i586-5.2.0-r0: task do_compile? Perhaps an out-of-memory error? I don't know what the -9 refers to or how to find out more information about it.
Try:
$ bitbake -c cleansstate gcc-cross ; bitbake -k gcc-cross
How much you have memory of ram?
Report log error here.
This worked for me,
Edit conf/local.conf and decrease the number of working threads by adding the following to you conf/local.conf file (under the build directory):
BB_NUMBER_THREADS = "6"
Just a long shot, -9 in kernel land means EBADF (bad file number.) Is it possible you have done some operations as root and some files are not accessible during the build? Is the issue reproducible? ie. can you rm -rf tmp and does it happen again? Make sure you don't have any permissions issues in your project directory and associated file system(s).
I have stuck in a such problem that my c++ server program can't coredump when terminate abnormally. The program running in daemon mode with chdir to '/'.
I had done the following things:
ulimit -c unlimited, so coredump enabled.
echo "/tmp/coredump/core.%e.%p.%t" > /proc/sys/kernel/core_pattern, and chmod a+w coredump, so it has the permission to write coredump file.
and I had try such things:
send a SIGABRT via kill -6, it can coredump.
in dmesg, I can't find any info about the abnormally terminates process.
running the program not in daemon mode.
My OS version: CentOS release 6.4 (Final), x86_64
ps. the server program installed a signal handler (sigaction() with flag SA_RESETHAND) to catch such signals {SIGHUP, SIGINT, SIGQUIT, SIGTERM} for normally terminates(free resources). so it can exclude the signal shielding.