Presto worker process mysteriously killed and restarted sometime - presto

In our presto cluster (0.212) with ~200 nodes (EC2 instances), sometime (like once per day) a few presto worker processes mysteriously restart (usually around the same time when it happens). The EC2 instances are fine and memory % metrics indicate 70% memory was used.
Does presto worker has any kind of suicide and restart logic (like restart if >= M consecutive errors in a row)? Or can presto coordinator restart worker under some situations? What else might kill a few worker process around the same time?
Here is one example of the server log that shows the restart.
2018-11-14T23:16:28.78011 2018-11-14T23:16:28.776Z INFO Thread-63 io.airlift.bootstrap.LifeCycleManager Life cycle stopping...
2018-11-14T23:16:29.17181 ThreadDump 4524
2018-11-14T23:16:29.17182 ForceSafepoint 414
2018-11-14T23:16:29.17182 Deoptimize 66
2018-11-14T23:16:29.17182 CollectForMetadataAllocation 11
2018-11-14T23:16:29.17182 CGC_Operation 272
2018-11-14T23:16:29.17182 G1IncCollectionPause 2900
2018-11-14T23:16:29.17183 EnableBiasedLocking 1
2018-11-14T23:16:29.17183 RevokeBias 6248
2018-11-14T23:16:29.17183 BulkRevokeBias 272
2018-11-14T23:16:29.17183 Exit 1
2018-11-14T23:16:29.17183 931 VM operations coalesced during safepoint
2018-11-14T23:16:29.17184 Maximum sync time 197 ms
2018-11-14T23:16:29.17184 Maximum vm operation time (except for Exit VM operation) 2599 ms
2018-11-14T23:16:29.52968 ./finish: line 37: kill: (3700) - No such process
2018-11-14T23:16:29.52969 ./finish: line 37: kill: (3702) - No such process
2018-11-14T23:16:31.53563 ./finish: line 40: kill: (3704) - No such process
2018-11-14T23:16:31.53564 ./finish: line 40: kill: (3706) - No such process
2018-11-14T23:16:32.25948 2018-11-14T23:16:32.257Z INFO main io.airlift.log.Logging Logging to stderr
2018-11-14T23:16:32.26034 2018-11-14T23:16:32.260Z INFO main Bootstrap Loading configuration
2018-11-14T23:16:32.33800 2018-11-14T23:16:32.337Z INFO main Bootstrap Initializing logging
......
2018-11-14T23:16:35.75427 2018-11-14T23:16:35.754Z INFO main io.airlift.bootstrap.LifeCycleManager Life cycle starting...
2018-11-14T23:16:35.75556 2018-11-14T23:16:35.755Z INFO main io.airlift.bootstrap.LifeCycleManager Life cycle startup complete. System ready.
If relevant, these "./finish: ..." lines in the log are related to the the /etc/service/presto/finish file below.
1 #!/bin/bash
2 set -e
3 exec 2>&1
4 exec 3>>/var/log/runit/runit.log
5
6 STATSD_PREFIX="runit.presto"
7 source /etc/statsd/functions
8
9 function error_handler() {
10 echo "$(date +"%Y-%m-%dT%H:%M:%S.%3NZ") Error occurred in run file at line: $1."
11 echo "$(date +"%Y-%m-%dT%H:%M:%S.%3NZ") Line exited with status: $2"
12 incr "finish.error"
13 }
14 trap 'error_handler $LINENO $?' ERR
15 echo "$(date +"%Y-%m-%dT%H:%M:%S.%3NZ") process=presto status=stopped exitcode=$1 waitcode=$2" >&3
16 # treat non-zero exit codes as a crash
17 # waitcode contains the signal if there's one (ex. 11 - SIGSEGV)
18 if [ $1 -ne 0 ]; then
19 incr "finish.crash"
20 fi
21
22
23 # ensure that we kill the entire process group.
24 # When sv force-restart runs, it will try to TERM the runit processes. If
25 # this doesn't work, it will kill (-9) the process. In case of haproxy,
26 # apache, gunicorn, etc., the master process will be killed (-9). Child processes
27 # (ie apache workers, gunicorn workers) will *not* be killed and will be
28 # around for minutes (if not hours). These child workers will keep
29 # listening on the socket, preventing the new master apache/gunicorn
30 # processes from binding to the socket. The new master process will keep
31 # crashing and be restarted by runit until the old child processes are
32 # gone.
33
34 # determine the process group id. it's the group id of the current (finish) proces.
35 PGID=$(ps -o pgid= $$ | grep -o [0-9]*)
36 # kill all processes, except ourself and the PGID (which is the main process)
37 kill $(pgrep -g $PGID | egrep -v "$PGID|$$" ) || true
38 sleep 2
39 # kill -9 to be sure
40 kill -9 $(pgrep -g $PGID | egrep -v "$PGID|$$" ) || true
41
42 echo "$(date +"%Y-%m-%dT%H:%M:%S.%3NZ") process=presto status=finished" >&3
43 incr "finish.count"
44 timing "finish.duration"

Our continuous pull deploy (salt based) restarts the presto server process under some conditions (dependency or config change). It was undesirable and unintentional and related listen_in sections have been removed.

Related

pm-utils hook-script failed on pkill signal (Beagle-Bone)

I want my application be notified (signaled) if my BBB Board is
going to suspend or resume.
So I added a hook-script in
#! /bin/sh
#/etc/pm/sleep.d/15_myapp
case "$1" in
suspend)
pkill -SIGUSR1 myapp>/dev/null 2>&1
;;
resume)
pkill -SIGALRM myapp >/dev/null 2>&1
;;
*)
;;
esac
exit $?
So far so good, every time I try to suspend the board
ajava#debainBBB:~# pm-suspend
I get immediately the Message on the same Consule with a description of
my pkill signal:
User defined signal 1
the suspend-process is then broken.
So I checked the /var/log/pm-suspend.log
Running hook /usr/lib/pm-utils/sleep.d/000kernel-change suspend suspend:
/usr/lib/pm-utils/sleep.d/000kernel-change suspend suspend: success.
Running hook /usr/lib/pm-utils/sleep.d/00logging suspend suspend:
Linux jetMaster 3.12.19-rt30+ #29 PREEMPT RT Wed Jun 25 15:02:55 CEST 2014 armv7l GNU/Linux
Module Size Used by
rfcomm 35643 0
bluetooth 238755 3 rfcomm
usb_f_acm 7016 2
u_serial 11485 1 usb_f_acm
usb_f_mass_storage 45500 2
libcomposite 42382 12 usb_f_acm,usb_f_mass_storage
musb_dsps 7540 0
at25 4594 0
lm75 4802 0
rtc_ds1307 8243 0
musb_am335x 1680 0
total used free shared buffers cached
Mem: 506180 66696 439484 0 7104 32204
-/+ buffers/cache: 27388 478792
Swap: 0 0 0
/usr/lib/pm-utils/sleep.d/00logging suspend suspend: success.
Running hook /usr/lib/pm-utils/sleep.d/00powersave suspend suspend:
/usr/lib/pm-utils/sleep.d/00powersave suspend suspend: success.
Running hook /etc/pm/sleep.d/15_myapp suspend suspend:
As you can see there is no success after calling my script.
But I already have checked the exit-status of my hook-script its 0 (i .e success)
Have any of you folks any idea what's going on here?
Why am I getting the signal description as out put of pm-suspend?
Thanks.
--- UPDATE --------------------------------------------------
The Problem seems to be the standard behavior of the Signal defined here
signal(7)
Apparently every Signal ending with terminating the process causes
the pm-suspend to fail.
The Problem seems to be the standard behavior of the Signal defined here signal(7) Apparently every Signal ending with terminating the process causes the pm-suspend to fail.

background process log shows big gaps in message timestamps

I started a long running job under nohup in the background over a weekend. When looking at the output after it finished, I noticed that there were large gaps between the timestamps of some log messages. Some gaps were as long as 10 hrs. I had no way of finding out what was going on with my job at that time.
I ran it on a standard Red hat linux server machine at work.
Is this behavior caused by nohup command ? If not what could be possible causes ?
One such long running job was as the script below -
#!/bin/bash
while true
do
echo "`date` `top -n 1 -b | grep progname`"
done
And here one such gap from the log -
Mon May 26 04:29:42 PDT 2014 27685 user 18 0 2883m 2.8g 1732 S 0.0 3.9 29:05.54 progname
Tue May 27 03:20:35 PDT 2014 27685 user 18 0 3371m 3.3g 1732 S 0.0 4.6 34:23.21 progname
Ok.
pid is a variable that is the pid of the process you want to monitor--
Try this for starters (Courtesy of S Chazelas):
export pid=$(ps -ef | grep progname | awk '{$print $2}')
while rss=$(ps -o rss= -p "${pid}")
do
printf '%d %s\n' "$rss" "$(date)"
sleep 60;
done > t.lis
#
echo "Done $(date)" >> t.lis
rss is the resident set size (memory allocated to the process) in pages.
getconf PAGESIZE
will show how many bytes are in a page of memory.

ksh child process not ignoring SIGTERM

My ksh version is ksh93-
=>rpm -qa | grep ksh
ksh-20100621-3.fc13.i686
I have a simple script which is as below - #cat test_sigterm.sh -
#!/bin/ksh
trap 'echo "removing"' QUIT
while read line
do
sleep 20
done
I am Executing the script From Terminal 1 -
1. The ksh is started from /bin/ksh as below :
# exec /bin/ksh
2. The script is executed from this ksh-
# ./test_sigterm.sh&
[1] 12136
and Sending a "SIGTERM" From Terminal 2 -
# ps -elf | grep ksh
4 S root 12136 30437 0 84 4 - 1345 poll_s 13:09 pts/0 00:00:00 /bin/ksh ./test_sigterm.sh
0 S root 18952 18643 0 80 0 - 1076 pipe_w 13:12 pts/5 00:00:00 grep ksh
4 S root 30437 30329 0 80 0 - 1368 poll_s 10:04 pts/0 00:00:00 /bin/ksh
# kill -15 12136
I can see that my test_sigterm.sh is getting killed on receiving the "SIGTERM" in either case, when run in background (&) and foreground.
But the ksh man pages say -
Signals.
The INT and QUIT signals for an invoked command are ignored if the command is followed by & and the monitor option is not active.
Otherwise, signals have the values inherited by the shell from its parent (but see also the trap built-in command below).
Is it a know or default behaviour of ksh to NOT IGNORE SIGTERM? or is an issue with ksh child SIGTERM signal handling?
I believe that this is normal behaviour.
While it says that signals are normally inherited by background processes, the
action of the TERM signal is determined by whether the shell is interactive or not. (See the '-i' option in the ksh man page under Invocation.)
If you need the script to ignore SIGTERM, then you can add this line to it:
trap '' TERM

How to measure CPU usage

I would like to log CPU usage at a frequency of 1 second.
One possible way to do it is via vmstat 1 command.
The problem is that the time between each output is not always exactly one second, especially on a busy server. I would like to be able to output the timestamp along with the CPU usage every second. What would be a simple way to accomplish this, without installing special tools?
There are many ways to do that. Except top another way is to you the "sar" utility. So something like
sar -u 1 10
will give you the cpu utilization for 10 times every 1 second. At the end it will print averages for each one of the sys, user, iowait, idle
Another utility is the "mpstat", that gives you similar things with sar
Use the well-known UNIX tool top that is normally available on Linux systems:
top -b -d 1 > /tmp/top.log
The first line of each output block from top contains a timestamp.
I see no command line option to limit the number of rows that top displays.
Section 5a. SYSTEM Configuration File and 5b. PERSONAL Configuration File of the top man page describes pressing W when running top in interactive mode to create a $HOME/.toprc configuration file.
I did this, then edited my .toprc file and changed all maxtasks values so that they are maxtasks=4. Then top only displays 4 rows of output.
For completeness, the alternative way to do this using pipes is:
top -b -d 1 | awk '/load average/ {n=10} {if (n-- > 0) {print}}' > /tmp/top.log
You might want to try htop and atop. htop is beautifully interactive while atop gathers information and can report CPU usage even for terminated processes.
I found a neat way to get the timestamp information to be displayed along with the output of vmstat.
Sample command:
vmstat -n 1 3 | while read line; do echo "$(date --iso-8601=seconds) $line"; done
Output:
2013-09-13T14:01:31-0700 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
2013-09-13T14:01:31-0700 r b swpd free buff cache si so bi bo in cs us sy id wa
2013-09-13T14:01:31-0700 1 1 4197640 29952 124584 12477708 12 5 449 147 2 0 7 4 82 7
2013-09-13T14:01:32-0700 3 0 4197780 28232 124504 12480324 392 180 15984 180 1792 1301 31 15 38 16
2013-09-13T14:01:33-0700 0 1 4197656 30464 124504 12477492 344 0 2008 0 1892 1929 32 14 43 10
To monitor the disk usage, cpu and load i created a small bash scripts that writes the values to a log file every 10 seconds.
This logfile is processed by logstash kibana and riemann.
# #!/usr/bin/env bash
# Define a timestamp function
LOGPATH="/var/log/systemstatus.log"
timestamp() {
date +"%Y-%m-%dT%T.%N"
}
#server load
while ( sleep 10 ) ; do
echo -n "$(timestamp) linux::systemstatus::load " >> $LOGPATH
cat /proc/loadavg >> $LOGPATH
#cpu usage
echo -n "$(timestamp) linux::systemstatus::cpu " >> $LOGPATH
top -bn 1 | sed -n 3p >> $LOGPAT
#disk usage
echo -n "$(timestamp) linux::systemstatus::storage " >> $LOGPATH
df --total|grep total|sed "s/total//g"| sed 's/^ *//' >> $LOGPATH
done

Getting CPU utilization information

How could I get the CPU utilization with time info of a process in linux? Basically I want to let my application run overnight. At the same time, I would like to monitor the CPU utilization during the period the application is run.
I tried top | grep appName >& log, it does not seem to return me anything in the log. Could someone help me with this?
Thanks.
vmstat and iostat can both give you periodic information of this nature; I would suggest either setting up the number of times manually, or putting a single poll into a cron job, and then redirecting the output to a file:
vmstat 20 4230 >> cpu_log_file
This would give you a snapshot of usage every 20 seconds for 24 hours.
install sysstat package and run sar
nohup sar -o output.file 12 8 >/dev/null 2>&1 &
use the top or watch command
PID COMMAND %CPU TIME #TH #WQ #PORT #MREG RPRVT RSHRD RSIZE VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW PAGEINS USER
10764 top 8.4 00:01.04 1/1 0 24 33 2000K 244K 2576K 17M 2378M 10764 10719 running 0 9908+ 54 564790+ 282365+ 3381+ 283412+ 838+ 27 root
10763 taskgated 0.0 00:00.00 2 0 25 27 432K 244K 1004K 27M 2387M 10763 1 sleeping 0 376 60 140 60 160 109 11 0 root
Write a program that invokes your process and then calls getrusage(2) and reports statistics for its children.
You can monitor the time used by your program with top while it is running.
Alternatively, you can launch your application with the time command, which will print the total amount of CPU time used by your program at the end of its execution. Just type time ./my_app instead of just ./my_app
For more info, man 1 time

Resources