ksh child process not ignoring SIGTERM - linux

My ksh version is ksh93-
=>rpm -qa | grep ksh
ksh-20100621-3.fc13.i686
I have a simple script which is as below - #cat test_sigterm.sh -
#!/bin/ksh
trap 'echo "removing"' QUIT
while read line
do
sleep 20
done
I am Executing the script From Terminal 1 -
1. The ksh is started from /bin/ksh as below :
# exec /bin/ksh
2. The script is executed from this ksh-
# ./test_sigterm.sh&
[1] 12136
and Sending a "SIGTERM" From Terminal 2 -
# ps -elf | grep ksh
4 S root 12136 30437 0 84 4 - 1345 poll_s 13:09 pts/0 00:00:00 /bin/ksh ./test_sigterm.sh
0 S root 18952 18643 0 80 0 - 1076 pipe_w 13:12 pts/5 00:00:00 grep ksh
4 S root 30437 30329 0 80 0 - 1368 poll_s 10:04 pts/0 00:00:00 /bin/ksh
# kill -15 12136
I can see that my test_sigterm.sh is getting killed on receiving the "SIGTERM" in either case, when run in background (&) and foreground.
But the ksh man pages say -
Signals.
The INT and QUIT signals for an invoked command are ignored if the command is followed by & and the monitor option is not active.
Otherwise, signals have the values inherited by the shell from its parent (but see also the trap built-in command below).
Is it a know or default behaviour of ksh to NOT IGNORE SIGTERM? or is an issue with ksh child SIGTERM signal handling?

I believe that this is normal behaviour.
While it says that signals are normally inherited by background processes, the
action of the TERM signal is determined by whether the shell is interactive or not. (See the '-i' option in the ksh man page under Invocation.)
If you need the script to ignore SIGTERM, then you can add this line to it:
trap '' TERM

Related

Change process title/name in bash script

There is a field in process object in node.js: process.title
That field allows you to change process name displayed in top or ps command on linux.
Is there some way to do this for and in bash script also?
Changing the command line reference from running processes is possible on *NIX with /proc filesystem :
$ ps
PID TTY TIME CMD
106 tty4 00:00:01 bash
719 tty4 00:00:00 ps
$ echo "toto" > /proc/106/comm
$ ps
PID TTY TIME CMD
106 tty4 00:00:01 toto
719 tty4 00:00:00 ps
$
And yes, it's not the prettiest way to do so.

Bash: multiple redirection

Early in a script, I see this:
exec 3>&2
And later:
{ $app $conf_file &>$app_log_file & } 1>&3 2>&1
My understanding of this looks something like this:
Create fd 3
Redirect fd 3 output to stderr
(Upon app execution) redirect stdout to fd 3, then redirect stderr to stdout
Isn't that some kind of circular madness? 3>stderr>stdout>3>etc?
I'm especially concerned as to the intention/implications of this line because I'd like to start running some apps using this script with valgrind. I'd like to see valgrind's output interspersed with the app's log statements, so I'm hoping that the default output of stderr is captured by the confusing line above. However, in some of the crashes that have led me to wanting to use valgrind, I've seen glibc errors outputted straight to the terminal, rather than captured in the app's log file.
So, the question(s): What does that execution line do, exactly? Does it capture stderr? If so, why do I see glibc output on the command line when an app crashes? If not, how should I change it to accomplish this goal?
You misread the 3>&2 syntax. It means open fd 3 and make it a duplicate of fd 2. See Duplicating File Descriptors.
In the same way 2>&1 does not mean make fd 2 point to the location of fd 1 it means re-open fd 2 as a duplicate of fd 1 (mostly the same net effect but different semantics).
Also remember that all redirections occur as they happen and that there are no "pointers" here. So 2>&1 1>/dev/null does not redirect standard error to /dev/null it leaves standard error attached to wherever standard output had been attached to (probably the terminal).
So the code in question does this:
Open fd 3 as a duplicate of fd 2
Re-open fd 1 as a duplicate of fd 3
Re-open fd 2 as a duplicate of fd 1
Effectively those lines send everything to standard error (or wherever fd 2 was attached when the initial exec line ran). If the redirections had been 2>&1 1>&3 then they would have swapped locations. I wonder if that was the original intention of that line since, as written, it is fairly pointless.
Not to mention that with the redirection inside the brace list the redirections on the outside of the brace list are fairly useless.
Ok, well let's see what happens in practice:
peter#tesla:/tmp/test$ bash -c 'exec 3>&2; { sleep 60m &>logfile & } 1>&3 2>&1' > stdout 2>stderr
peter#tesla:/tmp/test$ psg sleep
peter 22147 0.0 0.0 7232 836 pts/14 S 15:51 0:00 sleep 60m
peter#tesla:/tmp/test$ ll /proc/22147/fd
total 0
lr-x------ 1 peter peter 64 Jul 8 15:51 0 -> /dev/null
l-wx------ 1 peter peter 64 Jul 8 15:51 1 -> /tmp/test/logfile
l-wx------ 1 peter peter 64 Jul 8 15:51 2 -> /tmp/test/logfile
l-wx------ 1 peter peter 64 Jul 8 15:51 3 -> /tmp/test/stderr
I'm not sure exactly why the author of your script ended up with that line of code. Presumably it made sense to them when they wrote it. The redirections outside the curly braces happen before the redirections inside, so they're both overriden by the &>logfile. Even errors from bash, like command not found would end up in the logfile.
You say you see glibc messages on your terminal when the app crashes. I think your app must be using fd 3 after it starts. i.e., it was written to be started from a script that opened fd 3 for it, or else it opens /dev/tty or something.
BTW, psg is a function I define in my .bashrc:
psg(){ ps aux | grep "${#:-$USER}" | grep -v grep; }
recently updated to:
psg(){ local pids=$(pgrep -f "${#:--u$USER}"); [[ $pids ]] && ps u -p $pids; }
psgw(){ local pids=$(pgrep -f "${#:--u$USER}"); [[ $pids ]] && ps uww -p $pids; }
You need a context first, as in #Peter Cordes example. He provided the context by setting >stdout and 2>stderr first.
I have modified his example a bit.
$ bash -c 'exec 3>&2; { sleep 60m & } 1>&3 2>&1' >stdout 2>stderr
$ ps aux | grep sleep
logan 272163 0.0 0.0 8084 580 pts/2 S 19:22 0:00 sleep 60m
logan 272165 0.0 0.0 8912 712 pts/2 S+ 19:23 0:00 grep --color=auto sleep
$ ll /proc/272163/fd
total 0
dr-x------ 2 logan logan 0 Aug 25 19:23 ./
dr-xr-xr-x 9 logan logan 0 Aug 25 19:23 ../
lr-x------ 1 logan logan 64 Aug 25 19:23 0 -> /dev/null
l-wx------ 1 logan logan 64 Aug 25 19:23 1 -> /tmp/tmp.Vld71a451u/stderr
l-wx------ 1 logan logan 64 Aug 25 19:23 2 -> /tmp/tmp.Vld71a451u/stderr
l-wx------ 1 logan logan 64 Aug 25 19:23 3 -> /tmp/tmp.Vld71a451u/stderr
First, exec 3>&2 sets fd3 to point to stderr file. Then 1>&3 sets fd1 to point to stderr file also. Lastly, 2>&1 sets fd2 to point to stderr file too! (don't get confused with stderr fd2 and in this case stderr just being a random file name)
The reason fd0 is set to /dev/null, I'm guessing, is because the command is run in a non-interactive shell.

how to kill the tty in unix

This is the result of the finger command (Today(Monday) when I (Vidya) logged in)
sekic1083 [6:14am] [/home/vidya] -> finger
Name Tty Idle Login Time Where
Felix pts/0 - Thu 10:06 sekic2594.rnd.ki.sw.
john pts/1 2d Fri 15:43
john *pts/2 2d Fri 15:43
john *pts/3 4 Fri 15:44
john *pts/7 - Thu 16:25
Vidya pts/0 - Mon 06:14
Vidya *pts/5 - Mon 06:14
Vidya *pts/6 - Tue 10:13
Vidya *pts/9 - Wed 05:39
Vidya *pts/10 - Wed 10:23
Under column the Tty pts/0 and pts/5 are the current active terminals.
Apart from those two pts/6, pts/9 and pts/10 are also present and I had logged into these last week. But the idle time for them is showing as "-" (not idle).
How can I kill these 6,9 and 10 terminals?
You can run:
ps -ft pts/6 -t pts/9 -t pts/10
This would produce an output similar to:
UID PID PPID C STIME TTY TIME CMD
Vidya 772 2701 0 15:26 pts/6 00:00:00 bash
Vidya 773 2701 0 16:26 pts/9 00:00:00 bash
Vidya 774 2701 0 17:26 pts/10 00:00:00 bash
Grab the PID from the result.
Use the PIDs to kill the processes:
kill <PID1> <PID2> <PID3> ...
For the above example:
kill 772 773 774
If the process doesn't gracefully terminate, just as a last option you can forcefully kill by sending a SIGKILL
kill -9 <PID>
I had the same question as you but I wanted to kill the gnome terminal which I was in. I read the manual on "who" and found that you can list all of the sessions logged into your computer with the '-a' option and then the '-l' option prints the system login processes.
who -la
You should get something like this. Then all you have to do is kill the process with the 'kill' command.
kill <PID>
for example kill pts/0
pkill -9 -t pts/0
Try this:
skill -KILL -v pts/6
skill -KILL -v pts/9
skill -KILL -v pts/10
I had the same problem today.
I had NO remaining processes, but the remaining finger entry of user "xxx",
which prevent me the deletion of this user using "userdel xxx".
Error message was: userdel: account `xxx' is currently in use.
It looked like a crashed terminal session. So I rebooted, but the issue remained.
last xxx
xxx pts/5 10.1.2.3 Fri Feb 7 10:25 - crash (01:27)
So I (re)moved the /var/run/utmp file:
mv /var/run/utmp /var/run/utmp.save ; touch /var/run/utmp
This cleared all finger entries. Unfortunately in this way even the current running sessions will be cleared. If this is an issue for you, you have to reboot, after you (re)moved the utmp file.
However in my case this was the solution. Afterwards I was able to successfully delete the user, using "userdel xxx".
you do not need to know pts number, just type:
ps all | grep bash
then:
kill pid1 pid2 pid3 ...
The simplest way is with the pkill command.
In your case:
pkill -9 -t pts/6
pkill -9 -t pts/9
pkill -9 -t pts/10
Regarding tty sessions, the commands below are always useful:
w - shows active terminal sessions
tty - shows your current terminal session (so you won't close it by accident)
last | grep logged - shows currently logged users
Sometimes we want to close all sessions of an idle user (ie. when connections are lost abruptly).
pkill -u username - kills all sessions of 'username' user.
And sometimes when we want to kill all our own sessions except the current one, so I made a script for it. There are some cosmetics and some interactivity (to avoid accidental running on the script).
#!/bin/bash
MYUSER=`whoami`
MYSESSION=`tty | cut -d"/" -f3-`
OTHERSESSIONS=`w $MYUSER | grep "^$MYUSER" | grep -v "$MYSESSION" | cut -d" " -f2`
printf "\e[33mCurrent session\e[0m: $MYUSER[$MYSESSION]\n"
if [[ ! -z $OTHERSESSIONS ]]; then
printf "\e[33mOther sessions:\e[0m\n"
w $MYUSER | egrep "LOGIN#|^$MYUSER" | grep -v "$MYSESSION" | column -t
echo ----------
read -p "Do you want to force close all your other sessions? [Y]Yes/[N]No: " answer
answer=`echo $answer | tr A-Z a-z`
confirm=("y" "yes")
if [[ "${confirm[#]}" =~ "$answer" ]]; then
for SESSION in $OTHERSESSIONS
do
pkill -9 -t $SESSION
echo Session $SESSION closed.
done
fi
else
echo "There are no other sessions for the user '$MYUSER'".
fi
You can use killall command as well .
-o, --older-than
Match only processes that are older (started before) the time specified. The time is specified as a float then a unit. The units are s,m,h,d,w,M,y for seconds, minutes, hours, days,
-e, --exact
Require an exact match for very long names.
-r, --regexp
Interpret process name pattern as an extended regular expression.
This worked like a charm.
If you want to close tty for specific user with all the process, above command is the easiest. You can use:
killall -u user_name
In addition to AIXroot's answer, there is also a logout function that can be used to write a utmp logout record. So if you don't have any processes for user xxxx, but userdel says "userdel: account xxxx is currently in use", you can add a logout record manually. Create a file logout.c like this:
#include <stdio.h>
#include <utmp.h>
int main(int argc, char *argv[])
{
if (argc == 2) {
return logout(argv[1]);
}
else {
fprintf(stderr, "Usage: logout device\n");
return 1;
}
}
Compile it:
gcc -lutil -o logout logout.c
And then run it for whatever it says in the output of finger's "On since" line(s) as a parameter:
# finger xxxx
Login: xxxx Name:
Directory: /home/xxxx Shell: /bin/bash
On since Sun Feb 26 11:06 (GMT) on 127.0.0.1:6 (messages off) from 127.0.0.1
On since Fri Feb 24 16:53 (GMT) on pts/6, idle 3 days 17:16, from 127.0.0.1
Last login Mon Feb 10 14:45 (GMT) on pts/11 from somehost.example.com
Mail last read Sun Feb 27 08:44 2014 (GMT)
No Plan.
# userdel xxxx
userdel: account `xxxx' is currently in use.
# ./logout 127.0.0.1:6
# ./logout pts/6
# userdel xxxx
no crontab for xxxx

How is it possible that kill -9 for a process on Linux has no effect?

I'm writing a plugin to highlight text strings automatically as you visit a web site. It's like the highlight search results but automatic and for many words; it could be used for people with allergies to make words really stand out, for example, when they browse a food site.
But I have problem. When I try to close an empty, fresh FF window, it somehow blocks the whole process. When I kill the process, all the windows vanish, but the Firefox process stays alive (parent PID is 1, doesn't listen to any signals, has lots of resources open, still eats CPU, but won't budge).
So two questions:
How is it even possible for a process not to listen to kill -9 (neither as user nor as root)?
Is there anything I can do but a reboot?
[EDIT] This is the offending process:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
digulla 16688 4.3 4.2 784476 345464 pts/14 D Mar28 75:02 /opt/firefox-3.0/firefox-bin
Same with ps -ef | grep firefox
UID PID PPID C STIME TTY TIME CMD
digulla 16688 1 4 Mar28 pts/14 01:15:02 /opt/firefox-3.0/firefox-bin
It's the only process left. As you can see, it's not a zombie, it's running! It doesn't listen to kill -9, no matter if I kill by PID or name! If I try to connect with strace, then the strace also hangs and can't be killed. There is no output, either. My guess is that FF hangs in some kernel routine but which?
[EDIT2] Based on feedback by sigjuice:
ps axopid,comm,wchan
can show you in which kernel routine a process hangs. In my case, the offending plugin was the Beagle Indexer (openSUSE 11.1). After disabling the plugin, FF was a quick and happy fox again.
As noted in comments to the OP, a process status (STAT) of D indicates that the process is in an "uninterruptible sleep" state. In real-world terms, this generally means that it's waiting on I/O and can't/won't do anything - including dying - until that I/O operation completes.
Processes in a D state will normally only be there for a fraction of a second before the operation completes and they return to R/S. In my experience, if a process gets stuck in D, it's most often trying to communicate with an unreachable NFS or other remote filesystem, trying to access a failing hard drive, or making use of some piece of hardware by way of a flaky device driver. In such cases, the only way to recover and allow the process to die is to either get the fs/drive/hardware back up and running so the I/O can complete or to give up and reboot the system. In the specific case of NFS, the mount may also eventually time out and return from the I/O operation (with a failure code), but this is dependent on the mount options and it's very common for NFS mounts to be set to wait forever.
This is distinct from a zombie process, which will have a status of Z.
Double-check that the parent-id is really 1. If not, and this is firefox, first try sudo killall -9 firefox-bin. After that, try killing the specific process IDs individually with sudo killall -9 [process-id].
How is it even possible for a process not to listen to kill -9 (neiter as user nor as root)?
If a process has gone <defunct> and then becomes a zombie with a parent of 1, you can't kill it manually; only init can. Zombie processes are already dead and gone - they've lost the ability to be killed as they are no longer processes, only a process table entry and its associated exit code, waiting to be collected. You need to kill the parent, and you can't kill init for obvious reasons.
But see here for more general information. A reboot will kill everything, naturally.
Is it possible, that this process is restarted (for example by init) just at the time you kill it?
You can check this easily. If the PID is the same after kill -9 PID then the process wasn't killed, but if it has changed the process has been restarted.
I lately get trapped into a pitfall of Double Fork and had landed to this page before finally finding my answer. The symptoms are identical even if the problem is not the same:
WYKINWYT :What You Kill Is Not What You Thought
The minimal test code is shown below based on an example for an SNMP Daemon
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
int main(int argc, char* argv[])
{
//We omit the -f option (do not Fork) to reproduce the problem
char * options[]={"/usr/local/sbin/snmpd",/*"-f","*/-d","--master=agentx", "-Dagentx","--agentXSocket=tcp:localhost:1706", "udp:10161", (char*) NULL};
pid_t pid = fork();
if ( 0 > pid ) return -1;
switch(pid)
{
case 0:
{ //Child launches SNMP daemon
execv(options[0],options);
exit(-2);
break;
}
default:
{
sleep(10); //Simulate "long" activity
kill(pid,SIGTERM);//kill what should be child,
//i.e the SNMP daemon I assume
printf("Signal sent to %d\n",pid);
sleep(10); //Simulate "long" operation before closing
waitpid(pid);
printf("SNMP should be now down\n");
getchar();//Blocking (for observation only)
break;
}
}
printf("Bye!\n");
}
During the first phase the main process (7699) launches the SNMP daemon (7700) but we can see that this one is now Defunct/Zombie. Beside we can see another process (7702) with the options we specified
[nils#localhost ~]$ ps -ef | tail
root 7439 2 0 23:00 ? 00:00:00 [kworker/1:0]
root 7494 2 0 23:03 ? 00:00:00 [kworker/0:1]
root 7544 2 0 23:08 ? 00:00:00 [kworker/0:2]
root 7605 2 0 23:10 ? 00:00:00 [kworker/1:2]
root 7698 729 0 23:11 ? 00:00:00 sleep 60
nils 7699 2832 0 23:11 pts/0 00:00:00 ./main
nils 7700 7699 0 23:11 pts/0 00:00:00 [snmpd] <defunct>
nils 7702 1 0 23:11 ? 00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161
nils 7727 3706 0 23:11 pts/1 00:00:00 ps -ef
nils 7728 3706 0 23:11 pts/1 00:00:00 tail
After the 10 sec simulated we will try to kill the only process we know (7700). What we succeed at last with waitpid(). But Process 7702 is still here
[nils#localhost ~]$ ps -ef | tail
root 7431 2 0 23:00 ? 00:00:00 [kworker/u256:1]
root 7439 2 0 23:00 ? 00:00:00 [kworker/1:0]
root 7494 2 0 23:03 ? 00:00:00 [kworker/0:1]
root 7544 2 0 23:08 ? 00:00:00 [kworker/0:2]
root 7605 2 0 23:10 ? 00:00:00 [kworker/1:2]
root 7698 729 0 23:11 ? 00:00:00 sleep 60
nils 7699 2832 0 23:11 pts/0 00:00:00 ./main
nils 7702 1 0 23:11 ? 00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161
nils 7751 3706 0 23:12 pts/1 00:00:00 ps -ef
nils 7752 3706 0 23:12 pts/1 00:00:00 tail
After giving a character to the getchar() function our main process terminates but the SNMP daemon with the pid 7002 is still here
[nils#localhost ~]$ ps -ef | tail
postfix 7399 1511 0 22:58 ? 00:00:00 pickup -l -t unix -u
root 7431 2 0 23:00 ? 00:00:00 [kworker/u256:1]
root 7439 2 0 23:00 ? 00:00:00 [kworker/1:0]
root 7494 2 0 23:03 ? 00:00:00 [kworker/0:1]
root 7544 2 0 23:08 ? 00:00:00 [kworker/0:2]
root 7605 2 0 23:10 ? 00:00:00 [kworker/1:2]
root 7698 729 0 23:11 ? 00:00:00 sleep 60
nils 7702 1 0 23:11 ? 00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161
nils 7765 3706 0 23:12 pts/1 00:00:00 ps -ef
nils 7766 3706 0 23:12 pts/1 00:00:00 tail
Conclusion
The fact that we ignored the double fork mechanism made us think that the kill action did not succeed. But in fact we simply killed the wrong process !!
By adding the -f option ( Do Not (Double) Fork ) all go as expected
ps -ef | grep firefox;
and you can see 3 process, kill them all.
sudo killall -9 firefox
Should work
EDIT: [PID] changed to firefox
You can also do a pstree and kill the parent. This makes sure that you get the entire offending process tree and not just the leaf.

Redirect STDERR / STDOUT of a process AFTER it's been started, using command line?

In the shell you can do redirection, > <, etc., but how about AFTER a program is started?
Here's how I came to ask this question, a program running in the background of my terminal keeps outputting annoying text. It's an important process so I have to open another shell to avoid the text. I'd like to be able to >/dev/null or some other redirection so I can keep working in the same shell.
Short of closing and reopening your tty (i.e. logging off and back on, which may also terminate some of your background processes in the process) you only have one choice left:
attach to the process in question using gdb, and run:
p dup2(open("/dev/null", 0), 1)
p dup2(open("/dev/null", 0), 2)
detach
quit
e.g.:
$ tail -f /var/log/lastlog &
[1] 5636
$ ls -l /proc/5636/fd
total 0
lrwx------ 1 myuser myuser 64 Feb 27 07:36 0 -> /dev/pts/0
lrwx------ 1 myuser myuser 64 Feb 27 07:36 1 -> /dev/pts/0
lrwx------ 1 myuser myuser 64 Feb 27 07:36 2 -> /dev/pts/0
lr-x------ 1 myuser myuser 64 Feb 27 07:36 3 -> /var/log/lastlog
$ gdb -p 5636
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Attaching to process 5636
Reading symbols from /usr/bin/tail...(no debugging symbols found)...done.
Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x7f3c8f5a66e0 (LWP 5636)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
(no debugging symbols found)
0x00007f3c8eec7b50 in nanosleep () from /lib/libc.so.6
(gdb) p dup2(open("/dev/null",0),1)
[Switching to Thread 0x7f3c8f5a66e0 (LWP 5636)]
$1 = 1
(gdb) p dup2(open("/dev/null",0),2)
$2 = 2
(gdb) detach
Detaching from program: /usr/bin/tail, process 5636
(gdb) quit
$ ls -l /proc/5636/fd
total 0
lrwx------ 1 myuser myuser 64 Feb 27 07:36 0 -> /dev/pts/0
lrwx------ 1 myuser myuser 64 Feb 27 07:36 1 -> /dev/null
lrwx------ 1 myuser myuser 64 Feb 27 07:36 2 -> /dev/null
lr-x------ 1 myuser myuser 64 Feb 27 07:36 3 -> /var/log/lastlog
lr-x------ 1 myuser myuser 64 Feb 27 07:36 4 -> /dev/null
lr-x------ 1 myuser myuser 64 Feb 27 07:36 5 -> /dev/null
You may also consider:
using screen; screen provides several virtual TTYs you can switch between without having to open new SSH/telnet/etc, sessions
using nohup; this allows you to close and reopen your session without losing any background processes in the... process.
This will do:
strace -ewrite -p $PID
It's not that clean (shows lines like: write(#,<text you want to see>) ), but works!
You might also dislike the fact that arguments are abbreviated. To control that use the -s parameter that sets the maximum length of strings displayed.
It catches all streams, so you might want to filter that somehow:
strace -ewrite -p $PID 2>&1 | grep "write(1"
shows only descriptor 1 calls. 2>&1 is to redirect STDERR to STDOUT, as strace writes to STDERR by default.
Redirect output from a running process to another terminal, file, or screen:
tty
ls -l /proc/20818/fd
gdb -p 20818
Inside gdb:
p close(1)
p open("/dev/pts/4", 1)
p close(2)
p open("/tmp/myerrlog", 1)
q
Detach a running process from the bash terminal and keep it alive:
[Ctrl+z]
bg %1 && disown %1
[Ctrl+d]
Explanation:
20818 - just an example of running process PID
p - print result of gdb command
close(1) - close standard output
/dev/pts/4 - terminal to write to
close(2) - close error output
/tmp/myerrlog - file to write to
q - quit gdb
bg %1 - run stopped job 1 on background
disown %1 - detach job 1 from terminal
[Ctrl+z] - stop the running process
[Ctrl+d] - exit terminal
riffing off vladr's (and others') excellent research:
create the following two files in the same directory, something in your path, say $HOME/bin:
silence.gdb, containing (from vladr's answer):
p dup2(open("/dev/null",0),1)
p dup2(open("/dev/null",0),2)
detach
quit
and silence, containing:
#!/bin/sh
if [ "$0" -a "$1" ]; then
gdb -p $1 -x $0.gdb
else
echo Must specify PID of process to silence >&2
fi
chmod +x ~/bin/silence # make the script executable
Now, next time you forget to redirect firefox, for example, and your terminal starts getting cluttered with the inevitable "(firefox-bin:5117): Gdk-WARNING **: XID collision, trouble ahead" messages:
ps # look for process xulrunner-stub (in this case we saw the PID in the error above)
silence 5117 # run the script, using PID we found
You could also redirect gdb's output to /dev/null if you don't want to see it.
Not a direct answer to your question, but it's a technique I've been finding useful over the last few days: Run the initial command using 'screen', and then detach.
this is bash script part based on previous answers, which redirect log file during execution of an open process, it is used as postscript in logrotate process
#!/bin/bash
pid=$(cat /var/run/app/app.pid)
logFile="/var/log/app.log"
reloadLog()
{
if [ "$pid" = "" ]; then
echo "invalid PID"
else
gdb -p $pid >/dev/null 2>&1 <<LOADLOG
set scheduler-locking on
p close(1)
p open("$logFile", 1)
p close(2)
p open("$logFile", 1)
q
LOADLOG
LOG_FILE=$(ls /proc/${pid}/fd -l | fgrep " 1 -> " | awk '{print $11}')
echo "log file set to $LOG_FILE"
fi
}
reloadLog
updated: for gdb v7.11 and later, set scheduler-locking on or other any options mentioned here is required, because after attaching gdb, it does not stop all running threads and you may not able to close/open your log file because of file usage.
Dupx is a simple *nix utility to redirect standard output/input/error of an already running process.
https://www.isi.edu/~yuri/dupx/
You can use reredirect (https://github.com/jerome-pouiller/reredirect/).
Type
reredirect -m FILE PID
and outputs (standard and error) will be written in FILE.
reredirect README also explains how to restore original state of process, how to redirect to another command or to redirect only stdout or stderr.
reredirect also provide a script called relink that allows to redirect to current terminal:
relink PID
relink PID | grep usefull_content
(reredirect seems to have same features than Dupx described in another answer but, it does not depends on Gdb).

Resources