How to debug a futex contention shown in strace? - linux

I'm debugging an issue in a multi-threaded linux process, where a certain thread appears to not execute for few seconds. Looking at strace output revealed it waits for futex e.g.
1673109 14:36:28.600329 futex(0x44b8d20, FUTEX_WAIT_PRIVATE,
1673109 14:36:33.221850 <... futex resumed> ) = 0 <4.621514>
How can I find out out what this futex(0x44b8d20) refers to in user space, i.e. how to map this to a locking construct which is using futex internally.

I would use a simple systemtap script so that would help you to quickly find out addresses of contended futex locks. When I say address, I am referring to the first argument of futex() syscall.
You can download the simple system tap script that finds the contended user space locks here:
https://sourceware.org/systemtap/examples/process/futexes.stp
If you have systemtap installed on your system,
just start this system tap script: stap futexes.stp
Capture the strace output as you did before.
If you end the system tap script execution by simply doing Ctrl-C,
you will get the output of contended futexes.
Finally in your strace output,
search for futex calls in which the second argument (operation type) is FUTEX_WAIT.
For example : futex(0x7f58a31999d0, FUTEX_WAIT, 4508, NULL) = 0
Then you can search for the first argument in system tap script output.
Something like : ome[4489] lock 0x7f58a31999d0 contended 1 times, 7807 avg us
If you look at this system tap script,
it nicely prints the process name and process/thread id for you.
This makes it easy to find what you are looking for.
However one note is that executing the systemtap script will actually hook a syscall system wide.

Related

How to find out who is calling futex system call on Linux?

I start several pthreads and every thread keep running function like below:
void test_spin() {
pthread_spin_lock(&spinlock);
pthread_spin_unlock(&spinlock);
}
a simple 'strace -c -f' shows me that there are 3 futex system calls. So my question here is where does these system call come from? And how can I trace who is calling the futex system call?

DIFFERENT TASKS ASSIGNED TO DIFFERENT INSTANCES OF FORK() OF A PROCESS IN C

Can I assign different task to different instances of fork() of a process in C ?
like for example:
program.c has been forked 3 times
int main()
{
pid_t pid;
pid = fork();
pid = fork();
pid = fork();
}
now to every instance of fork() I want to do different thing, Can I do this? with forks ? or any other method if favorable? :)
PS: I am testing Real Time Linux and want to check the performance of the Context Switching through forks through Time Constraint.
You can use posix process..
posix_spawn( &Pid,ProgramPath.c_str(), & FileActions,& SpawnAttr,argv,envp);
Check its documentation here.
You always have to test the result of fork(2) (in particular, to handle error cases), and do different things for 0 result (successful in child process), positive result (successful in parent process), negative result (failure, so use perror). So according to that result you can do different things. Often you end up invoking execve(2) for the child process (when fork gives 0), and you usually setup things (e.g. for IPC thru pipe(7)-s) before calling fork.
So to assign a different task after a fork just execute different code according to result of fork
You should read Advanced Linux Programming. It has several chapters explaining all that (so I won't take the time to explain it here).
You could be interested in pthreads (implemented using clone(2) and futex(7), which you should not use directly unless you are implementing your thread library, which is not reasonable).
Try also to strace(1) several programs (including some shell and some basic commands). It will tell which syscalls(2) they are calling. See also intro(2).

Linux schedule task when another is done

I have a task/process currently running. I would like to schedule another task to start when the first one finished.
How can I do that in linux ?
(I can't stop the first one, and create a script to start one task after the other)
Somewhat meager spec, but something along the line of
watch -n 1 'pgrep task1 || task2'
might do the job.
You want wait.
Either the system call in section 2 of the manual, one of it's varients like waitpid or the shell builtin which is designed explicitly for this purpose.
The shell builtin is a little more natural because both processes are childred of the sell, so you write a script like:
#!/bin/sh
command1 arguments &
wait
command2 args
To use the system calls you will have to write a program that forks, launches the first command in the child then waits before execing the second program.
The manpage for wait (2) says:
wait() and waitpid()
The wait() system call suspends execution of the current process until one of its children terminates. The call wait(&status) is equivalent to:
waitpid(-1, &status, 0);
The waitpid() system call suspends execution of the current process until a child
specified by pid argument has changed state.

Can AIO run without creating thread?

I would like aio to signal to my program when a read operation completes, and according to this page, such notification can be received by either a signal sent by the kernel, or by starting a thread running a user function. Either behavior can be selected by setting the right value of sigev_notify.
I gave it a try and soon discover that even when set to receive the notification by signal, another thread was created.
(gdb) info threads
Id Target Id Frame
2 Thread 0x7ffff7ff9700 (LWP 6347) "xnotify" 0x00007ffff7147e50 in gettimeofday () from /lib64/libc.so.6
* 1 Thread 0x7ffff7fc3720 (LWP 6344) "xnotify" 0x0000000000401834 in update (this=0x7fffffffdc00)
The doc also states that: The implementation of these functions can be done using support in the kernel (if available) or using an implementation based on threads at userlevel.
I would like to have no thread at all, is this possible?
I checked on my kernel, and that looks okay:
qdii#localhost /home/qdii $ grep -i aio /usr/src/linux/.config
CONFIG_AIO=y
Is it possible to run aio without any (userland) thread at all (apart from the main one, of course)?
EDIT:
I digged deeper into it. librt seems to provide a collection of aio functions: looking through the glibc sources exposed something fishy: inside /rt/aio_read.c is a function stub :
int aio_read (struct aiocb *aiocbp)
{
__set_errno (ENOSYS);
return -1;
}
stub_warning (aio_read)
I found a first relevant implementation in the subdirectory sysdeps/pthread, which directly called __aio_enqueue_request(..., LIO_READ), which in turn created pthreads. But as I was wondering why there would be a stup in that case, I thought maybe the stub could be implemented by the linux kernel itself, and that pthread implementation would be some sort of fallback code.
Grepping aio_read through my /usr/src/linux directory gives a lot of results, which I’m trying to understand now.
I found out that there are actually two really different aio libraries: one is part of glibc, included in librt, and performs asynchronous access by using pthreads. The other aio library implements the same interface as the first one, but is built upon the linux kernel itself and can use signals to run asynchronously.

when a process is killed is this information recorded anywhere?

Question:
When a process is killed, is this information recorded anywhere (i.e., in kernel), such as syslog (or can be configured to be recorded syslog.conf)
Is the information of the killer's PID, time and date when killed and reason
update - you have all giving me some insight, thank you very much|
If your Linux kernel is compiled with the process accounting (CONFIG_BSD_PROCESS_ACT) option enabled, you can start recording process accounting info using the accton(8) command and use sa(8) to access the recorded info. The recorded information includes the 32 bit exit code which includes the signal number.
(This stuff is not widely known / used these days, but I still remember it from the days of 4.x Bsd on VAXes ...)
Amended:
In short, the OS kernel does not care if the process is killed. That is dependant on whether the process logs anything. All the kernel cares about at this stage is reclaiming memory. But read on, on how to catch it and log it...
As per caf and Stephen C's mention on their comments...
If you are running BSD accounting daemon module in the kernel, everything gets logged. Thanks to Stephen C for pointing this out! I did not realize that functionality as I have this switched off/disabled.
In my hindsight, as per caf's comment - the two signals that cannot be caught are SIGKILL and SIGSTOP, and also the fact that I mentioned atexit, and I described in the code, that should have been exit(0);..ooops Thanks caf!
Original
The best way to catch the kill signal is you need to use a signal handler to handle a few signals , not just SIGKILL on its own will suffice, SIGABRT (abort), SIGQUIT (terminal program quit), SIGSTOP and SIGHUP (hangup). Those signals together is what would catch the command kill on the command line. The signal handler can then log the information stored in /var/log/messages (environment dependant or Linux distribution dependant). For further reference, see here.
Also, see here for an example of how to use a signal handler using the sigaction function.
Also it would be a good idea to adopt the usage of atexit function, then when the code exits at runtime, the runtime will execute the last function before returning back to the command line. Reference for atexit is here.
When the C function exit is used, and executed, the atexit function will execute the function pointer where applied as in the example below. - Thanks caf for this!
An example usage of atexit as shown:
#include <stdlib.h>
int main(int argc, char **argv){
atexit(myexitfunc); /* Beginning, immediately right after declaration(s) */
/* Rest of code */
return 0;
exit(0);
}
int myexitfunc(void){
fprintf(stdout, "Goodbye cruel world...\n");
}
Hope this helps,
Best regards,
Tom.
I don't know of any logging of signals sent to processes, unless the OOM killer is doing it.
If you're writing your own program you can catch the kill signal and write to a logfile before actually dying. This doesn't work with kill -9 though, just the normal kill.
You can see some details over thisaway.
If you use sudo, it will be logged. Other than that, the killed process can log some information (unless it's being terminated with extreme prejudice). You could even hack the kernel to log signals.
As for recording the reason a process was killed, I've yet to see a psychic program.
Kernel hacking is not for the weak of heart, but hella fun. You'd need to patch the signal dispatch routines to log information using printk(9) when kill(3), sigsend(2) or the like is called. Read "The Linux Signals Handling Model" for more information on how signals are handled.
If the process is getting it via kill(2), then unless the process is already logging the only external trace would be a kernel mod. It's pretty simple; just do a printk(), it's like printf(). Find the output in dmesg.
If the process is getting it via /bin/kill, then it would be a relatively easy matter to install a wrapper executable that did logging. But this (signal delivery via /bin/kill) is unlikely because kill is also a bash built-in.
By the way, if a process is killed with a signal is announced by the kernel to the parent process via de wait(2) system call. The value returned by this call is the exit status of the child (the lower byte) and some signal related info in the upper byte in case this process has been killed. See wait(2) for more information.

Resources