How to find out who is calling futex system call on Linux? - linux

I start several pthreads and every thread keep running function like below:
void test_spin() {
pthread_spin_lock(&spinlock);
pthread_spin_unlock(&spinlock);
}
a simple 'strace -c -f' shows me that there are 3 futex system calls. So my question here is where does these system call come from? And how can I trace who is calling the futex system call?

Related

How to debug a futex contention shown in strace?

I'm debugging an issue in a multi-threaded linux process, where a certain thread appears to not execute for few seconds. Looking at strace output revealed it waits for futex e.g.
1673109 14:36:28.600329 futex(0x44b8d20, FUTEX_WAIT_PRIVATE,
1673109 14:36:33.221850 <... futex resumed> ) = 0 <4.621514>
How can I find out out what this futex(0x44b8d20) refers to in user space, i.e. how to map this to a locking construct which is using futex internally.
I would use a simple systemtap script so that would help you to quickly find out addresses of contended futex locks. When I say address, I am referring to the first argument of futex() syscall.
You can download the simple system tap script that finds the contended user space locks here:
https://sourceware.org/systemtap/examples/process/futexes.stp
If you have systemtap installed on your system,
just start this system tap script: stap futexes.stp
Capture the strace output as you did before.
If you end the system tap script execution by simply doing Ctrl-C,
you will get the output of contended futexes.
Finally in your strace output,
search for futex calls in which the second argument (operation type) is FUTEX_WAIT.
For example : futex(0x7f58a31999d0, FUTEX_WAIT, 4508, NULL) = 0
Then you can search for the first argument in system tap script output.
Something like : ome[4489] lock 0x7f58a31999d0 contended 1 times, 7807 avg us
If you look at this system tap script,
it nicely prints the process name and process/thread id for you.
This makes it easy to find what you are looking for.
However one note is that executing the systemtap script will actually hook a syscall system wide.

linux:use clone's function performance without use clone() or sys_clone, can I do that?

I would like to try execute clone's performance, but without calling clone() in linux(error said library not implemented in that structure).
I try to check sys_clone but found it impossible since the parameters are some registers information;
Also tried to use pthread to mimic clone's performance but since:
pthread_create(&tid, NULL,foo,para)
and in the function foo
if I use
void* foo(void* para){
execvp(...)
}
execvp can't be executed since pthread creates thread and thread can't call that exec.
Anyone can tell me how I can achieve clone without using system call clone or whether I can do that in pthread?
Or Is there any other function in linux that call sys_clone without using the register parameter but the clone_flags?
Thanks!

Can AIO run without creating thread?

I would like aio to signal to my program when a read operation completes, and according to this page, such notification can be received by either a signal sent by the kernel, or by starting a thread running a user function. Either behavior can be selected by setting the right value of sigev_notify.
I gave it a try and soon discover that even when set to receive the notification by signal, another thread was created.
(gdb) info threads
Id Target Id Frame
2 Thread 0x7ffff7ff9700 (LWP 6347) "xnotify" 0x00007ffff7147e50 in gettimeofday () from /lib64/libc.so.6
* 1 Thread 0x7ffff7fc3720 (LWP 6344) "xnotify" 0x0000000000401834 in update (this=0x7fffffffdc00)
The doc also states that: The implementation of these functions can be done using support in the kernel (if available) or using an implementation based on threads at userlevel.
I would like to have no thread at all, is this possible?
I checked on my kernel, and that looks okay:
qdii#localhost /home/qdii $ grep -i aio /usr/src/linux/.config
CONFIG_AIO=y
Is it possible to run aio without any (userland) thread at all (apart from the main one, of course)?
EDIT:
I digged deeper into it. librt seems to provide a collection of aio functions: looking through the glibc sources exposed something fishy: inside /rt/aio_read.c is a function stub :
int aio_read (struct aiocb *aiocbp)
{
__set_errno (ENOSYS);
return -1;
}
stub_warning (aio_read)
I found a first relevant implementation in the subdirectory sysdeps/pthread, which directly called __aio_enqueue_request(..., LIO_READ), which in turn created pthreads. But as I was wondering why there would be a stup in that case, I thought maybe the stub could be implemented by the linux kernel itself, and that pthread implementation would be some sort of fallback code.
Grepping aio_read through my /usr/src/linux directory gives a lot of results, which I’m trying to understand now.
I found out that there are actually two really different aio libraries: one is part of glibc, included in librt, and performs asynchronous access by using pthreads. The other aio library implements the same interface as the first one, but is built upon the linux kernel itself and can use signals to run asynchronously.

when a process is killed is this information recorded anywhere?

Question:
When a process is killed, is this information recorded anywhere (i.e., in kernel), such as syslog (or can be configured to be recorded syslog.conf)
Is the information of the killer's PID, time and date when killed and reason
update - you have all giving me some insight, thank you very much|
If your Linux kernel is compiled with the process accounting (CONFIG_BSD_PROCESS_ACT) option enabled, you can start recording process accounting info using the accton(8) command and use sa(8) to access the recorded info. The recorded information includes the 32 bit exit code which includes the signal number.
(This stuff is not widely known / used these days, but I still remember it from the days of 4.x Bsd on VAXes ...)
Amended:
In short, the OS kernel does not care if the process is killed. That is dependant on whether the process logs anything. All the kernel cares about at this stage is reclaiming memory. But read on, on how to catch it and log it...
As per caf and Stephen C's mention on their comments...
If you are running BSD accounting daemon module in the kernel, everything gets logged. Thanks to Stephen C for pointing this out! I did not realize that functionality as I have this switched off/disabled.
In my hindsight, as per caf's comment - the two signals that cannot be caught are SIGKILL and SIGSTOP, and also the fact that I mentioned atexit, and I described in the code, that should have been exit(0);..ooops Thanks caf!
Original
The best way to catch the kill signal is you need to use a signal handler to handle a few signals , not just SIGKILL on its own will suffice, SIGABRT (abort), SIGQUIT (terminal program quit), SIGSTOP and SIGHUP (hangup). Those signals together is what would catch the command kill on the command line. The signal handler can then log the information stored in /var/log/messages (environment dependant or Linux distribution dependant). For further reference, see here.
Also, see here for an example of how to use a signal handler using the sigaction function.
Also it would be a good idea to adopt the usage of atexit function, then when the code exits at runtime, the runtime will execute the last function before returning back to the command line. Reference for atexit is here.
When the C function exit is used, and executed, the atexit function will execute the function pointer where applied as in the example below. - Thanks caf for this!
An example usage of atexit as shown:
#include <stdlib.h>
int main(int argc, char **argv){
atexit(myexitfunc); /* Beginning, immediately right after declaration(s) */
/* Rest of code */
return 0;
exit(0);
}
int myexitfunc(void){
fprintf(stdout, "Goodbye cruel world...\n");
}
Hope this helps,
Best regards,
Tom.
I don't know of any logging of signals sent to processes, unless the OOM killer is doing it.
If you're writing your own program you can catch the kill signal and write to a logfile before actually dying. This doesn't work with kill -9 though, just the normal kill.
You can see some details over thisaway.
If you use sudo, it will be logged. Other than that, the killed process can log some information (unless it's being terminated with extreme prejudice). You could even hack the kernel to log signals.
As for recording the reason a process was killed, I've yet to see a psychic program.
Kernel hacking is not for the weak of heart, but hella fun. You'd need to patch the signal dispatch routines to log information using printk(9) when kill(3), sigsend(2) or the like is called. Read "The Linux Signals Handling Model" for more information on how signals are handled.
If the process is getting it via kill(2), then unless the process is already logging the only external trace would be a kernel mod. It's pretty simple; just do a printk(), it's like printf(). Find the output in dmesg.
If the process is getting it via /bin/kill, then it would be a relatively easy matter to install a wrapper executable that did logging. But this (signal delivery via /bin/kill) is unlikely because kill is also a bash built-in.
By the way, if a process is killed with a signal is announced by the kernel to the parent process via de wait(2) system call. The value returned by this call is the exit status of the child (the lower byte) and some signal related info in the upper byte in case this process has been killed. See wait(2) for more information.

What are the POSIX cancellation points?

What are the POSIX cancellation points? I'm looking for a definitive list of POSIX cancellation points.
I'm asking because I have a book that says accept() and select() are cancellation points, but I've seen sites on the internet claim that they are not.
Also, if Linux cancellation points are different than POSIX cancellation points I want a list of them too.
The POSIX 1003.1-2003 standard gives a list in the System Interfaces section, then General Information, then Threads (direct link courtesy of A. Rex).
(Added: POSIX 1003.1-2008 is now available on the web (all 3872 pages of it, in PDF and HTML). You have to register (free). I got to it from the Open Group Bookstore.)
Cancellation Points
Cancellation points shall occur when a thread is executing the following functions:
accept()
aio_suspend()
clock_nanosleep()
close()
connect()
creat()
fcntl() (When the cmd argument is F_SETLKW)
fdatasync()
fsync()
getmsg()
getpmsg()
lockf()
mq_receive()
mq_send()
mq_timedreceive()
mq_timedsend()
msgrcv()
msgsnd()
msync()
nanosleep()
open()
pause()
poll()
pread()
pselect()
pthread_cond_timedwait()
pthread_cond_wait()
pthread_join()
pthread_testcancel()
putmsg()
putpmsg()
pwrite()
read()
readv()
recv()
recvfrom()
recvmsg()
select()
sem_timedwait()
sem_wait()
send()
sendmsg()
sendto()
sigpause()
sigsuspend()
sigtimedwait()
sigwait()
sigwaitinfo()
sleep()
system()
tcdrain()
usleep()
wait()
waidid()
waitpid()
write()
writev()
A cancellation point may also occur when a thread is executing the following functions:
access()
asctime()
asctime_r()
catclose()
catgets()
catopen()
closedir()
closelog()
ctermid()
ctime()
ctime_r()
dbm_close()
dbm_delete()
dbm_fetch()
dbm_nextkey()
dbm_open()
dbm_store()
dlclose()
dlopen()
endgrent()
endhostent()
endnetent()
endprotoent()
endpwent()
endservent()
endutxent()
fclose()
fcntl() (For any value of the cmd argument. [Presumably except F_SETLKW which is listed.]
fflush()
fgetc()
fgetpos()
fgets()
fgetwc()
fgetws()
fmtmsg()
fopen()
fpathconf()
fprintf()
fputc()
fputs()
fputwc()
fputws()
fread()
freopen()
fscanf()
fseek()
fseeko()
fsetpos()
fstat()
ftell()
ftello()
ftw()
fwprintf()
fwrite()
fwscanf()
getaddrinfo()
getc()
getc_unlocked()
getchar()
getchar_unlocked()
getcwd()
getdate()
getgrent()
getgrgid()
getgrgid_r()
getgrnam()
getgrnam_r()
gethostbyaddr()
gethostbyname()
gethostent()
gethostid()
gethostname()
getlogin()
getlogin_r()
getnameinfo()
getnetbyaddr()
getnetbyname()
getnetent()
getopt() (if opterr is non-zero.)
getprotobyname()
getprotobynumber()
getprotoent()
getpwent()
getpwnam()
getpwnam_r()
getpwuid()
getpwuid_r()
gets()
getservbyname()
getservbyport()
getservent()
getutxent()
getutxid()
getutxline()
getwc()
getwchar()
getwd()
glob()
iconv_close()
iconv_open()
ioctl()
link()
localtime()
localtime_r()
lseek()
lstat()
mkstemp()
mktime()
nftw()
opendir()
openlog()
pathconf()
pclose()
perror()
popen()
posix_fadvise()
posix_fallocate()
posix_madvise()
posix_openpt()
posix_spawn()
posix_spawnp()
posix_trace_clear()
posix_trace_close()
posix_trace_create()
posix_trace_create_withlog()
posix_trace_eventtypelist_getne
posix_trace_eventtypelist_rewin
posix_trace_flush()
posix_trace_get_attr()
posix_trace_get_filter()
posix_trace_get_status()
posix_trace_getnext_event()
posix_trace_open()
posix_trace_rewind()
posix_trace_set_filter()
posix_trace_shutdown()
posix_trace_timedgetnext_event(
posix_typed_mem_open()
printf()
pthread_rwlock_rdlock()
pthread_rwlock_timedrdlock()
pthread_rwlock_timedwrlock()
pthread_rwlock_wrlock()
putc()
putc_unlocked()
putchar()
putchar_unlocked()
puts()
pututxline()
putwc()
putwchar()
readdir()
readdir_r()
remove()
rename()
rewind()
rewinddir()
scanf()
seekdir()
semop()
setgrent()
sethostent()
setnetent()
setprotoent()
setpwent()
setservent()
setutxent()
stat()
strerror()
strerror_r()
strftime()
symlink()
sync()
syslog()
tmpfile()
tmpnam()
ttyname()
ttyname_r()
tzset()
ungetc()
ungetwc()
unlink()
vfprintf()
vfwprintf()
vprintf()
vwprintf()
wcsftime()
wordexp()
wprintf()
wscanf()
An implementation shall not introduce cancellation points into any other functions specified in this volume of IEEE Std 1003.1-2001.
The side effects of acting upon a cancellation request while suspended during a call of a function are the same as the side effects that may be seen in a single-threaded program when a call to a function is interrupted by a signal and the given function returns [EINTR]. Any such side effects occur before any cancellation cleanup handlers are called.
Whenever a thread has cancelability enabled and a cancellation request has been made with that thread as the target, and the thread then calls any function that is a cancellation point (such as pthread_testcancel() or read()), the cancellation request shall be acted upon before the function returns. If a thread has cancelability enabled and a cancellation request is made with the thread as a target while the thread is suspended at a cancellation point, the thread shall be awakened and the cancellation request shall be acted upon. However, if the thread is suspended at a cancellation point and the event for which it is waiting occurs before the cancellation request is acted upon, it is unspecified whether the cancellation request is acted upon or whether the cancellation request remains pending and the thread resumes normal execution.
Ugh! Can't get the table to work very well it looked OK in preview and nothing like a table afterwards. Look at the URL for the information!
There are a lot of possible cancellation points.
See the pthread_cancel man page for further and fast info.
Additional Info: since kernel 2.6, Linux has used the NPTL thread library which is POSIX compliant, so cancellation points should be as above for recent Linux implmentations.
http://www.ddj.com/linux-open-source/184406204

Resources