C++:pthread status monitoring on Linux - multithreading

I have an application running 8 independent threads, generated through a Wrapper Class around Pthread. All threads are running in infinite while loop, with a cycle time of 1 second each. From the main thread (Main Function, also running in infinite while loop) , i want to monitor the thread status whether the thread has got blocked for some reason or not. Is there any way of doing it through system calls for monitoring thread status?

GDB is the best option ,
attach gdb to the running process using the below command
gdb -p "pid"
info threads
this will display all the threads in the application and status of each thread

You can access process status information in the proc filesystem. Using the PIDs of the threads, you can look up their status in /proc/[PID]/status
The contents of the status file looks like:
ubuntu#ip-172-30-1-159:/proc/1151$ cat status
Name: systemd-logind
State: S (sleeping)
Tgid: 1151
Ngid: 0
Pid: 1151
PPid: 1
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
...

Related

unfair linux thread scheduling in single process

I have a process with two threads.
First thread is doing async work - it waits for IO on descriptors and timer events in epoll_wait.
Second thread is doing a lot of IO/memory work - it reads data from disk, process it in memory, allocates a lot of new memory, write it to disk and so on.
The problem is that first thread blocks in epoll_wait for much longer time, then was requested in timeout for epoll_wait(e.g. timeout was specified as 1500 ms and actually return from epoll_wait occures in 10 seconds).
This behavior I can reliably reproduce in virtual machine(VirtualBox with Ubuntu 16.04).
Example of behavior from GDB:
Thread 2.1 "se.real" hit Breakpoint 1, boost::asio::detail::epoll_reactor::run (this=0x826ebe0, block=true, ops=...) at /opt/com/include/boost/158/boost/asio/detail/impl/epoll_reactor.ipp:392
392 in /opt/com/include/boost/158/boost/asio/detail/impl/epoll_reactor.ipp
16:36:38.986826839
$17 = 1945
Thread 2.1 "se.real" hit Catchpoint 3 (call to syscall epoll_wait), 0xf7fd8be9 in __kernel_vsyscall ()
16:36:38.992081396
<INSIDE KERNEL>
Thread 2.1 "se.real" hit Catchpoint 3 (returned from syscall epoll_wait), 0xf7fd8be9 in __kernel_vsyscall ()
16:36:54.681444938
Breakpoint 1 is set to instruction prior to call epoll_wait, printed argument is timeout argument value(1945 ms).
Printed time is time from shell date +"%T.%N" command.
Catchpoint 3 is syscall catchpoint for epoll_wait syscall(first for enter, second for return).
We can easily see that we have spent in kernel for ~ 16 seconds, when 1945 ms were requested.
I have gathered perf record with -e 'sched:*' events from another reproduction. And I perfectly see:
se.real 4277 [001] 113049.144027: sched:sched_switch: prev_comm=se.real prev_pid=4277 prev_prio=120 prev_state=t|K ==> next_comm=strace next_pid=4142 next_prio=120
se.real 4277 [001] 113056.407952: sched:sched_stat_runtime: comm=se.real pid=4277 runtime=153767 [ns] vruntime=409222246640 [ns]
No any other sched event for thread 4277(first thread with async IO and epoll_wait) for ~7 seconds. In mean time there are a lot of sched activity between this events. This activity includes both second thread(thread with a lot of IO/memory work), swapper/kswapd, and other userspace processes.
The question is what I can do to give a chance for running first thread?
Update: changing scheduling policy to SCHED_FIFO for process doesn't solve problem - I'm still able stably reproduce the issue.

Mule http connector active threads keep remained in thread pool

I use mule 2.2.1 with the following http incoming receiver configuration.
<http:connector name="abc.connector.http" >
<receiver-threading-profile maxThreadsActive="500"
maxThreadsIdle="50" threadTTL="60000"
poolExhaustedAction="WAIT" maxBufferSize="100" />
</http:connector>
On production server, JVM frequently crashed. The JVM dump created as "hs_err_pid.log" is having threads like: 0x07990c00 JavaThread "ActiveMQ Session Task" [_thread_blocked, id=69807, stack(0x08770000,0x087b0000)].
There are around 2100 to 2300 threads in this crash every time.
My Question is:
Why it shows _thread_blocked?
When there is no load on Server, the count of threads are not reduced then 2000. Why it is so? I use jstack -l PID to check the no of running threads and prstat | grep PID to monitor the NLWP on solaris. It gives result like:
17725 application_pprd 3409M 2593M sleep 59 0 0:10:51 0.1% **java/2375**
How to remove this unused/inactive threads from pool to avoid crash?
How to increase this limit of NLWP for java process?

Not all threads finishing, maybe due to not locking STDOUT

I am experimenting with threads in perl. The following code basically creates n threads and assigns the same function to them (which they should execute in parallel).
Twist: The function just prints something. This means that they can't do it in parallel. I am honestly fine with that since I am just starting to do things with them however not all threads seem to finish. I suppose it is due to the fact that I haven't locked the STD out and that is why some conficts occur. That may not be the reason. In any case a different ammount of threads are not finishing each time.
If I am correct, how can I lock stdout (I get an error when I try to use the lock function) ?
If I am wrong, why are all threads not finishing and how can I fix that ?
The code:
use strict;
use threads ('yield',
'stack_size' => 64*4096,
'exit' => 'threads_only',
'stringify');
use threads::shared;
sub PrintTestMessage()
{
print "Hello world\n";
}
my #t;
push #t, threads->new(\&PrintTestMessage) for 1..10;
I get 10 times hello world, however after the program finishes I get different output:
Perl exited with active threads:
1 running and unjoined
9 finished and unjoined
0 running and detached
Perl exited with active threads:
8 running and unjoined
2 finished and unjoined
0 running and detached
Perl exited with active threads:
5 running and unjoined
5 finished and unjoined
0 running and detached
Why haven't all threads finished ? ( the unjoined is because I never join them in the code so it is expected)
You have to join the threads, otherwise main thread could (as in your example) finish before its child threads,
$_->join for #t;
From perldoc threads,
$thr->join()
This will wait for the corresponding thread to complete its execution. When the thread finishes, ->join() will return the return value(s) of the entry point function.

Process / thread scheduling on Linux: X server not running on other cpu cores?

Am unable to understand (what I think) is a peculiar situation wrt process/thread scheduling on Linux.
[Env: Ubuntu 12.10 , kernel ver 3.5.0-... ]
A 'test' application (call it sched_pthread), will have a total of three threads - 'main' + two others; main() will spawn two new threads:
Thread 1 [main()]:
Runs as SCHED_NORMAL (or SCHED_OTHER). It:
Creates two threads (Thread 2 and Thread 3 below); they will automatically inherit the
scheduling policy and priority of main.
Prints the character “m” to the terminal in a loop.
Terminates.
Thread 2 [t2]:
Sleeps for 2 seconds.
Changes it's scheduling policy to SCHED_FIFO, setting it's real­time priority to the
value passed on the command line.
Prints the character “2” to the terminal in a loop.
Terminates.
Thread 3 [t3]:
Changes it's scheduling policy to SCHED_FIFO, setting it's real­time priority to the
value passed on the command line plus 10.
Sleeps for 4 seconds.
Prints the character “3” to the terminal in a loop.
Terminates.
We run it as root.
As per scheduling policy, we should first see main() print 'm' for about 2s, then it should
get preempted by t2 (as it awakens after 2s) and we should see '2' appearing on the terminal for about 2s, after which t3 wakes up (it was asleep for 4s); it should now preempt everyone else & emit '3' to the display; after it dies, we should see '2's until p2 dies, then 'm's until main() dies.
So okay, this works: when i test it in console mode (no X server).
Of course, i take care to run it as:
sudo taskset 02 ./sched_pthrd 8
so that in effect it runs on only 1 processor core.
When I run the same thing in graphical mode (with X), after the initial 'm's by main(), there is a long-ish pause (a few seconds) during which nothing appears on the screen; then all of a sudden we get the 2's and 3's and m's slapped onto the screen!
This can be explained : the X server (Xorg) was preempted by the SCHED_FIFO threads and hence could not 'paint' pixels on the screen.
However - here's the question at last - : how come the Xorg process was not scheduled / migrated onto some other core (so that it can continue updating the screen in parallel with the RT threads)??
taskset verifies that the cpu affinity mask of Xorg is 'f' (1111b) (i have 4 cores on my laptop).
Any ideas??
Here's the source code:
https://dl.dropboxusercontent.com/u/9301413/code_shared/so_sched_pthrd.c
-or-
http://goo.gl/PLHBrC
TIA!
-Kaiwan.

Preventing threaded subprocess.popen from terminating my main script when child is killed?

Python 2.7.3 on Solaris 10
Questions
When my subprocess has an internal Segmentation Fault(core) issue or a user externally kills it from the shell with a SIGTERM or SIGKILL, my main program's signal handler handles a SIGTERM(-15) and my parent program exits. Is this real? or is it a bad python build?
Background and Code
I have a python script that first spawns a worker management thread. The worker management thread then spawns one or more worker threads. I have other stuff going on in my main thread that I cannot block. My management thread stuff and worker threads are rock-solid. My services run for years without restarts but then we have this subprocess.Popen scenario:
In the run method of the worker thread, I am using:
class workerThread(threading.Thread):
def __init__(self) :
super(workerThread, self).__init__()
...
def run(self)
...
atempfile = tempfile.NamedTempFile(delete=False)
myprocess = subprocess.Popen( ['third-party-cmd', 'with', 'arguments'], shell=False, stdin=subprocess.PIPE, stdout=atempfile, stderr=subprocess.STDOUT,close_fds=True)
...
I need to use myprocess.poll() to check for process termination because I need to scan the atempfile until I find relevant information (the file may be > 1 GiB) and I need to terminate the process because of user request or because the process has been running too long. Once I find what I am looking for, I will stop checking the stdout temp file. I will clean it up after the external process is dead and before the worker thread terminates. I need the stdin PIPE in case I need to inject a response to something interactive in the child's stdin stream.
In my main program, I set a SIGINT and SIGTERM handler for me to perform cleanup, if my main python program is terminated with SIGTERM or SIGINT(Ctrl-C) if running from the shell.
Does anyone have a solid 2.x recipe for child signal handling in threads?
ctypes sigprocmask, etc.
Any help would be very appreciated. I am just looking for an 'official' recipe or the BEST hack, if one even exists.
Notes
I am using a restricted build of Python. I must use 2.7.3. Third-party-cmd is a program I do not have source for - modifying it is not possible.
There are many things in your description that look strange. First thing, you have a couple of different threads and processes. Who is crashing, who's receinving SIGTERM and who's receiving SIGKILL and due to which operations ?
Second: why does your parent receive SIGTERM ? It can't be implicitly sent. Someone is calling kill to your parent process, either directly or indirectly (for example, by killing the whole parent group).
Third point: how's your program terminating when you're handling SIGTERM ? By definition, the program terminates if it's not handled. If it's handled, it's not terminated. What's really happenning ?
Suggestions:
$ cat crsh.c
#include <stdio.h>
int main(void)
{
int *f = 0x0;
puts("Crashing");
*f = 0;
puts("Crashed");
return 0;
}
$ cat a.py
import subprocess, sys
print('begin')
p = subprocess.Popen('./crsh')
a = raw_input()
print(a)
p.wait()
print('end')
$ python a.py
begin
Crashing
abcd
abcd
end
This works. No signal delivered to the parent. Did you isolate the problem in your program ?
If the problem is a signal sent to multiple processes: can you use setpgid to set up a separate process group for the child ?
Is there any reason for creating the temporary file ? It's 1 GB files being created in your temporary directory. Why not piping stdout ?
If you're really sure you need to handle signals in your parent program (why didn't you try/except KeyboardInterrupt, for example ?): could signal() unspecified behavior with multi threaded programs be causing those problems (for example, dispatching a signal to a thread that does not handle signals) ?
NOTES
The effects of signal() in a multithreaded process are unspecified.
Anyway, try to explain with more precision what are the threads and process of your program, what they do, how were the signal handlers set up and why, who is sending signals, who is receiving, etc, etc, etc, etc, etc.

Resources