How to inspect scheduling of multi-threaded user application using lttng? - multithreading

I am new to tracing in Linux. I have a multi-threaded C++ user application. The threads wake up periodically (by o/s timer) and sleep after doing some processing. I want to visualise:
1) When the threads start and stop running
2) Which cores the threads are running on.
I have installed lttng and Trace Compass onto an Ubuntu 14.04 LTS machine. But I don't know how to use these tools to achieve my objective.
I have read the following lttng doc section:
http://lttng.org/docs/#doc-tracing-your-own-user-application
In order to collect my trace, must I define custom lttng tracepoint definitions ( in a tracepoint provider header file ), and insert tracepoints into my user application, or is there a simpler way of achieving my goal?
Best regards
David

You can take a kernel trace, enabling at least the sched_switch event, to obtain information regarding which thread is running on which CPU. Opening such trace in Trace Compass and looking at the Control Flow View should show the status of all threads, so you can search for the ones the correspond to your application.
In addition, you could also instrument your application with userspace tracepoints, as you mentioned. This would allow you to track userspace states, going further than what is available in just the kernel trace.
You might be interested this example/tutorial, which shows how to instrument a simple application and how to write a Trace Compass configuration file to display application states graphically.

Related

In linux how to run piece of code without getting preempted in user mode

thread-stop-preemption
//code to run
thread-start-preemption
a piece of code is running in a thread,
are atomic functions are available in user mode?
Linux doesn't offer very good behavior for real-time applications.
Unless your application really is real-time you should change your code to use normal synchronization primitives (e.g. mutexes, condition variables etc.)
But if you really think you need your thread not to be interrupted you might get away (but not really) with the real-time policies mentioned in sched(7) e.g. SCHED_FIFO. If you choose to go down that route you can influence a thread's scheduling using sched_setattr(2).
More warning
Before using this for anything with hard real-time constraints consider a vanilla Linux kernel itself is probably not the tool for the job: although the scheduler will try to keep your thread running I don't think it "guarantees" it.

Any possibility to disable some SMX in a GPU?

In a single GPU such as P100 there are 56 SMs(Streaming Multiprocessors), and different SMs may have little correlation .I would like to know the application performance variation with different SMs.So it there any way to disable some SMs for a certain GPU. I know CPU offer the corresponding mechanisms but have get a good one for GPU yet.Thanks!
There are no CUDA-provided methods to disable a SM (streaming multiprocessor). With varying degrees of difficulty and behavior, some possibilities exist to try this using indirect methods:
Use CUDA MPS, and launch an application that "occupies" fully one or more SMs, by carefully controlling number of blocks launched and resource utilization of those blocks. With CUDA MPS, another application can run on the same GPU, and the kernels can run concurrently, assuming sufficient care is taken for it. This might allow for no direct modification of the application code under test (but an additional application launch is needed, as well as MPS). The kernel duration will need to be "long" so as to occupy the SMs while the application under test is running.
In your application code, effectively re-create the behavior listed in item 1 above by launching the "dummy" kernel from the same application as the code under test, and have the dummy kernel "occupy" one or more SMs. The application under test can then launch the desired kernel. This should allow for kernel concurrency without MPS.
In your application code, for the kernel under test itself, modify the kernel block scheduling behavior, probably using the smid special register via inline PTX, to cause the application kernel itself to only utilize certain SMs, effectively reducing the total number in use.

Monitor multiple thread performance

I have create a windows service having multiple threads (approx 4-5 threads). In this service thread created at specific internal and abort. Once thread is created it performs some I/O operations & db operation.
I have a GUI for this service to provide configuration which is required by this service. In this GUI I want to add one more functionality which shows me the performance of windows service with respect to all threads. I want show CPU utilization (if multicore processor is available than all the processors utilization) with its memory utilization.
If you look at Windows Task Manager it shows CPU (Per core basis) + Memory Utilization, I want to build the same thing but only for threads running by my windows service.
Can anybody help me out how to get CPU% and memory utilization per thread?
I think you cannot get the CPU and Memory utilization of Threads. Instead you can get the same for your service.
My question is, why would you require to build your own functionality, where SysInternals Process explorer gives more details for you? Any specific needs?
If you need to monitor the thread activities, you could better log some information using Log4net or other logging tools. This will get you an idea about the threads and what they are doing.
To be more specific, you could publish the logs using TelNetAppender, which can be received by your application. This will help you to look into the Process in real time.

Need weird advice in how to allow a Linux process ONLY create and use a single pipe

Hoi.
I am working on an experiment allowing users to use 1% of my CPU. That's like your own Webserver; but a big dynamic remote execution framework (dont ask about that), and I dont want users to use API functions like create files, no sockets, no threads, no console output, nothing.
Update1: People will be sending me binaries, so interrupt 0x80 is possible. Therefore... Kernel?
I need to limit a process so it cannot do anything but use a single pipe. Through that pipe the process will use my own wrapped and controlled API.
Is that even possible? I thought like a Linux kernel module.
The issues with limiting RAM and CPU are not primary here, for that there's something on google.
Thanks in advance!
The ptrace facility will allow your program to observe and control the operation of another process. Using the PTRACE_SYSCALL flag, you can stop the child process before every syscall, and make a decision about whether you want to allow that system call to proceed.
You might want to look at what Google is doing with their Native Client technology and the seccomp sandbox. The Native Client (NaCl) stuff is intended to let x86 binaries supplied by a web site run inside a user's local browser. The problem of malicious binaries is similar to what you face, so most of the technology/research probably applies directly.

Debugging utilities for Linux process hang issues?

I have a daemon process which does the configuration management. all the other processes should interact with this daemon for their functioning. But when I execute a large action, after few hours the daemon process is unresponsive for 2 to 3 hours. And After 2- 3 hours it is working normally.
Debugging utilities for Linux process hang issues?
How to get at what point the linux process hangs?
strace can show the last system calls and their result
lsof can show open files
the system log can be very effective when log messages are written to track progress. Allows to box the problem in smaller areas. Also correlate log messages to other messages from other systems, this often turns up interesting results
wireshark if the apps use sockets to make the wire chatter visible.
ps ax + top can show if your app is in a busy loop, i.e. running all the time, sleeping or blocked in IO, consuming CPU, using memory.
Each of these may give a little bit of information which together build up a picture of the issue.
When using gdb, it might be useful to trigger a core dump when the app is blocked. Then you have a static snapshot which you can analyze using post mortem debugging at your leisure. You can have these triggered by a script. The you quickly build up a set of snapshots which can be used to test your theories.
One option is to use gdb and use the attach command in order to attach to a running process. You will need to load a file containing the symbols of the executable in question (using the file command)
There are a number of different ways to do:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
You can use the alarm syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm (i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
You can seamlessly restart your process as it dies with fork and waitpid as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.

Resources