context switch measure time - linux

I wonder if anyone of you know how to to use the function get_timer()
to measure the time for context switch
how to find the average?
when to display it?
Could someone help me out with this.
Is it any expert who knows this?

One fairly straightforward way would be to have two threads communicating through a pipe. One thread would do (pseudo-code):
for(n = 1000; n--;) {
now = clock_gettime(CLOCK_MONOTONIC_RAW);
write(pipe, now);
sleep(1msec); // to make sure that the other thread blocks again on pipe read
}
Another thread would do:
context_switch_times[1000];
while(n = 1000; n--;) {
time = read(pipe);
now = clock_gettime(CLOCK_MONOTONIC_RAW);
context_switch_times[n] = now - time;
}
That is, it would measure the time duration between when the data was written into the pipe by one thread and the time when the other thread woke up and read that data. A histogram of context_switch_times array would show the distribution of context switch times.
The times would include the overhead of pipe read and write and getting the time, however, it gives a good sense of big the context switch times are.
In the past I did a similar test using stock Fedora 13 kernel and real-time FIFO threads. The minimum context switch times I got were around 4-5 usec.

I dont think we can actually measure this time from User space, as in kernel you never know when your process is picked up after its time slice expires. So whatever you get in userspace includes scheduling delays as well. However, from user space you can get closer measurement but not exact always. Even a jiffy delay matters.

I believe LTTng can be used to capture detailed traces of context switch timings, among other things.

Related

Are avcodec_send_packet and avcodec_receive_frame thread safe?

I am trying to implement video decoding application with libav decoder.
Most libav examples are built like this (pseudocode):
while true {
auto packet = receive_packet_from_network();
avcodec_send_packet(packet);
auto frame = alloc_empty_frame();
int r = avcodec_receive_frame(&frame);
if (r==0) {
send_to_render(frame);
}
}
Above is pseudocode.
Anyway, with this traditional cycle, when I wait receive frame complete and then wait rendering complete and then wait next data received from network incoming decoder buffer becomes empty. No HW decoder pipeline, low decode performance.
Additional limitation in my application - I know exactly that one received packet from network directly corresponds to one decoded frame.
Besides that, I would like to make solution faster. For that I want to split this cycle into 2 different threads like this:
//thread one
while true {
auto packet = receive_packet_from_network();
avcodec_send_packet(packet);
}
//thread two
while true {
auto frame = alloc_empty_frame();
int r = avcodec_receive_frame(&frame);
if (r==0) {
send_to_render(frame);
}
Purpose to split cycle into 2 different threads is to keep incoming decoder buffer always feed enough, mostly full. Only in that case I guess HW decoder I expect to use will be happy to work constantly pipelined. Of cause, I need thread synchronization mechanisms, not shown here just for simplicity. Of cause when EGAIN is returned from avcodec_send_packet() or avcodec_receive_frame() I need to wait for other thread makes its job feeding incoming buffer or fetching ready frames. That is another story.
Besides that, this threading solution does not work for me with random segmentation faults. Unfortunately I cannot find any libav documentation saying explicitly if such method is acceptable or not, are avcodec_send_packet() and avcodec_receive_frame() calls thread safe or not?
So, what is best way to load HW decoder pipeline? For me it is obvious that traditional poll cycles shown in any libav examples are not effective.
No, threading like this is not allowed in libavcodec.
But, FFmpeg and libavcodec do support threading and hardware pipelining. But, this is much lower-level and requires you, as the user, to let FFmpeg/libavcodec do its thing and not worry about it:
don't call send_packet() and receive_frame() from different threads;
set AVCodecContext.thread_count for threading;
let hardware wrappers in FFmpeg internally take care of pipelining, they know much better than you what to do. I can ask experts for more info if you're interested, I'm not 100% knowledgeable in this area, but can refer you to people that are.
if send_packet() returns AVERROR(EAGAIN), call receive_frame() first
if receive_frame() returns AVERROR(EAGAIN), please call send_packet() next.
With the correct thread_count, FFmpeg/libavcodec will decode multiple frames in parallel and use multiple cores.

Measuring Semaphore wait times with Micrometer

We have a throttling implementation that essentially boils down to:
Semaphore s = new Semaphore(1);
...
void callMethod() {
s.acquire();
timer.recordCallable(() -> // call expensive method);
s.release();
}
I would like to gather metrics about the impact semaphore has on the overall response time of the method. For example, I would like to know the number of threads that were waiting for acquire, the time spend waiting etc., What, I guess, I am looking for is guage that also captures timing information?
How do I measure the Semphore stats?
There are multiple things you can do depending on your needs and situation.
LongTaskTimer is a timer that measures tasks that are currently in-progress. The in-progress part is key here, since after the task has finished, you will not see its effect on the timer. That's why it is for long running tasks, I'm not sure if it fits your use case.
The other thing that you can do is having a Timer and a Gauge where the timer measures the time it took to acquire the Semaphore while with the gauge, you can increment/decrement the number of threads that are currently waiting on it.

ostream::flush context

I am using both ostream::write and ostream::flush operation in a multithread application in the following sequence:
// <<-- start time measurement
{
ostream::write();
ostream::flush();
}
//<<-- end time measurement
The issue it that on measuring the time for the above sequence, I get a very short time (~10msec), yet the time between thread entrance becomes very large (~400msec), only because of adding the ostream::flush and ostream:::write commands.
Only oncee in a while I get that the time difference becomes larger, yet I am not sure if it is because of some context switch.
I test it in Linux machine, dual core cpu.
This make me confused, I have assumed that both of these functions are blocking, or is it that the writing is actually done only after fflush ?
EDIT:
only one thread does the writing to file.

Is there a way to know the real CPU time elasped in a scala future?

I want to know how many cpu time is spent in a future.
ManagementFactory.getThreadMXBean.getCurrentThreadCpuTime is able to give this time but not for a future that is running in another thread.
I think the flowing code is able to give the end time but not the start time in the same thread.
future.map { result =>
val end = ManagementFactory.getThreadMXBean.getCurrentThreadCpuTime
(result, end - start)
}
ThreadMxBean is also capable to give the cpu time of a given thread but I think there is no whay to know the id of the thread before the future is run.

How can I improve performance with FutureTasks

The problem seems simple, I have a number (huge) of operations that I need to work and the main thread can only proceed when all of those operations return their results, however. I tried in one thread only and each operation took about let's say from 2 to 10 seconds at most, and at the end it took about 2,5 minutes. Tried with future tasks and submited them all to the ExecutorService. All of them processed at a time, however each of them took about let's say from 40 to 150 seconds. In the end of the day the full process took about 2,1 minutes.
If I'm right, all the threads were nothing but a way of execute all at once, although sharing processor's power, and what I thought I would get would be the processor working heavily to get me all the tasks executed at the same time taking the same time they take to excecuted in a single thread.
Question is: Is there a way I can reach this? (maybe not with future tasks, maybe with something else, I don't know)
Detail: I don't need them to exactly work at the same time that actually doesn't matter to me what really matters is the performance
You might have created way too many threads. As a consequence, the cpu was constantly switching between them thus generating a noticeable overhead.
You probably need to limit the number of running threads and then you can simply submit your tasks that will execute concurrently.
Something like:
ExecutorService es = Executors.newFixedThreadPool(8);
List<Future<?>> futures = new ArrayList<>(runnables.size());
for(Runnable r : runnables) {
es.submit(r);
}
// wait they all finish:
for(Future<?> f : futures) {
f.get();
}
// all done

Resources