Are avcodec_send_packet and avcodec_receive_frame thread safe? - multithreading

I am trying to implement video decoding application with libav decoder.
Most libav examples are built like this (pseudocode):
while true {
auto packet = receive_packet_from_network();
avcodec_send_packet(packet);
auto frame = alloc_empty_frame();
int r = avcodec_receive_frame(&frame);
if (r==0) {
send_to_render(frame);
}
}
Above is pseudocode.
Anyway, with this traditional cycle, when I wait receive frame complete and then wait rendering complete and then wait next data received from network incoming decoder buffer becomes empty. No HW decoder pipeline, low decode performance.
Additional limitation in my application - I know exactly that one received packet from network directly corresponds to one decoded frame.
Besides that, I would like to make solution faster. For that I want to split this cycle into 2 different threads like this:
//thread one
while true {
auto packet = receive_packet_from_network();
avcodec_send_packet(packet);
}
//thread two
while true {
auto frame = alloc_empty_frame();
int r = avcodec_receive_frame(&frame);
if (r==0) {
send_to_render(frame);
}
Purpose to split cycle into 2 different threads is to keep incoming decoder buffer always feed enough, mostly full. Only in that case I guess HW decoder I expect to use will be happy to work constantly pipelined. Of cause, I need thread synchronization mechanisms, not shown here just for simplicity. Of cause when EGAIN is returned from avcodec_send_packet() or avcodec_receive_frame() I need to wait for other thread makes its job feeding incoming buffer or fetching ready frames. That is another story.
Besides that, this threading solution does not work for me with random segmentation faults. Unfortunately I cannot find any libav documentation saying explicitly if such method is acceptable or not, are avcodec_send_packet() and avcodec_receive_frame() calls thread safe or not?
So, what is best way to load HW decoder pipeline? For me it is obvious that traditional poll cycles shown in any libav examples are not effective.

No, threading like this is not allowed in libavcodec.
But, FFmpeg and libavcodec do support threading and hardware pipelining. But, this is much lower-level and requires you, as the user, to let FFmpeg/libavcodec do its thing and not worry about it:
don't call send_packet() and receive_frame() from different threads;
set AVCodecContext.thread_count for threading;
let hardware wrappers in FFmpeg internally take care of pipelining, they know much better than you what to do. I can ask experts for more info if you're interested, I'm not 100% knowledgeable in this area, but can refer you to people that are.
if send_packet() returns AVERROR(EAGAIN), call receive_frame() first
if receive_frame() returns AVERROR(EAGAIN), please call send_packet() next.
With the correct thread_count, FFmpeg/libavcodec will decode multiple frames in parallel and use multiple cores.

Related

Web Audio API Processor result

I am doing some audio processing with JS, using Web Audio API
So I've created a custom Audio Worklet Processor in which I am processing some audio.
Here is a small example.
class MyProcessor extends AudioWorkletProcessor {
process (inputs, outputs, parameters) {
const someProcessedNumber = cppApiProcessor.process(inputs,outputs,parameters);
return true; // to keep the processor alive
}
}
You see variable someProcessedNumber comes from a cppApi and I don't know how to let the outer JS world know about that, as the Processor returns boolean (whether keep the node alive or not), and I cannot touch the data in outputs. ( I don't wanna change the outcoming audio, just process and give a number)
How can I do that? Is there a better way to do this?
You can use the port of an AudioWorkletProcessor to send data back to the main thread (or any other thread).
this.port.postMessage(someProcessedNumber);
Every AudioWorkletNode has a port as well which can be used to receive the message.
Using the MessagePort will generate some garbage on the audio thread which makes the garbage collection run from time to time. It's also not the most performant way to transfer data.
If that's an issue you can use a SharedArrayBuffer instead which the AudioWorkletProcessor uses to write the data and the AudioWorkletNode uses to read the data.
ringbuf.js is a library which aims to make this process as easy as possible.

FreeRTOS suspend task from another function

So I have a half duplex bus driver, where I send something and then always have to wait a lot of time to get a response. During this wait time I want the processor to do something valuable, so I'm thinking about using FreeRTOS and vTaskDelay() or something.
One way to do it would off be splitting the driver up in some send/receive part. After sending, it returns to the caller. The caller then suspends, and does the reception part after a certain period of time.
But, the level of abstraction would be finer if it continues to be one task from the user point of view, as today. Therefore I was thinking, is it possible for a function within a task to suspend the task itself? Like
void someTask()
{
while(true){
someFunction(&someTask(), arg 1, arg 2,...);
otherStuff();
}
}
void someFunction(*someSortOfReferenceToWhateverTaskWhoCalled, arg1, arg2 ...)
{
if(something)
{
/*Use the pointer or whatever to suspend the task that called this function*/
}
}
Have a look at the FreeRTOS API reference for vTaskSuspend, http://www.freertos.org/a00130.html
However I am not sure you are going about controlling the flow of the program in the correct way. Tasks can be suspended on queues, events, delays etc.
For example in serial comms, you might have a task that feeds data into a queue (but suspends if it is full) and an interrupt that takes data out of the queue and transmits the data, or an interrupt putting data in a queue, or sending an event to a task to say there is data ready for it to process, the task can then wake up and process the data or take it out of the queue.
One thing I think is important though (in my opinion) is to only have one suspend point in any task. This is not a strict rule, but will make your life a lot easier in most situations.
There a numerous other task control mechanisms that are common to most RTOS's.
Have a good look around the FreeRTOS website and play with a few demo's. There is also plenty of generic RTOS tutorials on the web. It it worth learning how use the basic features of most RTOS's. It is actually not that complicated.

using MPI_Send_variable many times in a row before MPI_Recv_variable

To my current understanding, after calling MPI_Send, the calling thread should block until the variable is received, so my code below shouldn't work. However, I tried sending several variables in a row and receiving them gradually while doing operations on them and this still worked... See below. Can someone clarify step by step what is going on here?
matlab code: (because I am using a matlab mex wrapper for MPI functions)
%send
if mpirank==0
%arguments to MPI_Send_variable are (variable, destination, tag)
MPI_Send_variable(x,0,'A_22')%thread 0 should block here!
MPI_Send_variable(y,0,'A_12')
MPI_Send_variable(z,1,'A_11')
MPI_Send_variable(w,1,'A_21')
end
%recieve
if mpirank==0
%arguments to MPI_Recv_variable are (source, tag)
a=MPI_Recv_variable(0,'A_12')*MPI_Recv_variable(0,'A_22');
end
if mpirank==1
c=MPI_Recv_variable(0,'A_21')*MPI_Recv_variable(0,'A_22');
end
MPI_SEND is a blocking call only in the sense that it blocks until it is safe for the user to use the buffer provided to it. The important text to read here is in Section 3.4:
The send call described in Section 3.2.1 uses the standard communication mode. In this mode, it is up to MPI to decide whether outgoing messages will be buffered. MPI may buffer outgoing messages. In such a case, the send call may complete before a matching receive is invoked. On the other hand, buffer space may be unavailable, or MPI may choose not to buffer outgoing messages, for performance reasons. In this case, the send call will not complete until a matching receive has been posted, and the data has been moved to the receiver.
I highlighted the part that you're running up against in bold there. If your message is sufficiently small (and there are sufficiently few of them), MPI will copy your send buffers to an internal buffer and keep track of things internally until the message has been received remotely. There's no guarantee that when MPI_SEND is done, the message has been received.
On the other hand, if you do want to know that the message was actually received, you can use MPI_SSEND. That function will synchronize (hence the extra S both sides before allowing them to return from the MPI_SSEND and the matching receive call on the other end.
In a correct MPI program, you cannot do a blocking send to yourself without first posting a nonblocking receive. So a correct version of your program would look something like this:
Irecv(..., &req1);
Irecv(..., &req2);
Send(... to self ...);
Send(.... to self ...);
Wait(&req1, ...);
/* do work */
Wait(&req2, ...);
/* do more work */
Your code is technically incorrect, but the reason it is working correctly is because the MPI implementation is using internal buffers to buffer your send data before it is transmitted to the receiver (or matched to the later receive operation in the case of self sends). An MPI implementation is not required to have such buffers (generally called "eager buffers"), but most implementations do.
Since the data you are sending is small, the eager buffers are generally sufficient to buffer them temporarily. If you send large enough data, the MPI implementation will not have enough eager buffer space and your program will deadlock. Try sending, for example, 10 MB instead of a double in your program to notice the deadlock.
I assume that there is just a MPI_Send() behind MPI_Send_variable() and MPI_Receive() behind MPI_Receive_variable().
How do a process can ever receive a message that he sent to himself if both the send and receive operations are blocking ? Either send to self or receive to self are non-blocking or you will get a deadlock, and sending to self is forbidden.
Following answer of #Greginozemtsev Is the behavior of MPI communication of a rank with itself well-defined? , the MPI standard states that send to self and receive to self are allowed. I guess it implies that it's non blocking in this particular case.
In MPI 3.0, in section 3.2.4 Blocking Receive here, page 59, the words have not changed since MPI 1.1 :
Source = destination is allowed, that is, a process can send a message to itself.
(However, it is unsafe to do so with the blocking send
and receive operations described above, since this may lead to deadlock.
See Section 3.5.)
I rode section 3.5, but it's not clear enough for me...
I guess that the parenthesis are here to tell us that talking to oneself is not a good practice, at least for MPI communications !

context switch measure time

I wonder if anyone of you know how to to use the function get_timer()
to measure the time for context switch
how to find the average?
when to display it?
Could someone help me out with this.
Is it any expert who knows this?
One fairly straightforward way would be to have two threads communicating through a pipe. One thread would do (pseudo-code):
for(n = 1000; n--;) {
now = clock_gettime(CLOCK_MONOTONIC_RAW);
write(pipe, now);
sleep(1msec); // to make sure that the other thread blocks again on pipe read
}
Another thread would do:
context_switch_times[1000];
while(n = 1000; n--;) {
time = read(pipe);
now = clock_gettime(CLOCK_MONOTONIC_RAW);
context_switch_times[n] = now - time;
}
That is, it would measure the time duration between when the data was written into the pipe by one thread and the time when the other thread woke up and read that data. A histogram of context_switch_times array would show the distribution of context switch times.
The times would include the overhead of pipe read and write and getting the time, however, it gives a good sense of big the context switch times are.
In the past I did a similar test using stock Fedora 13 kernel and real-time FIFO threads. The minimum context switch times I got were around 4-5 usec.
I dont think we can actually measure this time from User space, as in kernel you never know when your process is picked up after its time slice expires. So whatever you get in userspace includes scheduling delays as well. However, from user space you can get closer measurement but not exact always. Even a jiffy delay matters.
I believe LTTng can be used to capture detailed traces of context switch timings, among other things.

Pass by reference TCL - threading?

I'm using the Snack audio processing kit along with TCL.
I want to cut up part of the sound and give this section to another thread to work with.
My question is how to pass something by reference, between threads in TCL.
proc a {} {
snack::sound snd
thread::send -async $Thread [list B snd]
}
set Thread [thead::create {
proc B{snd} {
... do something with snd
}
}
That's not going to work. Tcl threads are designed to be strongly isolated from each other since it massively reduces the amount of locking required for normal processing. The down-side of this is that passing things between threads is non-trivial (other than for short messages containing commands, which audio data isn't!) But there is a way forward…
If you can send the data as a chunk of bytes (at the script level) then I recommend transferring it between threads using the tsv package, which is parceled up with the thread package so you'll already have it. That will let you transport the data between threads relatively simply. Be aware that the snack package is not thread-aware in its script-level interface, so the data transfers are still going to involve copying, and Tk (like a great many GUI toolkits, FWIW) does not support multi-threaded use (well, not without techniques for another time) so if you're doing waveform visualization you've got some work ahead. (OTOH, modern CPUs have loads of time to spare too.)

Resources