Way(s) to Improve Performance of Streaming Video Frames

Way(s) to Improve Performance of Streaming Video Frames - node.js

I have this Nuxt SPA. It has two img elements, each taking a URL to a streaming endpoint (using GET). Each img element has a counter attached to it and its number is updated on every "onload" event. The endpoint is located on a Flask server and works by streaming frames (PNG images) extracted from a video file (using OpenCV) using generators, yield, and the MIME type multipart/x-mixed-replace. Basically the same to the method described in: https://blog.miguelgrinberg.com/post/video-streaming-with-flask
It works with no issues, my problem now is performance. If there is only one stream working (one GET request, one connection), everything is fine performance-wise. But when there are two streams working in parallel (two GET requests, two incoming connections at the same time), the App struggles and it lags very hard: It freezes on displaying one frame, updating the counter. Then after a while the counter increases quickly by 100 or so and it displays a frame 100 or so frames ahead from the previous one. This happens in an alternating fashion between the 2 img elements. So element 1 will load 100 or so then freeze, then element 2 will do the same and freeze, rinse and repeat.
Does anyone have an idea what could cause this? I need to improve the performance so both "streams" can run at the same time without having such insane lags.
I think that it might be the two connections competing against each other so I have been thinking of sending multiple GET requests instead. The response would be a batch (array) of let's say 100-200 frames. Then a function in the app would play the frames at 30 FPS or so. Once the array is almost empty, then it will make a new GET request for the next batch of frames. Rinse and repeat, and do it in an alternating fashion between the 2 img elements.
Do you think this will alleviate the issue? Or am I solving the wrong problem?

Its hard to answer in detail without looking at the exact HW and SW being used on the client as there are many factors, such as bandwidth, ability to run multiple threads in parallel, fetching optimisations in browsers, HW vs SW codecs etc that might affect performance.
However, one high level thing that may help, if you are able to use it for your use case, would be to try to leverage the existing optimisations that many servers and clients have for video streaming and playback.
In other words, given that your source is essentially a stream of video frames, if you can combine them into an 'actual' video stream then the browser can simply request that video stream and the server can simply serve a video.
This allows you leverage all the inbuilt mechanisms to download the video in 'chunks', using range requests and/or Adaptive Bit rate streaming.
The client will also be able to leverage existing buffering and playback mechanisms for video.
Most common laptop/desktop machines should be able to handle two parallel videos being played back at the same time so this could take away the playback pain you are seeing, at the cost of more work on the server side to package the frames into video streams.

Related

Single Camera access by two processes at the same time

I want to use one Camera for two processes / threads, e.g.
a) live streaming and
b) image processing at the same time.
Use Case:
Application, which can handle multiple request, based on a user request.
a) User can request – Detect cam-1 and do a Live streaming
b) Later, user can request – Detect Motion / Image processing using the same cam-1, while process (a) is doing the live streaming.
Challenge I see to access same camera by 2 different process at the same time, is there way to reroute the data / pointers of Cam data to different process ?
Note: OS -Windows
Any help will be appreciated !!
Regards, AK

Well, doable. But ..
Given the said above, there are few things to respect once designing the target software approach. One of these is a fact, the camera is a device, which restricts it to have a single "commander-in-charge", rather than permiting to have a shizophrenic "duty" under several concurrent bosses.
This sais, the solution is in smarter-design of the acquired data-stream, this could be delivered into several concurrent consuming-processes.
For more hints on such a design concept, read this Answer to a similarly motivated Question.

Avoid to let two threads access the camera at the same time.
If the driver allows it, you may work with multiple buffers, used in a round-robin fashion to store the live stream. Their content can be continuously sent to the display, but when desired you can leave one on the side and reserve it to allow for longer processing.
If this is not possible, you can copy every desired image to a processing buffer when needed.
If your system must be very responsive and process the images in real-time, there is probably no need for two threads !
In any case, if you are working with two threads, there is no need to "reroute the pointers", you simply let the threads access the buffers.
If they are processes rather than threads, then you can establish the buffers in a shared memory section.

Streaming output from program to an arbitrary number of programs under Linux?

How should I stream the output from one program to an undefined number of programs in such a fashion that the data isn't buffered anywhere and that the application where the stream originates from doesn't block even if there's nothing reading the stream, but the programs reading the stream do block if there's no output from the first-mentioned program?
I've been trying to Google around for a while now, but all I find is methods where the program does block if nothing is reading the stream.

How should I stream the output from one program to an undefined number of programs in such a fashion that the data isn't buffered anywhere and that the application where the stream originates from doesn't block even if there's nothing reading the stream
Your requirements as stated can not possibly be satisfied without some form of a buffer.
Most straightforward option is to write the output to the file and let consumers read that file.
Another option is to have a ring-buffer in a form of a memory mapped file. As the capacity of a ring-buffer is normally fixed there needs to be a policy for dealing with slow consumers. Options are: block the producer; terminate the slow consumer; let the slow consumer somehow recover when it missed data.

Many years ago I wrote something like what you describe for an audio stream processing app (http://hewgill.com/nwr/). It's on github as splitter.cpp and has a small man page.
The splitter program currently does not support dynamically changing the set of output programs. The output programs are fixed when the command is started.

Without knowing exactly what sort of data you are talking about (how large is the data, what format is it, etc, etc) it is hard to come up with a concrete answer. Let's say for example you want a "ticker-tape" application that sends out information for share purchases on the stock exchange, you could quite easily have a server that accepts a socket from each application, starts a thread and sends the relevant data as it appears from the recoder at the stock market. I'm not aware of any "multiplexer" that exists today (but Greg's one may be a starting point). If you use (for example) XML to package the data, you could send the second half of a packet, and the client code would detect that it's not complete, so throws it away.
If, on the other hand, you are sending out high detail live update weather maps for the whole country, the data is probably large enough that you don't want to wait for a full new one to arrive, so you need some sort of lock'n'load protocol that sets the current updated map, and then sends that one out until (say) 1 minute later you have a new one. Again, it's not that complex to write some code to do this, but it's quite a different set of code to the "ticker tape" solution above, because the packet of data is larger, and getting "half a packet" is quite wasteful and completely useless.
If you are streaming live video from the 2016 Olympics in Brazil, then you probably want a further diffferent solution, as timing is everything with video, and you need the client to buffer, pick up key-frames, throw away "stale" frames, etc, etc, and the server will have to be different.

Thread model for playing external SWFs inside an SWF?

I am designing an application that allows users to use animated emoticons(defined using external SWF files) and display them inside another SWF file. This works as long as there are only a very small # emoticons at a time, but if the number increases significantly the performance starts to slow to a crawl.... The bottleneck isnt the network as there are only a few emoticons to choose from, we are just having issues displaying them simultaneously.
How does the Flash threading model handle playing external SWFs? Can we attempt to play them on a separate thread, or will that cause issues(like it does in Swing and Cocoa and the like)

its hard to say, when you mean loading multiple small swf , are they copies or are you creating new instances, with new properties etc...
I have managed to similar thing making a 20 line poker machine that uses swf for all the images, animations . These files each are about 600Kb and there is 12 of these at one time on the screen animated(well depends on the win);
But how i did this was load them into an array every time i needed it. and access it from there. i didnt seem to have any issues.
Its not much of an answer but can you explain your loading method, or put a sample html server i could see whats exactly your problem

realtime midi input and synchronisation with audio

I have built a standalone app version of a project that until now was just a VST/audiounit. I am providing audio support via rtaudio.
I would like to add MIDI support using rtmidi but it's not clear to me how to synchronise the audio and MIDI parts.
In VST/audiounit land, I am used to MIDI events that have a timestamp indicating their offset in samples from the start of the audio block.
rtmidi provides a delta time in seconds since the previous event, but I am not sure how I should grab those events and how I can work out their time in relation to the current sample in the audio thread.
How do plugin hosts do this?
I can understand how events can be sample accurate on playback, but it's not clear how they could be sample accurate when using realtime input.
rtaudio gives me a callback function. I will run at a low block size (32 samples). I guess I will pass a pointer to an rtmidi instance as the userdata part of the callback and then call midiin->getMessage( &message ); inside the audio callback, but I am not sure if this is thread-sensible.
Many thanks for any tips you can give me

In your case, you don't need to worry about it. Your program should send the MIDI events to the plugin with a timestamp of zero as soon as they arrive. I think you have perhaps misunderstood the idea behind what it means to be "sample accurate".
As #Brad noted in his comment to your question, MIDI is indeed very slow. But that's only part of the problem... when you are working in a block-based environment, incoming MIDI events cannot be processed by the plugin until the start of a block. When computers were slower and block sizes of 512 (or god forbid, >1024) were common, this introduced a non-trivial amount of latency which results in the arrangement not sounding as "tight". Therefore sequencers came up with a clever way to get around this problem. Since the MIDI events are already known ahead of time, these events can be sent to the instrument one block early with an offset in sample frames. The plugin then receives these events at the start of the block, and knows not to start actually processing them until N samples have passed. This is what "sample accurate" means in sequencers.
However, if you are dealing with live input from a keyboard or some sort of other MIDI device, there is no way to "schedule" these events. In fact, by the time you receive them, the clock is already ticking! Therefore these events should just be sent to the plugin at the start of the very next block with an offset of 0. Sequencers such as Ableton Live, which allow a plugin to simultaneously receive both pre-sequenced and live events, simply send any live events with an offset of 0 frames.
Since you are using a very small block size, the worst-case scenario is a latency of .7ms, which isn't too bad at all. In the case of rtmidi, the timestamp does not represent an offset which you need to schedule around, but rather the time which the event was captured. But since you only intend to receive live events (you aren't writing a sequencer, are you?), you can simply pass any incoming MIDI to the plugin right away.

Has anybody some advice on programming realtime audio synthesis?

I'm currently working on a personal project: creating a library for realtime audio synthesis in Flash. In short: tools to connect wavegenarators, filters, mixers, etc with eachother and supply the soundcard with raw (realtime) data. Something like max/msp or Reaktor.
I already have some working stuff, but I'm wondering if the basic setup that I wrote is right. I don't want to run into problems later on that force me to change the core of my app (although that can always happen).
Basically, what I do now is start at the end of the chain, at the place where the (raw) sounddata goes 'out' (to the soundcard). To do that, I need to write chunks of bytes (ByteArrays) to an object, and to get that chunk I ask whatever module is connected to my 'Sound Out' module to give me his chunk. That module does the same request to the module that's connected to his input, and that keeps happening until the start of the chain is reached.
Is this the right approach? I can imagine running into problems if there's a feedbackloop, or if there's another module with no output: if i were to connect a spectrumanalyzer somewhere, that would be a dead end in the chain (a module with no outputs, just an input). In my current setup, such a module wouldnt work because i only start calculating from the sound-output module.
Has anyone experience with programming something like this? I'd be very interested in some thoughts about the right approach. (For clarity: i'm not looking for specific Flash-implementations, and that's why i didnt tag this question under flash or actionscript)

I did a similar thing a while back, and I used the same approach as you do - start at the virtual line out, and trace the signal back to the top. I did this per sample though, not per buffer; if I were to write the same application today, I might choose per-buffer instead though, because I suspect it would perform better.
The spectrometer was designed as an insert module, that is, it would only work if both its input and its output were connected, and it would pass its input to the output unchanged.
To handle feedback, I had a special helper module that introduced a 1-sample delay and would only fetch its input once per cycle.
Also, I think doing all your internal processing with floats, and thus arrays of floats as the buffers, would be a lot easier than byte arrays, and it would save you the extra effort of converting between integers and floats all the time.

In later versions you may have different packet rates in different parts of your network.
One example would be if you extend it to transfer data to or from disk. Another example
would be that low data rate control variables such as one controlling echo-delay may, later, become a part of your network. You probably don't want to process control variables with the same frequency that you process audio packets, but they are still 'real time' and part of the function network. They may for example need smoothing to avoid sudden transitions.
As long as you are calling all your functions at the same rate, and all the functions are essentially taking constant-time, your pull-the-data approach will work fine. There will
be little to choose between pulling data and pushing. Pulling is somewhat more natural for playing audio, pushing is somewhat more natural for recording, but either works and ends up making the same calls to the underlying audio processing functions.
For the spectrometer you've got
the issue of multiple sinks for
data, but it is not a problem.
Introduce a dummy link to it from
the real sink. The dummy link can
cause a request for data that is not
honoured. As long as the dummy link knows
it is a dummy and does not care about
the lack of data, everything will be
OK. This is a standard technique for reducing multiple sinks or sources to a single one.
With this kind of network you do not want to do the same calculation twice in one complete update. For example if you mix a high-passed and low-passed version of a signal you do not want to evaluate the original signal twice. You must do something like record a timer tick value with each buffer, and stop propagation of pulls when you see the current tick value is already present. This same mechanism will also protect you against feedback loops in evaluation.
So, those two issues of concern to you are easily addressed within your current framework.
Rate matching where there are different packet rates in different parts of the network is where the problems with the current approach will start. If you are writing audio to disk then for efficiency you'll want to write large chunks infrequently. You don't want to be blocking your servicing of the more frequent small audio input and output processing packets during those writes. A single rate pulling or pushing strategy on its own won't be enough.
Just accept that at some point you may need a more sophisticated way of updating than a single rate network. When that happens you'll need threads for the different rates that are running, or you'll write your own simple scheduler, possibly as simple as calling less frequently evaluated functions one time in n, to make the rates match. You don't need to plan ahead for this. Your audio functions are almost certainly already delegating responsibility for ensuring their input buffers are ready to other functions, and it will only be those other functions that need to change, not the audio functions themselves.
The one thing I would advise at this stage is to be careful to centralise audio buffer
allocation, noticing that buffers are like fenceposts. They don't belong to an audio
function, they lie between the audio functions. Centralising the buffer allocation will make it easy to retrospectively modify the update strategy for different rates in different parts of the network.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string