OpenCV FPS Optimisation - linux

How can I increase opencv video FPS in Linux on Intel atom? The video seems lagging when processing with opencv libraries.
Furthermore, i m trying to execute a program/file with opencv
system(/home/file/image.jpg);
however, it shows Access Denied.

There are several things you can do to improve performance. Using OpenGL, GPUs, and even just disabling certain functions within OpenCV. When you capture video you can also change the FPS default which is sometimes set low. If you are getting access denied on that file I would check the permissions, but without setting the full error it is hard to figure out.
First is an example of disabling conversion and the second is setting the desired FPS. I think these defines are changed in OpenCV 3 though.
cap.set(CV_CAP_PROP_CONVERT_RGB , false);
cap.set(CV_CAP_PROP_FPS , 60);

From your question, it seems you have a problem that your frame buffer is collecting a lot of frames which you are not able to clear out before reaching to the real-time frame. i.e. a frame capture now, is processed several seconds later. Am I correct in understanding?
In this case, I'd suggest couple of things,
Use a separate thread to grab the frames from VideoCapture and then push these frames into a queue of a limited size. Of course this will lead to missing frames, but if you are interested in real time processing then this cost is often justified.
If you are using OOP, then I may suggest using a separate thread for each object, as this significantly speeds up the processing. You can see several fold increase depending on the application and functions used.

Related

Is it practical to use the "rude big hammer" approach to parallelize a MacOS/CoreAudio real-time audio callback?

First, some relevant background info: I've got a CoreAudio-based low-latency audio processing application that does various mixing and special effects on audio that is coming from an input device on a purpose-dedicated Mac (running the latest version of MacOS) and delivers the results back to one of the Mac's local audio devices.
In order to obtain the best/most reliable low-latency performance, this app is designed to hook in to CoreAudio's low-level audio-rendering callback (via AudioDeviceCreateIOProcID(), AudioDeviceStart(), etc) and every time the callback-function is called (from the CoreAudio's realtime context), it reads the incoming audio frames (e.g. 128 frames, 64 samples per frame), does the necessary math, and writes out the outgoing samples.
This all works quite well, but from everything I've read, Apple's CoreAudio implementation has an unwritten de-facto requirement that all real-time audio operations happen in a single thread. There are good reasons for this which I acknowledge (mainly that outside of SIMD/SSE/AVX instructions, which I already use, almost all of the mechanisms you might employ to co-ordinate parallelized behavior are not real-time-safe and therefore trying to use them would result in intermittently glitchy audio).
However, my co-workers and I are greedy, and nevertheless we'd like to do many more math-operations per sample-buffer than even the fastest single core could reliably execute in the brief time-window that is necessary to avoid audio-underruns and glitching.
My co-worker (who is fairly experienced at real-time audio processing on embedded/purpose-built Linux hardware) tells me that under Linux it is possible for a program to requisition exclusive access for one or more CPU cores, such that the OS will never try to use them for anything else. Once he has done this, he can run "bare metal" style code on that CPU that simply busy-waits/polls on an atomic variable until the "real" audio thread updates it to let the dedicated core know it's time to do its thing; at that point the dedicated core will run its math routines on the input samples and generate its output in a (hopefully) finite amount of time, at which point the "real" audio thread can gather the results (more busy-waiting/polling here) and incorporate them back into the outgoing audio buffer.
My question is, is this approach worth attempting under MacOS/X? (i.e. can a MacOS/X program, even one with root access, convince MacOS to give it exclusive access to some cores, and if so, will big ugly busy-waiting/polling loops on those cores (including the polling-loops necessary to synchronize the CoreAudio callback-thread relative to their input/output requirements) yield results that are reliably real-time enough that you might someday want to use them in front of a paying audience?)
It seems like something that might be possible in principle, but before I spend too much time banging my head against whatever walls might exist there, I'd like some input about whether this is an avenue worth pursuing on this platform.
can a MacOS/X program, even one with root access, convince MacOS to give it exclusive access to some cores
I don't know about that, but you can use as many cores / real-time threads as you want for your calculations, using whatever synchronisation methods you need to make it work, then pass the audio to your IOProc using a lock free ring buffer, like TPCircularBuffer.
But your question reminded me of a new macOS 11/iOS 14 API I've been meaning to try, the Audio Workgroups API (2020 WWDC Video).
My understanding is that this API lets you "bless" your non-IOProc real-time threads with audio real-time thread properties or at least cooperate better with the audio thread.
The documents distinguish between the threads working in parallel (this sounds like your case) and working asynchronously (this sounds like my proposal), I don't know which case is better for you.
I still don't know what happens in practice when you use Audio Workgroups, whether they opt you in to good stuff or opt you out of bad stuff, but if they're not the hammer you're seeking, they may have some useful hammer-like properties.

Circular buffer filling up faster than AVAudioSourceNode render block can read data from it

I am experimenting with AVAudioSourceNode, having connected it to the mixer node for output to the speaker. I am a bit of a newbie to iOS and audio programming so I apologize if this question is ignorant or unclear, but I will do my best to explain.
In the AVAudioSourceNode render block, I am attempting to retrieve received stream data that has been stored in a circular buffer (e.g. I currently use a basic implementation of a FIFO buffer but am considering moving to a TPCircularBuffer). I check to see if the buffer has enough bytes for me to fill the audiobuffer with, and if so I grab those bytes for output; if not, I either wait, or take what I can and fill the missing bytes with zeros.
In debugging, it appears I am running into a situation where the circular buffer is filling up a lot faster than the render block makes the call to access to the buffer to retrieve data from it. And understandably, after running OK for a few instants, once the circular buffer is full (I'm not even certain how large I should realistically make it but I guess that's another question), the output becomes garbage.
It is as if the acts of filling the circular buffer with streaming data (and probably other tasks as well) are taking priority over the calls made within the render block. I thought that audio operations involving the audio nodes would automatically be prioritized but it may be that I haven't done what is needed to make this happen.
I have read these threads:
iOS - Streaming and receiving audio from a device to another ends in one only sending and the other only receiving
Synchronising with Core Audio Thread
which appear to raise similar issues in substance, but a little more current guidance and explanation for my level of understanding and situation would be helpful and very much appreciated!
For playing, the audio system will only ask for data at the specified sample rate. If you fill a circular buffer at faster than that sample rate for an extended period of time, then it will overflow.
So you have to make sure your sample generator or incoming data stream complies with the sample rate for which the audio system is configured, no more and no less (other than strictly bounded bursting or latency jitter). The circular buffer needs to sized large enough to cover the maximum burst sizes plus maximum latency jitter plus any pre-fill plus a safety margin.
Another possible bug is trying to do to much inside the render block callback. Thus Apple recommends not using any code that requires memory management or locks or semaphores inside real-time audio callbacks.

Using pyarrow to stream an openCV image to multiple processes

I´m using openCV in python to load a video stream from a camera. I need to do multiple processing jobs on this stream, so, for instance, I might want to find objects in the image, do edge detection, color changes, etc, all on the same stream. I´d like to do it in parallel in many processes. The easiest solution would be to pickle the image frames and send them to all the processes, but for a high quality video this can be very costly.
I would like to read a frame, store this frame in memory with pyarrow and then have every process access this same frame in memory to do its trick. Then read another frame, etc. Couple of problems: i) how to access the frame from all processes with pyarrow (I understand from the docs that this should be possible, could not figure how); ii) how to make sure that all processes are done with the frame before overwriting it with another frame.
Thanks!
Plasma might be a good place to start for sharing the data.
Image replacement/deletion with distributed workers. There isn't one answer to this and any solution will have tradeoffs. You might try using something like celery as a starting point.

j2me out of memory exception

I'm making a game using J2ME and sometimes while testing it I'm getting out of memory exception, I know what it means but how can I solve it ? if I'll call System.gc() in my game loop every time will it help somehow ? Any tips on how to prevent this would be appreciated !
I see you've also asked j2me wtk find memory leak
In my experience, memory leaks don't cause OutOfMemoryExceptions. All they do is to slowly use up all the memory in the device. And when it's all nearly used up, the device is then forced to call System.gc() itself.
System.gc() is a blocking call, meaning it'll make your whole game stall for some microseconds, which of course is annoying. And this is why people go hunting for memory leaks - to prevent the automatic call to System.gc().
An OutOfMemoryException may occur if e.g. you have 1mb memory left, while trying to load a 2mb resource. And while a memory leak may dramatically increase chances of running into a situation like that, your problem is not the memory leak itself, but more likely that you're using too big resources.
Are you using mp3 files for music? Or big images for backgrounds or maps?
You could try calling System.gc() just before loading big resources, and it might reduce the problem. But the problem doesn't have to be related to your game alone. It could also matter what other apps are running on the device at the same time, and how much memory they use.
You could also try replacing mp3 music with MIDI music, if only just to test if it makes a difference. (Find JavaME optimized MIDI music at IndieGameMusic.com).
And if you do use big images, make sure you optimize them with tools like PNGout or Optipng.
If original file is not too big then no need of decreasing size,you can use jpeg instead...also you can just have specific limit to your local buffer size.and before system.gc() you can make use of Thread.sleep() for testing purpose to check the effect and to give time to Gc.also check with WTK's performance monitor to check actual pick location.

Fast Audio Input/Output

Here's what I want to do:
I want to allow the user to give my program some sound data (through a mic input), then hold it for 250ms, then output it back out through the speakers.
I have done this already using Java Sound API. The problem is that it's sorta slow. It takes a minimum of about 1-2 seconds from the time the sound is made to the time the sound is heard again from the speakers, and I haven't even tried to implement delay logic yet. Theoretically there should be no delay, but there is. I understand that you have to wait for the sound card to fill up its buffer or whatever, and the sample size and sampling rate have something to do with this.
My question is this: Should I continue down the Java path trying to do this? I want to get the delay down to like 100ms if possible. Does anyone have experience using the ASIO driver with Java? Supposedly it's faster..
Also, I'm a .NET guy. Does this make sense to do with .NET instead? What about C++? I'm looking for the right technology to use here, and maybe a good example of how to read/write to audio input/output streams using your suggested technology platform. Thanks for your help!
I've used JavaSound in the past and found it wonderfully flaky (and it keeps changing between VM releases). If you like C#, use it, just use the DirectX APIs. Here's an example of doing kind of what you want to do using DirectSound and C#. You could use the Effects plugins to perform your 250 ms echo.
http://blogs.microsoft.co.il/blogs/tamir/archive/2008/12/25/capturing-and-streaming-sound-by-using-directsound-with-c.aspx
You may want to look into JACK, an audio API designed for low-latency sound processing. Additionally, Google turns up this nifty presentation [PDF] about using JACK with Java.
Theoretically there should be no delay, but there is.
Well, it's impossible to have zero delay. The best you can hope for is an unnoticeable delay (in terms of human perception). It might help if you describe your basic algorithm for reading & writing the sound data, so people can identify possible problems.
A potential issue with using a garbage-collected language like Java is that the GC will periodically run, interrupting your processing for some arbitrary amount of time. However, I'd be surprised if it's >100ms in normal usage. If GC is a problem, most JVMs provide alternate collection algorithms you can try.
If you choose to go down the C/C++ path, I highly recommend using PortAudio ( http://portaudio.com/ ). It works with almost everything on multiple platforms and it gives you low-level control of the sound drivers without actually having to deal with the various sound driver technology that is around.
I've used PortAudio on multiple projects, and it is a real joy to use. And the license is permissive.
If low latency is your goal, you can't beat C.
libsoundio is a low-level C library for real-time audio input and output. It even comes with an example program that does exactly what you want - piping the microphone input to the speakers output.
It's possible with JavaSound to get end-to-end latency in the ballpark of 100-150ms.
The primary cause of latency is the buffer sizes of the capture and playback lines. The bufferSize is set when opening the lines:
capture: TargetDataLine#open(AudioFormat format, int bufferSize)
playback: SourceDataLine#open(AudioFormat format, int bufferSize)
If the buffer is too big it will cause excess latency, but if it's too small it will cause stuttery playback. So you need to find a balance for your applications needs and your computing power.
The default buffer size can be checked with DataLine#getBufferSize when calling #open(AudioFormat format). The default size will vary based on the AudioFormat and seems to be geared for high latency, stutter free playback applications (e.g. internet streaming). If you're developing a low latency application, the default buffer size is much too large and should be changed.
In my testing with a 16-bit PCM AudioFormat, a buffer size of 1024 bytes has been pretty close to ideal for low latency.
The second and often overlooked cause of audio latency is any other activity being done in the capture or playback threads. For example, logging messages to console can introduce 10's of ms of latency. Turn it off.

Resources