AVAssetWriter getting raw bytes makes corrupt videos on device (works on sim) - cgcontext

So my goal is to add CVPixelBuffers into my AVAssetWriter / AVAssetWriterInputPixelBufferAdaptor with super high speed. My previous solution used CGContextDrawImage but it is very slow (0.1s) to draw. The reason seems to be with color matching and converting, but that's another question I think.
My current solution is trying to read the bytes of the image directly to skip the draw call. I do this:
CGImageRef cgImageRef = [image CGImage];
CGImageRetain(cgImageRef);
CVPixelBufferRef pixelBuffer = NULL;
CGDataProviderRef dataProvider = CGImageGetDataProvider(cgImageRef);
CGDataProviderRetain(dataProvider);
CFDataRef da = CGDataProviderCopyData(dataProvider);
CVPixelBufferCreateWithBytes(NULL,
CGImageGetWidth(cgImageRef),
CGImageGetHeight(cgImageRef),
kCVPixelFormatType_32BGRA,
(void*)CFDataGetBytePtr(da),
CGImageGetBytesPerRow(cgImageRef),
NULL,
0,
NULL,
&pixelBuffer);
[writerAdaptor appendPixelBuffer:pixelBuffer withPresentationTime:presentTime];
-- releases here --
This works fine on my simulator and inside an app. But when I run the code inside the SpringBoard process, it comes out as the images below. Running it outside the sandbox is a requirement, it is meant for jailbroken devices.
I have tried to play around with e.g. pixel format styles but it mostly comes out with differently corrupted images.
The proper image/video file looks fine:
But this is what I get in the broken state:

Answering my own question as I think I got the answer(s). The resolution difference was a simple code error, not using the device bounds on the latter ones.
The color issues. In short, the CGImages I got when running outside of the sandbox was using more bytes per pixel, 8 bytes. The images I get when running inside the sandbox was 4 bytes. So basically I was simply writing the wrong data into the buffer.
So, instead of simply slapping all of the bytes from the larger image into the smaller buffer. I loop through the pixel buffer row-by-row, byte-by-byte and I pick the RGBA values for each pixel. I essentially had to skip every other byte from the source image to get the right data into the right place within the buffer.

Related

Decoding incomplete audio file

I was given an uncompressed .wav audio file (360 mb) which seems to be broken. The file was recorded using a small usb recorder (I don't have more information about the recorder at this moment). It was unreadable by any player and I've tried GSpot (https://www.headbands.com/gspot/) to detect whether it was perhaps of a different format than wav but to no avail. The file is big, which hints at it being in some uncompressed format. It misses the RIFF-WAVE characters at the start of the file though, which can be an indication this is some other format or perhaps (more likely in this case) the header is missing.
I've tried converting the bytes of the file directly to audio and this creates a VERY noisy audio file, though voices could be made out and I was able to determine the sample rate was probably 22050hz (given a sample size of 8-bits) and a file length of about 4 hours and 45 minutes. Running it through some filters in Audition resulted in a file that was understandable in some places, but still way too noisy in others.
Next I tried running the data through some java code that produces an image out of the bytes, and it showed me lots of noise, but also 3 byte separations every 1024 bytes. First a byte close to either 0 or 255 (but not 100%), then a byte representing a number distributed somewhere around 25 (but with some variation), and then a 00000000 (always, 100%). The first 'chunk header' (as I suppose these are) is located at 513 bytes into the file, again close to a 2-power, like the chunk size. Seems a bit too perfect for coincidence, so I'm mentioning it as it could be important. https://imgur.com/a/sgZ0JFS, the first image shows a 1024x1024 image showing the first 1mb of the file (row-wise) and the second image shows the distribution of the 3 'chunk header' bytes.
Next to these headers, the file also has areas that clearly show structure, almost wave-like structures. I suppose this is the actual audio I'm after, but it's riddled with noise: https://imgur.com/a/sgZ0JFS, third image, showing a region of the file with audio structures.
I also created a histogram for the entire file (ignoring the 3-byte 'chunk headers'): https://imgur.com/a/sgZ0JFS, fourth image. I've flipped the lower half of the range as I think audio data should be centered around some mean value, but correct me if I'm wrong. Maybe the non-symmetric nature of the histogram has something to do with signed/unsigned data or two's-complement. Perhaps the data representation is in 8-bit floats or something similar, I don't know.
I've ran into a wall now. I have no idea what else I can try. Is there anyone out there that sees something I missed. Perhaps someone can give me some pointers what else to try. I would really like to extract the audio data out of this file, as it contains some important information.
Sorry for the bother. I've been able to track down the owner of the voice recorder and had him record me a minute of audio with it and send me that file. I was able to determine the audio was IMA 4-bit ADPCM encoded, 16-bit audio at 48000hz. Looking at the structure of the file I realized simple placing the header of the good file in front of the data of the bad file should be possible, and lo and behold I had a working file again :)
I'm still very much interested how that ADPCM works and if I can write my own decoder, but that's for another day when I'm strolling on wikipedia again. Have a great day everyone!

Opencv stereo cameras capture and framerate limits

I am trying to get pairs of images out of a Minoru stereo webcam, currently through opencv on linux.
It works fine when I force a low resolution:
left = cv2.VideoCapture(0)
left.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 320)
left.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 240)
right = cv2.VideoCapture(0)
right.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 320)
right.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 240)
while True:
_, left_img = left.read()
_, right_img = right.read()
...
However, I'm using the images for creating depth maps, and a bigger resolution would be good. But if I try leaving the default, or forcing resolution to 640x480, I'm hitting errors:
libv4l2: error turning on stream: No space left on device
I have read about USB bandwith limitations but:
this happens on the first iteration (first read() from right)
I don't need anywhere near 60 or even 30 FPS, but couldn't manage to reduce "requested FPS" via VideoCapture parameters (if this makes sense)
adding sleeps don't seem to help, even between the left/right reads
strangely if I do much processing (in the while loop), I start noticing "lag": things happening in the real world get shown much later on the images read. This would suggest that actually there is a buffer somewhere that can and does accumulate several images (a lot)
I tried a workaround of creating and releasing a separate VideoCapture for each image read, but this is a bit too slow overall (< 1FPS), and more importantly, image are too much out of sync for working on stereo matching.
I'm trying to understand why this fails, in order to find solutions. It looks like v4l is allocating a single global too-small buffer, used by the 2 capture objects somehow.
Any help would be appreciated.
I had the same problem and found this answer - https://superuser.com/questions/431759/using-multiple-usb-webcams-in-linux
Since both the minoru cameras show the format as 'YUYV', this is likely a USB bandwidth issue. I lowered the frames per second to 20 (didn't work at 24) and I can see both the 640x480 images.

VTK efficiency of vtkRenderWindow::GetZbufferData and vtkWindowToImageFilter::Update

I have a snippet that converts vtk (off screen) rendering to 1)Point cloud; 2)Color image. The implementation is correct, it just the speed/efficiency is an issue.
At the beginning of every iteration, I update my rendering by calling:
renderWin->Render ();
For point cloud, I get the depth using following line and then convert it to point cloud (code not posted).
float *depth = new float[width * height];
renderWin->GetZbufferData (0, 0, width - 1, height - 1, &(depth[0]));
For color image, I use vtkWindowToImageFilter to get current color rendered image:
windowToImageFilter->Modified(); // Must have this to get updated rendered image
windowToImageFilter->Update(); // this line takes a lot of time
render_img_vtk = windowToImageFilter->GetOutput();
Above program is run in the same thread sequentially. The renderWindow size is about 1000x1000. There is not a lot of polydata needs to be rendered. VTK was compiled with OpenGL2 support.
Issue:
This code only runs about 15-20Hz, when I disabled/comment the windowToImageFilter part (vtkWindowToImageFilter::Update() takes a lot of time), the framerate goes to about 30Hz.
When I disabled/comment vtkRenderWindow::GetZbufferData, it goes up to 50Hz (which is how fast I call my loop and update the rendering).
I had a quick look of the VTK source file of these two function, I see it copy data using GL command. I am not sure how can I speed this up.
Update:
After some search, I found that the glReadPixels function called in the GetZbufferData causes delay as it try to synchronize the data. Please see this post: OpenGL read pixels faster than glReadPixels.
In this post, it is suggested that PBO should be used. VTK has a class vtkPixelBufferObject but no example can be found for using it to avoid blocking the pipeline when do glReadPixels()
So how can I do this within the VTK pipeline?
My answer is just about the GetZbufferData portion.
vtkOpenGLRenderWindow already uses glReadPixels with little overhead from what I can tell. here
What happens after that I believe can introduce overhead. Main thing to note is that vtkOpenGLRenderWindow has 3 method overloads for GetZbufferData. You are using the method overload with the same signature as the one used in vtkWindowToImageFilter here
I believe you are copying that part of the implementation in vtkWindowToImageFilter, which makes total sense. What do you do with float pointer depthBuffer after you get it? Looking at the vtkWindowToImageFilter implementation, I see that they have a for loop that calls memcpy here. I believe their memcpy has to be in a for loop in order to deal with spacing, because of the variables inIncrY and outIncrY. For your situation you should only have to call memcpy once then free the array pointed to by depthBuffer. Unless you are just using the pointer. Then you have to think about who has to delete that float array, because it was created with new.
I think the better option is to use the method with this signature: int GetZbufferData( int x1, int y1, int x2, int y2, vtkFloatArray* z )
In python that looks likes this:
import vtk
# create render pipeline (not shown)
# define image bounds (not shown)
vfa = vtk.vtkFloatArray()
ib = image_bounds
render_window.GetZbufferData(ib[0], ib[1], ib[2], ib[3], vfa)
Major benefit is that the pointer for the vtkFloatArray gets handed straight to glReadPixels. Also, vtk will take of garbage collection of the vtkFloatArray if you create it with vtkSmartPointer (not needed in Python)
My python implementation is running at about 150Hz on a single pass. On a 640x480 render window.
edit: Running at 150Hz

Trouble with capturing images from webcam with Arch Linux and OpenCV (node.js)

I want to grab a single frame from my webcam with node.js and OpenCV. The first frame captured is as expected. But if I move in front of the camera and take a second one, I get an image that was obviously taken quite after the first one and doesnt show my move. I have to take 5 pictures to see the move. Searching on the net gave me hint about a problem with the camera buffer that holds 4 images (OS depended).
Here is an example of someone having the same issue:
http://opencvarchive.blogspot.de/2010/05/opencv-arm-linux-servo-frame-delay.html
At the moment I'm doing a workaround and capture 5 images within a loop and then save the last image to disk. So the buffer is cleared and the really current image is taken.
Does anyone knows a better solution? Taking five images instead of one takes too much time for my application...
Thanks in advance! :)

Texture Buffer for OpenGL Video Player

I am using OpenGL, Ffmpeg and SDL to play videos and am currently optimizing the process of getting frames, decoding them, converting them from YUV to RGB, uploading them to texture and displaying the texture on a quad. Each of these stages is performed by a seperate thread and they written to shared buffers which are controlled by SDL mutexes and conditions (except for the upload and display of the textures as they need to be in the same context).
I have the player working fine with the decode, convert and OpenGL context on seperate threads but realised that because the video is 25 frames per second, I only get a converted frame from the buffer, upload it to OpenGL and bind it/display it every 40 milliseconds in the OpenGL thread. The render loop goes round about 6-10 times not showing the next frame for every frame it shows, due to this 40ms gap.
Therefore I decided it might be a good idea to have a buffer for the textures too and set up an array of textures created and initialised with glGenTextures() and the glParameters I needed etc.
When it hasn't been 40ms since the last frame refresh, a method is ran which grabs the next converted frame from the convert buffer and uploads it to the next free texture in the texture buffer by binding it then calling glTexSubImage2D(). When it has been 40ms since the last frame refresh, a seperate method is ran which grabs the next GLuint texture from the texture buffer and binds it with glBindTexture(). So effectively, I am just splitting up what was being done before (grab from convert buffer, upload, display) into seperate methods (grab from convert buffer, upload to texture buffer | and | grab from texture buffer, display) to make use of the wasted time between 40ms refreshes.
Does this sound reasonable? Because when ran, the video halts all the time in a sporadic manner, sometimes about 4 frames are played when they are supposed to (every 40ms) but then there is a 2 second gap, then 1 frame is shown, then a 3 second gap and the video is totally unwatchable.
The code is near identical to how I manage the convert thread grabbing decoded frames from the decode buffer, converting them from YUV to RGB and then putting them into the convert buffer so can't see where the massive bottlenecking could be.
Could the bottlenecking be on the OpenGL side of things? Is the fact that I am storing new image data to 10 different textures the problem as when a new texture is grabbed from the texture buffer, the raw data could be a million miles away from the last one in terms of memory location on the video memory? That's my only attempt at an answer, but I don't know much about how OpenGL works internally so that's why I am posting here.
Anybody have any ideas?
I'm no OpenGL expert, but my guess of the bottleneck is that the textures are intialized properly in system memory but are sent to the video memory at the 'wrong' time (like all at once instead of as soon as possible), stalling the pipeline. When using glTexSubImage2D you have no guarantees about when a texture arrives in video memory until you bind it.
Googling around it seems pixelbuffer objects give you more control about when they are in video memory: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=262523

Resources