Encoding video only FLV - linux

I am trying to generate a video only FLV file, I am using:
libx264 + ffmpeg
30 fps ( fixed )
playback is done using VLC 2.0.1 and flowplayer
When playing the FLV the frame-rate seems ~1 frame per sec, following is the way I cfg ffmpeg:
AVOutputFormat* fmtOutput = av_oformat_next(0);
while((0 != fmtOutput) && (0 != strcmp(fmtOutput->name, "flv")))
fmtOutput = av_oformat_next(fmtOutput);
m_pFmtCtxOutput = avformat_alloc_context();
m_pFmtCtxOutput->oformat = fmtOutput;
AVStream* pOutVideoStream= av_new_stream(m_pFmtCtxOutput, pInVideoStream->id);
AVCodec* videoEncoder = avcodec_find_encoder(CODEC_ID_H264);
pOutVideoStream->codec->width = 640;
pOutVideoStream->codec->height = 480;
pOutVideoStream->codec->level = 30;
pOutVideoStream->codec->pix_fmt = PIX_FMT_YUV420P;
pOutVideoStream->codec->bit_rate = 3000000;
pOutVideoStream->cur_dts = 0;
pOutVideoStream->first_dts = 0;
pOutVideoStream->index = 0;
pOutVideoStream->avg_frame_rate = (AVRational){ 30, 1 };
pOutVideoStream->time_base =
pOutVideoStream->codec->time_base= (AVRational){ 1, 30000 };
pOutVideoStream->codec->gop_size = 30;
%% Some specific libx264 settings %%
m_dVideoStep = 1000;// packet dts/pts is incremented by this amount each frame
pOutVideoStream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;
avcodec_open(pOutVideoStream->codec, videoEncoder);
The resulting file seems OK, with the exception of the playback frame-rate.
having in mind that:
pOutVideoStream->avg_frame_rate = (AVRational){ 30, 1 };
pOutVideoStream->time_base = (AVRational){ 1, 30000 };
pOutVideoStream->codec->time_base= (AVRational){ 1, 30000 };
For each frame I increment the dts/pts by 1000
What am I doing wrong here? why the file is playing choppy ( ~1 fps )?
Any help will be appreciated.
Nadav at Sophin

Stepping through the flv muxer code With a debugger, I have found the ffmpeg implementation to support PTS of a resolution no other than msec, that is, having time_base = (AVRational){ 1, 1000 }.
Also, 'AVStream::r_frame_rate' must be set in order for the flv muxer to properly resolve the frame-rate.

Related

Song position difference between mp3 and flac. How to fix this? C#

I am using Winforms and WindowsMediaPlayer to create a music player. In the form I have a pictureBox with a waveform rendered by NAudio which is saved to an image and loaded upon song change.
when mouseDown the song currentPosition changes to the mouse position over the waveform image.
With mp3 it works very accurate but when I play a flac file the position is very inaccurate.
(the waveform image for mp3 and flac is identical)
Should it have something to do with bitrate?
mp3 = 256 Kbps - Length = 07:44
flac = 892 Kbps - Length = 07:44
f.i.:
flac (x) | mp3 (y)
[ x and y represent mouse position and current song position to be at same real song position (what you hear) ]
somewhere in the beginning:
mousePosition: (x)30 | (y)44
currentSongPosition: (x)109 | (y)159
near the end:
mousePosition: (x)453 | (y)450
currentSongPosition: (x)1623 | (y)1607
This is my mouseDownEvent:
private void pbWaveForm_MouseDown(object sender, MouseEventArgs e)
{
double timePos = 0;
pnlWaveScrub.Height = pbWaveForm.Height;
double MousePosition = e.X;
double dur = musicPlayer.currentMedia.duration;
double ratio = MousePosition / (pbWaveForm.Width );
timePos = ratio * dur;
musicPlayer.controls.currentPosition = (int)timePos;
//MessageBox.Show(Convert.ToString((int)timePos) + " " + MousePosition.ToString());
}
I am planning an if statement with:
FileInfo f = new FileInfo()
if (f.Extention.Equals(".flac"))
{
// code to calculate different timePos
timePos = differentTimepos
musicPlayer.controls.currentPosition = (int)timePos;
}
If anyone can improve my code I would be delighted with it, thanks!

Azure Kinect: How to get a depth video recording in color?

I am trying to extract the depth video from a recording (mkv), but the problem is that its extracted in grayscale b16g format. Is it possible to extract or obtain the depth video with color as viewed in the Azure Kinect Viewer? Camera used is Azure Kinect DK.
Thanks, any feedback is appreciated.
This is the steps I used:
ffmpeg -i output.mkv -map 0:1 -vsync 0 depth%03d.png
this extracts the depth track as a sequence of 16-bit PNGs.
Source: https://learn.microsoft.com/en-us/azure/kinect-dk/record-file-format
then
ffmpeg -r 30 -i depth%03d.png -c:v libx264 -vf “fps=30,format=yuv420p” depth.mp4
recreates the depth video from the png images. but the output video is in grayscale.
Source: How to create a video from images with FFmpeg?
The viewer normalizes the depth based on depth modes min and max depth so that the entire 16bit depth range is used. Then it uses the following code to colorize.
static inline BgraPixel ColorizeBlueToRed(const DepthPixel &depthPixel,
const DepthPixel &min,
const DepthPixel &max)
{
constexpr uint8_t PixelMax = std::numeric_limits<uint8_t>::max();
// Default to opaque black.
//
BgraPixel result = { 0, 0, 0, PixelMax };
// If the pixel is actual zero and not just below the min value, make it black
//
if (depthPixel == 0)
{
return result;
}
uint16_t clampedValue = depthPixel;
clampedValue = std::min(clampedValue, max);
clampedValue = std::max(clampedValue, min);
// Normalize to [0, 1]
//
float hue = (clampedValue - min) / static_cast<float>(max - min);
// The 'hue' coordinate in HSV is a polar coordinate, so it 'wraps'.
// Purple starts after blue and is close enough to red to be a bit unclear,
// so we want to go from blue to red. Purple starts around .6666667,
// so we want to normalize to [0, .6666667].
//
constexpr float range = 2.f / 3.f;
hue *= range;
// We want blue to be close and red to be far, so we need to reflect the
// hue across the middle of the range.
//
hue = range - hue;
float fRed = 0.f;
float fGreen = 0.f;
float fBlue = 0.f;
ImGui::ColorConvertHSVtoRGB(hue, 1.f, 1.f, fRed, fGreen, fBlue);
result.Red = static_cast<uint8_t>(fRed * PixelMax);
result.Green = static_cast<uint8_t>(fGreen * PixelMax);
result.Blue = static_cast<uint8_t>(fBlue * PixelMax);
return result;
}
https://github.com/microsoft/Azure-Kinect-Sensor-SDK/blob/95f1d95f1f335b57a350a80a3a62e98e1ee4258d/tools/k4aviewer/k4adepthpixelcolorizer.h#L35

How do I swap stereo channels in raw PCM audio data on OS X?

I'm writing audio from an external decoding library on OS X to an AIFF file, and I am able to swap the endianness of the data with OSSwapInt32().
The resulting AIFF file (16-bit PCM stereo) does play, but the left and right channels are swapped.
Would there be any way to swap the channels as I am writing each buffer?
Here is the relevant loop:
do
{
xmp_get_frame_info(writer_context, &writer_info);
if (writer_info.loop_count > 0)
break;
writeModBuffer.mBuffers[0].mDataByteSize = writer_info.buffer_size;
writeModBuffer.mBuffers[0].mNumberChannels = inputFormat.mChannelsPerFrame;
// Set up our buffer to do the endianness swap
void *new_buffer;
new_buffer = malloc((writer_info.buffer_size) * inputFormat.mBytesPerFrame);
int *ourBuffer = writer_info.buffer;
int *ourNewBuffer = new_buffer;
memset(new_buffer, 0, writer_info.buffer_size);
int i;
for (i = 0; i <= writer_info.buffer_size; i++)
{
ourNewBuffer[i] = OSSwapInt32(ourBuffer[i]);
};
writeModBuffer.mBuffers[0].mData = ourNewBuffer;
frame_size = writer_info.buffer_size / inputFormat.mBytesPerFrame;
err = ExtAudioFileWrite(writeModRef, frame_size, &writeModBuffer);
} while (xmp_play_frame(writer_context) == 0);
This solution is very specific to 2 channel audio. I chose to do it at the same time you're looping to change the byte ordering to avoid an extra loop. I'm going through the loop 1/2 the number and processing two samples per iteration. The samples are interleaved so I copy from odd sample indexes into even sample indexes and vis-a-versa.
for (i = 0; i <= writer_info.buffer_size/2; i++)
{
ourNewBuffer[i*2] = OSSwapInt32(ourBuffer[i*2 + 1]);
ourNewBuffer[i*2 + 1] = OSSwapInt32(ourBuffer[i*2]);
};
An alternative is to use a table lookup for channel mapping.

When reading a WAV file, dataID is printed as "fact" and not "data"

I'm new to audio playback and have spent the day reading over the wav file specification. I wrote a simple program to extract the header of a file but right now my program always returns false as the DataID keeps returning as "fact" instead of "data".
There are a few reasons I believe this could be happening.
The file I am reading in has a format size of 18, whereas this resource states a valid PCM file should have a format size of 16.
The format code of the file I am reading is 6, meaning it has probably been compressed.
The value of dataSize is far too small (only 4). Even though the file has 30 seconds of playback when ran through VLC or Windows Media Player.
The code I am using is as follows:
using (var reader = new BinaryReader(File.Open(wavFile, FileMode.Open)))
{
// Read all descriptor info into variables to be passed
// to an ASWAVFile instance.
var chunkID = reader.ReadBytes(4); // Should contain "RIFF"
var chunkSize = reader.ReadBytes(4);
var format = reader.ReadBytes(4); // Should contain "WAVE"
var formatID = reader.ReadBytes(4); // Should contain "fmt"
var formatSize = reader.ReadBytes(4); // 16 for PCM format.
var formatCode = reader.ReadBytes(2); // Determines linear quantization - 1 = PCM, else it has been compressed
var channels = reader.ReadBytes(2); // mono = 1, stereo = 2
var sampleRate = reader.ReadBytes(4); // 8000, 44,100 etc
var byteRate = reader.ReadBytes(4); // SampleRate * Channels * BitsPerSample / 8
var blockAlign = reader.ReadBytes(2); // Channels * BitsPerSample / 8
var bitsPerSample = reader.ReadBytes(2); // If mono 8, if stereo 16 etc.
var padding = byteToInt(formatSize);
// Read any extra values so we can jump to the data chunk - extra padding should only be set here
// if formatSize is 18
byte[] fmtExtraSize = new byte[2];
if (padding == 18)
{
fmtExtraSize = reader.ReadBytes(2);
}
// Read the final header information in, we can then set
// other
var dataID = reader.ReadBytes(4); // Should contain "data"
var dataSize = reader.ReadBytes(4); // Calculated by Samples * Channels * BitsPerSample / 8
// Check if the file is in the correct format
if (
System.Text.ASCIIEncoding.Default.GetString(chunkID) != "RIFF" ||
System.Text.ASCIIEncoding.Default.GetString(format) != "WAVE" ||
System.Text.ASCIIEncoding.Default.GetString(formatID) != "fmt" ||
System.Text.ASCIIEncoding.Default.GetString(dataID) != "data"
)
{
return false;
}
//file = new ASWAVFile();
}
If I dump the values of chunkID, format, formatID and dataID I get:
RIFF, WAVE, fmt, fact
Causing the method to return false. Why is this happening?
The RIFF specification doesn't require the 'data' chunk to follow the 'fmt' chunk. You may see some files that write a 'pad' chunk after the 'fmt' chunk to ensure page alignment for better streaming.
http://en.wikipedia.org/wiki/WAV
Also the format code indicates the audio compression type, as you noted. Valid format codes are in mmreg.h (on Windows): (Format 6 is aLaw, indeed a compression type).
http://www-mmsp.ece.mcgill.ca/documents/audioformats/wave/Docs/MMREG.H
Your best bet is to write code that reads chunk headers, checks for the type you want, and skip past it to the next chunk if you can't find what you are looking for.

libfaac: Queue input is backward in time

I am using libav along with libfaac to encode audio into aac.
following is the logic:
frames[n]
i = 0 ;
while (there are frames)
{
cur_frame = frames[i];
av_encode_audio(frame, ...., &frame_finished);
if( frame_finished )
{
i++;
}
}
but I am getting this annoying warning for few frames "queue input is backward in time !"
The answer is very simple, you are not supposed to pass the same frame again to the libfaac,
so even if the frame_finished is not 1 you should still go to the next frame.
it should be as follows:
frames[n]
i = 0 ;
while (there are frames)
{
cur_frame = frames[i];
av_encode_audio(frame, ...., &frame_finished);
i++;
}

Resources