ffmpeg audio encoder frame size - audio

I need to convert audio data from AV_CODEC_ID_PCM_S16LE to AV_CODEC_ID_PCM_ALAW and I am using this code as an example. The example code does essentially this (error checking omitted for brevity):
const AVCodec* codec = avcodec_find_encoder(AV_CODEC_ID_MP2);
AVCodecContext* c = avcodec_alloc_context3(codec);
c->bit_rate = 64000;
c->sample_fmt = AV_SAMPLE_FMT_S16;
c->sample_rate = select_sample_rate(codec);
c->channel_layout = select_channel_layout(codec);
c->channels = av_get_channel_layout_nb_channels(c->channel_layout);
avcodec_open2(c, codec, NULL);
AVFrame* frame = av_frame_alloc();
frame->nb_samples = c->frame_size;
frame->format = c->sample_fmt;
frame->channel_layout = c->channel_layout;
The example code subsequently uses c->frame_size in a for loop.
My code is similar to the above with the following differences:
const AVCodec* codec = avcodec_find_encoder(AV_CODEC_ID_PCM_ALAW);
c->sample_rate = 8000;
c->channel_layout = AV_CH_LAYOUT_MONO;
c->channels = 1;
After calling avcodec_open2, c->frame_size is zero. The example code never sets the frame size so I assume that it expects either avcodec_alloc_context3 or avcodec_open2 to set it. Is this a correct assumption? Is the setting of the frame size based on the codec being used? If I have to set the frame size explicitly, is there a recommended size?
EDIT:
Based on #the-kamilz answer it appears that the example code is not robust. The example assumes that c->frame_size will be set but that appears to be dependent on the codec. In my case, codec->capabilities was in fact set to AV_CODEC_CAP_VARIABLE_FRAME_SIZE. So I modified my code to check c->frame_size and use it only if it is not zero. If it is zero, I just picked an arbitrary one second worth of data for frame->nb_samples.

In the FFmpeg documentation it is mentioned as:
int AVCodecContext::frame_size
Number of samples per channel in an audio frame.
encoding: set by libavcodec in avcodec_open2(). Each submitted frame except the last must contain exactly frame_size samples per channel.
May be 0 when the codec has AV_CODEC_CAP_VARIABLE_FRAME_SIZE set, then
the frame size is not restricted.
decoding: may be set by some decoders to indicate constant frame size
Hope that helps.

you don't control the frame size explicitly, it is set by the encoder depending on the codecs provided at initialization (opening) time
once avcodec_open2() is successful, you can retrieve the frame's buffer size with av_samples_get_buffer_size()

Related

Create valid h264 from partial stream of h264 video data and wrap as Mp4

Lets say that I am reading from data stream, and that stream is sending the content of an h264 video feed. Given I read from that stream and I have some amount of data consisting of an indeterminate number of frames (NAL?). Given that i know the framerate, and size of the originating video, how would I go about converting this snippet into a mp4 that i could view? The video does not contain audio.
I want to do this using nodejs? My attempts to do so have produced nothing resembling a valid h264 file to convert into mp4. My thoughts so far were to strip any data preceding the first found start code in the data and feed that into a file and use ffmpeg (currently just testing in the command line) to convert the file to mp4.
What's the correct way to go about doing this?
ie. something like this (it's in Typescript but same thing)
//We assume here that when this while loop exist at least one full frame of data will have been read and written to disk
let stream: WriteStream = fs.createWriteStream("./test.h264")
while(someDataStream.available()) { //just an example not real code
let data: Buffer = someDataStream.readSomeData() //just an example not a real method call
let file = null;
try {
file = fs.statSync("./test.h264");
} catch (error) {
console.error(error)
}
if(!stream.writable) {
console.error("stream not writable")
} else if(file == null || file.size <= 0) {
let index = data.indexOf(0x7C)
console.log("index: " + index)
if(index > 0) {
console.log("index2: " + data.slice(index).indexOf(0x7c))
stream.write(data.slice(index))
}
} else {
stream.write(data)
}
}
To handle a data stream, you'll need to emit fragmented MP4. Like all MP4, fMP4 streams begin with a preamble containing ftyp, moov, and styp boxes. Then each frame is encoded with a moof / mdat box pair.
In order to generate a useful preamble from your H.264 bitstream, you need to locate a SPS / PPS pair of NALUs in the H264 data, to set up the avc1 box within the moov box. Those two NALUs are often immediately followed by an I-frame (a key frame). The first frame in a stream must be an I-frame, and subsequent ones can be P- or B- frames. E
It's a fairly complex task involving lots of bit-banging and buffer-shuffling (those are technical terms ;-).
I've been working on a piece of js code to extract H.264 from webm and put it into fmp4. It's not yet complete. It's backed up by another piece of code to decode the parts of the H264 stream that are needed to pack it properly into fMP4.
I wish I could write, "here are the ten lines of code you need" but those formats (fMP4 and H264) aren't simple enough to make that possible.
Idk why none of those questions doesn't actually have an easy answer. Here you go, Node.js solution, i argument just in case you need to offset the search
const soi = Buffer.from([0x00, 0x00, 0x00, 0x01]);
function findStartFrame(buffer, i = -1) {
while ((i = buffer.indexOf(soi, i + 1)) !== -1) {
if ((buffer[i + 4] & 0x1F) === 7) return i
}
return -1
}

How to display unpacked UDP data properly

I'm trying to write a small code which displays data received from a game (F1 2019) over UDP.
The F1 2019 game send out data via the UDP. I have been able to receive the packets and have separated the header and data and now unpacked the data according to the structure in which the data is sent using the rawutil module.
The struct in which the packets are sent can be found here:
https://forums.codemasters.com/topic/38920-f1-2019-udp-specification/
I'm only interested in the telemetry packet.
import socket
import cdp
import struct
import array
import rawutil
from pprint import pprint
# settings
ip = '0.0.0.0'
port = 20777
# listen for packets
listen_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
listen_socket.bind((ip, port))
while True:
# Receiving data
data, address = listen_socket.recvfrom(65536)
header = data[:20]
telemetry = data[20:]
# decode the header
packetFormat, = rawutil.unpack('<H', header[:2])
gameMajorVersion, = rawutil.unpack('<B', header[2:3])
gameMinorVersion, = rawutil.unpack('<B', header[3:4])
packetVersion, = rawutil.unpack('<B', header[4:5])
packetId, = rawutil.unpack('<B', header[5:6])
sessionUID, = rawutil.unpack('<Q', header[6:14])
sessionTime, = rawutil.unpack('<f', header[14:18])
frameIdentifier, = rawutil.unpack('<B', header[18:19])
playerCarIndex, = rawutil.unpack('<B', header[19:20])
# print all info (just for now)
## print('Packet Format : ',packetFormat)
## print('Game Major Version : ',gameMajorVersion)
## print('Game Minor Version : ',gameMinorVersion)
## print('Packet Version : ',packetVersion)
## print('Packet ID : ', packetId)
## print('Unique Session ID : ',sessionUID)
## print('Session Time : ',sessionTime)
## print('Frame Number : ',frameIdentifier)
## print('Player Car Index : ',playerCarIndex)
## print('\n\n')
#start getting the packet data for each packet starting with telemetry data
if (packetId == 6):
speed, = rawutil.unpack('<H' , telemetry[2:4])
throttle, = rawutil.unpack('<f' , telemetry[4:8])
steer, = rawutil.unpack('<f' , telemetry[8:12])
brake, = rawutil.unpack('<f' , telemetry[12:16])
gear, = rawutil.unpack('<b' , telemetry[17:18])
rpm, = rawutil.unpack('<H' , telemetry[18:20])
print (speed)
The UDP specification states that the speed of the car is sent in km/h. However when I unpack the packet, the speed is a multiple of 256, so 10 km/h is 2560 for example.
I want to know if I'm unpacking the data in the wrong way? or is it something else that is causing this.
The problem is also with the steering for example. the spec says it should be between -1.0 and 1.0 but the actual values are either very large or very small.
screengrab here: https://imgur.com/a/PHgdNrx
Appreciate any help with this.
Thanks.
I recommend you don't use the unpack method, as with big structures (e.g. MotionPacket has 1343 bytes) your code will immediately get very messy.
However, if you desperately want to use it, call unpack only once, such as:
fmt = "<HBBBBQfBB"
size = struct.calcsize(fmt)
arr = struct.unpack("<HBBBBQfBB", header[:size])
Alternatively, have a look at ctypes library, especially ctypes.LittleEndianStructure where you can set the _fields_ attribute to a sequence of ctypes (such as uint8 etc, without having to translate them to relevant symbols as with unpack).
https://docs.python.org/3.8/library/ctypes.html#ctypes.LittleEndianStructure
Alternatively alternatively, have a look at namedtuples.
Alternatively alternatively alternatively, there's a bunch of python binary IO libs, such as binio where you can declare a structure of ctypes, as this is a thin wrapper anyway.
To fully answer your question, the structure seems to be:
struct PacketHeader
{
uint16 m_packetFormat; // 2019
uint8 m_gameMajorVersion; // Game major version - "X.00"
uint8 m_gameMinorVersion; // Game minor version - "1.XX"
uint8 m_packetVersion; // Version of this packet type, all start from 1
uint8 m_packetId; // Identifier for the packet type, see below
uint64 m_sessionUID; // Unique identifier for the session
float m_sessionTime; // Session timestamp
uint m_frameIdentifier; // Identifier for the frame the data was retrieved on
uint8 m_playerCarIndex; // Index of player's car in the array
};
Meaning that the sequence of symbols for unpack should be: <H4BQfLB, because uint in ctypes is actually uint32, where you had uint8.
I also replaced BBBB with 4B. Hope this helps.
Haider I wanted to read car speed from Formula 1 2019 too. I found your question, from your question I had some tips and solved my issue. And now i think i must pay back. The reason you get speed multiplied with 256 is you start from wrong byte and this data is formatted little endian. The code you shared starts from 22nd byte to read speed, if you start it from 23rd byte you will get correct speed data.

Get duration of remote MP3 file without full download

In a node.js script I need to get the duration of a remote MP3 file.
Currently I use download and mp3-duration modules to download the file and then compute the duration. It works but the download phase is very long.
I noticed that media players can display the duration almost instantly. Would it be somehow possible to get the duration without downloading the entire file first ?
ID3 Metadatas
mp3 files comes (optionnaly) with metadata called "ID3 tags" :
ID3 is a de facto standard for metadata in MP3 files
ID3v1 enhanced tag
The ID3v1 tag occupies 128 bytes, beginning with the string TAG 128 bytes from the end of the file.
The Enhanced tag is 227 bytes long, and placed before the ID3v1 tag.
The enhanced tag contains the start-time and end-time fields as the last 6 + 6 bytes of the tag formated like mmm:ss. So it gives you the duration of the song.
The solution would be to download the 355 last bytes of the file and check if the enhanced tag (TAG+) is present, then look for the last 12 bytes of the enhanced tag.
ID3v2 TLEN
Nowdays, we're mostly using ID3v2 which allows us to embed up to 256MB of informations :
ID3v2 tag is composed of frames. Each frame represents an information. Max frame size is 16MB. Max tag size is 256MB.
The frame you're interested in is the TLEN frame, which represent the length of the song.
But there is no guarantee that any mp3 file will have an ID3v2 tag or that the TLEN frame will be stored in its ID3v2 tag.
/!\ /!\ /!\ If the mp3 file does not contains metadata, the only solution is the one you already came up with : estimating the duration with the mp3-duration module.
How to get these metadata without downloading the entire file ?
Range requests
If the server accepts Range Request, then we just have to tell him the bytes we want !
ID3v1
const request = require('request-promise');
const NodeID3 = require('node-id3');
const mp3FileURL = "http://musicSite/yourmp3File.mp3";
const mp3FileHEAD = await request.head(mp3FileURL);
const serverAcceptRangeReq = mp3FileHEAD.headers['Accept-Ranges'] && mp3FileHEAD.headers['Accept-Ranges'].toLowerCase() != "none";
// here we assume that the server accepts range requests
const mp3FileSize = mp3FileHEAD.headers['Content-Length'];
const tagBytesHeader = {'Range': `${mp3FileSize - 355}-${mp3FileSize}`};
const tagBytes = await request.get(mp3FileURL, {headers: tagBytesHeader});
/!\ I didn't test the code, it just serves as a demonstration /!\
Then parse the response and check if tagBytes.includes('TAG+'), and if it does you've got your duration.
ID3v2
You can use the same method to check if the ID3v2 tags are at the beginning of the file and then use a module like node-id3 to parse the tags.
GET request to check for ID3v2
According to the documentation,
The ID3v2 tag header [...] should be the first information in the file, is 10 bytes
So even if the server does not accept Range Requests, you could come up with a solution using a simple GET request and "stopping" it when you receive more than 10 bytes :
const request = require('request');
let buff = new Buffer(0);
const r = request
.get(mp3FileURL)
.on('data', chunk => {
buff = Buffer.concat([buff, chunk]);
if (buff.length > 10) {
r.abort();
// deal with your buff
}
});
});
/!\ I didn't test the code, it just serves as a demonstration /!\
Wrapping it in a function that returns a Promise would be cleaner.

Is there any way to sample audio using OpenSL on android with different sampling rates and buffer sizes?

I have downloaded the audio-echo app from the android NDK portal for opensl. Due to the lack of documentation I'm not able to identify how to change the sampling rate and buffer size of the audio in and out.
If anybody has any idea on how to:
Change the buffer size and sampling rate on OpenSL
Read the buffers to be fed to a C code to be processed
Fed to the output module of OpenSL to be fed to the speakers
Another alternative I feel is read it at the preferred sampling rate and buffer size but downsample and upsample in the code itself and use a circular buffer to get desired data. But how are we reading and feeding the data in openSL?
In the OpenSL ES API, there are calls to create either a Player or a Recorder:
SLresult (*CreateAudioPlayer) (
SLEngineItf self,
SLObjectItf * pPlayer,
SLDataSource *pAudioSrc,
SLDataSink *pAudioSnk,
SLuint32 numInterfaces,
const SLInterfaceID * pInterfaceIds,
const SLboolean * pInterfaceRequired
);
SLresult (*CreateAudioRecorder) (
SLEngineItf self,
SLObjectItf * pRecorder,
SLDataSource *pAudioSrc,
SLDataSink *pAudioSnk,
SLuint32 numInterfaces,
const SLInterfaceID * pInterfaceIds,
const SLboolean * pInterfaceRequired
);
Note that both of these take a SLDataSource *pAudioSrc parameter.
To use a custom playback rate or recording rate, you have to set up this data source properly.
I use an 11Khz playback rate using this code:
// Configure data format.
SLDataFormat_PCM pcm;
pcm.formatType = SL_DATAFORMAT_PCM;
pcm.numChannels = 1;
pcm.samplesPerSec = SL_SAMPLINGRATE_11_025;
pcm.bitsPerSample = SL_PCMSAMPLEFORMAT_FIXED_16;
pcm.containerSize = 16;
pcm.channelMask = SL_SPEAKER_FRONT_CENTER;
pcm.endianness = SL_BYTEORDER_LITTLEENDIAN;
// Configure Audio Source.
SLDataSource source;
source.pFormat = &pcm;
source.pLocator = &bufferQueue;
To feed data to the speakers, a buffer queue is used that is filled by a callback. To set this callback, use SLAndroidSimpleBufferQueueItf, documented in section 8.12 SLBufferQueueItf of the OpenGL ES specification.

Raw audio playback in Allegro 5

I'm writing a MOD player, trying to playback a sample using Allegro5 raw stream capabilities, I can't figure out the exact init parameters for the stream to play the loaded sample data from the mod file.
This is what I have:
xf::ModLoader ml;
ml.loadFromFile("C:\\Users\\bubu\\Downloads\\agress.mod");
// getSampleLength() returns # of data words
int sample_length = ml.getSampleLength(1) * 2;
const int8_t* sample_data = ml.getSampleData(1);
ALLEGRO_MIXER* mixer = al_get_default_mixer();
ALLEGRO_AUDIO_STREAM* stream = al_create_audio_stream(1, sample_length, 8363, ALLEGRO_AUDIO_DEPTH_INT8, ALLEGRO_CHANNEL_CONF_1);
al_attach_audio_stream_to_mixer(stream, mixer);
al_set_audio_stream_gain(stream, 0.7f);
al_set_audio_stream_playmode(stream, ALLEGRO_PLAYMODE_ONCE);
al_set_audio_stream_playing(stream, true);
al_set_audio_stream_fragment(stream, (void*)sample_data);
al_drain_audio_stream(stream);
First of all, freq param is hardcoded for the test (8363Hz), but, playing back at the specified frequency I don't get what I expect, and al_drain_audio_stream() gets stuck forever playing garbage in a loop...
Any help would be appreciated.
At the very least, you need to be calling al_get_audio_stream_fragment before you call al_set_audio_stream_fragment. Typically you'd feed these streams in a while loop, while responding to the ALLEGRO_EVENT_AUDIO_STREAM_FRAGMENT event. See the ex_saw example in the Allegro's source for some sample code: https://github.com/liballeg/allegro5/blob/master/examples/ex_saw.c

Resources