How to capture raw audio samples after every 10MS using FFmpeg programmatically?

How to capture raw audio samples after every 10MS using FFmpeg programmatically? - audio

I am using FFmpeg to record audio using pulse audio at a 48k sampling rate and 32k bitrate. Eventually, I perform encoding these recorded samples through opus which is configured on default settings such as frame_size set to 120, initial padding set to 120 and etc. I am not sure about the recording callback settings for pulse audio but I have read in opus RFC that it uses 120ms by default and can support 10, 20, 40, 60, and any other which is a multiple of 2.5.
I have a project where I have added WebRTC p2p connection support to send FFmpeg recorded audio/video on the audio/video channel of WebRTC to the next connected peer. In the case of sending plain audio as well as plain video which is recorded through FFmpeg, there is no issue with sending these over an audio/video channel (obviously) however, in the case of FFmpeg encoded data, I have to modify WebRTC audio/video encoder factories (clear to me).
I want to capture raw audio samples using FFmpeg after every 10MS callback.

Related

Mux segmented mpegts audio and video to single clip with error correction

I have a recording as a collection of files in mpegts format, like
audio: a-1.ts, a-2.ts, a-3.ts, a-4.ts
video: v-1.ts, v-2.ts, v-3.ts
I need to make a single video clip in mp4 or mkv format.
However, there are two problems:
audio and video segments have different duration each, number of audio segments is different from number of video segments. Total duration of audio and video matches. Hence I can not concat pairwise audio video segments using mpeg and merge them afterwards, I get sync issues increasing progressively
few segments are corrupt or missing. So if I concat audio and video streams separately using ffmpeg I get streams of different lengths. When I merge these streams using ffmpeg I have correct a/v synchronization until time when first missing packet is encountered.
It's OK if video freezes for a while or there is silence for a while as long as most of the video is in sync with audio.
I've checked with tsduck and PCR seems to be present in all audio and video segments yet I could not find a way to merge streams using mpegTS PCR as sync reference. Please advise how can I achieve this.

How is the AAC encoder priming delay handled in HLS?

As per Apple, in AAC encoding 2112 priming samples are added at the beginning of audio. When creating HLS stream with AAC audio, will these priming samples be added to the beginning of each HLS segment or only to the first HLS segment? And, how does this AAC encoder delay affect HLS DISCONTINUITY tags later in the HLS stream?
https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html

I depends on the AAC you use.
For 'old-style' AAC-LC you only have priming samples at the beginning of the stream and not at the beginning of each segment.
But the delay is carried through the entire stream.
Typically a new piece of media is displayed after a DISCONTINUITY tag - for example an advertisement - so you will receive another set of priming samples.
Your AAC audio decoder needs to discard the priming samples (first 2112) PCM output samples after startup and after DISCONTINUITY.
If you use the more modern xHE-AAC - you don't have to worry about priming samples anymore.
Another wrinkle - in the early days it was just assumed that AAC-LC has 2112 priming samples.
Now the number can be different and it can be signaled in the MP4 container as Edit-List.

ffmpeg to auto lower/fade audio volume of one audio stream when microphone voice detected?

I want to do live audio translation via microphone, to get streamed live vid/audio from Facebook, plug the mic into laptop and do live translation by mixing existing audio stream with one coming from the mic (translation). This is OK, somehow I got this part by using audio filter "amix" and mix two audio streams together into one. Now I want to add more perfection to it, is it possible to (probably is) upon mic voice detection to automatically decrease/fade down 20% volume of input/original audio stream to hear translation (mic audio) more loudly and then when mic action/voice stops for lets say 3-5 seconds the volume of original audio stream fades up/goes up to normal volume... is this too much, i can play with sox or similar?

How to add a 5.1 .flac audio track to a .ts file with already 3 audio tracks?

I want to add a 5.1 .flac audio track to a .ts file that already has three audio tracks. I tried with tsMuxer and ffmpeg with unsuccessful results. In tsMuxeR the .flac track is not recognized and in ffmpeg everything seems to work fine until the very last moment when I check the file and the .flac audio track is not included in the "output.ts". The .flac track is about 3GB and its lenght is around two and a half hours.
Thank you so much.

I don't think you'll find any existing software that maps FLAC into a MPEG-2 Transport Stream.
This gives you an idea what sort of issues you run into: https://xiph.org/flac/ogg_mapping.html
Let's say you came up with a reasonable way of mapping FLAC into a MPEG-2 Transport Stream - there won't be anything reading it.
Unless there is a specified way of mapping FLAC into a MPEG-2 Tranport Stream - you are on your own.
But PCM is supported in a MPEG-2 Transport Stream (for example Blu-Ray).
I'd use ffmpeg to transcode your audio from FLAC to PCM and then mux it into your transport stream.
Your audio transcode (FLAC to PCM) is lossless.

RTP AAC Packet Depacketizer

I asked earlier about H264 at RTP H.264 Packet Depacketizer
My question now is about the audio packets.
I noticed via the RTP packets that audio frames like AAC, G.711, G.726 and others all have the Marker Bit set.
I think frames are independent. am I right?
My question is: Audio is small, but I know that I can have more than one frame per RTP packet. Independent of how many frames I have, they are complete? Or it may be fragmented between RTP packets.

The difference between audio and video is that audio is typically encoded either in individual samples, or in certain [small] frames without reference to previous data. Additionally, amount of data is small. So audio does not typically need complicated fragmentation to be transmitted over RTP. However, for any payload type you should again refer to RFC that describes the details:
AAC - RTP Payload Format for MPEG-4 Audio/Visual Streams
G.711 - RTP Payload Format for ITU-T Recommendation G.711.1
G.726 - RTP Profile for Audio and Video Conferences with Minimal Control
Other

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to capture raw audio samples after every 10MS using FFmpeg programmatically? - audio

Related

Mux segmented mpegts audio and video to single clip with error correction

How is the AAC encoder priming delay handled in HLS?

ffmpeg to auto lower/fade audio volume of one audio stream when microphone voice detected?

How to add a 5.1 .flac audio track to a .ts file with already 3 audio tracks?

RTP AAC Packet Depacketizer

Categories

Resources