What is the difference between a stream and channel in ffmpeg? - audio

The documentation refers to streams but it is not clear if an audio channel is a stream or whether a stream contains multiple audio channels.

Related

Mux segmented mpegts audio and video to single clip with error correction

I have a recording as a collection of files in mpegts format, like
audio: a-1.ts, a-2.ts, a-3.ts, a-4.ts
video: v-1.ts, v-2.ts, v-3.ts
I need to make a single video clip in mp4 or mkv format.
However, there are two problems:
audio and video segments have different duration each, number of audio segments is different from number of video segments. Total duration of audio and video matches. Hence I can not concat pairwise audio video segments using mpeg and merge them afterwards, I get sync issues increasing progressively
few segments are corrupt or missing. So if I concat audio and video streams separately using ffmpeg I get streams of different lengths. When I merge these streams using ffmpeg I have correct a/v synchronization until time when first missing packet is encountered.
It's OK if video freezes for a while or there is silence for a while as long as most of the video is in sync with audio.
I've checked with tsduck and PCR seems to be present in all audio and video segments yet I could not find a way to merge streams using mpegTS PCR as sync reference. Please advise how can I achieve this.

How do I create two separate audio streams using the same virtual audio driver?

I'm currently trying to develop a Node.js application on MacOS that routes audio from a camera rtsp to a virtual audio driver (SoundPusher) to be played through Zoom mic as one stream as well as grab audio from Zoom output through the virtual audio driver to a output rtsp stream as a different stream:
1. Camera Rtsp/Audio Element (SoundPusher Speaker) -> Zoom Input(SoundPusher Mic)
2. Zoom Output (SoundPusher Speaker) -> Pipe audio to Output Rtsp from SoundPusher Mic
1.The implementation that I have right now is that the audio from the camera rtsp is piped to a HTTP server with ffmpeg. On the client side, I create an audio element streaming the audio from the HTTP server through HLS. I then run setSinkId on the audio element to direct the audio to the Soundpusher input and have my microphone in Zoom set to Soundpusher output.
const audio = document.createElement('audio') as any;
audio.src = 'http://localhost:9999';
audio.setAttribute('type', 'audio/mpeg')
await audio.setSinkId(audioDriverSpeakerId);
audio.play();
2.I also have Soundpusher input set as the output for my audio in Zoom so I can obtain audio from Zoom and then pipe it to the output rtsp stream from Soundpusher output.
ffmpeg -f avfoundation -i "none:SoundPusher Audio" -c:a aac -f rtsp rtsp://127.0.0.1:5554/audio"
The problem is that the audio from my camera is being mixed in with the audio from Zoom in the output RTSP stream but I'm expecting to hear only the audio from Zoom. Does anyone know of a way to separate the audio from both streams but use the same audio driver? I'd like to route the audio streams so that the stream from the audio element to Zoom is separate from the stream from Zoom to the output rtsp.
I'm very new to audio streaming so any advice would be appreciated.

Transcode webm audio stream on-the-fly using ffmpeg

I want to transcode an audio stream from YouTube (webm) to PCM on the fly using a buffer, but ffmpeg can only process the first received buffer due to the lack of metadata in subsequent buffers. Is there any way to make this work? I've thought about attaching metadata to other chunks but couldn't make this work. Maybe there's a better approach?

RTP AAC Packet Depacketizer

I asked earlier about H264 at RTP H.264 Packet Depacketizer
My question now is about the audio packets.
I noticed via the RTP packets that audio frames like AAC, G.711, G.726 and others all have the Marker Bit set.
I think frames are independent. am I right?
My question is: Audio is small, but I know that I can have more than one frame per RTP ​​packet. Independent of how many frames I have, they are complete? Or it may be fragmented between RTP packets.
The difference between audio and video is that audio is typically encoded either in individual samples, or in certain [small] frames without reference to previous data. Additionally, amount of data is small. So audio does not typically need complicated fragmentation to be transmitted over RTP. However, for any payload type you should again refer to RFC that describes the details:
AAC - RTP Payload Format for MPEG-4 Audio/Visual Streams
G.711 - RTP Payload Format for ITU-T Recommendation G.711.1
G.726 - RTP Profile for Audio and Video Conferences with Minimal Control
Other

How can I programmatically mux multiple RTP audio streams together?

I have several RTP streams coming to from the network, and since RTP can only handle one stream in each direction, I need to able to merge a couple to send back to another client (could be one that is already sending an RTP stream, or not... that part isn't important).
My guess is that there is some algorithm for mixing audio bytes.
RTP Stream 1 ---------------------
\_____________________ (1 MUXED 2) RTP Stream Out
/
RTP Stream 2 ---------------------
There is an IETF draft for RTP stream Muxing which might help you the link is here http://www.cs.columbia.edu/~hgs/rtp/drafts/draft-tanigawa-rtp-multiplex-01.txt
In case you want to use only one stream, then perhaps send data from multiple streams as different channles this link gives an overview how Audio channels are multiplexed in WAV files. You can adopt similar strategy
I think you are talking about VoIP conference.
mediastreamer2 library I think supports conference filter.

Resources