RTP AAC Packet Depacketizer - audio

I asked earlier about H264 at RTP H.264 Packet Depacketizer
My question now is about the audio packets.
I noticed via the RTP packets that audio frames like AAC, G.711, G.726 and others all have the Marker Bit set.
I think frames are independent. am I right?
My question is: Audio is small, but I know that I can have more than one frame per RTP ​​packet. Independent of how many frames I have, they are complete? Or it may be fragmented between RTP packets.

The difference between audio and video is that audio is typically encoded either in individual samples, or in certain [small] frames without reference to previous data. Additionally, amount of data is small. So audio does not typically need complicated fragmentation to be transmitted over RTP. However, for any payload type you should again refer to RFC that describes the details:
AAC - RTP Payload Format for MPEG-4 Audio/Visual Streams
G.711 - RTP Payload Format for ITU-T Recommendation G.711.1
G.726 - RTP Profile for Audio and Video Conferences with Minimal Control
Other

Related

Can Linux determine the parameters of an S/PDIF stream?

I have an external audio source that transmits audio data to my computer's sound card via S/PDIF. The sound card has an S/PDIF input. With "arecord" or "audacity" I can record over this input without any problems.
The audio source offers the data in different sample rates (32 kHz, 44.1 kHz, 48 kHz) which I cannot influence. I also can't tell from the source which sample rate the audio source has selected.
For "recording" I would now very much like to keep the sample rate and not have it converted (apparently by the sound card).
Now finally my question: Can I somehow detect with the help of Linux in which format and with which parameters the S/PDIF stream is encoded

Save audio from RTP stream that contains RFC 2833 RTP events

I'm trying to extract audio from a telephone session captured with Wireshark. The capture as send to us from the telephone provider for debugging/analysis. I have 3 files: signalling, and two files with UDP data, one for each direction. After merging two of these files (one direction with signalling), Wireshark provides RTP stream analysis. What I observe (as I do for a second session capture) is that Wireshark isn't able to export RTP stream audio (Payload type: ITU-T G.711 PCMA (8)) for one direction. This happens to be an RTP stream containing "RTF 2833 RTP events" (Payload type: telephone-event (106)). These events seem to transport DTMF tunes out-of-band, for each DTMF tune, there is a section of 7 consecutive RTP events of this type. What Wireshark does is producing an 8 GB *.au file for an audio stream less than two minutes. For the opposite-direction stream I get an audio file that is 2 MB in size.
I have to admit that this is just guesswork: I connect the error with a feature that I can see, I'm a bit confused that Wireshark obviously knows these Events but fails on saving the corresponding audio stream. Do I maybe need some plugin for that?
I tried to search the web for this issue but without success.
This question was previously asked on Network Engineering but turned out to be off-topic there.
You can filter (rtp.p_type != 106) the DTMF events from the wireshark logs (pcap) and then save only the G.711 data in a separate file.
Then do the RTP analysis and save the audio payload in .au/.raw file format.

MPEG Transport Stream Audio data information

I am writing a code to extract AAC audio data from mpeg ts stream. I want to get stream properties like sampling frequency, number of channels, Audio type, Audio profile type etc. from Transport stream, without decoding the actual data. How much of the information will be available from stream?
Also I want to know is there any way to find the total duration of the stream without actually finding the last PTS value in the file
Thanks
AAC frames packed in TS use ADTS headers. Its 7 (or 9) bytes, and very easy to parse. ADTS header format is documented well online.

How to find AAC-LC (non-ADTS) audio packet length

I have AAC-LC audio stream coming directly from audio encoder.
Its a raw stream, No ADTS headers, no container data as I want to stream encoded audio directly as it arrives.(before file gets saved).
I want to determine the frame boundaries/frame lengths/packets lengths in incoming encoded raw AAC stream. (AAC has variable packet lengths.)
Can I search for any fixed frame headers/patterns so that I can determine frame boundaries?
Is it possible with AAC?
Thanks in advance for your valuable inputs.
If you are taking AAC encoded data directly from encoder then it's up to encoder to send frame by frame. It should not send "packets", but single frames. Otherwise I don't see a way you can parse for frames.
I'd first check if it really sends more than one frame at a time?
If yes, then one solution would be to tell encoder to send ADTS header, then parse info from ADTS, and finally strip down ADTS from the frame and stream it as raw.
Does that help?

How can I programmatically mux multiple RTP audio streams together?

I have several RTP streams coming to from the network, and since RTP can only handle one stream in each direction, I need to able to merge a couple to send back to another client (could be one that is already sending an RTP stream, or not... that part isn't important).
My guess is that there is some algorithm for mixing audio bytes.
RTP Stream 1 ---------------------
\_____________________ (1 MUXED 2) RTP Stream Out
/
RTP Stream 2 ---------------------
There is an IETF draft for RTP stream Muxing which might help you the link is here http://www.cs.columbia.edu/~hgs/rtp/drafts/draft-tanigawa-rtp-multiplex-01.txt
In case you want to use only one stream, then perhaps send data from multiple streams as different channles this link gives an overview how Audio channels are multiplexed in WAV files. You can adopt similar strategy
I think you are talking about VoIP conference.
mediastreamer2 library I think supports conference filter.

Resources