Is GSM6.10 audio format block or stream based? - voip

I might be asking the wrong question, but my knowledge in this area is very limited.
I'm using acmStreamConvert to convert PCM to GSM (6.10).
Audio Format: 8khz, 16-bit, mono
For the PCM buffer size I'm using 640 bytes (320 samples). For GSM buffer I'm using 65 bytes. My understanding is that GSM "always" converts 320 samples to 65 bytes.
The reason I ask "block or stream" is I'm wondering if I can safely convert multiple audio streams (real-time) using the same acmStreamConvert handle? I see the function has some flags for ACM_STREAMCONVERTF_START and ACM_STREAMCONVERTF_END and ACM_STREAMCONVERTF_BLOCKALIGN, but is it required I use this start/end sequence for GSM? I understand that might be required for some formats that use head/tails, but I'm hoping this isn't required for GSM format?
I'm working on a group VOIP client, and each client sends GSM format, and then needs to convert to PCM before playing. I'm hoping I don't need one ACM handle per client.

Stream based, or at least the ACM API usage of it is. Trying to use the same ACM objects/handles for multiple streams will produce undesired results. I suspect this also means it doesn't handle lost packets as well as other codecs might (haven't confirmed that part yet).

Related

Can Linux determine the parameters of an S/PDIF stream?

I have an external audio source that transmits audio data to my computer's sound card via S/PDIF. The sound card has an S/PDIF input. With "arecord" or "audacity" I can record over this input without any problems.
The audio source offers the data in different sample rates (32 kHz, 44.1 kHz, 48 kHz) which I cannot influence. I also can't tell from the source which sample rate the audio source has selected.
For "recording" I would now very much like to keep the sample rate and not have it converted (apparently by the sound card).
Now finally my question: Can I somehow detect with the help of Linux in which format and with which parameters the S/PDIF stream is encoded

Save audio from RTP stream that contains RFC 2833 RTP events

I'm trying to extract audio from a telephone session captured with Wireshark. The capture as send to us from the telephone provider for debugging/analysis. I have 3 files: signalling, and two files with UDP data, one for each direction. After merging two of these files (one direction with signalling), Wireshark provides RTP stream analysis. What I observe (as I do for a second session capture) is that Wireshark isn't able to export RTP stream audio (Payload type: ITU-T G.711 PCMA (8)) for one direction. This happens to be an RTP stream containing "RTF 2833 RTP events" (Payload type: telephone-event (106)). These events seem to transport DTMF tunes out-of-band, for each DTMF tune, there is a section of 7 consecutive RTP events of this type. What Wireshark does is producing an 8 GB *.au file for an audio stream less than two minutes. For the opposite-direction stream I get an audio file that is 2 MB in size.
I have to admit that this is just guesswork: I connect the error with a feature that I can see, I'm a bit confused that Wireshark obviously knows these Events but fails on saving the corresponding audio stream. Do I maybe need some plugin for that?
I tried to search the web for this issue but without success.
This question was previously asked on Network Engineering but turned out to be off-topic there.
You can filter (rtp.p_type != 106) the DTMF events from the wireshark logs (pcap) and then save only the G.711 data in a separate file.
Then do the RTP analysis and save the audio payload in .au/.raw file format.

How to convert PCM audio stream for online play

I have access to an audio stream of PCM audio buffers. I should be clear I do not have access to the audio file. I only have access to a stream of 4096 byte chunks of the audio data.
The PCM buffers come in with the following format:
PCM Int 16
Little Endian
Two Channels
Interleaved
To support audio playback on a standard browser I need to convert the audio to the following format:
PCM Float 32
Big Endian
Two channels (at most)
Deinterleaved
This audio is coming from an iOS app so I have access to Swift and Objective C (although I am not very comfortable with Objective C...which makes Apple's Audio Converter Services almost impossible to use because Swift really doesn't like pointers).
Additionally the playback will occur on a browser so I could handle the conversion in client side Javascript or server sider. I am proficient enough in the following server side languages to do a conversion:
Java (preferred)
PHP
Node.js
Python
If anyone knows a way to do this in any of these languages please let me know. I have worked on this for long enough that I will probably understand even a very technical description of how to do this.
My current plan is to use bitwise operations to deinterleave the left and right channels, then cast the Int 16 Buffer to a Float 32 Buffer with the Web Audio API. Does this seem like a good plan?
Any help is appreciated, thank you.
My current plan is to use bitwise operations to deinterleave the left and right channels, then cast the Int 16 Buffer to a Float 32 Buffer with the Web Audio API. Does this seem like a good plan?
Yes, that is exactly what you need to do. I do the exact same thing in my applications, and this method works well and is really the only way that makes sense to do it. You don't want to send 32-bit float samples to the client from the server due to the amount of bandwidth. Do the conversion client-side.

Correct way to encode Kinect audio with lame.exe

I receive data from a Kinect v2, which is (I believe, information is hard to find) 16kHz mono audio in 32-bit floating point PCM. The data arrives in up to 4 "SubFrames", which contain 256 samples each.
When I send this data to lame.exe with -r -s 16 --bitwidth 32 -m m I get an output containing gaps (supposedly where the second channel should be). These command line switches should however take stereo and downmix it to mono.
I've also tried importing the raw data into Audacity, but I still can't figure out the correct way to get continuous audio out of it.
EDIT: I can get continuous audio when I only save the first SubFrame. The audio still doesn't sound right though.
In the end I went with Ogg Vorbis. A free format, so no problems there either. I use the following command line switches for oggenc2.exe:
oggenc2.exe --raw-format=3 --raw-chan=1 --raw-rate=16000 - --output=[filename]

Adding audio effects (reverb etc..) to a BackgroundAudioPlayer driven streaming audio app

I have a windows phone 8 app which plays audio streams from a remote location or local files using the BackgroundAudioPlayer. I now want to be able to add audio effects, for example, reverb or echo, etc...
Please could you advise me on how to do this? I haven't been able to find a way of hooking extra audio processing code into the pipeline of audio processing even through I've read much about WASAPI, XAudio2 and looked at many code examples.
Note that the app is written in C# but, from my previous experience with writing audio processing code, I know that I should be writing the audio code in native C++. Roughly speaking, I need to find a point at which there is an audio buffer containing raw PCM data which I can use as an input for my audio processing code which will then write either back to the same buffer or to another buffer which is read by the next stage of audio processing. There need to be ways of synchronizing what happens in my code with the rest of the phone's audio processing mechanisms and, of course, the process needs to be very fast so as not to cause audio glitches. Or something like that; I'm used to how VST works, not how such things might work in the Windows Phone world.
Looking forward to seeing what you suggest...
Kind regards,
Matt Daley
I need to find a point at which there is an audio buffer containing
raw PCM data
AFAIK there's no such point. This MSDN page hints that audio/video decoding is performed not by the OS, but by the Qualcomm chip itself.
You can use something like Mp3Sharp for decoding. This way the mp3 will be decoded on the CPU by your managed code, you can interfere / process however you like, then feed the PCM into the media stream source. Main downside - battery life: the hardware-provided codecs should be much more power-efficient.

Resources