Encode LINEAR16 audio to Twilio media audio/x-mulaw | NodeJS - node.js

I have been trying to stream mulaw media stream back to Twilio. Requirement is payload must be encoded audio/x-mulaw with a sample rate of 8000 and base64 encoded
My input is from #google-cloud/text-to-speech in LINEAR16 Google Docs
I tried Wavefile
This is how I encoded the response from #google-cloud/text-to-speech
const wav = new wavefile.WaveFile(speechResponse.audioContent)
wav.toBitDepth('8')
wav.toSampleRate(8000)
wav.toMuLaw()
Then I send the result back to Twilio via WebSocket
twilioWebsocket.send(JSON.stringify({
event: 'media',
media: {
payload: wav.toBase64(),
},
streamSid: meta.streamSid,
}))
Problem is we only hear random noise on other ends of Twilio call, seems like encoding is not proper
Secondly I have checked the #google-cloud/text-to-speech output audio by saving it in a file and it was proper and clear
Can anyone please help me with the encoding

I also had this same problem. The error is in wav.toBase64(), as this includes the wav header. Twilio media streams expects raw audio data, which you can get with wav.data.samples, so your code would be:
const wav = new wavefile.WaveFile(speechResponse.audioContent)
wav.toBitDepth('8')
wav.toSampleRate(8000)
wav.toMuLaw()
const payload = Buffer.from(wav.data.samples).toString('base64');

I just had the same Problem. The solution is, that you need to convert the LINEAR16 by hand to the corresponding MULAW Codec.
You can use the code from a music libary.
I created a function out of this to convert a linear16 byte array to mulaw:
short2ulaw(b: Buffer): Buffer {
// Linear16 to linear8 -> buffer is half the size
// As of LINEAR16 nature, the length should ALWAYS be even
const returnbuffer = Buffer.alloc(b.length / 2)
for (let i = 0; i < b.length / 2; i++) {
// The nature of javascript forbids us to use 16-bit types. Every number is
// A double precision 64 Bit number.
let short = b.readInt16LE(i * 2)
let sign = 0
// Determine the sign of the 16-Bit byte
if (short < 0) {
sign = 0x80
short = short & 0xef
}
short = short > 32635 ? 32635 : short
const sample = short + 0x84
const exponent = this.exp_lut[sample >> 8] & 0x7f
const mantissa = (sample >> (exponent + 3)) & 0x0f
let ulawbyte = ~(sign | (exponent << 4) | mantissa) & 0x7f
ulawbyte = ulawbyte == 0 ? 0x02 : ulawbyte
returnbuffer.writeUInt8(ulawbyte, i)
}
return returnbuffer
}
Now you could use this on Raw PCM (Linear16). Now you just need to consider to strip the bytes at the beginning of the google stream since google adds a wav header.
You can then encode the resulting base64 buffer and send this to twilio.

Related

Audio Unit RemoteIO Setting interleaved float gives kAudioUnitErr_FormatNotSupported

I am working with Audio Unit RemoteIO's to obtain a low latency audio output. My problem is AFAIK audio unit only accepts several audio formats depending on the hardware. My problem is I have a C++ DSP Sound engine and it works with float interleaved PCM. I do not want to implement a format converter since it can slow things down in the remote IO callback. I tried obtaining a low latency Audio Unit with the following format:
AudioStreamBasicDescription const audioDescription = {
.mSampleRate = defaultSampleRate,
.mFormatID = kAudioFormatLinearPCM,
.mFormatFlags = kAudioFormatFlagIsFloat,
.mBytesPerPacket = defaultSampleRate * STEREO_CHANNEL,
.mFramesPerPacket = 1,
.mBytesPerFrame = STEREO_CHANNEL * sizeof(Float32),
.mChannelsPerFrame = STEREO_CHANNEL,
.mBitsPerChannel = 8 * sizeof(Float32),
.mReserved = 0
};
status = AudioUnitSetProperty(audioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
kOutputBus,
&audioDescription,
sizeof(audioDescription));
This fails with the error code kAudioUnitErr_FormatNotSupported -10868. If I try to obtain a float PCM NON-interleaved audio stream with the following:
AudioStreamBasicDescription const audioDescription = {
.mSampleRate = defaultSampleRate,
.mFormatID = kAudioFormatLinearPCM,
.mFormatFlags = kAudioFormatFlagIsFloat | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved,
.mBytesPerPacket = sizeof(float),
.mFramesPerPacket = 1,
.mBytesPerFrame = sizeof(float),
.mChannelsPerFrame = STEREO_CHANNEL,
.mBitsPerChannel = 8 * sizeof(float),
.mReserved = 0
};
status = AudioUnitSetProperty(audioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
kOutputBus,
&audioDescription,
sizeof(audioDescription));
Everything works fine. However I want to obtain an interleaved audio stream for my DSP engine to work without format conversion. Is this possible at all?
PS. waiting for hotpaw2 to guide me :)
Your error is probably due to this line:
.mBytesPerPacket = defaultSampleRate * STEREO_CHANNEL,

android AudioTrack playback short array (16bit)

I have an application that playback audio. It takes encoded audio data over RTP and decode it to 16bit array. The decoded 16bit array is converted to 8 bit array (byte array) as this is required for some other functionality.
Even though audio playback is working it is breaking continuously and very hard to recognise audio output. If I listen carefully I can tell it is playing the correct audio.
I suspect this is due to the fact I convert 16 bit data stream into a byte array and use the write(byte[], int, int, AudioTrack.WRITE_NON_BLOCKING) of AudioTrack class for audio playback.
Therefore I converted the byte array back to a short array and used write(short[], int, int, AudioTrack.WRITE_NON_BLOCKING) method to see if it could resolve the problem.
However now there is no audio sound at all. In the debug output I can see the short array has data.
What could be the reason?
Here is the AUdioTrak initialization
sampleRate =AudioTrack.getNativeOutputSampleRate(AudioManager.STREAM_MUSIC);
minimumBufferSize = AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT);
audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, sampleRate,
AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_16BIT,
minimumBufferSize,
AudioTrack.MODE_STREAM);
Here is the code converts short array to byte array
for (int i=0;i<internalBuffer.length;i++){
bufferIndex = i*2;
buffer[bufferIndex] = shortToByte(internalBuffer[i])[0];
buffer[bufferIndex+1] = shortToByte(internalBuffer[i])[1];
}
Here is the method that converts byte array to short array.
public short[] getShortAudioBuffer(byte[] b){
short audioBuffer[] = null;
int index = 0;
int audioSize = 0;
ByteBuffer byteBuffer = ByteBuffer.allocate(2);
if ((b ==null) && (b.length<2)){
return null;
}else{
audioSize = (b.length - (b.length%2));
audioBuffer = new short[audioSize/2];
}
if ((audioSize/2) < 2)
return null;
byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
for(int i=0;i<audioSize/2;i++){
index = i*2;
byteBuffer.put(b[index]);
byteBuffer.put(b[index+1]);
audioBuffer[i] = byteBuffer.getShort(0);
byteBuffer.clear();
System.out.print(Integer.toHexString(audioBuffer[i]) + " ");
}
System.out.println();
return audioBuffer;
}
Audio is decoded using opus library and the configuration is as follows;
opus_decoder_ctl(dec,OPUS_SET_APPLICATION(OPUS_APPLICATION_AUDIO));
opus_decoder_ctl(dec,OPUS_SET_SIGNAL(OPUS_SIGNAL_MUSIC));
opus_decoder_ctl(dec,OPUS_SET_FORCE_CHANNELS(OPUS_AUTO));
opus_decoder_ctl(dec,OPUS_SET_MAX_BANDWIDTH(OPUS_BANDWIDTH_FULLBAND));
opus_decoder_ctl(dec,OPUS_SET_PACKET_LOSS_PERC(0));
opus_decoder_ctl(dec,OPUS_SET_COMPLEXITY(10)); // highest complexity
opus_decoder_ctl(dec,OPUS_SET_LSB_DEPTH(16)); // 16bit = two byte samples
opus_decoder_ctl(dec,OPUS_SET_DTX(0)); // default - not using discontinuous transmission
opus_decoder_ctl(dec,OPUS_SET_VBR(1)); // use variable bit rate
opus_decoder_ctl(dec,OPUS_SET_VBR_CONSTRAINT(0)); // unconstrained
opus_decoder_ctl(dec,OPUS_SET_INBAND_FEC(0)); // no forward error correction
Let's assume you have a short[] array which contains the 16-bit one channel data to be played.
Then each sample is a value between -32768 and 32767 which represents the signal amplitude at the exact moment. And 0 value represents a middle point (no signal). This array can be passed to the audio track with ENCODING_PCM_16BIT format encoding.
But things are going weird when playing ENCODING_PCM_8BIT is used (See AudioFormat)
In this case each sample encoded by one byte. But each byte is unsigned. That means, it's value is between 0 and 255, while 128 represents the middle point.
Java has no unsigned byte format. Byte format is signed. I.e. values -128...-1 will represent actual values of 128...255. So you have to be careful when converting to the byte array, otherwise it will be a noise with barely recognizable source sound.
short[] input16 = ... // the source 16-bit audio data;
byte[] output8 = new byte[input16.length];
for (int i = 0 ; i < input16.length ; i++) {
// To convert 16 bit signed sample to 8 bit unsigned
// We add 128 (for rounding), then shift it right 8 positions
// Then add 128 to be in range 0..255
int sample = ((input16[i] + 128) >> 8) + 128;
if (sample > 255) sample = 255; // strip out overload
output8[i] = (byte)(sample); // cast to signed byte type
}
To perform backward conversion all should be the same: each single sample to be converted to exactly one sample of the output signal
byte[] input8 = // source 8-bit unsigned audio data;
short[] output16 = new short[input8.length];
for (int i = 0 ; i < input8.length ; i++) {
// to convert signed byte back to unsigned value just use bitwise AND with 0xFF
// then we need subtract 128 offset
// Then, just scale up the value by 256 to fit 16-bit range
output16[i] = (short)(((input8[i] & 0xFF) - 128) * 256);
}
The issue of not being able to convert data from byte array to short array was resolved when used bitwise operators instead of using ByteArray. It could be due not setting the correct parameters in ByteArray or it is not suitable for such conversion.
Nevertheless implementing conversion using bitwise operators resolved the problem. Since the original question has been resolved by this approach, please consider this as the final answer.
I will raise a separate topic for playback issue.
Thank you for all your support.

Sending a byte [] over javacard apdu

I send a byte [] from the host application to the javacard applet. But when I try to retrieve it as byte [] via the command buffer[ISO7816.OFFSET_CDATA], I am told that I cannot convert byte to byte[]. How can I send a byte [] via command APDU from the host application and retrieve it as byte[] on the other end (javacard applet). It appears buffer[ISO7816.OFFSET_CDATA] returns byte. See my comments on where the error occurs.
My idea is as follows:
The host application sends challenge as a byte [] to be signed by the javacard applet. Note that the signature requires the challenge to be a byte []. The javacard signs as follows:
private void sign(APDU apdu) {
if(!pin.isValidated()) ISOException.throwIt(SW_PIN_VERIFICATION_REQUIRED);
else{
byte [] buffer = apdu.getBuffer();
byte numBytes = buffer[ISO7816.OFFSET_LC];
byte byteRead =(byte)(apdu.setIncomingAndReceive());
if ( ( numBytes != 20 ) || (byteRead != 20) )
ISOException.throwIt(ISO7816.SW_WRONG_LENGTH);
byte [] challenge = buffer[ISO7816.OFFSET_CDATA];// error point cannot convert from byte to byte []
byte [] output = new byte [64];
short length = 64;
short x =0;
Signature signature =Signature.getInstance(Signature.ALG_RSA_SHA_PKCS1, false);
signature.init(privKey, Signature.MODE_SIGN);
short sigLength = signature.sign(challenge, offset,length, output, x); // challenge must be a byte []
//This sequence of three methods sends the data contained in
//'serial' with offset '0' and length 'serial.length'
//to the host application.
apdu.setOutgoing();
apdu.setOutgoingLength((short)output.length);
apdu.sendBytesLong(output,(short)0,(short)output.length);
}
}
The challenge is sent by the host application as shown below:
byte [] card_signature=null;
SecureRandom random = SecureRandom . getInstance( "SHA1PRNG" ) ;
byte [] bytes = new byte [ 20 ] ;
random . nextBytes ( bytes) ;
CommandAPDU challenge;
ResponseAPDU resp3;
challenge = new CommandAPDU(IDENTITY_CARD_CLA,SIGN_CHALLENGE, 0x00, 0x20,bytes);
resp3= c.transmit(challenge);
if(resp3.getSW()==0x9000) {
card_signature = resp3.getData();
String s= DatatypeConverter.printHexBinary(card_signature);
System.out.println("signature: " + s);
} else System.out.println("Challenge signature error " + resp3.getSW());
Generally, you send bytes over through the APDU interface. A Java or Java Card byte[] is a construct that can hold those bytes. This is where the APDU buffer comes in: it is the byte array that holds the bytes sent over the APDU interface - or at least a portion of them after calling setIncomingAndReceive().
The challenge therefore is within the APDU buffer; instead of calling:
short sigLength = signature.sign(challenge, offset,length, output, x);
you can therefore simply call:
short sigLength = signature.sign(buffer, apdu.getOffsetCdata(), CHALLENGE_SIZE, buffer, START);
where CHALLENGE_SIZE is 20 and START is simply zero.
Then you can use:
apdu.getOutgoingAndSend(START, sigLength);
to send back the signed challenge.
If you require to keep the challenge for a later stage then you should create a byte array in RAM using JCSystem.makeTransientByteArray() during construction of the Applet and then use Util.arrayCopy() to move the byte values into the challenge buffer. However, since the challenge is generated by the offcard system, there doesn't seem to be any need for this. The offcard system should keep the challenge, not the card.
You should not use ISO7816.OFFSET_CDATA anymore; it will not return the correct result if you would use larger key sizes that generate larger signatures and therefore require the use of extended length APDUs.

AAC stream resampled incorrectly

I do have a very particular problem, I wish I could find the answer to.
I'm trying to read an AAC stream from an URL (online streaming radio e.g. live.noroc.tv:8000/radionoroc.aacp) with NAudio library and get IEEE 32 bit floating samples.
Besides that I would like to resample the stream to a particular sample rate and channel count (rate 5512, mono).
Below is the code which accomplishes that:
int tenSecondsOfDownloadedAudio = 5512 * 10;
float[] buffer = new float[tenSecondsOfDownloadedAudio];
using (var reader = new MediaFoundationReader(pathToUrl))
{
var ieeeFloatWaveFormat = WaveFormat.CreateIeeeFloatWaveFormat(5512, 1); // mono
using (var resampler = new MediaFoundationResampler(reader, ieeeFloatWaveFormat))
{
var waveToSampleProvider = new WaveToSampleProvider(resampler);
int readSamples = 0;
int tempBuffer = new float[5512]; // 1 second buffer
while(readSamples <= tenSecondsOfDownloadedAudio)
{
int read = waveToSampleProvider.Read(tempBuffer, 0, tempBuffer.Length);
if(read == 0)
{
Thread.Sleep(500); // allow streaming buffer to get loaded
continue;
}
Array.Copy(tempBuffer, 0, buffer, readSamples, tempBuffer.Length);
readSamples += read;
}
}
}
These particular samples are then written to a Wave audio file using the following simple method:
using (var writer = new WaveFileWriter("path-to-audio-file.wav", WaveFormat.CreateIeeeFloatWaveFormat(5512, 1)))
{
writer.WriteSamples(samples, 0, samples.Length);
}
What I've encountered is that NAudio does not read 10 seconds of audio (as it was requested) but only 5, though the buffer array gets fully loaded with samples (which at this rate and channel count should contain 10 seconds of audio samples).
Thus the final audio file plays the stream 2 times as slower as it should (5 second stream is played as 10).
Is this somewhat related to different bit depths (should I record at 64 bits per sample as opposite to 32).
I do my testing at Windows Server 2008 R2 x64, with MFT codecs installed.
Would really appreciate any suggestions.
The problem seems to be with MediaFoundationReader failing to handle HE-AACv2 in ADTS container with is a standard online radio stream format and most likely the one you are dealing with.
Adobe products have the same problem mistreating this format exactly the same way^ stretching the first half of the audio to the whole duration and : Corrupted AAC files recorded from online stream
Supposedly, it has something to do with HE-AACv2 stereo stream being actually a mono stream with additional info channel for Parametric Stereo.

LibAV - what approach to take for realtime audio and video capture?

I'm using libav to encode raw RGB24 frames to h264 and muxing it to flv. This works
all fine and I've streamed for more then 48 hours w/o any problems! My next step
is to add audio to the stream. I'll be capturing live audio and I want to encode it
in real time using speex, mp3 or nelly moser.
Background info
I'm new to digital audio and therefore I might be doing things wrong. But basically my application gets a "float" buffer with interleaved audio. This "audioIn" function gets called by the application framework I'm using. The buffer contains 256 samples per channel,
and I have 2 channels. Because I might be mixing terminology, this is how I use the
data:
// input = array with audio samples
// bufferSize = 256
// nChannels = 2
void audioIn(float * input, int bufferSize, int nChannels) {
// convert from float to S16
short* buf = new signed short[bufferSize * 2];
for(int i = 0; i < bufferSize; ++i) { // loop over all samples
int dx = i * 2;
buf[dx + 0] = (float)input[dx + 0] * numeric_limits<short>::max(); // convert frame of the first channel
buf[dx + 1] = (float)input[dx + 1] * numeric_limits<short>::max(); // convert frame of the second channel
}
// add this to the libav wrapper.
av.addAudioFrame((unsigned char*)buf, bufferSize, nChannels);
delete[] buf;
}
Now that I have a buffer, where each sample is 16 bits, I pass this short* buffer, to my
wrapper av.addAudioFrame() function. In this function I create a buffer, before I encode
the audio. From what I read, the AVCodecContext of the audio encoder sets the frame_size. This frame_size must match the number of samples in the buffer when calling avcodec_encode_audio2(). Why I think this, is because of what is documented here.
Then, especially the line:
If it is not set, frame->nb_samples must be equal to avctx->frame_size for all frames except the last.*(Please correct me here if I'm wrong about this).
After encoding I call av_interleaved_write_frame() to actually write the frame.
When I use mp3 as codec my application runs for about 1-2 minutes and then my server, which is receiving the video/audio stream (flv, tcp), disconnects with a message "Frame too large: 14485504". This message is generated because the rtmp-server is getting a frame which is way to big. And this is probably due to the fact I'm not interleaving correctly with libav.
Questions:
There quite some bits I'm not sure of, even when going through the source code of libav and therefore I hope if someone has an working example of encoding audio which comes from a buffer which which comes from "outside" libav (i.e. your own application). i.e. how do you create a buffer which is large enough for the encoder? How do you make the "realtime" streaming work when you need to wait on this buffer to fill up?
As I wrote above I need to keep track of a buffer before I can encode. Does someone else has some code which does this? I'm using AVAudioFifo now. The functions which encodes the audio and fills/read the buffer is here too: https://gist.github.com/62f717bbaa69ac7196be
I compiled with --enable-debug=3 and disable optimizations, but I'm not seeing any
debug information. How can I make libav more verbose?
Thanks!

Resources