Normally, I would expect that the following would be good enough to represent binary data in a Buffer:
new Buffer('01001000','binary')
but I am pretty certain Node.js/JS does not support this 'binary' encoding.
What is the best way then to write binary data to a buffer?
You can do binary encoding like this:
var binaryString = "\xff\xfa\xc3\x4e";
var buffer = new Buffer(binaryString, "binary");
console.log(buffer);
<Buffer ff fa c3 4e>
//types of encoding allowed
encoding size (bytes)
base64 4,177,241
binary 4,162,398
hex 4,669,965
JSON 2,271,670
utf16le* 4,543,605
utf8* 3,640,132
ascii* 2,929,850
Related
I have been trying to stream mulaw media stream back to Twilio. Requirement is payload must be encoded audio/x-mulaw with a sample rate of 8000 and base64 encoded
My input is from #google-cloud/text-to-speech in LINEAR16 Google Docs
I tried Wavefile
This is how I encoded the response from #google-cloud/text-to-speech
const wav = new wavefile.WaveFile(speechResponse.audioContent)
wav.toBitDepth('8')
wav.toSampleRate(8000)
wav.toMuLaw()
Then I send the result back to Twilio via WebSocket
twilioWebsocket.send(JSON.stringify({
event: 'media',
media: {
payload: wav.toBase64(),
},
streamSid: meta.streamSid,
}))
Problem is we only hear random noise on other ends of Twilio call, seems like encoding is not proper
Secondly I have checked the #google-cloud/text-to-speech output audio by saving it in a file and it was proper and clear
Can anyone please help me with the encoding
I also had this same problem. The error is in wav.toBase64(), as this includes the wav header. Twilio media streams expects raw audio data, which you can get with wav.data.samples, so your code would be:
const wav = new wavefile.WaveFile(speechResponse.audioContent)
wav.toBitDepth('8')
wav.toSampleRate(8000)
wav.toMuLaw()
const payload = Buffer.from(wav.data.samples).toString('base64');
I just had the same Problem. The solution is, that you need to convert the LINEAR16 by hand to the corresponding MULAW Codec.
You can use the code from a music libary.
I created a function out of this to convert a linear16 byte array to mulaw:
short2ulaw(b: Buffer): Buffer {
// Linear16 to linear8 -> buffer is half the size
// As of LINEAR16 nature, the length should ALWAYS be even
const returnbuffer = Buffer.alloc(b.length / 2)
for (let i = 0; i < b.length / 2; i++) {
// The nature of javascript forbids us to use 16-bit types. Every number is
// A double precision 64 Bit number.
let short = b.readInt16LE(i * 2)
let sign = 0
// Determine the sign of the 16-Bit byte
if (short < 0) {
sign = 0x80
short = short & 0xef
}
short = short > 32635 ? 32635 : short
const sample = short + 0x84
const exponent = this.exp_lut[sample >> 8] & 0x7f
const mantissa = (sample >> (exponent + 3)) & 0x0f
let ulawbyte = ~(sign | (exponent << 4) | mantissa) & 0x7f
ulawbyte = ulawbyte == 0 ? 0x02 : ulawbyte
returnbuffer.writeUInt8(ulawbyte, i)
}
return returnbuffer
}
Now you could use this on Raw PCM (Linear16). Now you just need to consider to strip the bytes at the beginning of the google stream since google adds a wav header.
You can then encode the resulting base64 buffer and send this to twilio.
I am using a node function to compile a buffer that is made up of a header and payload. The first term in the header is the total size of the buffer that is being sent to the server, the second element is the command that is being sent to the server and then the payload is added to the buffer. using the bufferpack.js library from NPM when I calculate the length of the required buffer I am getting the same size as after the buffer has been created and I get back the length property of the buffer.
The issue I am seeing is that once the buffer is more than 127 bytes the number of bytes written does not match the number of bytes in the actual buffer, in the testing I have done its 2 bytes longer. This is causing an issue on the server as it doesn't know what to do with the extra bytes.
Here is the method that I am using to convert the data to a buffer and then send the buffer to the server
_sendall(cmd, payload='', callback){
console.log('payload:', payload.length)
let fmt = '>HB'
let size = struct.calcLength(fmt) + payload.length
console.log('size:',size)
let header = [size, cmd]
let buf = struct.pack(fmt, header) + payload
console.log('buf:', buf.length)
console.log('[INFO] Buffer Sent:',buf)
this.sock.write(buf, 'binary', (err)=>{
console.log('[INFO] Bytes written now:', this.sock.bytesWritten-this._bytesWrittenPrevious)
console.log('[INFO] Total Bytes Written:', this.sock.bytesWritten)
this._bytesWrittenPrevious = this.sock.bytesWritten
if (err) {
console.log('[ERROR] _sendall:', err)
return callback(false)
}
return callback(true)
})
}
Here is an example of the console output when the system works correctly and the bytes sent match the size of the buffer. The server responds and all is right in the world.
[INFO] Sending the input setup
payload: 20
size: 23
buf: 23
[INFO] Buffer Sent: Iinput_int_register_0
[INFO] Bytes written now: 23
[INFO] Total Bytes Written: 143
Here is what I am seeing when the system is not working correctly and the server never responds back and the code hangs up as there are callbacks that never fire.
[INFO] Sending the input setup
payload: 143
size: 146
buf: 146
[INFO] Buffer Sent: �Iinput_double_register_0,input_double_register_1,input_double_register_2,input_double_register_3,input_double_register_4,input_double_register_5
[INFO] Bytes written now: 148
[INFO] Total Bytes Written: 291
Any ideas what is happening? I have not been able to find anything regarding an issue such as this so any help is greatly appreciated.
UPDATE:
I made the changes recommended by #mscdex to the encoding in the sock.write and now I am writing the same number of bytes that I am sending but I am still having an issue. I have narrowed it down to the size element which is being encoded as an unsigned short (H) using the bufferpack.js library. Anytime size is more than 127 I believe its encoding it incorrectly, if I try to unpack the buffer I am getting a NaN return value for the size. Still working on resolving the issue.
When you're sending binary data, you do not want to be specifying 'utf8' as the encoding. You really should be using Buffer objects instead of strings for binary data. However, to write a binary string, you can use the 'binary' encoding instead. That will keep the string from being interpreted as UTF-8.
I was finally able to make the program work by ditching the bufferpack.js library for packing the buffer. It appears that it was not encoding the size of the buffer that was being sent correctly anytime the size of the buffer exceeds 127 bytes. The H in the fmt variable was supposed to format the byte as an unsigned short, appears to be looking for a signed short but that is just conjecture at this time until I have time to dig into it further as I got the same result using h which is a signed short.
To solve the issue I switched to using the native buffer write methods and this has solved the issue. I will look into the bufferpack.js and see if I can find the issue and post a PR to the repo if I'm able to correct it.
Here is the new code that works as expected:
_sendall(cmd, payload='', callback){
let fmt = '>HB'
let size = struct.calcLength(fmt) + payload.length
let header = [size, cmd]
let buf = new Buffer(size)
buf.writeUInt16BE(size, 0)
buf.writeInt8(cmd, 2)
buf.write(payload.toString(), 3)
this.sock.write(buf, 'binary', (err)=>{
console.log('[INFO] Bytes written now:', this.sock.bytesWritten-this._bytesWrittenPrevious)
console.log('[INFO] Total Bytes Written:', this.sock.bytesWritten)
this._bytesWrittenPrevious = this.sock.bytesWritten
if (err) {
console.log('[ERROR] _sendall:', err)
return callback(false)
}
return callback(true)
})
}
I have a buffer in node <Buffer 42 d9 00 00> that is supposed to represent the decimal 108.5. I am using this module to try and decode the buffer: https://github.com/feross/ieee754.
ieee754.read = function (buffer, offset, isLE, mLen, nBytes)
The arguments mean the following:
buffer = the buffer
offset = offset into the buffer
value = value to set (only for write)
isLe = is little endian?
mLen = mantissa length
nBytes = number of bytes
I try to read the value: ieee754.read(buffer, 0, false, 5832704, 4) but am not getting the expected result. I think I am calling the function correctly, although I am unsure about the mLen argument.
[I discovered that] the node Buffer class has that ability built in: buffer.readFloatBE(0).
I want to serialize a buffer to string without any overhead ( one character for one byte) and be able to unserialize it into buffer again.
var b = new Buffer (4) ;
var s = b.toString() ;
var b2 = new Buffer (s)
Produces the same results only for values below 128. I want to use the whole scope of 0-255.
I know I can write it in a loop with String.fromCharCode() in serializing and String.charCodeAt() in deserializing, but I'm looking for some native module implementation if there is any.
You can use the 'latin1' encoding, but you should generally try to avoid it because converting a Buffer to a binary string has some extra computational overhead.
Example:
var b = Buffer.alloc(4);
var s = b.toString('latin1');
var b2 = Buffer.from(s, 'latin1');
I'm attempting to display the character í from 0xed (237).
String.fromCharCode yields the correct result:
String.fromCharCode(0xed); // 'í'
However, when using a Buffer:
var buf = new Buffer(1);
buf.writeUInt8(0xed,0); // <Buffer ed>
buf.toString('utf8'); // '?', same as buf.toString()
buf.toString('binary'); // 'í'
Using 'binary' with Buffer.toString is to be deprecated so I want to avoid this.
Second, I can also expect incoming data to be multibyte (i.e. UTF-8), e.g.:
String.fromCharCode(0x0512); // Ԓ - correct
var buf = new Buffer(2);
buf.writeUInt16LE(0x0512,0); // <Buffer 12 05>, [0x0512 & 0xff, 0x0512 >> 8]
buf.toString('utf8'); // Ԓ - correct
buf.toString('binary'); // Ô
Note that both examples are inconsistent.
SO, what am I missing? What am I assuming that I shouldn't? Is String.fromCharCode magical?
Seems you might be assuming that Strings and Buffers use the same bit-length and encoding.
JavaScript Strings are 16-bit, UTF-16 sequences while Node's Buffers are 8-bit sequences.
UTF-8 is also a variable byte-length encoding, with code points consuming between 1 and 6 bytes. The UTF-8 encoding of í, for example, takes 2 bytes:
> new Buffer('í', 'utf8')
<Buffer c3 ad>
And, on its own, 0xed is not a valid byte in UTF-8 encoding, thus the ? representing an "unknown character." It is, however, a valid UTF-16 code for use with String.fromCharCode().
Also, the output you suggest for the 2nd example doesn't seem correct.
var buf = new Buffer(2);
buf.writeUInt16LE(0x0512, 0);
console.log(buf.toString('utf8')); // "\u0012\u0005"
You can detour with String.fromCharCode() to see the UTF-8 encoding.
var buf = new Buffer(String.fromCharCode(0x0512), 'utf8');
console.log(buf); // <Buffer d4 92>