convert audio file to Linear PCM 16-bit - node.js

I am trying to send an audio file through a websocket, and I realised that in order to do so i need to convert the mp3 file to a Linear PCM 16-bit code, but i cant find a way to do so.
here is what i want to do:
let mp3File = // the 16-bit pcm file
ws.on('message', async(msg) => {
if (typeof msg === "string") {
} else if (recognizeStream) {
recognizeStream.write(msg);
}
ws.send(mp3File) <== stream back the audio file
});
});
some background, the stream is a phone call (via vonage api) so ny ws connected to phone call and hear the user input, and then after some logic on my server i want to play to the user a mp3 file that is a local file in my server, via ws.send().
-----------update--------
now, if i send the pcm data from the stream (the raw audio from phone call)
its works (the server echoing the phone call )
so i want to convert the mp3 file to the same format so i could send it to via ws.send().
-----------update 2--------
after making my audio file at the right format which is:
" Linear PCM 16-bit, with either a 8kHz or a 16kHz sample rate, and a 20ms frame size "
i am trying to send the file trough the web socket but i dont know how to do so,
i have the file in the project folder but i dont know how to send it via websocket , i looked for how to do so but i dident find anything.
i am trying to do what specified here:

First let's understand what this means:
Linear PCM 16-bit, with either a 8kHz or a 16kHz sample rate, and a
20ms frame size
They are talking about 2 things here:
The format of audio data, which is "Linear PCM 16-bit, with either a 8kHz or a 16kHz sample rate"
How you send this audio data to them and how they send it to you: in chunks of audio data worth 20ms frames
Based on the audio format, if you choose "16bit Linear PCM with sample rate of 16K" implies:
samplerate = 16000
samplewidth = 16 bits = 2 byte
So an audio chunk of 1 second will contain bytes = (16000 * 2) = 32000 bytes
this means a 20ms/0.02s frame of audio will be equivalent to (32000*0.2) = 640 bytes
There are 2 things needed:
convert mp3 to wav. Install ffmpeg on your system and run this command
ffmpeg -i filename.mp3 -ar 16000 -sample_fmt s16 output.wav
This converts your filename.mp3 to output.wav which will be Linear PCM 16-bit in 16K samplerate
In your code, when you send audio back, you need to stream it as chunks of 640 bytes, not the entire file data in one shot. There are 3 options:
run a loop to write write all the audio to the websocket but in chunks of 640 bytes.
but this has an issue, Nexmo will buffer only first 20s of audio. Anything more than that will be discarded
start an async task that runs every 20ms and writes 640 bytes of data to websocket.
write when you get audio from nexmo (this is the one I will show)
Since nexmo will send you 640 bytes every 20ms, you can just send back 640 bytes at same time.
I'm writing this example using npm websocket package.
var fs = require('fs');
var binaryData = fs.readFileSync('output.wav');
var start = 44 // discard the wav header
var chunkSize = 640
...
// ws is a websocket connection object
connection.on('message', function(message) {
if (message.type === 'utf8') {
// handle a text message here
}
else if (message.type === 'binary') {
// print length of audio sent by nexmo. will be 640 for 16K and 320 for 8K
console.log('Received Binary Message of ' + message.binaryData.length + ' bytes');
if (start >= binaryData.length) {
// slice a chunk and send
toSend = binaryData.slice(start, start + chunkSize)
start = start + chunkSize
connection.sendBytes(toSend);
console.log('Sent Binary Message of ' + toSend.length + ' bytes');
}
} ...
});
Remember, there will be some delay from the point you send the audio from your server to nexmo, and you hearing on other side.
It can vary from half a second to even more depending on the location of Nexmo's datacentre, of the server where you run your code, network speed etc.
I have observed it to be close to 0.5 sec.

Related

Create valid h264 from partial stream of h264 video data and wrap as Mp4

Lets say that I am reading from data stream, and that stream is sending the content of an h264 video feed. Given I read from that stream and I have some amount of data consisting of an indeterminate number of frames (NAL?). Given that i know the framerate, and size of the originating video, how would I go about converting this snippet into a mp4 that i could view? The video does not contain audio.
I want to do this using nodejs? My attempts to do so have produced nothing resembling a valid h264 file to convert into mp4. My thoughts so far were to strip any data preceding the first found start code in the data and feed that into a file and use ffmpeg (currently just testing in the command line) to convert the file to mp4.
What's the correct way to go about doing this?
ie. something like this (it's in Typescript but same thing)
//We assume here that when this while loop exist at least one full frame of data will have been read and written to disk
let stream: WriteStream = fs.createWriteStream("./test.h264")
while(someDataStream.available()) { //just an example not real code
let data: Buffer = someDataStream.readSomeData() //just an example not a real method call
let file = null;
try {
file = fs.statSync("./test.h264");
} catch (error) {
console.error(error)
}
if(!stream.writable) {
console.error("stream not writable")
} else if(file == null || file.size <= 0) {
let index = data.indexOf(0x7C)
console.log("index: " + index)
if(index > 0) {
console.log("index2: " + data.slice(index).indexOf(0x7c))
stream.write(data.slice(index))
}
} else {
stream.write(data)
}
}
To handle a data stream, you'll need to emit fragmented MP4. Like all MP4, fMP4 streams begin with a preamble containing ftyp, moov, and styp boxes. Then each frame is encoded with a moof / mdat box pair.
In order to generate a useful preamble from your H.264 bitstream, you need to locate a SPS / PPS pair of NALUs in the H264 data, to set up the avc1 box within the moov box. Those two NALUs are often immediately followed by an I-frame (a key frame). The first frame in a stream must be an I-frame, and subsequent ones can be P- or B- frames. E
It's a fairly complex task involving lots of bit-banging and buffer-shuffling (those are technical terms ;-).
I've been working on a piece of js code to extract H.264 from webm and put it into fmp4. It's not yet complete. It's backed up by another piece of code to decode the parts of the H264 stream that are needed to pack it properly into fMP4.
I wish I could write, "here are the ten lines of code you need" but those formats (fMP4 and H264) aren't simple enough to make that possible.
Idk why none of those questions doesn't actually have an easy answer. Here you go, Node.js solution, i argument just in case you need to offset the search
const soi = Buffer.from([0x00, 0x00, 0x00, 0x01]);
function findStartFrame(buffer, i = -1) {
while ((i = buffer.indexOf(soi, i + 1)) !== -1) {
if ((buffer[i + 4] & 0x1F) === 7) return i
}
return -1
}

Get duration of remote MP3 file without full download

In a node.js script I need to get the duration of a remote MP3 file.
Currently I use download and mp3-duration modules to download the file and then compute the duration. It works but the download phase is very long.
I noticed that media players can display the duration almost instantly. Would it be somehow possible to get the duration without downloading the entire file first ?
ID3 Metadatas
mp3 files comes (optionnaly) with metadata called "ID3 tags" :
ID3 is a de facto standard for metadata in MP3 files
ID3v1 enhanced tag
The ID3v1 tag occupies 128 bytes, beginning with the string TAG 128 bytes from the end of the file.
The Enhanced tag is 227 bytes long, and placed before the ID3v1 tag.
The enhanced tag contains the start-time and end-time fields as the last 6 + 6 bytes of the tag formated like mmm:ss. So it gives you the duration of the song.
The solution would be to download the 355 last bytes of the file and check if the enhanced tag (TAG+) is present, then look for the last 12 bytes of the enhanced tag.
ID3v2 TLEN
Nowdays, we're mostly using ID3v2 which allows us to embed up to 256MB of informations :
ID3v2 tag is composed of frames. Each frame represents an information. Max frame size is 16MB. Max tag size is 256MB.
The frame you're interested in is the TLEN frame, which represent the length of the song.
But there is no guarantee that any mp3 file will have an ID3v2 tag or that the TLEN frame will be stored in its ID3v2 tag.
/!\ /!\ /!\ If the mp3 file does not contains metadata, the only solution is the one you already came up with : estimating the duration with the mp3-duration module.
How to get these metadata without downloading the entire file ?
Range requests
If the server accepts Range Request, then we just have to tell him the bytes we want !
ID3v1
const request = require('request-promise');
const NodeID3 = require('node-id3');
const mp3FileURL = "http://musicSite/yourmp3File.mp3";
const mp3FileHEAD = await request.head(mp3FileURL);
const serverAcceptRangeReq = mp3FileHEAD.headers['Accept-Ranges'] && mp3FileHEAD.headers['Accept-Ranges'].toLowerCase() != "none";
// here we assume that the server accepts range requests
const mp3FileSize = mp3FileHEAD.headers['Content-Length'];
const tagBytesHeader = {'Range': `${mp3FileSize - 355}-${mp3FileSize}`};
const tagBytes = await request.get(mp3FileURL, {headers: tagBytesHeader});
/!\ I didn't test the code, it just serves as a demonstration /!\
Then parse the response and check if tagBytes.includes('TAG+'), and if it does you've got your duration.
ID3v2
You can use the same method to check if the ID3v2 tags are at the beginning of the file and then use a module like node-id3 to parse the tags.
GET request to check for ID3v2
According to the documentation,
The ID3v2 tag header [...] should be the first information in the file, is 10 bytes
So even if the server does not accept Range Requests, you could come up with a solution using a simple GET request and "stopping" it when you receive more than 10 bytes :
const request = require('request');
let buff = new Buffer(0);
const r = request
.get(mp3FileURL)
.on('data', chunk => {
buff = Buffer.concat([buff, chunk]);
if (buff.length > 10) {
r.abort();
// deal with your buff
}
});
});
/!\ I didn't test the code, it just serves as a demonstration /!\
Wrapping it in a function that returns a Promise would be cleaner.

Is there any way to sample audio using OpenSL on android with different sampling rates and buffer sizes?

I have downloaded the audio-echo app from the android NDK portal for opensl. Due to the lack of documentation I'm not able to identify how to change the sampling rate and buffer size of the audio in and out.
If anybody has any idea on how to:
Change the buffer size and sampling rate on OpenSL
Read the buffers to be fed to a C code to be processed
Fed to the output module of OpenSL to be fed to the speakers
Another alternative I feel is read it at the preferred sampling rate and buffer size but downsample and upsample in the code itself and use a circular buffer to get desired data. But how are we reading and feeding the data in openSL?
In the OpenSL ES API, there are calls to create either a Player or a Recorder:
SLresult (*CreateAudioPlayer) (
SLEngineItf self,
SLObjectItf * pPlayer,
SLDataSource *pAudioSrc,
SLDataSink *pAudioSnk,
SLuint32 numInterfaces,
const SLInterfaceID * pInterfaceIds,
const SLboolean * pInterfaceRequired
);
SLresult (*CreateAudioRecorder) (
SLEngineItf self,
SLObjectItf * pRecorder,
SLDataSource *pAudioSrc,
SLDataSink *pAudioSnk,
SLuint32 numInterfaces,
const SLInterfaceID * pInterfaceIds,
const SLboolean * pInterfaceRequired
);
Note that both of these take a SLDataSource *pAudioSrc parameter.
To use a custom playback rate or recording rate, you have to set up this data source properly.
I use an 11Khz playback rate using this code:
// Configure data format.
SLDataFormat_PCM pcm;
pcm.formatType = SL_DATAFORMAT_PCM;
pcm.numChannels = 1;
pcm.samplesPerSec = SL_SAMPLINGRATE_11_025;
pcm.bitsPerSample = SL_PCMSAMPLEFORMAT_FIXED_16;
pcm.containerSize = 16;
pcm.channelMask = SL_SPEAKER_FRONT_CENTER;
pcm.endianness = SL_BYTEORDER_LITTLEENDIAN;
// Configure Audio Source.
SLDataSource source;
source.pFormat = &pcm;
source.pLocator = &bufferQueue;
To feed data to the speakers, a buffer queue is used that is filled by a callback. To set this callback, use SLAndroidSimpleBufferQueueItf, documented in section 8.12 SLBufferQueueItf of the OpenGL ES specification.

Play Raw File Using NAudio Library

Hello i have following code to play raw file.my raw file duration is 25 second. this code is working fine but after sometime my program play raw file very slow almost 50% speed and my raw file duration is increase 36 second when i restart my pc and run my program its working normaly again. i need to restart my pc every one hour to work correctly please check what is wrong with my code here is my code
Try
Dim rawStream = File.OpenRead("C:\myFile.raw")
Dim waveFormat = New NAudio.Wave.WaveFormat(8000, 16, 1)
Dim rawSource = New RawSourceWaveStream(rawStream, waveFormat)
Dim audioBufferSize = 320
Dim offset As Integer = 0
Dim buffer As Byte() = New Byte(audioBufferSize - 1) {}
Dim buffer2 As Byte() = New Byte(320) {}
While (rawSource.Read(buffer2, offset, audioBufferSize) > 0)
msport.Write(buffer2, offset, audioBufferSize)
Thread.Sleep(20)
End While
Catch ex As Exception
MsgBox(ex.ToString)
End Try
NAudio is not having any effect at all in your code sample. All you are doing is reading data from a file, and sending it to the serial port. RawSourceWaveStream simply attaches a WaveFormat to the file stream, but nothing is reading that stream. Whatever device you have listening to the audio data you're sending over the serial port will have an audio format it is expecting. You need to find out what that is. Then you can use NAudio to convert the audio to the correct format if it is not already at the right sample rate / channel count (which would be the two most likely causes of audio playing at 50% speed)

ffmpeg audio decode data is zero

I tried to decode the audio using ffmpeg with the following code:
NSMutableData *finalData = [NSMutableData data];
......
while(av_read_frame(pFormatCtx, &packet) >= 0){
if(packet.stream_index == videoStream)
{
int consumed = avcodec_decode_audio4(pCodecCtx, pFrame, &got_frame_ptr, &packet);
if(got_frame_ptr)
{
[finalData appendBytes:(pFrame->data)[0] length:(pFrame->linesize)[0]];
}
}
av_free_packet(&packet);
}
......
[finalData writeToFile:path atomically:YES];
Bu the saved file can't be played, even I changed the file extension to wav. When I look into it in HexEdit (a Hex editor), I found there are many zero bytes. For example the content of the file before offset 0x970 are all zero. Is there any error in my code? Any help will be appreciated.
Actually the decode result is good. The zero bytes in the file is normal, because the decode result is PCM data. I tried to import the data into Adobe Audition, it can be played. FYI.

Resources