Create valid h264 from partial stream of h264 video data and wrap as Mp4 - node.js

Lets say that I am reading from data stream, and that stream is sending the content of an h264 video feed. Given I read from that stream and I have some amount of data consisting of an indeterminate number of frames (NAL?). Given that i know the framerate, and size of the originating video, how would I go about converting this snippet into a mp4 that i could view? The video does not contain audio.
I want to do this using nodejs? My attempts to do so have produced nothing resembling a valid h264 file to convert into mp4. My thoughts so far were to strip any data preceding the first found start code in the data and feed that into a file and use ffmpeg (currently just testing in the command line) to convert the file to mp4.
What's the correct way to go about doing this?
ie. something like this (it's in Typescript but same thing)
//We assume here that when this while loop exist at least one full frame of data will have been read and written to disk
let stream: WriteStream = fs.createWriteStream("./test.h264")
while(someDataStream.available()) { //just an example not real code
let data: Buffer = someDataStream.readSomeData() //just an example not a real method call
let file = null;
try {
file = fs.statSync("./test.h264");
} catch (error) {
console.error(error)
}
if(!stream.writable) {
console.error("stream not writable")
} else if(file == null || file.size <= 0) {
let index = data.indexOf(0x7C)
console.log("index: " + index)
if(index > 0) {
console.log("index2: " + data.slice(index).indexOf(0x7c))
stream.write(data.slice(index))
}
} else {
stream.write(data)
}
}

To handle a data stream, you'll need to emit fragmented MP4. Like all MP4, fMP4 streams begin with a preamble containing ftyp, moov, and styp boxes. Then each frame is encoded with a moof / mdat box pair.
In order to generate a useful preamble from your H.264 bitstream, you need to locate a SPS / PPS pair of NALUs in the H264 data, to set up the avc1 box within the moov box. Those two NALUs are often immediately followed by an I-frame (a key frame). The first frame in a stream must be an I-frame, and subsequent ones can be P- or B- frames. E
It's a fairly complex task involving lots of bit-banging and buffer-shuffling (those are technical terms ;-).
I've been working on a piece of js code to extract H.264 from webm and put it into fmp4. It's not yet complete. It's backed up by another piece of code to decode the parts of the H264 stream that are needed to pack it properly into fMP4.
I wish I could write, "here are the ten lines of code you need" but those formats (fMP4 and H264) aren't simple enough to make that possible.

Idk why none of those questions doesn't actually have an easy answer. Here you go, Node.js solution, i argument just in case you need to offset the search
const soi = Buffer.from([0x00, 0x00, 0x00, 0x01]);
function findStartFrame(buffer, i = -1) {
while ((i = buffer.indexOf(soi, i + 1)) !== -1) {
if ((buffer[i + 4] & 0x1F) === 7) return i
}
return -1
}

Related

convert audio file to Linear PCM 16-bit

I am trying to send an audio file through a websocket, and I realised that in order to do so i need to convert the mp3 file to a Linear PCM 16-bit code, but i cant find a way to do so.
here is what i want to do:
let mp3File = // the 16-bit pcm file
ws.on('message', async(msg) => {
if (typeof msg === "string") {
} else if (recognizeStream) {
recognizeStream.write(msg);
}
ws.send(mp3File) <== stream back the audio file
});
});
some background, the stream is a phone call (via vonage api) so ny ws connected to phone call and hear the user input, and then after some logic on my server i want to play to the user a mp3 file that is a local file in my server, via ws.send().
-----------update--------
now, if i send the pcm data from the stream (the raw audio from phone call)
its works (the server echoing the phone call )
so i want to convert the mp3 file to the same format so i could send it to via ws.send().
-----------update 2--------
after making my audio file at the right format which is:
" Linear PCM 16-bit, with either a 8kHz or a 16kHz sample rate, and a 20ms frame size "
i am trying to send the file trough the web socket but i dont know how to do so,
i have the file in the project folder but i dont know how to send it via websocket , i looked for how to do so but i dident find anything.
i am trying to do what specified here:
First let's understand what this means:
Linear PCM 16-bit, with either a 8kHz or a 16kHz sample rate, and a
20ms frame size
They are talking about 2 things here:
The format of audio data, which is "Linear PCM 16-bit, with either a 8kHz or a 16kHz sample rate"
How you send this audio data to them and how they send it to you: in chunks of audio data worth 20ms frames
Based on the audio format, if you choose "16bit Linear PCM with sample rate of 16K" implies:
samplerate = 16000
samplewidth = 16 bits = 2 byte
So an audio chunk of 1 second will contain bytes = (16000 * 2) = 32000 bytes
this means a 20ms/0.02s frame of audio will be equivalent to (32000*0.2) = 640 bytes
There are 2 things needed:
convert mp3 to wav. Install ffmpeg on your system and run this command
ffmpeg -i filename.mp3 -ar 16000 -sample_fmt s16 output.wav
This converts your filename.mp3 to output.wav which will be Linear PCM 16-bit in 16K samplerate
In your code, when you send audio back, you need to stream it as chunks of 640 bytes, not the entire file data in one shot. There are 3 options:
run a loop to write write all the audio to the websocket but in chunks of 640 bytes.
but this has an issue, Nexmo will buffer only first 20s of audio. Anything more than that will be discarded
start an async task that runs every 20ms and writes 640 bytes of data to websocket.
write when you get audio from nexmo (this is the one I will show)
Since nexmo will send you 640 bytes every 20ms, you can just send back 640 bytes at same time.
I'm writing this example using npm websocket package.
var fs = require('fs');
var binaryData = fs.readFileSync('output.wav');
var start = 44 // discard the wav header
var chunkSize = 640
...
// ws is a websocket connection object
connection.on('message', function(message) {
if (message.type === 'utf8') {
// handle a text message here
}
else if (message.type === 'binary') {
// print length of audio sent by nexmo. will be 640 for 16K and 320 for 8K
console.log('Received Binary Message of ' + message.binaryData.length + ' bytes');
if (start >= binaryData.length) {
// slice a chunk and send
toSend = binaryData.slice(start, start + chunkSize)
start = start + chunkSize
connection.sendBytes(toSend);
console.log('Sent Binary Message of ' + toSend.length + ' bytes');
}
} ...
});
Remember, there will be some delay from the point you send the audio from your server to nexmo, and you hearing on other side.
It can vary from half a second to even more depending on the location of Nexmo's datacentre, of the server where you run your code, network speed etc.
I have observed it to be close to 0.5 sec.

ffmpeg audio encoder frame size

I need to convert audio data from AV_CODEC_ID_PCM_S16LE to AV_CODEC_ID_PCM_ALAW and I am using this code as an example. The example code does essentially this (error checking omitted for brevity):
const AVCodec* codec = avcodec_find_encoder(AV_CODEC_ID_MP2);
AVCodecContext* c = avcodec_alloc_context3(codec);
c->bit_rate = 64000;
c->sample_fmt = AV_SAMPLE_FMT_S16;
c->sample_rate = select_sample_rate(codec);
c->channel_layout = select_channel_layout(codec);
c->channels = av_get_channel_layout_nb_channels(c->channel_layout);
avcodec_open2(c, codec, NULL);
AVFrame* frame = av_frame_alloc();
frame->nb_samples = c->frame_size;
frame->format = c->sample_fmt;
frame->channel_layout = c->channel_layout;
The example code subsequently uses c->frame_size in a for loop.
My code is similar to the above with the following differences:
const AVCodec* codec = avcodec_find_encoder(AV_CODEC_ID_PCM_ALAW);
c->sample_rate = 8000;
c->channel_layout = AV_CH_LAYOUT_MONO;
c->channels = 1;
After calling avcodec_open2, c->frame_size is zero. The example code never sets the frame size so I assume that it expects either avcodec_alloc_context3 or avcodec_open2 to set it. Is this a correct assumption? Is the setting of the frame size based on the codec being used? If I have to set the frame size explicitly, is there a recommended size?
EDIT:
Based on #the-kamilz answer it appears that the example code is not robust. The example assumes that c->frame_size will be set but that appears to be dependent on the codec. In my case, codec->capabilities was in fact set to AV_CODEC_CAP_VARIABLE_FRAME_SIZE. So I modified my code to check c->frame_size and use it only if it is not zero. If it is zero, I just picked an arbitrary one second worth of data for frame->nb_samples.
In the FFmpeg documentation it is mentioned as:
int AVCodecContext::frame_size
Number of samples per channel in an audio frame.
encoding: set by libavcodec in avcodec_open2(). Each submitted frame except the last must contain exactly frame_size samples per channel.
May be 0 when the codec has AV_CODEC_CAP_VARIABLE_FRAME_SIZE set, then
the frame size is not restricted.
decoding: may be set by some decoders to indicate constant frame size
Hope that helps.
you don't control the frame size explicitly, it is set by the encoder depending on the codecs provided at initialization (opening) time
once avcodec_open2() is successful, you can retrieve the frame's buffer size with av_samples_get_buffer_size()

how to concatenate two wav audio files with 30 seconds of white sound using NAudio

I need to concatenate two wav audio files with 30 seconds of whute sound between them.
I want to use the NAudio library - or with any other way that work.
How to do it ?
( the different from any other question is that i need not only to make one audio file from two different audio files .. i also need to add silent between them )
Assuming your WAV files have the same sample rate and channel count, you can concatenate using FollowedBy and use SignalGenerator combined with Take to get the white noise.
var f1 = new AudioFileReader("ex1.wav");
var f2 = new SignalGenerator(f1.WaveFormat.SampleRate, f1.WaveFormat.Channels) { Type = SignalGeneratorType.White, Gain = 0.2f }.Take(TimeSpan.FromSeconds(5));
var f3 = new AudioFileReader("ex3.wav");
using (var wo = new WaveOutEvent())
{
wo.Init(f1.FollowedBy(f2).FollowedBy(f3));
wo.Play();
while (wo.PlaybackState == PlaybackState.Playing) Thread.Sleep(500);
}

JAudioTagger Deleting First Few Seconds of Track

I've written a simple Groovy script (below) to set the values of four of the ID3v1 and ID3v2 tag fields in mp3 files using the JAudioTagger library. The script successfully makes the changes but it also deletes the first 5 to 10 seconds of some of the files, other files are unaffected. It's not a big problem, but if anyone knows a simple fix, I would be grateful. All the files are from the same source, all have v1 and v2 tags, I can find no obvious difference in the source files to explain it.
import org.jaudiotagger.*
java.util.logging.Logger.getLogger("org.jaudiotagger").setLevel(java.util.logging.Level.OFF)
Integer trackNum = 0
Integer totalFiles = 0
Integer invalidFiles = 0
validMP3File = true
def dir = new File(/D:\Users\Jeremy\Music\Speech Radio\Unlistened\Z Temp Files to MP3 Tagged/)
dir.eachFile({curFile ->
totalFiles ++
try {
mp3File = org.jaudiotagger.audio.AudioFileIO.read(curFile)
} catch (org.jaudiotagger.audio.exceptions.CannotReadException e) {
validMP3File = false
invalidFiles ++
}
// Get the file name excluding the extension
baseFilename = org.jaudiotagger.audio.AudioFile.getBaseFilename(curFile)
// Check that it is an MP3 file
if (validMP3File) {
if (mp3File.getAudioHeader().getEncodingType() != 'mp3') {
validMP3File = false
invalidFiles ++
}
}
if (validMP3File) {
trackNum ++
if (mp3File.hasID3v1Tag()) {
curTagv1 = mp3File.getID3v1Tag()
} else {
curTagv1 = new org.jaudiotagger.tag.id3.ID3v1Tag()
}
if (mp3File.hasID3v2Tag()) {
curTagv2 = mp3File.getID3v2TagAsv24()
} else {
curTagv2 = new org.jaudiotagger.tag.id3.ID3v23Tag()
}
curTagv1.setField(org.jaudiotagger.tag.FieldKey.TITLE, baseFilename)
curTagv2.setField(org.jaudiotagger.tag.FieldKey.TITLE, baseFilename)
curTagv1.setField(org.jaudiotagger.tag.FieldKey.ARTIST, "BBC Radio")
curTagv2.setField(org.jaudiotagger.tag.FieldKey.ARTIST, "BBC Radio")
curTagv1.setField(org.jaudiotagger.tag.FieldKey.ALBUM, "BBC Radio - 20130205")
curTagv2.setField(org.jaudiotagger.tag.FieldKey.ALBUM, "BBC Radio - 20130205")
curTagv1.setField(org.jaudiotagger.tag.FieldKey.TRACK, trackNum.toString())
curTagv2.setField(org.jaudiotagger.tag.FieldKey.TRACK, trackNum.toString())
mp3File.setID3v1Tag(curTagv1)
mp3File.setID3v2Tag(curTagv2)
mp3File.save()
}
})
println """$trackNum tracks created from $totalFiles files with $invalidFiles invalid files"""
I'm still investigating and it appears that there is no problem with JAudioTagger. Before setting the tags, I use Total Recorder to reduce the quality of the download from 128kbps, 44,100Hz to 56kbps, 22,050Hz. This reduces the file size to less than half and the quality is fine for speech radio.
If I run my script on the original files, none of the audio track is deleted. The deletion of the first part of the audio track only occurs with the files that have been processed by Total Recorder.
Looking at the JAudioTagger logging for these files, there does appear to be a problem with the header:
Checking further because the ID3 Tag ends at 0x23f9 but the mp3 audio doesnt start until 0x7a77
Confirmed audio starts at 0x7a77 whether searching from start or from end of ID3 tag
This check is not performed for files that have not been processed by Total Recorder.
The log of the header read operation also shows (for a 27 minute track):
trackLength:06:52
It looks as though I shall have to find a new MP3 file editor!
Instead of
mp3File.save()
could you try:
mp3File.commit()
No idea if it will help, but that seems to be the documented method?

ffmpeg audio decode data is zero

I tried to decode the audio using ffmpeg with the following code:
NSMutableData *finalData = [NSMutableData data];
......
while(av_read_frame(pFormatCtx, &packet) >= 0){
if(packet.stream_index == videoStream)
{
int consumed = avcodec_decode_audio4(pCodecCtx, pFrame, &got_frame_ptr, &packet);
if(got_frame_ptr)
{
[finalData appendBytes:(pFrame->data)[0] length:(pFrame->linesize)[0]];
}
}
av_free_packet(&packet);
}
......
[finalData writeToFile:path atomically:YES];
Bu the saved file can't be played, even I changed the file extension to wav. When I look into it in HexEdit (a Hex editor), I found there are many zero bytes. For example the content of the file before offset 0x970 are all zero. Is there any error in my code? Any help will be appreciated.
Actually the decode result is good. The zero bytes in the file is normal, because the decode result is PCM data. I tried to import the data into Adobe Audition, it can be played. FYI.

Resources