I'm trying to convert rtmp streams to m3u8 stream. To reach that aim I use FFMPEG. Now, there is no problem with converting and downloading. However, it writes lots of .ts file such as channel0000.ts,channel0001.ts,channel0002.ts. Per every 10 seconds, 1 ts file is created. In this point, I want a single ts file. In other words, I need overwriting because I don't want to store all the stream I need just last 10 seconds. When I try to write on the same file I get this error:
Invalid segment filename template 'channel.ts'
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argumentStream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #0:1 -> #0:1 (copy)
Last message repeated 1 times
Here is my FFMPEG command.
ffmpeg -loglevel quiet -i rtmp://example -c:v libx264 -profile:v baseline -level 3.1 -c:a aac -strict experimental -f mpegts - | ffmpeg -i - -c copy -map 0 -f segment -segment_list channel.m3u8 -segment_format mpegts -segment_time 10 channel%04d.ts
Any suggestion?
In FFMPEG documentation, I found "segment_wrap" options. When you add this option, files are written in a loop. In my case I added "-segment_wrap 1" command part and it writes just a single file now.
I had a similar issue, my solution was to pipe out the entire result of the hls muxer, except for the header file using the -hls_segment_filename pipe:1 option (I'm therefore just piping out the .ts files and not the .m3u8 ones).
On my pipe out, i stuck a piece of code I wrote that detects the header of a .ts file, which looks something like the bytes [71, 64, 17, 1X, 0, 66, 240, 34, 0, 1, 193, 0, 0, 255, 1, 255, 0, 1, 252, 128, 17, 72] (I recommend GHex to check yoour .ts files if you want more of the header to discriminate). I then made a piece of code that cuts along this line (warning, the fourth byte here can change its value, but it is the only one), and recomposes the file.
The only thing left to do I think for your application is to use a queue for the file's content. If you want ten seconds and your file are about 1s you can use a queue of length 10.
My code was in go (because my whole project was), so here's a bit of code in go that might help :
type tsFile struct {
contents []byte
finished bool
child *tsFile
}
func in(a int, array int[]) bool{
for _, b := range list {
if b == a {
return true
}
}
return false
}
func CutAdd(curFile *tsFile, pipedOut []bytes){
header := [...]bytes{} // insert here the header you want
unNeededIndex := [...]int{3} // insert here the bytes indexes that change between files
cur_header_pointer := 0
header_cache := make([]byte, 0, len(header))
for i, b := range(pipedOut){
if header[cur_header_pointer] == b || in(cur_header_pointer, unNeededIndex){
header_cache = append(header_cache, b)
cur_header_pointer ++
if cur_header_pointer == len(header){
curFile.finished = true
curFile.child = &tsFile{contents : header_cache}
CutAdd(curFile.child, pipedOut[i:])
return
}
} else {
if cur_header_pointer != 0 {
for _, cached_b := range(header_cache){
curFile.contents = append(curFile.contents, cached_b) // we store what we thought was a header
}
cur_header_pointer = 0
}
curFile.contents = append(curFile.contents, b) // we store the byte
}
}
}
It is a bit janky, but it works for me (also there might be mistakes, I didn't put the actual code I made then, but you should have a rough idea of what I mean)
Related
I have a file that contains a list of about forty thousand integers that are space delimited, with each integer between the value of 0 and 255. It is this file here:
https://github.com/johnlai2004/sound-project/blob/master/integers.txt
If you connect a speaker to an ESP32 breakout board, then run this list of integers through the digital to analog converter at a frequency of 24kHz, you will hear the sentence, "That's not the post that you missed."
What I want to know is how do you use FFMPEG to convert this list of integers into a sound file that other computer can play to hear the same phrase? I tried this command:
ffmpeg -f u8 -ac 1 -ar 24000 -i integers.txt -y audio.wav
But my audio.wav just sounds like white noise. I tried a few other values for -f and for -ar, but all I hear are different frequencies of white noise and maybe some extra buzzing.
Is it possible to use ffmpeg to translate my list of integers into an audio file for other computers to play? If so, what's the correct ffmpeg command to do this?
OTHER NOTES
If it helps, this is the sketch file that I upload to an ESP32 if I want to hear the audio:
https://github.com/johnlai2004/sound-project/blob/master/play-audio.ino
In short, the file looks like this:
#define speakerPin 25 //The pins to output audio on. (9,10 on UNO,Nano)
#define bufferTotal 1347
#define buffSize 32
byte buffer[bufferTotal][buffSize];
int buffItemN = 0;
int bufferN = 0;
hw_timer_t * timer = NULL;
portMUX_TYPE timerMux = portMUX_INITIALIZER_UNLOCKED;
void IRAM_ATTR onTimer() {
portENTER_CRITICAL_ISR(&timerMux);
byte v = buffer[bufferN][buffItemN];
dacWrite(speakerPin,v);
buffItemN++;
if(buffItemN >= buffSize){ //If the buffer is empty, do the following
buffItemN = 0; //Reset the sample count
bufferN++;
if(bufferN >= bufferTotal)
bufferN = 0;
}
portEXIT_CRITICAL_ISR(&timerMux);
}
void setup() {
/* buffer records */
buffer[0][0]=88; // I split the long list of integers and load it into a 2D array
buffer[0][1]=88;
buffer[0][2]=86;
buffer[0][3]=85;
//etc....
buffer[1346][28]=94;
buffer[1346][29]=92;
buffer[1346][30]=92;
buffer[1346][31]=95;
/* end buffer records */
timer = timerBegin(0, 80, true);
timerAttachInterrupt(timer, &onTimer, true);
timerAlarmWrite(timer, 41, true);
timerAlarmEnable(timer);
}
void loop() {
}
The buffer... is the list of integers found in the integers.txt file.
As #Gyan suggested in comments, I had to convert my list of integers to a binary file first before running the ffmpeg command. So I created a golang script called main.go with this:
package main
import (
"io/ioutil"
"strings"
"strconv"
"os"
)
func main() {
input:="./integers.txt"
output:="./binary.raw"
// Load the list of integers into memory
contentbyte, _ := ioutil.ReadFile(input)
content := strings.Split(string(contentbyte)," ");
// Prepare to output a new binary file
f, err := os.OpenFile(output, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0600)
if err != nil {
panic(err)
}
defer f.Close()
for _,val := range content {
// Convert each integer to a binary value and write to output file
i,_ := strconv.Atoi(val)
if _, err = f.Write([]byte{byte(i)}); err != nil {
panic(err)
}
}
}
I run the go run main.go to give me the binary.raw file. I then ran the ffmpeg command as posted in my question like this ffmpeg -f u8 -ar 24000 -ac 1 -i binary.raw -y audio.wav.
The audio.wav file sounds just like the output of my ESP32 + speaker, which is what I wanted.
So to preface my problem, I'll give some context.
In SDL2 you can load wav files such as from the wiki:
SDL_AudioSpec wav_spec;
Uint32 wav_length;
Uint8 *wav_buffer;
/* Load the WAV */
if (SDL_LoadWAV("test.wav", &wav_spec, &wav_buffer, &wav_length) == NULL) {
fprintf(stderr, "Could not open test.wav: %s\n", SDL_GetError());
} else {
/* Do stuff with the WAV data, and then... */
SDL_FreeWAV(wav_buffer);
}
The issue I'm getting from SDL_GetError is Complex WAVE files not supported
Now the wav file I'm intending to open has the following properties:
Playing test.wav.
Detected file format: WAV / WAVE (Waveform Audio) (libavformat)
ID_AUDIO_ID=0
[lavf] stream 0: audio (pcm_s24le), -aid 0
Clip info:
encoded_by: Pro Tools
ID_CLIP_INFO_NAME0=encoded_by
ID_CLIP_INFO_VALUE0=Pro Tools
originator_reference:
ID_CLIP_INFO_NAME1=originator_reference
ID_CLIP_INFO_VALUE1=
date: 2016-05-1
ID_CLIP_INFO_NAME2=date
ID_CLIP_INFO_VALUE2=2016-05-1
creation_time: 20:13:34
ID_CLIP_INFO_NAME3=creation_time
ID_CLIP_INFO_VALUE3=20:13:34
time_reference:
ID_CLIP_INFO_NAME4=time_reference
ID_CLIP_INFO_VALUE4=
ID_CLIP_INFO_N=5
Load subtitles in dir/
ID_FILENAME=dir/test.wav
ID_DEMUXER=lavfpref
ID_AUDIO_FORMAT=1
ID_AUDIO_BITRATE=2304000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
ID_START_TIME=0.00
ID_LENGTH=135.53
ID_SEEKABLE=1
ID_CHAPTERS=0
Selected audio codec: Uncompressed PCM [pcm]
AUDIO: 48000 Hz, 2 ch, s24le, 2304.0 kbit/100.00% (ratio: 288000->288000)
ID_AUDIO_BITRATE=2304000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
AO: [pulse] 48000Hz 2ch s16le (2 bytes per sample)
ID_AUDIO_CODEC=pcm
From the wiki.libsdl.org/SDL_OpenAudioDevice page and subsequent wiki.libsdl.org/SDL_AudioSpec#Remarks page I can at least surmise that a wav file of:
freq = 48000;
format = AUDIO_F32;
channels = 2;
samples = 4096;
quality should work.
The main problem I can see is that my wav file has the s16le format whereas it's not listed on the SDL_AudioSpec page.
This leads me to believe I need to reduce the quality of test.wav so it does not appear as "complex" in SDL.
When I search engine Complex WAVE files not supported nothing helpful comes up, except it appears in the SDL_Mixer library, which as far as I know I'm not using.
Can the format be changed via ffmepg to work in SDL2?
Edit: This appears to be the actual code in SDL2 where it complains. I don't really know enough about C to dig all the way through the vast SDL2 library, but I thought it might help if someone notices something just from hinting variable names and such:
/* Read the audio data format chunk */
chunk.data = NULL;
do {
if ( chunk.data != NULL ) {
SDL_free(chunk.data);
chunk.data = NULL;
}
lenread = ReadChunk(src, &chunk);
if ( lenread < 0 ) {
was_error = 1;
goto done;
}
/* 2 Uint32's for chunk header+len, plus the lenread */
headerDiff += lenread + 2 * sizeof(Uint32);
} while ( (chunk.magic == FACT) || (chunk.magic == LIST) );
/* Decode the audio data format */
format = (WaveFMT *)chunk.data;
if ( chunk.magic != FMT ) {
SDL_SetError("Complex WAVE files not supported");
was_error = 1;
goto done;
}
After a couple hours of fun audio converting I got it working, will have to tweak it to try and get better sound quality.
To answer the question at hand, converting can be done by:
ffmpeg -i old.wav -acodec pcm_s16le -ac 1 -ar 16000 new.wav
To find codecs on your version of ffmpeg:
ffmpeg -codecs
This format works with SDL.
Next within SDL when setting the desired SDL_AudioSpec make sure to have the correct settings:
freq = 16000;
format = AUDIO_S16LSB;
channels = 2;
samples = 4096;
Finally the main issue was most likely using the legacy SDL_MixAudio instead of the newer SDL_MixAudioFormat
With the following settings:
SDL_MixAudioFormat(stream, mixData, AUDIO_S16LSB, len, SDL_MIX_MAXVOLUME / 2); as can be found on the wiki.
I do have a very particular problem, I wish I could find the answer to.
I'm trying to read an AAC stream from an URL (online streaming radio e.g. live.noroc.tv:8000/radionoroc.aacp) with NAudio library and get IEEE 32 bit floating samples.
Besides that I would like to resample the stream to a particular sample rate and channel count (rate 5512, mono).
Below is the code which accomplishes that:
int tenSecondsOfDownloadedAudio = 5512 * 10;
float[] buffer = new float[tenSecondsOfDownloadedAudio];
using (var reader = new MediaFoundationReader(pathToUrl))
{
var ieeeFloatWaveFormat = WaveFormat.CreateIeeeFloatWaveFormat(5512, 1); // mono
using (var resampler = new MediaFoundationResampler(reader, ieeeFloatWaveFormat))
{
var waveToSampleProvider = new WaveToSampleProvider(resampler);
int readSamples = 0;
int tempBuffer = new float[5512]; // 1 second buffer
while(readSamples <= tenSecondsOfDownloadedAudio)
{
int read = waveToSampleProvider.Read(tempBuffer, 0, tempBuffer.Length);
if(read == 0)
{
Thread.Sleep(500); // allow streaming buffer to get loaded
continue;
}
Array.Copy(tempBuffer, 0, buffer, readSamples, tempBuffer.Length);
readSamples += read;
}
}
}
These particular samples are then written to a Wave audio file using the following simple method:
using (var writer = new WaveFileWriter("path-to-audio-file.wav", WaveFormat.CreateIeeeFloatWaveFormat(5512, 1)))
{
writer.WriteSamples(samples, 0, samples.Length);
}
What I've encountered is that NAudio does not read 10 seconds of audio (as it was requested) but only 5, though the buffer array gets fully loaded with samples (which at this rate and channel count should contain 10 seconds of audio samples).
Thus the final audio file plays the stream 2 times as slower as it should (5 second stream is played as 10).
Is this somewhat related to different bit depths (should I record at 64 bits per sample as opposite to 32).
I do my testing at Windows Server 2008 R2 x64, with MFT codecs installed.
Would really appreciate any suggestions.
The problem seems to be with MediaFoundationReader failing to handle HE-AACv2 in ADTS container with is a standard online radio stream format and most likely the one you are dealing with.
Adobe products have the same problem mistreating this format exactly the same way^ stretching the first half of the audio to the whole duration and : Corrupted AAC files recorded from online stream
Supposedly, it has something to do with HE-AACv2 stereo stream being actually a mono stream with additional info channel for Parametric Stereo.
I've discovered it is dangerous to assume that all PCM wav audio files have 44 bytes of header data before the samples begin. Though this is common, many applications (ffmpeg for example), will generate wavs with a 46-byte header and ignoring this fact while processing will result in a corrupt and unreadable file. But how can you detect how long the header actually is?
Obviously there is a way to do this, but I searched and found little discussion about this. A LOT of audio projects out there assume 44 (or conversely, 46) depending on the authors own context.
You should be checking all of the header data to see what the actual sizes are. Broadcast Wave Format files will contain an even larger extension subchunk. WAV and AIFF files from Pro Tools have even more extension chunks that are undocumented as well as data after the audio. If you want to be sure where the sample data begins and ends you need to actually look for the data chunk ('data' for WAV files and 'SSND' for AIFF).
As a review, all WAV subchunks conform to the following format:
Subchunk Descriptor (4 bytes)
Subchunk Size (4 byte integer, little endian)
Subchunk Data (size is Subchunk Size)
This is very easy to process. All you need to do is read the descriptor, if it's not the one you are looking for, read the data size and skip ahead to the next. A simple Java routine to do that would look like this:
//
// Quick note for people who don't know Java well:
// 'in.read(...)' returns -1 when the stream reaches
// the end of the file, so 'if (in.read(...) < 0)'
// is checking for the end of file.
//
public static void printWaveDescriptors(File file)
throws IOException {
try (FileInputStream in = new FileInputStream(file)) {
byte[] bytes = new byte[4];
// Read first 4 bytes.
// (Should be RIFF descriptor.)
if (in.read(bytes) < 0) {
return;
}
printDescriptor(bytes);
// First subchunk will always be at byte 12.
// (There is no other dependable constant.)
in.skip(8);
for (;;) {
// Read each chunk descriptor.
if (in.read(bytes) < 0) {
break;
}
printDescriptor(bytes);
// Read chunk length.
if (in.read(bytes) < 0) {
break;
}
// Skip the length of this chunk.
// Next bytes should be another descriptor or EOF.
int length = (
Byte.toUnsignedInt(bytes[0])
| Byte.toUnsignedInt(bytes[1]) << 8
| Byte.toUnsignedInt(bytes[2]) << 16
| Byte.toUnsignedInt(bytes[3]) << 24
);
in.skip(Integer.toUnsignedLong(length));
}
System.out.println("End of file.");
}
}
private static void printDescriptor(byte[] bytes)
throws IOException {
String desc = new String(bytes, "US-ASCII");
System.out.println("Found '" + desc + "' descriptor.");
}
For example here is a random WAV file I had:
Found 'RIFF' descriptor.
Found 'bext' descriptor.
Found 'fmt ' descriptor.
Found 'minf' descriptor.
Found 'elm1' descriptor.
Found 'data' descriptor.
Found 'regn' descriptor.
Found 'ovwf' descriptor.
Found 'umid' descriptor.
End of file.
Notably, here both 'fmt ' and 'data' legitimately appear in between other chunks because Microsoft's RIFF specification says that subchunks can appear in any order. Even some major audio systems that I know of get this wrong and don't account for that.
So if you want to find a certain chunk, loop through the file checking each descriptor until you find the one you're looking for.
The trick is to look at the "Subchunk1Size", which is a 4-byte integer beginning at byte 16 of the header. In a normal 44-byte wav, this integer will be 16 [10, 0, 0, 0]. If it's a 46-byte header, this integer will be 18 [12, 0, 0, 0] or maybe even higher if there is extra extensible meta data (rare?).
The extra data itself (if present), begins in byte 36.
So a simple C# program to detect the header length would look like this:
static void Main(string[] args)
{
byte[] bytes = new byte[4];
FileStream fileStream = new FileStream(args[0], FileMode.Open, FileAccess.Read);
fileStream.Seek(16, 0);
fileStream.Read(bytes, 0, 4);
fileStream.Close();
int Subchunk1Size = BitConverter.ToInt32(bytes, 0);
if (Subchunk1Size < 16)
Console.WriteLine("This is not a valid wav file");
else
switch (Subchunk1Size)
{
case 16:
Console.WriteLine("44-byte header");
break;
case 18:
Console.WriteLine("46-byte header");
break;
default:
Console.WriteLine("Header contains extra data and is larger than 46 bytes");
break;
}
}
In addition to Radiodef's excellent reply, I'd like to add 3 things that aren't obvious.
The only rule for WAV files is the FMT chunk comes before the DATA chunk. Apart from that, you will find chunks you don't know about at the beginning, before the DATA chunk and after it. You must read the header for each chunk to skip forward to find the next chunk.
The FMT chunk is commonly found in 16 byte and 18 byte variations, but the spec actually allows more than 18 bytes as well.
If the FMT chunk' header size field says greater than 16, Bytes 17 and 18 also specify how many extra bytes there are, so if they are both zero, you end up with an 18 byte FMT chunk identical to the 16 byte one.
It is safe to read in just the first 16 bytes of the FMT chunk and parse those, ignoring any more.
Why does this matter? - not much any more, but Windows XP's Media Player was able to play 16 bit WAV files, but 24 bit WAV files only if the FMT chunk was the Extended (18+ byte) version. There used to be a lot of complaints that "Windows doesn't play my 24 bit WAV files", but if it had an 18 byte FMT chunk, it would... Microsoft fixed that sometime during the early days of Windows 7, so 24 bit with 16 byte FMT files work fine now.
(Newly added) Chunk sizes with odd sizes occur quite often. Mostly seen when a 24 bit mono file is made. It is unclear from the spec, but the chunk size specifies the actual data length (the odd value) and a pad byte (zero) is added after the chunk and before the start of the next chunk. So chunks always start on even boundaries, but the chunk size itself is stored as the actual odd value.
I'm using libav to encode raw RGB24 frames to h264 and muxing it to flv. This works
all fine and I've streamed for more then 48 hours w/o any problems! My next step
is to add audio to the stream. I'll be capturing live audio and I want to encode it
in real time using speex, mp3 or nelly moser.
Background info
I'm new to digital audio and therefore I might be doing things wrong. But basically my application gets a "float" buffer with interleaved audio. This "audioIn" function gets called by the application framework I'm using. The buffer contains 256 samples per channel,
and I have 2 channels. Because I might be mixing terminology, this is how I use the
data:
// input = array with audio samples
// bufferSize = 256
// nChannels = 2
void audioIn(float * input, int bufferSize, int nChannels) {
// convert from float to S16
short* buf = new signed short[bufferSize * 2];
for(int i = 0; i < bufferSize; ++i) { // loop over all samples
int dx = i * 2;
buf[dx + 0] = (float)input[dx + 0] * numeric_limits<short>::max(); // convert frame of the first channel
buf[dx + 1] = (float)input[dx + 1] * numeric_limits<short>::max(); // convert frame of the second channel
}
// add this to the libav wrapper.
av.addAudioFrame((unsigned char*)buf, bufferSize, nChannels);
delete[] buf;
}
Now that I have a buffer, where each sample is 16 bits, I pass this short* buffer, to my
wrapper av.addAudioFrame() function. In this function I create a buffer, before I encode
the audio. From what I read, the AVCodecContext of the audio encoder sets the frame_size. This frame_size must match the number of samples in the buffer when calling avcodec_encode_audio2(). Why I think this, is because of what is documented here.
Then, especially the line:
If it is not set, frame->nb_samples must be equal to avctx->frame_size for all frames except the last.*(Please correct me here if I'm wrong about this).
After encoding I call av_interleaved_write_frame() to actually write the frame.
When I use mp3 as codec my application runs for about 1-2 minutes and then my server, which is receiving the video/audio stream (flv, tcp), disconnects with a message "Frame too large: 14485504". This message is generated because the rtmp-server is getting a frame which is way to big. And this is probably due to the fact I'm not interleaving correctly with libav.
Questions:
There quite some bits I'm not sure of, even when going through the source code of libav and therefore I hope if someone has an working example of encoding audio which comes from a buffer which which comes from "outside" libav (i.e. your own application). i.e. how do you create a buffer which is large enough for the encoder? How do you make the "realtime" streaming work when you need to wait on this buffer to fill up?
As I wrote above I need to keep track of a buffer before I can encode. Does someone else has some code which does this? I'm using AVAudioFifo now. The functions which encodes the audio and fills/read the buffer is here too: https://gist.github.com/62f717bbaa69ac7196be
I compiled with --enable-debug=3 and disable optimizations, but I'm not seeing any
debug information. How can I make libav more verbose?
Thanks!