Properly open audio files with libav/ffmpeg - audio

I am trying to decode audio samples from various file formats using ffmpeg. Therefore I have started some experimenting based on the code in this discussion: How to decode audio via FFmpeg in Android . I use the latest FFMPEG release (1.0) and compile it using https://github.com/halfninja/android-ffmpeg-x264
AVFormatContext * pFormatCtx;
avcodec_register_all();
av_register_all();
int lError;
if ((lError = avformat_open_input(&pFormatCtx, filename, NULL, 0))
!= 0) {
LOGE("Error open source file: %d", lError);
return;
}
if ((lError = avformat_find_stream_info(pFormatCtx, 0)) < 0) {
LOGE("Error find stream information: %d (Streams: %d)", lError, pFormatCtx->nb_streams);
return;
}
LOGE("audio format: %s", pFormatCtx->iformat->name);
LOGE("audio bitrate: %d", pFormatCtx->bit_rate);
audioStreamIndex = av_find_best_stream(pFormatCtx, AVMEDIA_TYPE_AUDIO,
-1, -1, &codec, 0);
//if (audioStreamIndex < 0 || audioStreamIndex >= pFormatCtx->nb_streams)
// audioStreamIndex = 0;
LOGE("Stream: %d (total: %d)", audioStreamIndex, pFormatCtx->nb_streams);
LOGE("audio codec: %s", codec->name);
FFMPEG is compiled using enable-decoder=mp1/mp2/mp3/ogg/vorbis/wav/aac/theora and without any external libraries (e.g. libmp3lame, libtheora, etc.)
Opening of mp3 and wav files works without problems producing the following output for instance for mp3:
audio format: mp3
audio bitrate: 256121
stream: 0 (total: 1)
audio codec: mp3
But when I try to open an ogg file I get this:
Error find stream information: -1 (Streams: 1)
When I manually set audioStreamIndex=0 and comment out the return statement:
Error find stream information: -1 (Streams: 1)
audio format: mp3
audio bitrate: 0
stream: 0 (total: 1)
audio codec: mp3
For m4a (AAC) I get this:
audio format: mp3
audio bitrate: 288000
stream: 0 (total: 1)
audio codec: mp1
but later it fails in avcodec_decode_audio3.
I also tried to manually force a format without success:
AVInputFormat *pForceFormat= av_find_input_format("ogg");
if ((lError = avformat_open_input(&pFormatCtx, filename, pForceFormat, 0))
// continue
Is there something wrong with the loading code which makes it only work with mp3 and wav and fails for other formats?
Regards,

The problem was a missing demuxer.

Related

Writing a raw opus buffer to a file in NodeJS

I'm working on Discord voice and I made voice receiver, but I can't seem to be able to write the audio buffer to a file to be able to play it. Here's my code:
stopRecording() {
this.recording = false;
let output = Buffer.concat(this.recordedBuffers);
fs.writeFileSync('./out.opus', output.toString());
this.recordedBuffers = [];
console.log('done');
};
The file gets created but it is unplayable. Any ideas?
It may be worth confirming that recordedBuffers contains Ogg Opus file data. The first 32 bytes should show file header data indicating this is an Ogg, Webm, or Mp4 container. Once the mime-type is confirmed, the file extension can be changed from .opus if needed:
/* .opus = 'OggS ... Opus'
* .webm = '... webm ...'
* .mp4 = '... mp4 ...'
* .wav = '... WAVE ...'
*/
new TextDecoder().decode(concatenatedBuffers.slice(0, 32))
FFmpeg could also inspect the downloaded file:
$ ffprobe audio.opus
Input #0, ogg, from 'audio.opus':
Duration: 00:06:00.11, start: 0.000000, bitrate: 99 kb/s
Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp

Convert .mp4 to .mpeg4 using Converter

I have an MP4 file, which I would like to convert into an MPEG4 file. TO do this, I have found the PythonVideoConvert package. On the PyPI page, the following code is given:
from converter import Converter
conv = Converter()
info = conv.probe('test/test1.avi')
PATH = 'C:/Users/.../'
convert = conv.convert(PATH +'Demo.mp4', PATH + 'Demo.mpeg4', {
'format': 'mpeg4',
'audio': {
'codec': 'aac',
'samplerate': 11025,
'channels': 2
},
'video': {
'codec': 'hevc',
'width': 720,
'height': 400,
'fps': 25
}})
When I run this code, a convert object is created. However, there is no .mpeg4 video in the PATH directory.
Therefore, I have two questions:
Is the code above correct for converting a .mp4 file into a .mpeg4 file
What do I need to run to save the converted video as a .mpeg4 file?
Based on Selcuk's comment, I ran the following code:
for timecode in convert:
pass
This gives the error:
Traceback (most recent call last):
File "<ipython-input-60-14c9225c3ac2>", line 1, in <module>
for timecode in convert:
File "C:\Users\20200016\Anaconda3\lib\site-packages\converter\__init__.py", line 229, in convert
optlist = self.parse_options(options, twopass)
File "C:\Users\20200016\Anaconda3\lib\site-packages\converter\__init__.py", line 60, in parse_options
raise ConverterError(f'Requested unknown format: {str(f)}')
ConverterError: Requested unknown format: mpeg4
So, my suggested format seems incorrect. What can I do to convert a video into .mpeg4?
I don't think PythonVideoConverter is meant to be used in Windows.
I was getting an exception AttributeError: module 'signal' has no attribute 'SIGVTALRM', because SIGVTALRM is not a valid signal in Windows.
The default path of FFmpeg an FFprobe command line tools, also doesn't make sense for Windows.
We may still use the package in Windows, but it's recommended to set ffmpeg_path and ffprobe_path.
Example:
conv = Converter(ffmpeg_path=r'c:\FFmpeg\bin\ffmpeg.exe', ffprobe_path=r'c:\FFmpeg\bin\ffprobe.exe')
We also have to disable the timeout feature, by setting timeout=None argument.
mpeg4 is not a valid FFmpeg format, but we can still use it as a file extension.
(format is FFmpeg terminology usually applies container format).
When non-standart file extension is used, we have to set the format entry.
Setting 'format': 'mp4' creates MP4 file container (may be created with the non-standart .mpeg4 file extension).
Complete code sample:
from converter import Converter
conv = Converter(ffmpeg_path=r'c:\FFmpeg\bin\ffmpeg.exe', ffprobe_path=r'c:\FFmpeg\bin\ffprobe.exe')
#info = conv.probe('test/test1.avi')
PATH = 'C:/Users/Rotem/'
convert = conv.convert(PATH + 'Demo.mp4', PATH + 'Demo.mpeg4', {
'format': 'mp4', #'format': 'mpeg4',
'audio': {
'codec': 'aac',
'samplerate': 11025,
'channels': 2
},
'video': {
'codec': 'hevc',
'width': 720,
'height': 400,
'fps': 25
}},
timeout=None)
# https://pypi.org/project/PythonVideoConverter/
for timecode in convert:
print(f'\rConverting ({timecode:.2f}) ...')
We may see the media information of Demo.mpeg4 using MediaInfo tool:
General
Complete name : C:\Users\Rotem\Demo.mpeg4
Format : MPEG-4
Format profile : Base Media
Codec ID : isom (isom/iso2/mp41)
File size : 207 KiB
Duration : 10 s 148 ms
Overall bit rate mode : Variable
Overall bit rate : 167 kb/s
Writing application : Lavf58.45.100
FileExtension_Invalid : braw mov mp4 m4v m4a m4b m4p m4r 3ga 3gpa 3gpp 3gp 3gpp2 3g2 k3g jpm jpx mqv ismv isma ismt f4a f4b f4v
Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main#L3#Main
Codec ID : hev1
Codec ID/Info : High Efficiency Video Coding
Duration : 10 s 0 ms
Bit rate : 82.5 kb/s
Width : 720 pixels
Height : 400 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 25.000 FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.011
Stream size : 101 KiB (49%)
Writing library : x265 3.4+28-419182243:[Windows][GCC 9.3.0][64 bit] 8bit+10bit+12bit
Encoding settings : ...
Color range : Limited
Codec configuration box : hvcC
Audio
ID : 2
Format : AAC LC
Format/Info : Advanced Audio Codec Low Complexity
Codec ID : mp4a-40-2
Duration : 10 s 148 ms
Duration_LastFrame : -70 ms
Bit rate mode : Variable
Bit rate : 79.1 kb/s
Maximum bit rate : 128 kb/s
Channel(s) : 2 channels
Channel layout : L R
Sampling rate : 11.025 kHz
Frame rate : 10.767 FPS (1024 SPF)
Compression mode : Lossy
Stream size : 98.0 KiB (47%)
Title : IsoMedia File Produced by Google, 5-11-2011
Language : English
Default : Yes
Alternate group : 1
In MediaInfo output, the MP4 file container applies "MPEG-4" format...
Note:
The HEVC video format applies H.265 video codec - in most cases the codec is considered to be more relevant then container.
'Requested unknown format: mpeg4'
*.mpeg4 is not valid container. mpeg4 is codec, *.something (avi, mp4, mov, mkv, ...) are containers.
basicly: codec.CONTAINER or your_mpeg4_video.mkv etc.
video codec (like mpeg4) handle only video, but you need more than only visual, you need audio, many audio tracks (eng, de, nl, 2.0, 5.1, 7.1 ...), subtitles, etc and these stuff are inside container.
install ffmpeg: https://ffmpeg.org/
try this basic script:
import subprocess
input_file = 'Demo.mp4'
output_file = 'Demo.mkv' # or .mp4, .mov, ...
ffmpeg_cli = "ffmpeg -i '{}' -vcodec libx265 '{}'".format(input_file, output_file)
subprocess.call(ffmpeg_cli, shell=True)
I don't know what are you doing (what you want, what are your expectations) but if you looking for way how to degrese size of video,
look here: https://github.com/MarcelSuleiman/convert_h264_to_h265
simple.

Transcribe MP3 audio file with Bing Speech API (speech to text)

I have a long recording (hour+) in the format of MP3. The following is the info i managed to get from FFMPEG about the audio file:
[mp3 # 000001fe666da320] Skipping 0 bytes of junk at 58650.
[mjpeg # 000001fe666effe0] Changing bps to 8
[mp3 # 000001fe666da320] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '1.mp3':
Duration: 00:57:18.52, start: 0.000000, bitrate: 192 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, mono, s16p, 192 kb/s
Stream #0:1: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 1300x1370, 90k tbr, 90k tbn, 90k tbc
I would like to use Bing Speech API (Microsoft Oxford - Cognitive Services - Speech API) to transcribe this file (speech to text).
I believe that this is achievable by using something like the code below.
Option 1:
before sending up any audio data, you must first send up an SpeechAudioFormat descriptor to describe the layout and format of your raw audio data via DataRecognitionClient's sendAudioFormat() method. Can you provide a code sample for this option?
Option 2: converting the file to the target's acceptable format. I have done that with FFMPEG and this is what i got:
Duration: 00:57:23.67, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
As I understand from the documentation, this should be acceptable: The audio must be PCM, mono, 16-bit sample, with sample rate of 8000 Hz or 16000 Hz.
I tried to send the audio to the server but did not get any reply. Am I on the right tracks? What is the maximum buffer size?
Do u see other, maybe easier option to get my audio file transcribed?
private void SendAudioHelper(string wavFileName)
{
using (FileStream fileStream = new FileStream(wavFileName, FileMode.Open, FileAccess.Read))
{
int bytesRead = 0;
byte[] buffer = new byte[1024];
try
{
do
{
// Get more Audio data to send into byte buffer.
bytesRead = fileStream.Read(buffer, 0, buffer.Length);
// Send of audio data to service.
this.dataClient.SendAudio(buffer, bytesRead);
}
while (bytesRead > 0);
}
finally
{
// We are done sending audio. Final recognition results will arrive in OnResponseReceived event call.
this.dataClient.EndAudio();
}
}
}
There is a limit of 15 seconds when you use the REST implementation. SDK has a limit of 2minutes.
Bing Speech team

FFMPEG audio sample data

I'm beginner to FFMPEG API and I need to process audio sample.
I see that audio sample data stored in AVFrame->data[0], but I don't know how audio sample stored in FFMPEG AVFrame.
For example:
There are 2 channels,
frame->nb_samples = 64,
frame->linesize[0] = 256.
I don't know how audio sample data stored in frame->data[0].
Thanks,
The audio samples are pointed to by
frame->data[0]
frame->data[1]
and they're frame->linesize[0] bytes long
The sample_fmt of your AVCodecContext will tell you the format of the samples, which will be one of the following:
AV_SAMPLE_FMT_FLTP
AV_SAMPLE_FMT_FLT
AV_SAMPLE_FMT_S16P
AV_SAMPLE_FMT_S16
For FLT you cast the pointers to float* and for S16 you cast to int16_t*

How to decode Full Rate GSM Audio file?

I have to decode a full rate gsm audio file. Full Rate GSM Audio file is decoded using libgsm. I have used MSVC++ with windows nightly builds of ffmpeg and libav but unable to decode file correctly. Can anyone tell me the reason? I have tried decoding using following codecs:
/* various PCM "codecs" */
AV_CODEC_ID_FIRST_AUDIO = 0x10000,
AV_CODEC_ID_PCM_S16LE = 0x10000,
AV_CODEC_ID_PCM_S16BE,
AV_CODEC_ID_PCM_U16LE,
AV_CODEC_ID_PCM_U16BE,
AV_CODEC_ID_PCM_S8,
AV_CODEC_ID_PCM_U8,
AV_CODEC_ID_PCM_MULAW,
AV_CODEC_ID_PCM_ALAW,
AV_CODEC_ID_PCM_S32LE,
AV_CODEC_ID_PCM_S32BE,
AV_CODEC_ID_PCM_U32LE,
AV_CODEC_ID_PCM_U32BE,
AV_CODEC_ID_PCM_S24LE,
AV_CODEC_ID_PCM_S24BE,
AV_CODEC_ID_PCM_U24LE,
AV_CODEC_ID_PCM_U24BE,
AV_CODEC_ID_PCM_S24DAUD,
AV_CODEC_ID_PCM_ZORK,
AV_CODEC_ID_PCM_S16LE_PLANAR,
AV_CODEC_ID_PCM_DVD,
AV_CODEC_ID_PCM_F32BE,
AV_CODEC_ID_PCM_F32LE,
AV_CODEC_ID_PCM_F64BE,
AV_CODEC_ID_PCM_F64LE,
AV_CODEC_ID_PCM_BLURAY,
AV_CODEC_ID_PCM_LXF,
AV_CODEC_ID_S302M,
AV_CODEC_ID_PCM_S8_PLANAR,
/* various ADPCM codecs */
AV_CODEC_ID_ADPCM_IMA_QT = 0x11000,
AV_CODEC_ID_ADPCM_IMA_WAV,
AV_CODEC_ID_ADPCM_IMA_DK3,
AV_CODEC_ID_ADPCM_IMA_DK4,
AV_CODEC_ID_ADPCM_IMA_WS,
AV_CODEC_ID_ADPCM_IMA_SMJPEG,
AV_CODEC_ID_ADPCM_MS,
AV_CODEC_ID_ADPCM_4XM,
AV_CODEC_ID_ADPCM_XA,
AV_CODEC_ID_ADPCM_ADX,
AV_CODEC_ID_ADPCM_EA,
AV_CODEC_ID_ADPCM_G726,
AV_CODEC_ID_ADPCM_CT,
AV_CODEC_ID_ADPCM_SWF,
AV_CODEC_ID_ADPCM_YAMAHA,
AV_CODEC_ID_ADPCM_SBPRO_4,
AV_CODEC_ID_ADPCM_SBPRO_3,
AV_CODEC_ID_ADPCM_SBPRO_2,
AV_CODEC_ID_ADPCM_THP,
AV_CODEC_ID_ADPCM_IMA_AMV,
AV_CODEC_ID_ADPCM_EA_R1,
AV_CODEC_ID_ADPCM_EA_R3,
AV_CODEC_ID_ADPCM_EA_R2,
AV_CODEC_ID_ADPCM_IMA_EA_SEAD,
AV_CODEC_ID_ADPCM_IMA_EA_EACS,
AV_CODEC_ID_ADPCM_EA_XAS,
AV_CODEC_ID_ADPCM_EA_MAXIS_XA,
AV_CODEC_ID_ADPCM_IMA_ISS,
AV_CODEC_ID_ADPCM_G722,
AV_CODEC_ID_ADPCM_IMA_APC,
AV_CODEC_ID_VIMA = MKBETAG('V','I','M','A'),
/* AMR */
AV_CODEC_ID_AMR_NB = 0x12000,
AV_CODEC_ID_AMR_WB,
/* RealAudio codecs*/
AV_CODEC_ID_RA_144 = 0x13000,
AV_CODEC_ID_RA_288,
/* various DPCM codecs */
AV_CODEC_ID_ROQ_DPCM = 0x14000,
AV_CODEC_ID_INTERPLAY_DPCM,
AV_CODEC_ID_XAN_DPCM,
AV_CODEC_ID_SOL_DPCM,
/* audio codecs */
AV_CODEC_ID_MP2 = 0x15000,
AV_CODEC_ID_MP3, ///< preferred ID for decoding MPEG audio layer 1, 2 or 3
AV_CODEC_ID_AAC,
AV_CODEC_ID_AC3,
AV_CODEC_ID_DTS,
AV_CODEC_ID_VORBIS,
AV_CODEC_ID_DVAUDIO,
AV_CODEC_ID_WMAV1,
AV_CODEC_ID_WMAV2,
AV_CODEC_ID_MACE3,
AV_CODEC_ID_MACE6,
AV_CODEC_ID_VMDAUDIO,
AV_CODEC_ID_FLAC,
AV_CODEC_ID_MP3ADU,
AV_CODEC_ID_MP3ON4,
AV_CODEC_ID_SHORTEN,
AV_CODEC_ID_ALAC,
AV_CODEC_ID_WESTWOOD_SND1,
AV_CODEC_ID_GSM, ///< as in Berlin toast format
AV_CODEC_ID_QDM2,
AV_CODEC_ID_COOK,
AV_CODEC_ID_TRUESPEECH,
AV_CODEC_ID_TTA,
AV_CODEC_ID_SMACKAUDIO,
AV_CODEC_ID_QCELP,
AV_CODEC_ID_WAVPACK,
AV_CODEC_ID_DSICINAUDIO,
AV_CODEC_ID_IMC,
AV_CODEC_ID_MUSEPACK7,
AV_CODEC_ID_MLP,
AV_CODEC_ID_GSM_MS, /* as found in WAV */
AV_CODEC_ID_ATRAC3,
AV_CODEC_ID_VOXWARE,
AV_CODEC_ID_APE,
AV_CODEC_ID_NELLYMOSER,
AV_CODEC_ID_MUSEPACK8,
AV_CODEC_ID_SPEEX,
AV_CODEC_ID_WMAVOICE,
AV_CODEC_ID_WMAPRO,
AV_CODEC_ID_WMALOSSLESS,
AV_CODEC_ID_ATRAC3P,
AV_CODEC_ID_EAC3,
AV_CODEC_ID_SIPR,
AV_CODEC_ID_MP1,
AV_CODEC_ID_TWINVQ,
AV_CODEC_ID_TRUEHD,
AV_CODEC_ID_MP4ALS,
AV_CODEC_ID_ATRAC1,
AV_CODEC_ID_BINKAUDIO_RDFT,
AV_CODEC_ID_BINKAUDIO_DCT,
AV_CODEC_ID_AAC_LATM,
AV_CODEC_ID_QDMC,
AV_CODEC_ID_CELT,
AV_CODEC_ID_G723_1,
AV_CODEC_ID_G729,
AV_CODEC_ID_8SVX_EXP,
AV_CODEC_ID_8SVX_FIB,
AV_CODEC_ID_BMV_AUDIO,
AV_CODEC_ID_RALF,
AV_CODEC_ID_IAC,
AV_CODEC_ID_ILBC,
AV_CODEC_ID_FFWAVESYNTH = MKBETAG('F','F','W','S'),
AV_CODEC_ID_8SVX_RAW = MKBETAG('8','S','V','X'),
AV_CODEC_ID_SONIC = MKBETAG('S','O','N','C'),
AV_CODEC_ID_SONIC_LS = MKBETAG('S','O','N','L'),
AV_CODEC_ID_PAF_AUDIO = MKBETAG('P','A','F','A'),
AV_CODEC_ID_OPUS = MKBETAG('O','P','U','S')
Here is MSVC version of the libgsm library:
http://code.google.com/p/vsmm/

Resources