I have an original file 0.flac which I just open with librosa and then save with SoundFile as 1.flac:
import soundfile as sf
import librosa
in_path = "0.flac"
out_path = "1.flac"
sampling_rate = 16000
wav, source_sampling_rate = librosa.load(in_path, sr=None)
assert(source_sampling_rate == sampling_rate)
# 16bit -> PCM_16
sf.write(out_path, wav, sampling_rate, format='flac', subtype='PCM_16')
However the file size and file itself seems to be changed:
User#User-MacBook-Pro:~/check_librosa$ metaflac --list 0.flac
METADATA block #0
type: 0 (STREAMINFO)
is last: false
length: 34
minimum blocksize: 4096 samples
maximum blocksize: 4096 samples
minimum framesize: 107 bytes
maximum framesize: 5961 bytes
sample_rate: 16000 Hz
channels: 1
bits-per-sample: 16
total samples: 225360
MD5 signature: 41222a894966327db4f89afa74e5e1a1
METADATA block #1
type: 3 (SEEKTABLE)
is last: false
length: 36
seek points: 2
point 0: sample_number=0, stream_offset=0, frame_samples=4096
point 1: sample_number=159744, stream_offset=170830, frame_samples=4096
METADATA block #2
type: 4 (VORBIS_COMMENT)
is last: false
length: 40
vendor string: reference libFLAC 1.2.1 20070917
comments: 0
METADATA block #3
type: 1 (PADDING)
is last: true
length: 8192
User#User-MacBook-Pro:~/check_librosa$ metaflac --list 1.flac
METADATA block #0
type: 0 (STREAMINFO)
is last: false
length: 34
minimum blocksize: 4096 samples
maximum blocksize: 4096 samples
minimum framesize: 107 bytes
maximum framesize: 6040 bytes
sample_rate: 16000 Hz
channels: 1
bits-per-sample: 16
total samples: 225360
MD5 signature: 41222a894966327db4f89afa74e5e1a1
METADATA block #1
type: 4 (VORBIS_COMMENT)
is last: true
length: 40
vendor string: reference libFLAC 1.3.3 20190804
comments: 0
There are less metadata blocks and the maximum framesize is different.
What could be the reason for this? Is loading the file via librosa lossy?
Related
Trying to get screenshot using fbgrab -d /dev/fb0 img0.png but it always outputs a blank image.
Am I missing something here?
The captured png has entire range with 255 values as I can see from histogram.
Environment:
Linux based custom OS built using buildroot running on i.MX6q based embedded custom board.
X.org is not enabled.
2 framebuffers(/dev/fb0, /dev/fb1) each being updated by different programs using QT eglfs.
root#hostname:/root# fbgrab -v -z 9 -d /dev/fb0 img0.png
frame buffer fixed info:
id: "DISP3 BG - DI1"
type: packed pixels
line length: 7680 bytes (1920 pixels)
frame buffer variable info:
resolution: 1920x1080
virtual resolution: 1920x4352
offset: 0x3264
bits_per_pixel: 32
grayscale: false
red: offset: 16, length: 8, msb_right: 0
green: offset: 8, length: 8, msb_right: 0
blue: offset: 0, length: 8, msb_right: 0
alpha: offset: 24, length: 8, msb_right: 0
pixel format: non-standard
Resolution: 1920x1080 depth 32
Converting image from 32
Now writing PNG file (compression 9)
root#hostname:/root# fbgrab -v -z 9 -d /dev/fb1 img1.png
frame buffer fixed info:
id: "DISP3 FG"
type: packed pixels
line length: 7680 bytes (1920 pixels)
frame buffer variable info:
resolution: 1920x1080
virtual resolution: 1920x1080
offset: 0x0
bits_per_pixel: 32
grayscale: false
red: offset: 16, length: 8, msb_right: 0
green: offset: 8, length: 8, msb_right: 0
blue: offset: 0, length: 8, msb_right: 0
alpha: offset: 24, length: 8, msb_right: 0
pixel format: standard
Resolution: 1920x1080 depth 32
Converting image from 32
Now writing PNG file (compression 9)
Tried fbdump using fbdump > grab.ppm the output is different but not close to what is on the screen. I do not have ppmtopng package installed to use fbdump | ppmtopng > grab.png
I have pre-trained weights for maskrcnn in caffe2 in .pkl extension and it's config file as yaml. If I try to load it directly it throws Improper config format: . Is there a way to use it without installing caffe2.
Config.py
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 6
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
# md5sum of weights pkl file: aa14062280226e48f569ef1c7212e7c7
DATASETS: ('medline_train',)
SCALES: (400,)
MAX_SIZE: 512
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
USE_FLIPPED: False
TEST:
DATASETS: ('medline_val',)
SCALE: 400
MAX_SIZE: 512
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
FORCE_JSON_DATASET_EVAL: True
OUTPUT_DIR: .
I'm trying to record a sound using 'pyaudio' and get a spectrogram for the audio, but I get the above error: " Audio buffer is not finite everywhere".
It might be a possible duplicate, but I didn't find something which solves the eroror. Here is my code:
CHUNK = 96000 # number of data points to read at a time
RATE = 16000 # time resolution of the recording device (Hz)
p=pyaudio.PyAudio() # start the PyAudio class
stream=p.open(format=pyaudio.paInt16,channels=1,rate=RATE,input=True,
frames_per_buffer=CHUNK) #uses default input device
# create a numpy array holding a single read of audio data
stop=0
while not stop: #to it a few times just to see
print('Recording')
audio=np.frombuffer(stream.read(CHUNK))
print(type(audio[0]))
print("max value: ",np.max(audio))
print("min value: ",np.min(audio))
sd.play(audio,RATE)
S = librosa.feature.melspectrogram(audio, sr=RATE)
S = 10 * np.log(S + 1e-15)
#em=get_emotion_audio(audio,RATE)
#print("[DETECTED] ",em)
stop=1
# close the stream gracefully
stream.stop_stream()
stream.close()
p.terminate()
Here is the error I'am getting:
'''
Recording
<class 'numpy.float64'>
max value: nan
min value: nan
---------------------------------------------------------------------------
ParameterError Traceback (most recent call last)
<ipython-input-3-33fa263f625d> in <module>
19 print("min value: ",np.min(audio))
20 sd.play(audio,RATE)
---> 21 S = librosa.feature.melspectrogram(audio, sr=RATE)
22 S = 10 * np.log(S + 1e-15)
23 #em=get_emotion_audio(audio,RATE)
~\Anaconda3\lib\site-packages\librosa\feature\spectral.py in melspectrogram(y, sr, S, n_fft, hop_length, power, **kwargs)
1529
1530 S, n_fft = _spectrogram(y=y, S=S, n_fft=n_fft, hop_length=hop_length,
-> 1531 power=power)
1532
1533 # Build a Mel filter
~\Anaconda3\lib\site-packages\librosa\core\spectrum.py in _spectrogram(y, S, n_fft, hop_length, power)
1555 else:
1556 # Otherwise, compute a magnitude spectrogram from input
-> 1557 S = np.abs(stft(y, n_fft=n_fft, hop_length=hop_length))**power
1558
1559 return S, n_fft
~\Anaconda3\lib\site-packages\librosa\core\spectrum.py in stft(y, n_fft, hop_length, win_length, window, center, dtype, pad_mode)
159
160 # Check audio is valid
--> 161 util.valid_audio(y)
162
163 # Pad the time series so that frames are centered
~\Anaconda3\lib\site-packages\librosa\util\utils.py in valid_audio(y, mono)
168
169 if not np.isfinite(y).all():
--> 170 raise ParameterError('Audio buffer is not finite everywhere')
171
172 return True
ParameterError: Audio buffer is not finite everywhere
'''
The solution was to change the two lines as:
audio=np.frombuffer(stream.read(CHUNK),dtype=np.int16)
S = librosa.feature.melspectrogram(audio.astype('float32'), sr=RATE)
What I want to achieve is to take screenshots continuously and pass them to nodejs application in order to process each one separately and do some other stuff. I need this for Linux environment only. I picked ffmpeg with x11grab as a screenshots provider. The following command works just fine:
ffmpeg -t 10 -s 1366x768 -f x11grab -i :0.0+0,0 -vf fps=30 output_%d.png -y
It creates 300 consequent frames of my screen during 10 seconds period. Then I want to redirect the output to my nodejs app rather than just to write files on the hard drive. So I'm calling ffmpeg from node:
var spawn = require('child_process').spawn,
fps = 30,
duration = 10,
screenSize = {w: 1366, h: 768},
args = [
'-t',
duration,
'-s',
screenSize.w + 'x' + screenSize.h,
'-f',
'x11grab',
'-i',
':0.0',
'-vf',
'fps=' + fps,
'-f',
'mjpeg',
'pipe:1'
],
ff = spawn('ffmpeg', args);
ff.stdout.on('data', function (data) {
console.log('Data size: ' + data.length);
});
ff.stdout.on('end', function (data) {
console.log('Stream end');
});
ff.stderr.on('data', function (data) {
console.log('ff error: ' + data);
});
I apologize for a long log, but it's important:
ff error: ffmpeg version N-77455-g4707497 Copyright (c) 2000-2015 the FFmpeg developers
built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04)
configuration: --extra-libs=-ldl --prefix=/opt/ffmpeg --mandir=/usr/share/man --enable-avresample --disable-debug --enable-nonfree --enable-gpl --enable-version3 --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-decoder=amrnb --disable-decoder=amrwb --enable-libpulse --enable-libdcadec --enable-libfreetype --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-libvorbis --enable-libmp3lame --enable-libopus --enable-libvpx --enable-libspeex --enable-libass --enable-avisynth --enable-libsoxr --enable-libxvid --enable-libvo-aacenc --enable-libvidstab
libavutil 55. 11.100 / 55. 11.100
libavcodec 57. 20.100 / 57. 20.100
libavformat 57. 20.100 / 57. 20.100
libavdevice 57. 0.100 / 57. 0.100
libavfilter 6. 21.101 / 6. 21.101
libavresample 3. 0. 0 / 3. 0. 0
libswscale 4. 0.100 / 4. 0.100
libswresample 2. 0.101 / 2. 0.101
libpostproc 54. 0.100 / 54. 0.100
ff error: Input #0, x11grab, from ':0.0':
Duration: N/A, start: 1451414448.216650, bitrate: N/A
Stream #0:0: Video: rawvideo (BGR[0] / 0x524742), bgr0, 1366x768, 29.97 fps, 29.97 tbr, 1000k tbn,
ff error: 29.97 tbc
ff error: [swscaler # 0x34238a0] deprecated pixel format used, make sure you did set range correctly
ff error: Output #0, mjpeg, to 'pipe:1':
Metadata:
encoder :
ff error: Lavf57.20.100
Stream #0:0: Video: mjpeg, yuvj444p(pc), 1366x768, q=2-31, 200 kb/s, 30 fps, 30 tbn, 30 tbc
Metadata:
encoder : Lavc57.20.100 mjpeg
Side data:
unknown side data type 10 (24 bytes)
Stream mapping:
Stream #0:0 -> #0:0 (rawvideo (native) -> mjpeg (native))
Press [q] to stop, [?] for help
ff error: [swscaler # 0x34238a0] Warning: data is not aligned! This can lead to a speedloss
Data size: 65536
Data size: 65536
Data size: 21413
Data size: 65536
Data size: 65536
Data size: 45581
Data size: 65536
Data size: 65536
Data size: 62377
Data size: 65536
Data size: 65536
Data size: 45581
Data size: 65536
Data size: 65536
Data size: 21413
Data size: 65536
Data size: 60933
Data size: 65536
Data size: 49550
Data size: 65536
Data size: 36709
Data size: 65536
Data size: 27035
Data size: 65536
Data size: 20131
Data size: 65536
Data size: 15887
Data size: 65536
Data size: 15887
Data size: 65536
Data size: 15887
Data size: 65536
Data size: 15911
Data size: 65536
Data size: 15911
Data size: 65536
Data size: 15911
ff error: frame= 16 fps=0.0 q=24.8 size= 1819kB time=00:00:00.53 bitrate=27935.6kbits/s speed=1.04x
Data size: 65536
Data size: 15911
Data size: 65536
Data size: 15911
Data size: 65536
Data size: 15919
Data size: 65536
Data size: 15919
Data size: 65536
Data size: 15919
Data size: 65536
Data size: 15919
Data size: 65536
Data size: 15919
Data size: 65536
Data size: 15911
Data size: 65536
Data size: 15911
Data size: 65536
Data size: 15906
Data size: 65536
Data size: 15906
Data size: 65536
Data size: 15917
Data size: 65536
Data size: 15999
Data size: 65536
Data size: 15949
Data size: 65536
Data size: 15997
Data size: 65536
Data size: 15965
ff error: frame= 32 fps= 30 q=24.8 size= 3092kB time=00:00:01.06 bitrate=23743.7kbits/s speed=1.01x
Data size: 65536
Data size: 16025
Data size: 65536
Data size: 15978
Data size: 65536
Data size: 15963
Data size: 65536
Data size: 16028
Data size: 65536
Data size: 15976
Data size: 65536
Data size: 15958
Data size: 65536
Data size: 15940
Data size: 65536
Data size: 15992
Data size: 65536
Data size: 15962
Data size: 65536
Data size: 16010
Data size: 65536
Data size: 15941
Data size: 65536
Data size: 15941
Data size: 65536
Data size: 15973
Data size: 65536
Data size: 15943
Data size: 65536
Data size: 15947
Data size: 65536
Data size: 15947
ff error: frame= 48 fps= 30 q=24.8 size= 4365kB time=00:00:01.60 bitrate=22349.6kbits/s speed=1.01x
Data size: 65536
Data size: 15982
Data size: 65536
Data size: 15982
Data size: 65536
Data size: 15956
Data size: 65536
Data size: 15956
Data size: 65536
Data size: 15956
Data size: 65536
Data size: 16001
Data size: 65536
Data size: 15930
Data size: 65536
Data size: 15922
Data size: 65536
Data size: 15924
Data size: 65536
Data size: 15924
Data size: 65536
Data size: 15924
Data size: 65536
Data size: 15924
Data size: 65536
Data size: 15911
Data size: 65536
Data size: 15924
Data size: 65536
Data size: 15985
Data size: 65536
Data size: 15985
ff error: frame= 64 fps= 30 q=24.8 size= 5638kB time=00:00:02.13 bitrate=21651.3kbits/s speed=1.01x
Data size: 65536
Data size: 15985
Data size: 65536
Data size: 15924
Data size: 65536
Data size: 15976
Data size: 65536
Data size: 15976
Data size: 65536
Data size: 15958
Data size: 65536
Data size: 16319
Data size: 65536
Data size: 16558
Data size: 65536
Data size: 16576
Data size: 65536
Data size: 16564
Data size: 65536
Data size: 16582
Data size: 65536
Data size: 16589
Data size: 65536
Data size: 16587
Data size: 65536
Data size: 16446
Data size: 65536
Data size: 16450
Data size: 65536
Data size: 16450
Data size: 65536
Data size: 16450
ff error: frame= 80 fps= 30 q=24.8 size= 6918kB time=00:00:02.66 bitrate=21251.0kbits/s speed=1.01x
Data size: 65536
Data size: 16568
Data size: 65536
Data size: 16575
Data size: 65536
Data size: 16585
Data size: 65536
Data size: 18182
Data size: 65536
Data size: 17203
Data size: 65536
Data size: 16769
Data size: 65536
Data size: 16734
Data size: 65536
Data size: 16823
Data size: 65536
Data size: 16338
Data size: 65536
Data size: 16455
Data size: 65536
Data size: 16406
Data size: 65536
Data size: 16645
Data size: 65536
Data size: 16800
Data size: 65536
Data size: 16800
Data size: 65536
Data size: 16800
Data size: 65536
Data size: 16806
ff error: frame= 96 fps= 30 q=24.8 size= 8204kB time=00:00:03.20 bitrate=21001.8kbits/s speed= 1x
Data size: 65536
Data size: 16795
Data size: 65536
Data size: 16804
Data size: 65536
Data size: 16770
Data size: 65536
Data size: 16760
Data size: 65536
Data size: 16813
Data size: 65536
Data size: 16445
Data size: 65536
Data size: 16259
Data size: 65536
Data size: 16260
Data size: 65536
Data size: 16265
Data size: 65536
Data size: 16284
Data size: 65536
Data size: 16233
Data size: 65536
Data size: 16233
Data size: 65536
Data size: 16182
Data size: 65536
Data size: 16058
Data size: 60561
ff error: frame= 111 fps= 30 q=24.8 size= 9384kB time=00:00:03.70 bitrate=20776.1kbits/s speed= 1x
Data size: 61813
Data size: 61813
Data size: 61813
Data size: 61813
Data size: 61813
Data size: 61781
Data size: 61784
Data size: 61796
Data size: 61842
Data size: 61839
Data size: 61793
Data size: 61810
Data size: 61844
Data size: 61844
Data size: 61850
Data size: 61841
ff error: frame= 127 fps= 30 q=24.8 size= 10350kB time=00:00:04.23 bitrate=20027.8kbits/s speed= 1x
Data size: 61858
Data size: 61853
Data size: 61833
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
ff error: frame= 142 fps= 30 q=24.8 size= 11256kB time=00:00:04.73 bitrate=19480.5kbits/s speed= 1x
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
ff error: frame= 158 fps= 30 q=24.8 size= 12223kB time=00:00:05.26 bitrate=19011.4kbits/s speed= 1x
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61867
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
ff error: frame= 173 fps= 30 q=24.8 size= 13129kB time=00:00:05.76 bitrate=18650.3kbits/s speed= 1x
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
ff error: frame= 188 fps= 30 q=24.8 size= 14035kB time=00:00:06.26 bitrate=18346.6kbits/s speed= 1x
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
ff error: frame= 203 fps= 30 q=24.8 size= 14941kB time=00:00:06.76 bitrate=18087.8kbits/s speed= 1x
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
ff error: frame= 219 fps= 30 q=24.8 size= 15907kB time=00:00:07.30 bitrate=17850.8kbits/s speed= 1x
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
ff error: frame= 234 fps= 30 q=24.8 size= 16813kB time=00:00:07.80 bitrate=17658.1kbits/s speed= 1x
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
Data size: 61850
ff error: frame= 250 fps= 30 q=24.8 size= 17780kB time=00:00:08.33 bitrate=17478.0kbits/s speed= 1x
Data size: 61850
Data size: 61712
Data size: 61712
Data size: 61712
Data size: 61712
Data size: 61712
Data size: 61712
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
ff error: frame= 265 fps= 30 q=24.8 size= 18675kB time=00:00:08.83 bitrate=17319.4kbits/s speed= 1x
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
ff error: frame= 280 fps= 30 q=24.8 size= 19563kB time=00:00:09.33 bitrate=17171.2kbits/s speed= 1x
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
ff error: frame= 295 fps= 30 q=24.8 size= 20452kB time=00:00:09.83 bitrate=17038.0kbits/s speed= 1x
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
Data size: 60638
ff error: frame= 300 fps= 30 q=24.8 Lsize= 20748kB time=00:00:10.00 bitrate=16996.6kbits/s speed=0.998x
video:20748kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%
Stream end
The error messages are weird too, but not the biggest problem for now, so let's omit them.
Looking at all the data coming from ffmpeg I assume that the images are coming as a single stream without any separation. I don't think that looking into binary data and seeking for certain flag bytes is a good idea to solve the problem and separate the images. The command line I mentioned in the top of the post does separate writes to hard drive that means it produces separate writable streams per each screenshot. And it doesn't do so for a spawned process from nodejs.
What am I missing?
The solution is to use png-streamer and tune up ffmpeg args a little. png-streamer parses the incoming stream and recognizes the PNG images by PNG binary header. Here's the final code:
var ffmpegArgs = [
'-t',
options.duration,
'-s',
options.width + 'x' + options.height,
'-f',
'x11grab',
'-i',
':' + options.display + '+' + options.offsetX + ',' + options.offsetY,
'-vf',
'fps=' + options.fps,
'-f', //<<
'image2pipe', //<<
'-vcodec', //<<
'png', //<<
'pipe:1'
],
ffmpeg = spawn('ffmpeg', ffmpegArgs);
new pngStreamer(ffmpeg, callback);
Along with helpful notes from comments there was a trick that -vcodec png needs to be used as well. Otherwise the buffer comes in LAVC format.
You have 30 seconds audio file sampled at a rate of 44.1 KHz and quantized using 8 bits ; calculate the bit rate and the size of mono and stereo versions of this file ؟؟
The bitrate is the number of bits per second.
bitrate = bitsPerSample * samplesPerSecond * channels
So in this case for stereo the bitrate is 8 * 44100 * 2 = 705,600kbps
To get the file size, multiply the bitrate by the duration (in seconds), and divide by 8 (to get from bits to bytes):
fileSize = (bitsPerSample * samplesPerSecond * channels * duration) / 8;
So in this case 30 seconds of stereo will take up (8 * 44100 * 2 * 30) / 8 = 2,646,000 bytes
Assuming uncompressed PCM audio...
time * sampleRate * bitsPerSample * channelCount
For 30 seconds mono audio at 44.1kHz, 8bps, that's 1,323,000 bytes. For stereo, that's two channels, so double it.
Formula = Sample rate x sample bit x # of channels x time in seconds / 8x1024
CD Quality (Sample Rate) = 44.1Khz
Size of mono = (44 100 x 8 x 1 x 30) / 8 x 1024
= 1291.99KB
= 1.26 MB
Size of Stereo = (44 100 x 8 x 2 x 30) / 8 x 1024
= 2583.98 KB
= 2.52 MB
≈ 2.5 MB