I am trying to sample (convert analog to digital) mp3 files via the following Python code using the librosa library, but it takes too much time (around 4 seconds for one file). I suspect this is because librosa doesn't support mp3 and hence uses the slower audioread to sample mp3
Code:
import time
import librosa
s = time.time()
for i in mp3_list[:10]: # list of mp3 file paths, doing for 10 files
y, sr = librosa.load(i)
print('time taken =', time.time() - s)
time taken = 36.55561399459839
I also get this warning:
UserWarning: "PySoundFile failed. Trying audioread instead."
Obviously, this is too much time for any practical application. I want to know if there are better alternatives to this?
For comparison, it only took around 1.2 seconds total time to sample 10 same-sized wav conversions
So the warning kind of hints it. The Librosa developers addressed a similar question in this GitHub question:
This warning will always occur when loading mp3 because libsndfile
does not (yet/currently) support the mp3 format. Librosa tries to use
libsndfile first, and if that fails, it will fall back on the
audioread package, which is a bit slower and more brittle, but
supports more formats.
This is confirmed in the Librosa-code: try ... except RuntimeError ...
So what you can do in this case is either implement your own load() that directly uses audioread to avoid the time wasted in the first block of librosa.load(), or you can use a different library such as pydub. Alternatively, you can use ffmpeg to convert your mp3 to wave before loading them.
Related
I have the below code which uses an audio file and calculates amplitude:
from scipy.io.wavfile import read
fs, amplitude = read('1.wav')
print(amplitude)
Now I am trying to read the file in such a way that I should be able to process audio every second. As of now, its reading all the audio file and then showing it, but I want to read let's say first 10 sec (or 1,2,3 sec) and then print its amplitude. Just like in case of reading frames from camera using OpenCV.
Is there any library available to achieve this?
So I'm making a program that corrects stereo in-balance for an audio file. I'm using pysoundfile to read/write the files. Code looks something like this.
import soundfile as sf
data, rate = sf.read("Input.wav")
for d in data:
# processes audio
sf.write("Output.wav", data, rate, 'PCM_24')
The issue is that I'm working with DJ mixes that can be a couple hours long. So loading the entire mix into ram is causing the program to be killed.
My question is how do I read/write the file in smaller sections vs loading the entire thing?
The common situation when the integrity of an MP3 file is not correct, is when the file has been partially uploaded to the server. In this case, the indicated audio duration doesn't correspond to what is really in the MP3 file: we can hear the beginning, but at some point the playing stops and the indicated duration of the audio player is broken.
I tried with libraries like node-ffprobe, but it seems they just read metadata, without making comparison with real audio data in the file. Is there a way to detect efficiently a corrupted or incomplete MP3 file from node.js?
Note: the client uploading MP3 files is a hardware (an audio recorder), uploading files on a FTP server. Not a browser. So I'm not able to upload potentially more useful data from the client.
MP3 files don't normally have a duration. They're just a series of MPEG frames. Sometimes, there is an ID3 tag indicating duration, but not always.
Players can determine duration by choosing one of a few methods:
Decode the entire audio file.This is the slowest method, but if you're going to decode the file anyway, you might as well go this route as it gives you an exact duration.
Read the whole file, skimming through frame headers.You'll have to read the whole file from disk, but you won't have to decode it. Can be slow if I/O is slow, but gives you an exact duration.
Read the first frame's bitrate and estimate duration by file size.Definitely the fastest method, and the one most commonly used by players. Duration is an estimate only, and is reasonably accurate for CBR, but can be wildly inaccurate for VBR.
What I'm getting at is that these files might not actually be broken. They might just be VBR files that your player doesn't know the duration of.
If you're convinced they are broken (such as stopping in the middle of content), then you'll have to figure out how you want to handle it. There are probably only a couple ways to determine this:
Ideally, there's an ID3 tag indicating duration, and you can decode the whole file and determine its real duration to compare.
Usually, that ID3 tag won't exist, so you'll have to check to see if the last frame is complete or not.
Beyond that, you don't really have a good way of knowing if the stream is incomplete, since there is no outer container that actually specifies number of frames to expect.
The expression for calculating the filesize of an mp3 based on duration and encoding (from this answer) is quite simple:
x = length of song in seconds
y = bitrate in kilobits per second
(x * y) / 1024 = filesize (MB)
There is also a javascript implementation for the Web Audio API in another answer on that same question. Perhaps that would be useful in your Node implementation.
mp3diags is some older open source software for fixing mp3s and which was great for batch processing stuff like this. The source is c++ and still available if you're feeling nosy and want to see how some of these features are implemented.
Worth a look since it has some features that might be be useful in your context:
What is MP3 Diags and what does it do?
low quality audio
missing VBR header
missing normalization data
Correcting files that show incorrect song duration
Correcting files in which the player cannot seek correctly
I have two short 2-3 minute .wav files that were recorded within 1 minute of eachother. They could be anywhere from 0-60 seconds off. I'd like to sync them together. There is a sync tone that is played, and present in both audio files. There is very little audio in them besides the loud sync tone, it is very obvious when viewed in audacity.
I've tried every solution listed here Automatically sync two audio recordings in python
and none of them work. They all share the same problem, when they get to this method:
def find_freq_pairs(freqs_dict_orig, freqs_dict_sample):
time_pairs = []
for key in freqs_dict_sample.keys(): # iterate through freqs in sample
if freqs_dict_orig.has_key(key): # if same sample occurs in base
for i in range(len(freqs_dict_sample[key])): # determine time offset
for j in range(len(freqs_dict_orig[key])):
time_pairs.append((freqs_dict_sample[key][i], freqs_dict_orig[key][j]))
return time_pairs
Each time, the inner for loop ends up having to do (500k ^ 2) iterations for each of the 512 keys in the freqs_dict dictionary. This will take many months to run. This is with two 3-4 second audio files. With 1-2 minute audio files, it was (5m+ * 5m+) iterations. I think perhaps the library broke with python3, since everyone on that thread seemed happy with it...
Does anyone know a better way to sync two audio files with python?
Thank you
I'm trying to optimise a piece of software for playing video, which internally uses the FFmpeg libraries for decoding. We've found that on some large (4K, 60fps) video, it sometimes takes longer to decode a frame than that frame should be displayed for; sadly, because of the problem domain, simply buffering/skipping frames is not an option.
However, it appears that the FFmpeg executable is able to decode the video in question fine, at about 2x speed, so I've been trying to work out what we're doing wrong.
I've written a very stripped-back decoder program for testing; the source is here (it's about 200 lines). From profiling it, it appears that the one major bottleneck during decoding is the avcodec_send_packet() function, which can take up to 50ms per call. However, measuring the same call in FFmpeg shows strange behaviour:
(these are the times taken for each call to avcodec_send_packet() in milliseconds, when decoding a 4K 25fps VP9-encoded video.)
Basically, it seems that when FFmpeg uses this function, it only really takes any amount of time to complete every N calls, where N is the number of threads being used for decoding. However, both my test decoder and the actual product use 4 threads for decoding, and this doesn't happen; when using frame-based threading, the test decoder behaves like FFmpeg using only 1 thread. This would seem to indicate that we're not using multithreading at all, but we've still seen performance improvements by using more threads.
FFmpeg's results average out to being about twice as fast overall as our decoders, so clearly we're doing something wrong. I've been reading through FFmpeg's source to try to find any clues, but it's so far eluded me.
My question is: what's FFmpeg doing here that we're not? Alternatively, how can we increase the performance of our decoder?
Any help is greatly appreciated.
I was facing the same problem. It took me quite a while to figure out a solution which I want to share here for future references:
Enable multithreading for the decoder. Per default the decoder only uses one thread, depending on the decoder, multithreading can speed up decoding drastically.
Assuming you have AVFormatContext *format_ctx, a matching codec AVCodec* codec and AVCodecContext* codec_ctx (allocated using avcodec_alloc_context3).
Before opening the codec context (using avcodec_open2) you can configure multithreading. Check the capabilites of the codec in order to decide which kind of multithreading you can use:
// set codec to automatically determine how many threads suits best for the decoding job
codec_ctx->thread_count = 0;
if (codec->capabilities | AV_CODEC_CAP_FRAME_THREADS)
codec_ctx->thread_type = FF_THREAD_FRAME;
else if (codec->capabilities | AV_CODEC_CAP_SLICE_THREADS)
codec_ctx->thread_type = FF_THREAD_SLICE;
else
codec_ctx->thread_count = 1; //don't use multithreading
Another speed-up I found out is the following: keep sending packets to the decoder (thats what avcodec_send_packet() is doing) until you get AVERROR(EAGAIN) as return value. This means the internal decoder buffers are full and you first need to collect the decoded frames (but remember to send this last packet again after the decoder is empty again). Now you can collect the decoded frames using avcodec_receive_frame until you get AVERROR(EAGAIN) again.
Some decoders work way faster when they have mutiple frames queued for decoding (thats what the decoder does when codec_ctx->thread_type = FF_THREAD_FRAME is set).
avcodec_send_packet() and avcodec_receive_frame() are wrapper functions most important thing those do is calling selected codec's decode function and returns decoded frame (or error).
Try tweaking the codec options, for example, low latency may not give you what you want. And sometimes old api (I believe it still around) avcodec_decode_video2() outperforms newer one, you may try that too.