I have the below code which uses an audio file and calculates amplitude:
from scipy.io.wavfile import read
fs, amplitude = read('1.wav')
print(amplitude)
Now I am trying to read the file in such a way that I should be able to process audio every second. As of now, its reading all the audio file and then showing it, but I want to read let's say first 10 sec (or 1,2,3 sec) and then print its amplitude. Just like in case of reading frames from camera using OpenCV.
Is there any library available to achieve this?
Related
I have some long audio files.I want to split this audio file into multiple short length audio file using python.Ex:The audio long length is more than 1 hour and want to split into multiple short length 5s files. i want to extract features for the whole audio file in each 5s.
There are two issues in your question.
Splitting the audio
Extracting features.
and both of them have the same, underlying, key information: sampling frequency.
The duration of an audio signal, in seconds, and the sampling frequency used for the audio file, define the amount of samples that an audio file has. An audio sample is (in simplified terms) one value of the audio signal in your hard-disk or computer memory.
The amount of audio samples, for a typical wav file, are calculated based on the formula sr * dur, here sr is the sampling frequency in Hz (e.g. 44100 for a CD quality signal) and dur is the duration of the audio file in seconds. For example, a CD audio file of 2 seconds has 44100 * 2 = 88200 samples.
So:
To split an audio file in Python, you first have to read it in a variable. There are plenty libraries and functions out there, for example (in a random order):
scipy.io.wavfile.read
wave module
and others. You can check this SO post for more info on reading a wav file.
Then, you just have to get N samples, e.g. my_audio_1 = whole_audio_file[0:5*sr].
BUT!!!
If you just want to extract features for every X seconds, then it is no need to split the audio manually. Most audio feature extraction libraries, do that for you.
For example, in librosa you can control the amount of the FFT points, which roughly are equivalent to the length of the audio that you want to extract features from. You can check, for example, here: https://librosa.org/doc/latest/feature.html
I am trying to sample (convert analog to digital) mp3 files via the following Python code using the librosa library, but it takes too much time (around 4 seconds for one file). I suspect this is because librosa doesn't support mp3 and hence uses the slower audioread to sample mp3
Code:
import time
import librosa
s = time.time()
for i in mp3_list[:10]: # list of mp3 file paths, doing for 10 files
y, sr = librosa.load(i)
print('time taken =', time.time() - s)
time taken = 36.55561399459839
I also get this warning:
UserWarning: "PySoundFile failed. Trying audioread instead."
Obviously, this is too much time for any practical application. I want to know if there are better alternatives to this?
For comparison, it only took around 1.2 seconds total time to sample 10 same-sized wav conversions
So the warning kind of hints it. The Librosa developers addressed a similar question in this GitHub question:
This warning will always occur when loading mp3 because libsndfile
does not (yet/currently) support the mp3 format. Librosa tries to use
libsndfile first, and if that fails, it will fall back on the
audioread package, which is a bit slower and more brittle, but
supports more formats.
This is confirmed in the Librosa-code: try ... except RuntimeError ...
So what you can do in this case is either implement your own load() that directly uses audioread to avoid the time wasted in the first block of librosa.load(), or you can use a different library such as pydub. Alternatively, you can use ffmpeg to convert your mp3 to wave before loading them.
I'm new to audio processing and dealing with data that's being streamed in real-time. What I want to do is:
listen to a built-in microphone
chunk together samples into 0.1second chunks
convert the chunk into a periodogram via the short-time Fourier transform (STFT)
apply some simple functions
convert back to time series data via the inverse STFT (ISTFT)
play back the new audio on headphones
I've been looking around for "real time spectrograms" to give me a guide on how to work with the data, but no dice. I have, however, discovered some interesting packages, including PortAudio.jl, DSP.jl and MusicProcessing.jl.
It feels like I'd need to make use of multiprocessing techniques to just store the incoming data into suitable chunks, whilst simultaneously applying some function to a previous chunk, whilst also playing another previously processed chunk. All of this feels overcomplicated, and has been putting me off from approaching this project for a while now.
Any help will be greatly appreciated, thanks.
As always start with a simple version of what you really need ... ignore for now pulling in audio from a microphone, instead write some code to synthesize a sin curve of a known frequency and use that as your input audio, or read in audio from a wav file - benefit here is its known and reproducible unlike microphone audio
this post shows how to use some of the libs you mention http://www.seaandsailor.com/audiosp_julia.html
You speak of "real time spectrogram" ... this is simply repeatedly processing a window of audio, so lets initially simplify that as well ... once you are able to read in the wav audio file then send it into a FFT call which will return back that audio curve in its frequency domain representation ... as you correctly state this freq domain data can then be sent into an inverse FFT call to give you back the original time domain audio curve
After you get above working then wrap it in a call which supplies a sliding window of audio samples to give you the "real time" benefit of being able to parse incoming audio from your microphone ... keep in mind you always use a power of 2 number of audio samples in your window of samples you feed into your FFT and IFFT calls ... lets say your window is 16384 samples ... your julia server will need to juggle multiple demands (1) pluck the next buffer of samples from your microphone feed (2) send a window of samples into your FFT and IFFT call ... be aware the number of audio samples in your sliding window will typically be wider than the size of your incoming microphone buffer - hence the notion of a sliding window ... over time add your mic buffer to the front of this window and remove same number of samples off from tail end of this window of samples
I'm searching on a method that can help me creating an audio file (of known length) composed by mixing some audio chunks at specific time position.
For example:
I want to create song.wav with duration of 5 minutes, composed by:
- chunk1.wav (time 0:02)
- chunk3.wav (time 0.50)
- chunk2.wav (time 1:20)
I think I can create 5 minutes of silence then mixing the wave files but I don't know how.
I try SOX but no solution found yet
I'd like also FFMPEG based solution
I want to retrive the bit rate of an audio file for the purpose of spliting a raw byte information of an audio file at a particular second of complete length of file.Can any one suggest the way to get the bit rate for an audio file in Blackberry or j2me?