I am trying to convert a numpy array of an audio file sampled at 44100 Hz into an AudioFileClip in MoviePy so I can overdub a videoFileClip. The online documentation is unclear on this topic.
Any advice?
Thanks.
The relevant class is AudioArrayClip in AudioClip.py.
Here are a couple of examples of how to generate 2 seconds of mono and stereo random noise:
import numpy as np
from moviepy.audio.AudioClip import AudioArrayClip
rate = 44100 # Sampling rate in samples per second.
duration = 2 # Duration in seconds
data_mono = np.random.uniform(-1, 1, (int(duration*rate/2), 1))
data_stereo = np.random.uniform(-1, 1, (rate*duration, 2))
audio_mono = AudioArrayClip(data_mono, fps=rate)
audio_stereo = AudioArrayClip(data_stereo, fps=rate)
audio_mono.write_audiofile('mono.mp3')
audio_stereo.write_audiofile('stereo.mp3')
Edit: Update workaround to get correct duration of mono file (python 3.7.5, moviepy 1.0.0)
Related
I know I can do this with scipy or numpy, but I want to do it with just built-in modules in this case. So far, I have come up with this code to generate samples of a sawtooth wave of a specific frequency, at a specific sampling rate (and plot it):
import math
import matplotlib.pyplot as plt
def sawtooth_sample(amplitude, freq, samplerate, i):
value = math.atan(math.tan(2.0 * math.pi * float(freq) * (float(i) / float(samplerate))))
return amplitude * value
def plot_samples(num_samples, frequency):
# Generate the samples
samples = []
for i in range(num_samples):
samples.append(sawtooth_sample(1.0, frequency, 44100, i))
# Plot the samples
plt.plot(samples)
plt.show()
So, then I do this to test it:
# Generate 1000 samples of a 100Hz sawtooth wave, sampled at 44.1KHz
plot_samples(1000, 100)
And, I get this:
Great. Looks very sawtooth-y.
But, when I try with a higher frequency, like this:
# Generate 20 samples of a 10KHz sawtooth wave, sampled at 44.1KHz
plot_samples(20, 10000)
Then I get this:
Not very sawtooth-y any more. Looks more like a triangle wave, with a low harmonic that is sawtooth-shaped. What am I doing wrong and/or missing here?
I am currently processing some audio data. I have an audio file that I have created from splitting a larger file on silence using pydub.
However, if I take this audio file after exporting it with pydub, and then convert the AudioSegment's array to numpy array, and re-write it using soundfile, I get an audio file written that is about half the speed as it was originally. What could be going wrong?
import soundfile as sf
import numpy as np
from pydub import AudioSegment, effects
from pathlib import Path
# This code takes a large .mp3 file ("original_audio_mp3") with sample rate of 44100 khz
sound = AudioSegment.from_file(original_audio_mp3)
if sound.frame_rate != desired_sample_rate:
sound = sound.set_frame_rate(desired_sample_rate) # convert to 16000 khz sample rate
sound = effects.normalize(sound) # normalize audio file
dBFS = sound.dBFS # get decibels relative to full scale
sound_chunks = split_on_silence(sound,
min_silence_len = 200, # measured in ms
silence_thresh = dBFS -30 # if DBFS goes 30 below the file's dBFS it will be considered "silence"
)
# this "audio_segment_0.wav" file came from the above code.
audio_file_path = Path("audio_segment_0.wav")
raw_audio = AudioSegment.from_file(audio_file_path).set_frame_rate(16000)
# append 200 ms of silence to beginning and end of file
raw_audio = effects.normalize(raw_audio)
silence = AudioSegment.silent(duration = 200, frame_rate = 16000)
raw_audio_w_silence = silence + raw_audio + silence
# export it
raw_audio_w_silence.export("pydub_audio.wav", format = 'wav') # the output from this sounds completely OK.
# read audio, manipulate and write with soundfile
new_audio = AudioSegment.from_file("pydub_audio.wav").set_frame_rate(16000)
new_audio_signal = np.array(new_audio.get_array_of_samples(), dtype = np.float32) / 32768.0 # scale to between [-1.0, 1.0]
# the output from down here using the scaled numpy array sounds about half the speed as the first.
sf.write("soundfile_export.wav", data = new_audio_signal, samplerate = new_audio.frame_rate, format = 'wav')
I am trying to separate voice from background noise in audio file using python and then extract mfcc features
but I get "librosa.util.exceptions.ParameterError: Invalid shape for monophonic audio: ndim=2, shape=(1025, 5341) "
error
here's the code
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
import scipy
from scipy.io.wavfile import write
import soundfile as sf
from sklearn.preprocessing import normalize
from scipy.io.wavfile import read, write
from scipy.fftpack import rfft, irfft
y, sr = librosa.load('/home/osboxes/Desktop/AccentReco1/audio-files/egyptiansong.mp3', duration=124)
y=rfft(y)
# And compute the spectrogram magnitude and phase
S_full, phase = librosa.magphase(librosa.stft(y))
# We'll compare frames using cosine similarity, and aggregate similar frames
# by taking their (per-frequency) median value.
#
# To avoid being biased by local continuity, we constrain similar frames to be
# separated by at least 2 seconds.
#
# This suppresses sparse/non-repetetitive deviations from the average spectrum,
# and works well to discard vocal elements.
S_filter = librosa.decompose.nn_filter(S_full,
aggregate=np.median,
metric='cosine',
width=int(librosa.time_to_frames(2, sr=sr)))
# The output of the filter shouldn't be greater than the input
# if we assume signals are additive. Taking the pointwise minimium
# with the input spectrum forces this.
S_filter = np.minimum(S_full, S_filter)
# We can also use a margin to reduce bleed between the vocals and instrumentation masks.
# Note: the margins need not be equal for foreground and background separation
margin_i, margin_v = 2, 10
power = 2
mask_i = librosa.util.softmask(S_filter,
margin_i * (S_full - S_filter),
power=power)
mask_v = librosa.util.softmask(S_full - S_filter,
margin_v * S_filter,
power=power)
# Once we have the masks, simply multiply them with the input spectrum
# to separate the components
S_foreground = mask_v * S_full
S_background = mask_i * S_full
# extract mfcc feature from data
mfccs = np.mean(librosa.feature.mfcc(y=S_foreground, sr=sr, n_mfcc=40).T,axis=0)
print(mfccs)
any idea?
You are trying to get the MFCCs for a spectrogram.
You have to convert them back to an audio sample using inverse STFT.
from librosa.core import istft
vocals = istft(S_foreground )
In the following code I created a buffer which holds 10 frames of an audio file in each loop iteration.
import collections
import librosa
import wave
my_buffer = collections.deque(maxlen=10)
f = wave.open('Desktop/0963.wav',"rb")
num_frames = f.getnframes()
for frame in range(num_frames):
my_buffer.append(f.readframes(frame))
Out of the buffer, I need to get a numpy array representing audio amplitude of each sample point with librosa. Any idea?
If you use scipy.io.wavfile , it will directly read a wave file and load data to an numpy array. Which you can then slice as per your requirements.
scipy.io.wavfile reads a WAV file and returns the sample rate (in samples/sec) and data from the WAV file
>>> type(f)
<type 'tuple'>
>>> f
(44100, array([-36, 57, 156, ..., 66, 64, 77], dtype=int16))
>>>
Source Code
from scipy.io.wavfile import read
import numpy as np
f = read('your_audio.wav')
n = np.array(f[1],dtype=float)
for i in xrange(0,len(n),10):
my_buffer = n[i:i+10]
my_buffer contents:
>>>
[ -36. 57. 156. 198. 191. 126. 70. 42. 43. 62.]
[ 69. 71. 83. 117. 159. 177. 151. 89. 14. -27.]
[ -33. -4. 21. 38. 42. 66. 94. 134. 144. 142.]
[ 118. 115. 111. 132. 122. 123. 103. 119. 125. 134.]
.....
.....
Here we have my_buffer with 10 frames per iteration that you can feed into next block.
As mentioned above, scipy.io.wavfile is a good module for reading in and handling audio. If you want to stick with librosa, you can use this to do the same:
import librosa
filepath = 'Desktop/0963.wav'
samplerate = 44100
audio, samplerate = librosa.load(filepath, sr=samplerate)
audio.shape
What I like about librosa.load is that you can specify any sample rate to downsample an audio file to. Hope this helps.
I want to resample a mono recording recordered in 40.000 Hz to 44100 Hz.
The code below works but librosa seems to save in stereo, making the file twice the size which is not needed and I have a lot of samples to process.
So I need to save the result in mono.
Code:
# resampling a .wav file to a specific sample rate
import os
import librosa
import resampy
# this is the sample reate we want
sr_target = 44100
directory_in_str = '/home/hugo/test/'
directory = os.fsencode(directory_in_str)
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".wav"):
file_path = os.path.join(directory_in_str, filename)
print(file_path)
# Load in librosa's example audio file at its native sampling rate
x, sr_orig = librosa.load(file_path, mono=True, sr=None)
print("Original sample rate is : ", sr_orig)
# x is now a 1-d numpy array, with `sr_orig` audio samples per second
# We can resample this to any sampling rate we like, say 16000 Hz
y = resampy.resample(x, sr_orig, sr_target)
file_path_new = os.path.join(directory_in_str+'new/', filename)
# write it back
librosa.output.write_wav(file_path_new, y, sr_target)
continue
else:
continue
Question: I want to save the resampled file in mono, I get stereo and no option to save only mono...
The output is mono or stereo depends on y. If y has the shape of (n,), then the output is mono; If y has the shape of (2,n), then the output is stereo. librosa.output.write_wav won't automatically turn a mono signal to stereo.
From your code, your output audio seems like a stereo audio. The file having twice the size doesn't mean that it's stereo. It may be caused by the different data type of the input and output audio.