numpy data sounds different than original sound_file.wav file - python-3.x

import wave
import numpy as np
from IPython.display import display, Audio
#sound 1
with wave.open('sound_file.wav', 'rb') as wf:
signal = np.frombuffer(wf.readframes(nframes = wf.getnframes()), 'int' + str(int(16 * wf.getsampwidth())))
display(Audio(data = signal, rate = wf.getframerate()))
#sound2
display(Audio('sound_file.wav'))
here the sound1 sounds different than sound2 so can anyone tell me what happens there.
As well as please state some usual practices of sound preprocessing that must be done after getting an np array from a sound file.

Related

Getting soundfile.LibsndfileError: Error opening 'speech.wav': Format not recognized when giving 2D numpy array to soundfile

Tried generating audio from tensors generated from NVIDIA TTS nemo model before running into the error:
Here is the code for it:
import soundfile as sf
from nemo.collections.tts.models import FastPitchModel
from nemo.collections.tts.models import HifiGanModel
spec_generator = FastPitchModel.from_pretrained("tts_en_fastpitch")
vocoder = HifiGanModel.from_pretrained(model_name="tts_hifigan")
text = "Just keep being true to yourself, if you're passionate about something go for it. Don't sacrifice anything, just have fun."
parsed = spec_generator.parse(text)
spectrogram = spec_generator.generate_spectrogram(tokens=parsed)
audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)
audio = audio.to('cpu').detach().numpy()
sf.write("speech.wav", audio, 22050)
Expected to get an audio file speech.wav
Looking at your example I see that your audio shape is (1, 173056).
Based on https://github.com/bastibe/python-soundfile/issues/309 I have converted the audio to 1D array of size 173056 and worked fine.
Used code:
>>> import numpy as np
>>> sf.write("speech.wav", np.ravel(audio), sample_rate)
Regards,

How to convert audio file to picture for every 0.1 seconds in python

I have an auido file of the engine. I would like to convert it into the spectrogram.
import matplotlib.pyplot as plot
from scipy import signal
from scipy.io import wavfile
from pydub import AudioSegment
samplingFrequency, signalData = wavfile.read('datamusic/car2.wav')
plot.subplot(111)
plot.specgram(signalData,Fs=samplingFrequency)
plot.savefig("datapicture/car2.png")
It is the code that o have found. However it converts all sound.

Python 3 neopixel non-functional

import neopixel
import board
import sounddevice as sd
from numpy import linalg as LA
import numpy as np
def getvolume(indata, outdata, frames, time, status):
volume_norm = np.linalg.norm(indata)*10
with sd.Stream(callback=getvolume):
sd.sleep(250)
pixels = neopixel.NeoPixel(board.D18, 144)
pixels.fill((volume_norm, volume_norm, volume_norm))`
I want the code to make the LED strip shine brighter the louder the sound is. It currently doesn't do anything. What am I doing wrong?

Feature Extraction using MFCC

I want to know, how to extract the audio (x.wav) signal, feature extraction using MFCC? I know the steps of the audio feature extraction using MFCC. I want to know the fine coding in Python using the Django framework
This is the most important step in building a speech recognizer because after converting the speech signal into the frequency domain, we must convert it into the usable form of the feature vector.
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
from python_speech_features import mfcc, logfbank
frequency_sampling, audio_signal =
wavfile.read("/home/user/Downloads/OSR_us_000_0010_8k.wav")
audio_signal = audio_signal[:15000]
features_mfcc = mfcc(audio_signal, frequency_sampling)
print('\nMFCC:\nNumber of windows =', features_mfcc.shape[0])
print('Length of each feature =', features_mfcc.shape[1])
features_mfcc = features_mfcc.T
plt.matshow(features_mfcc)
plt.title('MFCC')
filterbank_features = logfbank(audio_signal, frequency_sampling)
print('\nFilter bank:\nNumber of windows =', filterbank_features.shape[0])
print('Length of each feature =', filterbank_features.shape[1])
filterbank_features = filterbank_features.T
plt.matshow(filterbank_features)
plt.title('Filter bank')
plt.show()
or you may use this code to extract the feature
import numpy as np
from sklearn import preprocessing
import python_speech_features as mfcc
def extract_features(audio,rate):
"""extract 20 dim mfcc features from an audio, performs CMS and combines
delta to make it 40 dim feature vector"""
mfcc_feature = mfcc.mfcc(audio,rate, 0.025, 0.01,20,nfft = 1200, appendEnergy = True)
mfcc_feature = preprocessing.scale(mfcc_feature)
delta = calculate_delta(mfcc_feature)
combined = np.hstack((mfcc_feature,delta))
return combined
you can use following code to extract an audio file MFCC features using librosa package(it is easy to install and work):
import librosa
import librosa.display
audio_path = 'my_audio_file.wav'
x, sr = librosa.load(audio_path)
mfccs = librosa.feature.mfcc(x, sr=sr,n_mfcc=40)
print(mfccs.shape)
also you can Display the MFCCs using following code:
librosa.display.specshow(mfccs, sr=sr, x_axis='time')

Why is the plot in librosa different?

I am currently trying using librosa to perform stfft, such that the parameter resembles a stfft process from a different framework (Kaldi).
The audio file is fash-b-an251
Kaldi does it using a sample frequency of 16 KHz, window_size = 400 (25ms), hop_length=160 (10ms).
The spectrogram extracted from this looks like this:
I then tried to do the same using librosa:
import numpy as np
import sys
import librosa
import os
import scipy
import matplotlib.pyplot as plt
from matplotlib import cm
# Input parameter
# relative_path_to_file
if len(sys.argv) < 1:
print "Missing Arguments!"
print "python spectogram_librosa.py path_to_audio_file"
sys.exit()
path = sys.argv[1]
abs_path = os.path.abspath(path)
spectogram_dnn = "/home/user/dnn/spectogram"
if not os.path.exists(spectogram_dnn):
print "spectogram_dnn folder didn't exist!"
os.makedirs(spectogram_dnn)
print "Created!"
y,sr = librosa.load(abs_path,sr=16000)
D = librosa.logamplitude(np.abs(librosa.core.stft(y, win_length=400, hop_length=160, window=scipy.signal.hanning,center=False)), ref_power=np.max)
librosa.display.specshow(D,sr=16000,hop_length=160, x_axis='time', y_axis='log', cmap=cm.jet)
plt.colorbar(format='%+2.0f dB')
plt.title('Log power spectrogram')
plt.show()
raw_input()
sys.exit()
Which is basically taken from here:
In which i've modified the stfft function such that it fits my parameters..
Problems is that is creates an entirely different plot..
So.. What am I doing wrong in librosa?.. Why is this plot so much different, from the one created in kaldi.
Am I missing something?
It has to do with the Hz scale. The one in the first image is linear while the one in the second image is logarithmic. You can fix it by either changing the scale in either of the images to match the other.

Resources