How to iterate through audio files when converting into mfccs - python-3.x

I am a beginner, i am converting audio files into mfccs , i have done it for one file but don't know how to iterate it through all dataset. I have multiple folders in Training folder ,one of them is 001(0) from which one wav file is converted.I want to convert all folder's wav files present in Training folder
import os
import numpy as np
import matplotlib.pyplot as plt
from glob import glob
import scipy.io.wavfile as wav
from python_speech_features import mfcc, logfbank
# Read the input audio file
(rate,sig) = wav.read('Downloads/DataVoices/Training/001(0)/001000.wav')
# Take the first 10,000 samples for analysis
sig = sig[:10000]
features_mfcc = mfcc(sig,rate)
# Print the parameters for MFCC
print('\nMFCC:\nNumber of windows =', features_mfcc.shape[0])
print('Length of each feature =', features_mfcc.shape[1])
# Plot the features
features_mfcc = features_mfcc.T
plt.matshow(features_mfcc)
plt.title('MFCC')
# Extract the Filter Bank features
features_fb = logfbank(sig, rate)
# Print the parameters for Filter Bank
print('\nFilter bank:\nNumber of windows =', features_fb.shape[0])
print('Length of each feature =', features_fb.shape[1])
# Plot the features
features_fb = features_fb.T
plt.matshow(features_fb)
plt.title('Filter bank')
plt.show()

You can use glob recursively with wildcards to find all of the wav files.
for f in glob.glob(r'Downloads/DataVoices/Training/**/*.wav', recursive=True):
(rate,sig) = wav.read(f)
# Rest of your code

Related

OSError: Unable to open file (file signature not found)

I am currently doing an assignment on deep learning by downloading the assignment files from github.
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset
%matplotlib inline
You are given a dataset ("data.h5") containing: - a training set of m_train images labeled as cat (y=1) or non-cat (y=0) - a test set of m_test images labeled as cat or non-cat - each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB). Thus, each image is square (height = num_px) and (width = num_px).
# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
I ran the setup.sh file too but the error doesn't seem to go away.
lr_utils.py file:
import numpy as np
import h5py
def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
Kindly help!
I solved the issue by downloading uncorrupted .h5 files and putting them in the folder datasets/ in the same directory.
The files you downloaded are corrupted. You can visit https://github.com/abdur75648/Deep-Learning-Specialization-Coursera to download the uncorrupted files.
you can download uncorrupted files from here :
https://www.kaggle.com/datasets/muhammeddalkran/catvnoncat
and replace it in the directory of the corrupted files

Feature Extraction using MFCC

I want to know, how to extract the audio (x.wav) signal, feature extraction using MFCC? I know the steps of the audio feature extraction using MFCC. I want to know the fine coding in Python using the Django framework
This is the most important step in building a speech recognizer because after converting the speech signal into the frequency domain, we must convert it into the usable form of the feature vector.
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
from python_speech_features import mfcc, logfbank
frequency_sampling, audio_signal =
wavfile.read("/home/user/Downloads/OSR_us_000_0010_8k.wav")
audio_signal = audio_signal[:15000]
features_mfcc = mfcc(audio_signal, frequency_sampling)
print('\nMFCC:\nNumber of windows =', features_mfcc.shape[0])
print('Length of each feature =', features_mfcc.shape[1])
features_mfcc = features_mfcc.T
plt.matshow(features_mfcc)
plt.title('MFCC')
filterbank_features = logfbank(audio_signal, frequency_sampling)
print('\nFilter bank:\nNumber of windows =', filterbank_features.shape[0])
print('Length of each feature =', filterbank_features.shape[1])
filterbank_features = filterbank_features.T
plt.matshow(filterbank_features)
plt.title('Filter bank')
plt.show()
or you may use this code to extract the feature
import numpy as np
from sklearn import preprocessing
import python_speech_features as mfcc
def extract_features(audio,rate):
"""extract 20 dim mfcc features from an audio, performs CMS and combines
delta to make it 40 dim feature vector"""
mfcc_feature = mfcc.mfcc(audio,rate, 0.025, 0.01,20,nfft = 1200, appendEnergy = True)
mfcc_feature = preprocessing.scale(mfcc_feature)
delta = calculate_delta(mfcc_feature)
combined = np.hstack((mfcc_feature,delta))
return combined
you can use following code to extract an audio file MFCC features using librosa package(it is easy to install and work):
import librosa
import librosa.display
audio_path = 'my_audio_file.wav'
x, sr = librosa.load(audio_path)
mfccs = librosa.feature.mfcc(x, sr=sr,n_mfcc=40)
print(mfccs.shape)
also you can Display the MFCCs using following code:
librosa.display.specshow(mfccs, sr=sr, x_axis='time')

getting error in path line, . guide please

I am training my system for texture analysis, using local binary pattern. here I am training images. taken code from somewhere. I am getting the error in defining the path of images.
# OpenCV bindings
import cv2
# To performing path manipulations
import os
# Local Binary Pattern function
from skimage.feature import local_binary_pattern
# To calculate a normalized histogram
from scipy.stats import itemfreq
from sklearn.preprocessing import normalize
# Utility package -- use pip install cvutils to install
import cvutils
# To read class from file
import csv
#Store the path of training images in train_images
train_images = cvutils.imlist ("'C:\Users\Babar\MATLAB\isp\training
images\fire-image1.jpg',
'C:\Users\Babar\MATLAB\isp\training images\fire-image2.jpg',
'C:\Users\Babar\MATLAB\isp\training images\fire-image3.jpg'")
# Dictionary containing image paths as keys and corresponding label as
value
train_dic = {'fire-image1':0,'fire-image2':0,'fire-image3':0}
with open('C:\Users\Babar\MATLAB\isp\class_train.txt', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader:
train_dic[row[0]] = int(row[1])

Trying to generate a Keras model with my own data instead of cifar10

I have followed this example:
https://www.pyimagesearch.com/2017/10/30/how-to-multi-gpu-training-with-keras-python-and-deep-learning/
and had an issue with the following line(line #51):
((trainX, trainY), (testX, testY)) = cifar10.load_data()
as i would like to train it on my own data
is there any simple way to generate this kind of output without digging deep into cifar's implementation?
I am pretty sure it is something that people already did but i cannot find a sample/tutorial/example
Thanks..
Assume you have your images as .jpg format, and your labels as csv format called label.csv, and separated them into 2 folders, train folder and test folder.
Then you can do the following to get the x_train
import cv2 #library for reading images
import numpy as np
import glob #library for reading files in a folder
x_train= []
for file in glob.glob("train/*.jpg"):
im = cv2.imread(file) #reading each image from the folder
x_train.append(im)
x_train = np.array(x_train)
And you can do the following to get the y_train
import csv
y_train= []
with open('train/label.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
y_train.append([int(row[0])]) #converting the string to int (otherwise the csv data will be read as string)
y_train = np.array(y_train)
You can do the same for your test folder, just change the name of the parameters and arguments.

Why is the plot in librosa different?

I am currently trying using librosa to perform stfft, such that the parameter resembles a stfft process from a different framework (Kaldi).
The audio file is fash-b-an251
Kaldi does it using a sample frequency of 16 KHz, window_size = 400 (25ms), hop_length=160 (10ms).
The spectrogram extracted from this looks like this:
I then tried to do the same using librosa:
import numpy as np
import sys
import librosa
import os
import scipy
import matplotlib.pyplot as plt
from matplotlib import cm
# Input parameter
# relative_path_to_file
if len(sys.argv) < 1:
print "Missing Arguments!"
print "python spectogram_librosa.py path_to_audio_file"
sys.exit()
path = sys.argv[1]
abs_path = os.path.abspath(path)
spectogram_dnn = "/home/user/dnn/spectogram"
if not os.path.exists(spectogram_dnn):
print "spectogram_dnn folder didn't exist!"
os.makedirs(spectogram_dnn)
print "Created!"
y,sr = librosa.load(abs_path,sr=16000)
D = librosa.logamplitude(np.abs(librosa.core.stft(y, win_length=400, hop_length=160, window=scipy.signal.hanning,center=False)), ref_power=np.max)
librosa.display.specshow(D,sr=16000,hop_length=160, x_axis='time', y_axis='log', cmap=cm.jet)
plt.colorbar(format='%+2.0f dB')
plt.title('Log power spectrogram')
plt.show()
raw_input()
sys.exit()
Which is basically taken from here:
In which i've modified the stfft function such that it fits my parameters..
Problems is that is creates an entirely different plot..
So.. What am I doing wrong in librosa?.. Why is this plot so much different, from the one created in kaldi.
Am I missing something?
It has to do with the Hz scale. The one in the first image is linear while the one in the second image is logarithmic. You can fix it by either changing the scale in either of the images to match the other.

Resources