From documentation, https://pytorch.org/audio/stable/backend.html#torchaudio.backend.sox_io_backend.load it seems there is no parameter for loading audio with a fixed sampling rate which is important for training models.
How to load a pytorch audio tensor with a fixed sampling rate with torchaudio?
Resample can be used from transforms.
waveform, sample_rate = torchaudio.load('test.wav', normalize=True)
transform = transforms.Resample(sample_rate, sample_rate/10)
waveform = transform(waveform)
You can resample with torchaudio.functional.resample
arr, org_sr = torchaudio.load('path')
arr = torchaudio.functional.resample(arr, orig_freq=org_sr, new_freq=new_sr)
Related
I checked pyaudio but it offers the ability to record the input and manipulate it , i just want to do action when audio input exists.
You can implement a simple input audio detection by using PyAudio. You just need to decide what you mean with audio existence.
In the following example code I have used a simple root mean square calculation with a threshold. An other option is a peak test, just comparing the amplitude of each audio sample with a peak amplitude threshold. What is most useful for you depends on the application.
You can play around with the threshold value (i.e. the minimum amplitude or loudness of audio) and the chunk size (i.e. the latency of the audio detection) to get the behaviour you want.
import pyaudio
import math
RATE = 44100
CHUNK = 1024
AUDIO_EXISTENCE_THRESHOLD = 1000
def detect_input_audio(data, threshold):
if not data:
return False
rms = math.sqrt(sum([x**2 for x in data]) / len(data))
if rms > threshold:
return True
return False
audio = pyaudio.PyAudio()
stream = audio.open(format=pyaudio.paInt16, channels=1, input=True,
rate=RATE, frames_per_buffer=CHUNK)
data = []
while detect_input_audio(data, AUDIO_EXISTENCE_THRESHOLD):
data = stream.read(CHUNK)
# Do something when input audio exists
# ...
stream.stop_stream()
stream.close()
audio.terminate()
I'm building a machine learning model using Apache Spark's ML library and let's say RandomForestClassifier.
I divide the dataset to training and test as below
(tr,test) = dataframe.randomSplit([0.8,0.2]), seed = 23)
apply the model
rf = RandomForestClassifier(numTrees=10,featuresCol = "features",
labelCol = "label")
model= rf.fit(tr)
prediction = model.transform(test)
eval = BinaryClassificationEvaluator(rawPredictionCol="rawPrediction")
eval.evaluate(prediction)
I'm under the impression that this gives me AUC which is not accuracy. How do I get the Precision, recall, F1 and accuracy for this model?
My class variable is binary (0 or 1).
AUC is the area under the ROC curve. Nothing to do with accuracy, but more useful metric according to my opinion. Gives a much better overview of your model's capability.
All the metrics you need are here:
https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html#binary-classification
Take notice that all the metrics are computed towards one label (depends if your true positives are the 0s or the 1s). If you have a class imbalance and you compute your metrics for the major class (lets say the 1s), then your results might be misleading. So use the label that is more important for your model to correctly classify.
Please, read the documentation carefully before using the metrics to fully understand what they are all about.
Cheers.
You can use MulticlassMetrics to get precision and recall.
predictionAndLabels = prediction.select("prediction","label").rdd
# Instantiate metrics objects
multi_metrics = MulticlassMetrics(predictionAndLabels)
precision_score = multi_metrics.weightedPrecision
recall_score = multi_metrics.weightedRecall
Alternatively, you can get the Confusion Matrix and calculate your own.
confusion_matrix = multi_metrics.confusionMatrix().toArray()
I am looking to perform feature extraction for human accelerometer data to use for activity recognition. The sampling rate of my data is 100Hz.
From the various sources I have researched an FFT is a favourable method to use. I have the data in a sliding windows format, the length of each window is 256. I am using Python to do this with the NumPy library. The code I have used to apply the FFt is:
import numpy as np
def fft_transform (window_data):
fft_data = []
fft_freq = []
power_spec = []
for window in window_data:
fft_window = np.fft.fft(window)
fft_data.append(fft_window)
freq = np.fft.fftfreq(np.array(window).shape[-1], d=0.01)
fft_freq.append(freq )
fft_ps = np.abs(fft_window)**2
power_spec.append(fft_ps)
return fft_data, fft_freq, power_spec
This give output which looks like this:
fft_data
array([ 2.92394828e+01 +0.00000000e+00j,
-6.00104665e-01 -7.57915977e+00j,
-1.02677676e+01 -1.55806119e+00j,
-7.17273995e-01 -6.64043705e+00j,
3.45758079e+01 +3.60869421e+01j,
etc..
freq_data
array([ 0. , 0.390625, 0.78125 , 1.171875, 1.5625 , etc...
power_spectrum
array([ 8.54947354e+02, 5.78037884e+01, 1.07854606e+02,
4.46098863e+01, 2.49775388e+03, etc...
I have also plotted the results using this code - where fst_ps is the first array/window of power_spectrum and the fst_freq is the first window/array of the fft_freq data.
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(width, height))
fig1= fig.add_subplot(221)
fig2= fig.add_subplot(222)
fig1.plot(fst_freq, fst_ps)
fig2.plot(fst_freq, np.log10(fst_ps))
plt.show()
I am looking for some advice on what my next step is for extracting features. Thanks
So, as you decomposed signal into spectrum, next step you could try to understand which frequencies is relevant for your application. But it's quite bit difficult to get it from single spectrum picture. Remember, that one frequency bin in the spectrum - it's the same basic signal bounded by narrow frequency range. Some frequencies could not be important for your task.
Better way, if you could try STFT method to understand your signal features in the frequency-time domain. For example, you may read this article about STFT approach on Python. Usually this method applied for searching some kind of time-frequency patterns, which can be recognized as features. For example, in human voice pattern (as in the article) you may see sustainable floating frequencies with duration and frequency bound features. You need to get STFT for your signal to find some patterns on the sonogram to extract features for your task.
I am trying to construct a plot spectrum of an audio sample similar to the one that is created using Audacity. From Audacity's wiki page, the plot spectrum (attached example) performs:
Plot Spectrum take the audio in blocks of 'Size' samples, does the
FFT, and averages all the blocks together.
I was thinking I would use the STFT functionality recently provided by Tensorflow.
I am using audio blocks of size 512, and my code is as follows:
audio_binary = tf.read_file(audio_file)
waveform = tf.contrib.ffmpeg.decode_audio(
audio_binary,
file_format="wav",
samples_per_second=4000,
channel_count=1
)
stft = tf.contrib.signal.stft(
waveform,
512, # frame_length
512, # frame_step
fft_length=512,
window_fn=functools.partial(tf.contrib.signal.hann_window, periodic=True), # matches audacity
pad_end=True,
name="STFT"
)
But the results of stft are is just an empty array when I expect the FFT results for each frame (of 512 samples)
What is wrong with the way that I am making this call?
I have verified that waveform audio data is being correctly read with just the regular tf.fft function.
audio_file = tf.placeholder(tf.string)
audio_binary = tf.read_file(audio_file)
waveform = tf.contrib.ffmpeg.decode_audio(
audio_binary,
file_format="wav",
samples_per_second=sample_rate, # Get Info on .wav files (sample rate)
channel_count=1 # Get Info on .wav files (audio channels)
)
stft = tf.contrib.signal.stft(
tf.transpose(waveform),
frame_length, # frame_lenght, hmmm
frame_step, # frame_step, more hmms
fft_length=fft_length,
window_fn=functools.partial(tf.contrib.signal.hann_window,
periodic=False), # matches audacity
pad_end=False,
name="STFT"
)
I am building digit recognition classification using SVM. I have 10000 data and I split them to training and test data with a ratio 7:3. I use linear kernel.
The results turn out that training accuracy is always 1 when change training example numbers, however the test accuracy is just around 0.9 ( I am expecting a much better accuracy, at least 0.95). I think the results indicates overfitting. However, I worked on the parameters, like C, gamma, ... they don't change the results very much.
Can anyone help me out with how to deal with overfitting in SVM? Thanks very much in advance for your time and help.
The following is my code:
from sklearn import svm, cross_validation
svc = svm.SVC(kernel = 'linear',C = 10000, gamma = 0.0, verbose=True).fit(sample_X,sample_y_1Num)
clf = svc
predict_y_train = clf.predict(sample_X)
predict_y_test = clf.predict(test_X)
accuracy_train = clf.score(sample_X,sample_y_1Num)
accuracy_test = clf.score(test_X,test_y_1Num)
#conduct cross-validation
cv = cross_validation.ShuffleSplit(sample_y_1Num.size, n_iter=10,test_size=0.2, random_state=None)
scores = cross_validation.cross_val_score(clf,sample_X,sample_y_1Num,cv = cv)
score_mean = mean(scores)
One way to reduce the overfitting is by adding more training observations. Since your problem is digit recognition, it easy to synthetically generate more training data by slightly changing the observations in your original data set. You can generate 4 new observations from each of your existing observations by shifting the digit images one pixel left, right, up, and down. This will greatly increase the size of your training data set and should help the classifier learn to generalize, instead of learning the noise.