Why does librosa STFT show wrong frequencies? - python-3.x

I generated a 200 Hz sine wave using numpy and then used librosas stft() and specshow() functions to show a spectrogram. However, the frequency it is showing is not 200 Hz. When I use matplotlibs magnitude_spectrum() function, it shows exactly 200 Hz. Does anyone know why that might be? Am I doing something wrong? Any help will be much appriciated.
The results from librosas spectrogram and matplotlibs frequency spectrum can be seen in the image below.
Minimal working example:
import matplotlib.pyplot as plt
from matplotlib import mlab
%matplotlib inline
import numpy as np
import librosa
import librosa.display
sr = 20000
freq1 = 200
n_fft=2000
x = np.linspace(0, 1, sr)
y = 0.5*np.sin(freq1 * 2 * np.pi * x)
no_window = np.linspace(1, 1, n_fft)
D = np.abs(librosa.stft(y, n_fft=n_fft, hop_length=int(n_fft/2), window=no_window, center=False,))
plt.figure(figsize=(9, 4))
librosa.display.specshow(D, y_axis='linear')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')
plt.ylim(0, 250)
plt.tight_layout()
plt.show()
plt.figure(figsize=(9, 4))
plt.magnitude_spectrum(y, Fs=sr, color='C1', window=mlab.window_none)
plt.xlim(0, 250)
plt.xlabel('Frequency [Hz]')
plt.ylabel('Amplitude [-]')
plt.tight_layout()
plt.show()

Just passing the results to specshow is not enough. You also need to tell it what scale these results are on. You do this be passing the sample rate parameter sr like this:
librosa.display.specshow(D, y_axis='linear', sr=sr)
If you don't, it defaults to sr=22050, hop_length=512, which is certainly not correct in your case.
This is similar to the answer given here.

Related

Get Tangent Point of a Curve

I have the following df (I will attach in the post):
Then I plot two columns, the called Price and the called OG. And it has show something like this:
plt.plot(out["PRICE"], out["OG [%]"])
So I want to get the tangent point(x,y) that optimize the curve. In the image I can see that is nearby (80, 0.160), but how can I get this coordenate automatically, considering that the curve could change in the future ?
Thanks in advance!
DF in CSV:
,INCREASE [%],PRICE,INCREASE,QTY,GPS,NNS,OG [%]
0,0.0,47.69,0.0,239032932.10219583,11399480531.953718,9649069936.361042
1,0.1,52.458999999999996,4.769,267545911.79200616,14035190986.69685,11961949944.986732,0.27315694384293565
2,0.2,57.227999999999994,9.538,296058891.48181653,16942858241.721395,14546786753.89384,0.24307636032561325
3,0.30000000000000004,61.997,14.307000000000002,324571871.1716268,20122482297.027348,17403580363.082355,0.21857913428577896
4,0.4,66.76599999999999,19.076,353084850.8614371,23574063152.614704,20532330772.55227,0.198325906714522
5,0.5,71.535,23.845,381597830.5512475,27297600808.483486,23933037982.30361,0.18134997420002735
6,0.6000000000000001,76.304,28.614000000000004,410110810.2410579,31293095264.633682,27605701992.33637,0.16694472549220507
7,0.7000000000000001,81.07300000000001,33.383,438623789.93086815,35560546521.06528,31550322802.650528,0.1545858626459231
8,0.8,85.842,38.152,467136769.6206784,40099954577.778275,35766900413.246086,0.14387833953735796
9,0.9,90.61099999999999,42.921,495649749.3104888,44911319434.7727,40255434824.12307,0.13452003951711053
10,1.0,95.38,47.69,524162729.0002991,49994641092.04852,45015926035.28145,0.12627665505254082
11,1.1,100.149,52.459,552675708.6901095,55349919549.605774,50048374046.72126,0.11896408514089048
12,1.2000000000000002,104.918,57.22800000000001,581188688.3799199,60977154807.444435,55352778858.44248,0.11243592554246645
13,1.3,109.687,61.997,609701668.0697302,66876346865.56449,60929140470.44511,0.10657445172186328
14,1.4000000000000001,114.456,66.766,638214647.7595404,73047495723.96596,66777458882.729126,0.10128402946033532
15,1.5,119.225,71.535,666727627.4493507,79490601382.64883,72897734095.29456,0.09648623602161768
16,1.6,123.994,76.304,695240607.1391611,86205663841.61314,79289966108.14143,0.09211620281895366
17,1.7000000000000002,128.763,81.07300000000001,723753586.8289715,93192683100.85886,85954154921.26971,0.08811984166718287
18,1.8,133.53199999999998,85.842,752266566.5187817,100451659160.38594,92890300534.67935,0.08445171808362244
19,1.9000000000000001,138.301,90.611,780779546.208592,107982592020.19447,100098402948.37045,0.08107340396640193
20,2.0,143.07,95.38,809292525.8984023,115785481680.28442,107578462162.34296,0.07795218934826136
This particular curve does not have and inflection point or "knee" (elbow):
from kneed import KneeLocator
kn = KneeLocator(x = out['PRICE'], y = out['OG [%] '], curve='convex', direction='decreasing')
print(kn.knee)
None
But if it did, you would do it like this:
y = [7342, 6881, 6531,
6356, 6209, 6094,
5980, 5880, 5779,
5691, 5617, 5532,
5467, 5395, 5345,
5290, 5243, 5207,
5164]
x = range(1, len(y)+1)
import kneed
from kneed import KneeLocator
kn = KneeLocator(x, y, curve='convex', direction='decreasing')
print(kn.knee)
print(round(kn.knee_y, 3))
import matplotlib.pyplot as plt
plt.xlabel('x')
plt.ylabel('y')
plt.plot(x, y, 'bx-')
plt.vlines(kn.knee, plt.ylim()[0], plt.ylim()[1], linestyles='dashed')
where
(print(kn.knee),print(round(kn.knee_y, 3)))
(5,6209)
gives you the coordinates of the knee.

Why does this polynomial function give drastically different results, depending on the amount of points I plot?

import matplotlib.pyplot as plt
import numpy as np
def func(x):
return 5.61929612e+02 + 6.81573974e-01*x - 4.10728802e-03*x**2 + 1.87813061e-05*x**3 - 5.48199867e-08*x**4 + 8.41432160e-11*x**5 - 6.22733129e-14*x**6 + 1.76282052e-17*x**7
x_values = np.arange(0, 750, 1)
plt.plot(x_values, func(x_values))
plt.show()
When I run this code this is the result:
However when I increase the resolution:
x_values = np.arange(0, 750, 0.1)
plt.plot(x_values, func(x_values))
plt.show()
I get this:
I am using Python 3.7 and the up to date versions of pyplot and numpy. (16.06.2020)
I have used the fix of increasing the resolution but this does not make any sense to me. Is there someone who can help?

Why does matplotlib magnitude_spectrum function seem to show wrong magnitudes?

I created a 1 second long audio sample consiting of two sine waves and then used matplotlibs magnitude spectrum function to plot the spectrum and the results seem to be wrong. The two waves have the exact same amplitude throughout the one second audio sample, and yet the magnitudes are vastly different. This seemed weird to me, so I have also used numpys functions to plot the DFT and the magnitudes are the exact same, as I think they should be. The resulting plots are shown in the image below. Does anyone know, why that might be? Did I do anything wrong in my code? Any help will be much appriciated.
Minimal working example:
import matplotlib.pyplot as plt
import numpy as np
sr = 20000
freq1 = 200
freq2 = 100
duration = 1
x = np.linspace(0, duration, sr * duration)
y = np.concatenate([0.5*np.sin(freq1 * 2 * np.pi * x[:10000]) + 0.5*np.sin(freq2 * 2 * np.pi * x[:10000]), np.sin(freq1 * 2 * np.pi * x[10000:15000]), np.sin(freq2 * 2 * np.pi * x[15000:20000])])
fig, ax = plt.subplots(3, 1, figsize=(12, 10))
ax[0].plot(x, y)
ax[0].axis(xmin=0, xmax=1)
ax[0].set_xlabel('Time [s]')
ax[0].set_ylabel('Amplitude [-]')
ax[1].magnitude_spectrum(y, Fs=sr, color='C1')
ax[1].axis(xmin=0, xmax=500)
ax[1].set_xlabel('Frequency [Hz]')
ax[1].set_ylabel('Magnitude [-]')
ax[2].plot(np.fft.rfftfreq(sr, d=1/sr), np.abs(np.fft.rfft(y, norm='ortho'))/100)
ax[2].axis(xmin=0, xmax=500)
ax[2].set_xlabel('Frequency [Hz]')
ax[2].set_ylabel('Magnitude [-]')
plt.tight_layout()
plt.show()
I think it is related to the window used in the matplotlib. By default, it uses Hanning window, so change to window type to window_none. Also the way the scaling is done is different in both cases. By doing following changes, you will see them both match.
from matplotlib import mlab
ax[1].magnitude_spectrum(y, Fs=sr, color='C1', window=mlab.window_none)
ax[2].plot(np.fft.rfftfreq(sr, d=1/sr), np.abs(np.fft.rfft(y))/sr)
results in

Using horizontal line to fit the model

I am writing a python code using horizontal line for investigating the under-fiting using the function sin(2.pi.x) in range of [0,1].
I first generate N data points by adding some random noise using Gaussian distribution with mu=0 and sigma=1.
import matplotlib.pyplot as plt
import numpy as np
# generate N random points
N=30
X= np.random.rand(N,1)
y= np.sin(np.pi*2*X)+ np.random.randn(N,1)
I need to fit the model using horizontal line and display it. But I don't know how to do next.
Could you help me figure out this problem? I'd appreciate about it.
Assuming that you want to use the least squares loss function, by definition you are trying to find the value of yhat minimizing np.sum((y-yhat)**2). Differentiating by yhat, you'll find that the minimum is achieved at yhat = np.sum(y)/N, which is of course nothing but y.mean(), as also already pointed out by #ImportanceOfBeingErnest in the comments.
plt.scatter(X, y)
plt.plot(X, np.zeros(N) + np.mean(y))
From what I understand you're generating a noisy Sine wave and trying to fit a horizontal line?
import os
import fnmatch
import numpy as np
import matplotlib.pyplot as plt
# generate N random points
N=60
X= np.linspace(0.0,2*np.pi, num=N)
noise = 0.1 * np.random.randn(N)
y= np.sin(4*X) + noise
numer = sum([xi*yi for xi,yi in zip(X, y)]) - N * np.mean(X) * np.mean(y)
denum = sum([xi**2 for xi in X]) - N * np.mean(X)**2
b = numer / denum
A = np.mean(y) - b * np.mean(X)
y_ = b * X+ A
plt.plot(X,y)
plt.plot(X,y_)
plt.show()

Smooth curves in Python Plots [duplicate]

I've got the following simple script that plots a graph:
import matplotlib.pyplot as plt
import numpy as np
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
plt.plot(T,power)
plt.show()
As it is now, the line goes straight from point to point which looks ok, but could be better in my opinion. What I want is to smooth the line between the points. In Gnuplot I would have plotted with smooth cplines.
Is there an easy way to do this in PyPlot? I've found some tutorials, but they all seem rather complex.
You could use scipy.interpolate.spline to smooth out your data yourself:
from scipy.interpolate import spline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
power_smooth = spline(T, power, xnew)
plt.plot(xnew,power_smooth)
plt.show()
spline is deprecated in scipy 0.19.0, use BSpline class instead.
Switching from spline to BSpline isn't a straightforward copy/paste and requires a little tweaking:
from scipy.interpolate import make_interp_spline, BSpline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
spl = make_interp_spline(T, power, k=3) # type: BSpline
power_smooth = spl(xnew)
plt.plot(xnew, power_smooth)
plt.show()
Before:
After:
For this example spline works well, but if the function is not smooth inherently and you want to have smoothed version you can also try:
from scipy.ndimage.filters import gaussian_filter1d
ysmoothed = gaussian_filter1d(y, sigma=2)
plt.plot(x, ysmoothed)
plt.show()
if you increase sigma you can get a more smoothed function.
Proceed with caution with this one. It modifies the original values and may not be what you want.
See the scipy.interpolate documentation for some examples.
The following example demonstrates its use, for linear and cubic spline interpolation:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
# Define x, y, and xnew to resample at.
x = np.linspace(0, 10, num=11, endpoint=True)
y = np.cos(-x**2/9.0)
xnew = np.linspace(0, 10, num=41, endpoint=True)
# Define interpolators.
f_linear = interp1d(x, y)
f_cubic = interp1d(x, y, kind='cubic')
# Plot.
plt.plot(x, y, 'o', label='data')
plt.plot(xnew, f_linear(xnew), '-', label='linear')
plt.plot(xnew, f_cubic(xnew), '--', label='cubic')
plt.legend(loc='best')
plt.show()
Slightly modified for increased readability.
One of the easiest implementations I found was to use that Exponential Moving Average the Tensorboard uses:
def smooth(scalars: List[float], weight: float) -> List[float]: # Weight between 0 and 1
last = scalars[0] # First value in the plot (first timestep)
smoothed = list()
for point in scalars:
smoothed_val = last * weight + (1 - weight) * point # Calculate smoothed value
smoothed.append(smoothed_val) # Save it
last = smoothed_val # Anchor the last smoothed value
return smoothed
ax.plot(x_labels, smooth(train_data, .9), x_labels, train_data)
I presume you mean curve-fitting and not anti-aliasing from the context of your question. PyPlot doesn't have any built-in support for this, but you can easily implement some basic curve-fitting yourself, like the code seen here, or if you're using GuiQwt it has a curve fitting module. (You could probably also steal the code from SciPy to do this as well).
Here is a simple solution for dates:
from scipy.interpolate import make_interp_spline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
from datetime import datetime
data = {
datetime(2016, 9, 26, 0, 0): 26060, datetime(2016, 9, 27, 0, 0): 23243,
datetime(2016, 9, 28, 0, 0): 22534, datetime(2016, 9, 29, 0, 0): 22841,
datetime(2016, 9, 30, 0, 0): 22441, datetime(2016, 10, 1, 0, 0): 23248
}
#create data
date_np = np.array(list(data.keys()))
value_np = np.array(list(data.values()))
date_num = dates.date2num(date_np)
# smooth
date_num_smooth = np.linspace(date_num.min(), date_num.max(), 100)
spl = make_interp_spline(date_num, value_np, k=3)
value_np_smooth = spl(date_num_smooth)
# print
plt.plot(date_np, value_np)
plt.plot(dates.num2date(date_num_smooth), value_np_smooth)
plt.show()
It's worth your time looking at seaborn for plotting smoothed lines.
The seaborn lmplot function will plot data and regression model fits.
The following illustrates both polynomial and lowess fits:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
df = pd.DataFrame(data = {'T': T, 'power': power})
sns.lmplot(x='T', y='power', data=df, ci=None, order=4, truncate=False)
sns.lmplot(x='T', y='power', data=df, ci=None, lowess=True, truncate=False)
The order = 4 polynomial fit is overfitting this toy dataset. I don't show it here but order = 2 and order = 3 gave worse results.
The lowess = True fit is underfitting this tiny dataset but may give better results on larger datasets.
Check the seaborn regression tutorial for more examples.
Another way to go, which slightly modifies the function depending on the parameters you use:
from statsmodels.nonparametric.smoothers_lowess import lowess
def smoothing(x, y):
lowess_frac = 0.15 # size of data (%) for estimation =~ smoothing window
lowess_it = 0
x_smooth = x
y_smooth = lowess(y, x, is_sorted=False, frac=lowess_frac, it=lowess_it, return_sorted=False)
return x_smooth, y_smooth
That was better suited than other answers for my specific application case.

Resources