i tried to generate a spectrogram for each axis in my dataset
here what i tried
dataset = np.loadtxt("trainingdataset.txt", delimiter=",", dtype = np.int32)
fake_size = 1415684
time = np.arange(fake_size)/1415684 # 1kHz
base_freq = 2 * np.pi * 100
x = dataset[:,2]
y = dataset[:,3]
z = dataset[:,4]
xyz_magnitude = x**2 + y**2 + z**2
to_plot = [('x', x), ('y', y), ('z', z), ('xyz', xyz_magnitude)]
for chl, data in to_plot:
plt.figure(); plt.title(chl)
d = plt.specgram(data, Fs=1000)
plt.xlabel('Time [s]'); plt.ylabel('Frequency [Hz]')
plt.show()
but it gives the following warning
Warning (from warnings module):
File "C:\Users\hadeer.elziaat\AppData\Local\Programs\Python\Python36\lib\site-packages\matplotlib\axes\_axes.py", line 7221
Z = 10. * np.log10(spec)
RuntimeWarning: divide by zero encountered in log10
the dataset headers
(patient number, time/millisecond, x-axis, y-axis, z-axis, label)
1,15,70,39,-970,0
1,31,70,39,-970,0
1,46,60,49,-960,0
1,62,60,49,-960,0
1,78,50,39,-960,0
1,93,50,39,-960,0
1,109,60,39,-990,0
According to the manual, the default scaling is dB. In case of zero values in the calculated spectrogram, evaluation of the logarithmic scale will lead to an error.
Related
I have 2 sets of datapoints:
import random
import pandas as pd
A = pd.DataFrame({'x':[random.uniform(0, 1) for i in range(0,100)], 'y':[random.uniform(0, 1) for i in range(0,100)]})
B = pd.DataFrame({'x':[random.uniform(0, 1) for i in range(0,100)], 'y':[random.uniform(0, 1) for i in range(0,100)]})
For each one of these dataset I can produce the jointplot like this:
import seaborn as sns
sns.jointplot(x=A["x"], y=A["y"], kind='kde')
sns.jointplot(x=B["x"], y=B["y"], kind='kde')
Is there a way to calculate the "common area" between these 2 joint plots ?
By common area, I mean, if you put one joint plot "inside" the other, what is the total area of intersection. So if you imagine these 2 joint plots as mountains, and you put one mountain inside the other, how much does one fall inside the other ?
EDIT
To make my question more clear:
import matplotlib.pyplot as plt
import scipy.stats as st
def plot_2d_kde(df):
# Extract x and y
x = df['x']
y = df['y']
# Define the borders
deltaX = (max(x) - min(x))/10
deltaY = (max(y) - min(y))/10
xmin = min(x) - deltaX
xmax = max(x) + deltaX
ymin = min(y) - deltaY
ymax = max(y) + deltaY
# Create meshgrid
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
# We will fit a gaussian kernel using the scipy’s gaussian_kde method
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)
fig = plt.figure(figsize=(13, 7))
ax = plt.axes(projection='3d')
surf = ax.plot_surface(xx, yy, f, rstride=1, cstride=1, cmap='coolwarm', edgecolor='none')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('PDF')
ax.set_title('Surface plot of Gaussian 2D KDE')
fig.colorbar(surf, shrink=0.5, aspect=5) # add color bar indicating the PDF
ax.view_init(60, 35)
I am interested in finding the interection/common volume (just the number) of these 2 kde plots:
plot_2d_kde(A)
plot_2d_kde(B)
Credits: The code for the kde plots is from here
I believe this is what you're looking for. I'm basically calculating the space (integration) of the intersection (overlay) of the two KDE distributions.
A = pd.DataFrame({'x':[random.uniform(0, 1) for i in range(0,100)], 'y':[random.uniform(0, 1) for i in range(0,100)]})
B = pd.DataFrame({'x':[random.uniform(0, 1) for i in range(0,100)], 'y':[random.uniform(0, 1) for i in range(0,100)]})
# KDE fro both A and B
kde_a = scipy.stats.gaussian_kde([A.x, A.y])
kde_b = scipy.stats.gaussian_kde([B.x, B.y])
min_x = min(A.x.min(), B.x.min())
min_y = min(A.y.min(), B.y.min())
max_x = max(A.x.max(), B.x.max())
max_y = max(A.y.max(), B.y.max())
print(f"x is from {min_x} to {max_x}")
print(f"y is from {min_y} to {max_y}")
x = [a[0] for a in itertools.product(np.arange(min_x, max_x, 0.01), np.arange(min_y, max_y, 0.01))]
y = [a[1] for a in itertools.product(np.arange(min_x, max_x, 0.01), np.arange(min_y, max_y, 0.01))]
# sample across 100x100 points.
a_dist = kde_a([x, y])
b_dist = kde_b([x, y])
print(a_dist.sum() / len(x)) # intergral of A
print(b_dist.sum() / len(x)) # intergral of B
print(np.minimum(a_dist, b_dist).sum() / len(x)) # intergral of the intersection between A and B
The following code compares calculating the volume of the intersection either via scipy's dblquad or via taking the average value over a grid.
Remarks:
For the 2D case (and with only 100 sample points), it seems the delta's need to be quite larger than 10%. The code below uses 25%. With a delta of 10%, the calculated values for f1 and f2 are about 0.90, while in theory they should be 1.0. With a delta of 25%, these values are around 0.994.
To approximate the volume the simple way, the average needs to be multiplied by the area (here (xmax - xmin)*(ymax - ymin)). Also, the more grid points are considered, the better the approximation. The code below uses 1000x1000 grid points.
Scipy has some special functions to calculate the integral, such as scipy.integrate.dblquad. This is much slower than the 'simple' method, but a bit more precise. The default precision didn't work, so the code below reduces that precision considerably. (dblquad outputs two numbers: the approximate integral and an indication of the error. To only get the integral, dblquad()[0] is used in the code.)
The same approach can be used for more dimensions. For the 'simple' method, create a more dimensional grid (xx, yy, zz = np.mgrid[xmin:xmax:100j, ymin:ymax:100j, zmin:zmax:100j]). Note that a subdivision by 1000 in each dimension would create a grid that's too large to work with.
When using scipy.integrate, dblquad needs to be replaced by tplquad for 3 dimensions or nquad for N dimensions. This probably will also be rather slow, so the accuracy needs to be reduced further.
import numpy as np
import pandas as pd
import scipy.stats as st
from scipy.integrate import dblquad
df1 = pd.DataFrame({'x':np.random.uniform(0, 1, 100), 'y':np.random.uniform(0, 1, 100)})
df2 = pd.DataFrame({'x':np.random.uniform(0, 1, 100), 'y':np.random.uniform(0, 1, 100)})
# Extract x and y
x1 = df1['x']
y1 = df1['y']
x2 = df2['x']
y2 = df2['y']
# Define the borders
deltaX = (np.max([x1, x2]) - np.min([x1, x2])) / 4
deltaY = (np.max([y1, y2]) - np.min([y1, y2])) / 4
xmin = np.min([x1, x2]) - deltaX
xmax = np.max([x1, x2]) + deltaX
ymin = np.min([y1, y2]) - deltaY
ymax = np.max([y1, y2]) + deltaY
# fit a gaussian kernel using scipy’s gaussian_kde method
kernel1 = st.gaussian_kde(np.vstack([x1, y1]))
kernel2 = st.gaussian_kde(np.vstack([x2, y2]))
print('volumes via scipy`s dblquad (volume):')
print(' volume_f1 =', dblquad(lambda y, x: kernel1((x, y)), xmin, xmax, ymin, ymax, epsabs=1e-4, epsrel=1e-4)[0])
print(' volume_f2 =', dblquad(lambda y, x: kernel2((x, y)), xmin, xmax, ymin, ymax, epsabs=1e-4, epsrel=1e-4)[0])
print(' volume_intersection =',
dblquad(lambda y, x: np.minimum(kernel1((x, y)), kernel2((x, y))), xmin, xmax, ymin, ymax, epsabs=1e-4, epsrel=1e-4)[0])
Alternatively, one can calculate the mean value over a grid of points, and multiply the result by the area of the grid. Note that np.mgrid is much faster than creating a list via itertools.
# Create meshgrid
xx, yy = np.mgrid[xmin:xmax:1000j, ymin:ymax:1000j]
positions = np.vstack([xx.ravel(), yy.ravel()])
f1 = np.reshape(kernel1(positions).T, xx.shape)
f2 = np.reshape(kernel2(positions).T, xx.shape)
intersection = np.minimum(f1, f2)
print('volumes via the mean value multiplied by the area:')
print(' volume_f1 =', np.sum(f1) / f1.size * ((xmax - xmin)*(ymax - ymin)))
print(' volume_f2 =', np.sum(f2) / f2.size * ((xmax - xmin)*(ymax - ymin)))
print(' volume_intersection =', np.sum(intersection) / intersection.size * ((xmax - xmin)*(ymax - ymin)))
Example output:
volumes via scipy`s dblquad (volume):
volume_f1 = 0.9946974276169385
volume_f2 = 0.9928998852123891
volume_intersection = 0.9046421634401607
volumes via the mean value multiplied by the area:
volume_f1 = 0.9927873844924111
volume_f2 = 0.9910132867915901
volume_intersection = 0.9028999384136771
I edited some examples to make a simulation for the voltage superposition of 2 point charges and made a 3D surface plot, the code is the following:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
q1 = 2e-9
q2 = -2e-9
K = 9e9
#Charge1 position
x1 = 2.0
y1 = 4.0
#Charge2 position
x2 = 6.0
y2 = 4.0
x = np.linspace(0,8,50)
y = np.linspace(0,8,50)
x, y = np.meshgrid(x,y)
r1 = np.sqrt((x - x1)**2 + (y - y1)**2)
r2 = np.sqrt((x - x2)**2 + (y - y2)**2)
V = K*(q1/r1 + q2/r2)
fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(x, y, V, rstride=1, cstride=1, cmap=cm.rainbow,
linewidth=0, antialiased=False)
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()
3D Surface
Now what I want to do is a contour plot with a vector (quiver) plot on top of it. I tried the following code, but I get a bunch of buggy vectors coming out of both charges, even the negative one:
fig2, ax2 = plt.subplots(1,1)
cp = ax2.contourf(x, y, V, cmap=cm.coolwarm)
fig2.colorbar(cp)
v,u = np.gradient(-V, 0.2, 0.2) #E = -∇V
ax2.quiver(x, y, u, v)
ax2.set_title("Point Charges")
plt.show()
Buggy vectors
I suspect that the long vectors are related to a division by zero. The vectors should come out of the positive charge and get into the negative one. But how would I go about fixing them? Thanks in advance.
Welcome to SO, very nice MWE. One option would be to exclude all vectors beyond a certain length by setting them to NaN. Here I use the 95th percentile.
r = np.sqrt(u**2 + v**2)
is_valid = r < np.percentile(r, 95)
u[~is_valid] = np.nan
v[~is_valid] = np.nan
x[~is_valid] = np.nan
y[~is_valid] = np.nan
fig2, ax2 = plt.subplots(1,1)
cp = ax2.contourf(x, y, V, cmap=cm.coolwarm)
fig2.colorbar(cp)
ax2.quiver(x, y, u, v)
ax2.set_title("Point Charges")
ax2.set_xlim(0, 8)
ax2.set_ylim(0, 8)
plt.show()
i am new in python and i want to extract the spectrogram of the magnitude in a text file
Here is my code
dataset = np.loadtxt("trainingdataset.txt", delimiter=",", dtype = np.int32)
fake_size = 1415684
time = np.arange(fake_size)/1415684 # 1kHz
base_freq = 2 * np.pi * 100
x = dataset[:,2]
y = dataset[:,3]
z = dataset[:,4]
xyz_magnitude = x**2 + y**2 + z**2
to_plot = [('x', x), ('y', y), ('z', z), ('xyz', xyz_magnitude)]
for chl, data in to_plot:
plt.figure(); plt.title(chl)
d = plt.specgram(data, Fs=1000)
plt.xlabel('Time [s]'); plt.ylabel('Frequency [Hz]')
plt.show()
and there is a sample of my dataset, the dataset headers is a as follows (patient number, time/millisecond, X-axis, Y-axis, Z-axis, label
1,15,70,39,-970,0
1,31,70,39,-970,0
1,46,60,49,-960,0
1,62,60,49,-960,0
1,78,50,39,-960,0
1,93,50,39,-960,0
1,109,60,39,-990,0
Edit from comments:
there is warning in that code Warning (from warnings module): File "C:\Users******\AppData\Local\Programs\Python\Python36\lib\site-packages\matplotlib\axes_axes.py", line 7221 Z = 10. * np.log10(spec) RuntimeWarning: divide by zero encountered in log10 –
i try to convert my accelerometer signal (x-axis) and (y-axis) and (z-axis) into spectrogram by calculating the magnitude
the dataset is divided into three text files for x,y and z respectively each file contains one column and 1000 rows.
and this is my code
import numpy as np
import matplotlib.pyplot as plt
fake_size = 1000
time = np.arange(fake_size)/1000 # 1kHz
x = np.loadtxt("trainingdatasetX.txt", delimiter=",")
y = np.loadtxt("trainingdatasetY.txt", delimiter=",")
z = np.loadtxt("trainingdatasetZ.txt", delimiter=",")
xyz_magnitude = x**2 + y**2 + z**2
to_plot = [('x', x), ('y', y), ('z', z), ('xyz', xyz_magnitude)]
for chl, data in to_plot:
plt.figure(); plt.title(chl)
plt.specgram(data, Fs=1000)
plt.xlabel('Time [s]'); plt.ylabel('Frequency [Hz]')
plt.show()
but it gives me the following error
Warning (from warnings module):
File "C:\Users\hadeer.elziaat\AppData\Local\Programs\Python\Python36\lib\site-packages\matplotlib\axes\_axes.py", line 7221
Z = 10. * np.log10(spec)
RuntimeWarning: divide by zero encountered in log10
I have reviewed the response to this question: How would I iterate over a list of files and plot them as subplots on a single figure?
But am none the wiser on how to achieve my goal. I would like to plot multiple data sets, with differing x axes, onto a single figure in Python. I have included a snippet of my code below, which performs an FFT on a dataset, then calculates 3 Butterworth filter outputs. Ideally I would like to have all plotted on a single figure, which I have attempted to achieve in the code below.
The for loop calculates the 3 Butterworth filter outputs, the code above - the FFT and the code directly below attempts to append the FFT curve and sqrt(0.5) line to the previously generated plots for display.
Any Direction or advice would be appreciated.
"""Performs a Fast Fourier Transform on the data specified at the base of the code"""
def FFT(col):
x = io2.loc[1:,'Time']
y = io2.loc[1:,col]
# Number of samplepoints
#N = 600
N = pd.Series.count(x)
N2 = int(N/2)
# sample spacing
#T = 1.0 / 800.0
T = 1/(io2.loc[2,'Time'] - io2.loc[1,'Time'])
#x = np.linspace(0.0, N*T, N)
#y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x)
yf = scipy.fftpack.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), N2)
fig=plt.figure()
plt.clf()
i=1
for order in [3, 6, 9]:
ax=fig.add_subplot(111, label="order = %d" % order)
b, a = butter_lowpass(cutoff, fs, order=order)
w, h = freqz(b, a, worN=2000)
ax.plot((fs * 0.5 / np.pi) * w, abs(h))
i=i+1
ax4=fig.add_subplot(111, label='sqrt(0.5)', frame_on=False)
ax5=fig.add_subplot(111, label="FFT of "+col, frame_on=False)
ax4.plot([0, 0.5 * fs], [np.sqrt(0.5), np.sqrt(0.5)], '--')
ax5.plot(xf, 2.0/N * np.abs(yf[:N2]))
plt.xlabel('Frequency (Hz)')
plt.ylabel('Gain')
plt.grid(True)
plt.legend(loc='best')
#fig, ax = plt.subplots()
#ax.plot(xf, 2.0/N * np.abs(yf[:N2]), label="FFT of "+col)
plt.axis([0,5000,0,0.1])
#plt.xlabel('Frequency (Hz)')
#plt.ylabel('Amplitude (mm)')
#plt.legend(loc=0)
plt.show()
return
Kind Regards,
Here you can find a minimal example of how to plot multiple lines with different x and y datasets. You are recreating the plot every time you type add_subplot(111). Instead, you should call plot multiple times. I have added an example for a single plot with multiple lines, as well as an example for one subplot per line.
import numpy as np
import matplotlib.pyplot as plt
x1 = np.arange(0, 10, 1)
x2 = np.arange(3, 12, 0.1)
x3 = np.arange(2, 8, 0.01)
y1 = np.sin(x1)
y2 = np.cos(x2**0.8)
y3 = np.sin(4.*x3)**3
data = []
data.append((x1, y1, 'label1'))
data.append((x2, y2, 'label2'))
data.append((x3, y3, 'label3'))
# All lines in one plot.
plt.figure()
for n in data:
plt.plot(n[0], n[1], label=n[2])
plt.legend(loc=0, frameon=False)
# One subplot per data set.
cols = 2
rows = len(data)//2 + len(data)%2
plt.figure()
gs = plt.GridSpec(rows, cols)
for n in range(len(data)):
i = n%2
j = n//2
plt.subplot(gs[j,i])
plt.plot(data[n][0], data[n][1])
plt.title(data[n][2])
plt.tight_layout()
plt.show()