Annotating clustering from DBSCAN to original Pandas DataFrame - python-3.x

I have working code that is utilizing dbscan to find tight groups of sparse spatial data imported with pd.read_csv.
I am maintaining the original spatial data locations and would like to annotate the labels returned by dbscan for each data point to the original dataframe and then write a csv with the same information.
So the code below is doing exactly what I would expect it to at this point, I would just like to extend it to import the label for each row in the original dataframe.
import argparse
import string
import os, subprocess
import pathlib
import glob
import gzip
import re
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from sklearn.cluster import DBSCAN
X = pd.read_csv(tmp_csv_name)
X = X.drop('Name', axis = 1)
X = X.drop('Type', axis = 1)
X = X.drop('SomeValue', axis = 1)
# only columns 'x' and 'y' now remain
db=DBSCAN(eps=EPS, min_samples=minSamples, metric='euclidean', algorithm='auto', leaf_size=30).fit(X)
labels = def_inst_dbsc.labels_
unique_labels = set(labels)
# maxX , maxY are manual inputs temporarily
while sizeX > 16 or sizeY > 16 :
sizeX=sizeX*0.8 ; sizeY=sizeY*0.8
fig, ax = plt.subplots(figsize=(sizeX,sizeY))
plt.xlim(0,maxX)
plt.ylim(0,maxY)
plt.scatter(X['x'], X['y'], c=colors, marker="o", picker=True)
# hackX , hackY are manual inputs temporarily
# which represent the boundaries defined in the original dataset
poly = patches.Polygon(xy=list(zip(hackX,hackY)), fill=False)
ax.add_patch(poly)
plt.show()

Related

clustering for a single timeseries

I have a single array numpy array(x) and i want to cluster it in unsupervised way using DBSCAN and hierarchial clustering using scikitlearn. Is the clustering possible for single array data? Additionally i need to plot the clusters and its corresponding representation on the input data.
I tried
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from scipy import stats
import scipy.cluster.hierarchy as hac
#my data
x = np.linspace(0, 500, 10000)
x = 1.5 * np.sin(x)
#dbscan
clustering = DBSCAN(eps=3).fit(x)
# here i am facing problem
# hierarchial
Yes, DBSCAN can cluster "1-D" arrays. See time series below, although I don't know the significance of clustering just the waveform.
For example,
import numpy as np
rng =np.random.default_rng(42)
x=rng.normal(loc=[-10,0,0,0,10], size=(200,5)).reshape(-1,1)
rng.shuffle(x)
print(x[:10])
# [[-10.54349551]
# [ -0.32626201]
# [ 0.22359555]
# [ -0.05841124]
# [ -0.11761086]
# [ -1.0824272 ]
# [ 0.43476607]
# [ 11.40382139]
# [ 0.70166365]
# [ 9.79889535]]
from sklearn.cluster import DBSCAN
dbs=DBSCAN()
clusters = dbs.fit_predict(x)
import matplotlib.pyplot as plt
plt.scatter(x,np.zeros(len(x)), c=clusters)
You can use AgglomerativeClustering for hierarchical clustering.
Here's an example using the data from above.
from sklearn.cluster import AgglomerativeClustering
aggC = AgglomerativeClustering(n_clusters=None, distance_threshold=1.0, linkage="single")
clusters = aggC.fit_predict(x)
plt.scatter(x,np.zeros(len(x)), c=clusters)
Time Series / Waveform (no other features)
You can do it, but with no features other than time and signal amplitude, I don't know if this has any meaning.
import numpy as np
from scipy import signal
y = np.hstack((np.zeros(100), signal.square(2*np.pi*np.linspace(0,2,200, endpoint=False)), np.zeros(100), signal.sawtooth(2*np.pi*np.linspace(0,2,200, endpoint=False)+np.pi/2,width=0.5), np.zeros(100), np.sin(2*np.pi*np.linspace(0,2,200,endpoint=False)), np.zeros(100)))
import datetime
start = datetime.datetime.fromisoformat("2022-12-01T12:00:00.000000")
times = np.array([(start+datetime.timedelta(microseconds=_)).timestamp() for _ in range(1000)])
my_sig = np.hstack((times.reshape(-1,1),y.reshape(-1,1)))
print(my_sig[:5,:])
# [[1.6698924e+09 0.0000000e+00]
# [1.6698924e+09 0.0000000e+00]
# [1.6698924e+09 0.0000000e+00]
# [1.6698924e+09 0.0000000e+00]
# [1.6698924e+09 0.0000000e+00]]
from sklearn.cluster import AgglomerativeClustering
aggC = AgglomerativeClustering(n_clusters=None, distance_threshold=4.0)
clusters = aggC.fit_predict(my_sig)
import matplotlib.pyplot as plt
plt.scatter(my_sig[:,0], my_sig[:,1], c=clusters)

Implementing ipywidget slider for time

I am trying to create a slider for time in Jupyter Notebook using ipywidgets. I would like to take the tabulated experimental data (see figure below) and control the value bounds with the help of a slider. The graph should be a force-displacement graph, evolving in time:
This is my python code:
from ipywidgets import IntSlider, interact, FloatSlider
u = fdat1['C_1_Weg_R4[mm]'].values
f = fdat1['C_1_Kraft_R4[kN]'].values
t = fdat1['S/No'].values
#interact(t = IntSlider(min = 0, max = max(fdat0['S/No'].values)))
def aa_(t):
plt.plot(f[t],u[t])
plt.grid()
plt.xlabel("force [kN]")
plt.ylabel("displacement [mm]")
plt.title("Load-displacement curve for \nexperiment")
fdat1 is the name of the tabulated data. I have also considered using "C_1_Zeit[s]" column as my slider values, but these are not integer values.
The problem is that nothing gets plotted, but the slider works and the graph changes scale.
I have been searching online for some time now and would really appreciate some help.
Thank you in advance!
Edit:
from ipywidgets import IntSlider, interact, FloatSlider
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame.from_records(
[np.linspace(0,30, num=30), np.linspace(0,20, num=30), ]).T
df.columns=['A', 'B']
#interact(t = IntSlider(min = 0, max = 21))
def aa_(t):
plt.scatter(df['A'], df['B'])
plt.grid()
plt.xlabel("force [kN]")
plt.ylabel("displacement [mm]")
plt.title("Load-displacement curve for \nexperiment")
plt.xlim(0, 30)
plt.ylim(0, 30)
Inside your plotting function, create a slice of your results dataframe that slices based on the slider value.
from ipywidgets import IntSlider, interact, FloatSlider
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
results = pd.DataFrame.from_records(
[np.linspace(0,30, num=30), np.linspace(0,20, num=30), ]).T
results.columns=['A', 'B']
#interact(t = IntSlider(min = 0, max = 21))
def aa_(t):
df = results.iloc[:t] # make the slice here
plt.scatter(df['A'], df['B'])
plt.grid()
plt.xlabel("force [kN]")
plt.ylabel("displacement [mm]")
plt.title("Load-displacement curve for \nexperiment")
plt.xlim(0, 30)
plt.ylim(0, 30)
So, basically, this should have been the correct code:
from ipywidgets import IntSlider, interact, FloatSlider
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
u = fdat1['C_1_Weg_R4[mm]'].values #loads displacement values from fdat1
f = fdat1['C_1_Kraft_R4[kN]'].values #loads force values from fdat1
df = pd.DataFrame.from_dict([u,f]).T #creates a dataframe
df.columns=['A', 'B']
#interact(t = IntSlider(min = 0, max = df.shape[0])) #interactive scatterplot with a slider for time
def scatterplot_(t):
plt.scatter(df.loc[0:t,'A'], df.loc[0:t,'B'])
plt.grid()
plt.xlabel("force [kN]")
plt.ylabel("displacement [mm]")
plt.title("Load-displacement curve for \nexperiment")
plt.xlim(-5, 5)
plt.ylim(-25, 25)

How can I make a transparent background?

I have a .csv file which contains some data where x, y, x1, y1 are the coordinate points, and p is the value. My below code is working very well for plotting, but when I am plotting the data, I am getting a background color like the purple color. I don't want any color in the background. I want the background will be Transparent. My ultimate goal is overlying this result over an image. I am new in Python. Any help will be highly appreciated.
Download link of the .csv file here or link-2 or link-3
I am getting below result
My Code
import matplotlib.pyplot as plt
from scipy import ndimage
import numpy as np
import pandas as pd
from skimage import transform
from PIL import Image
import cv2
x_dim=1200
y_dim=1200
# Read CSV
df = pd.read_csv("flower_feature.csv")
# Create numpy array of zeros os same size
array = np.zeros((x_dim, y_dim), dtype=np.uint8)
for index, row in df.iterrows():
x = np.int(row["x"])
y = np.int(row["y"])
x1 = np.int(row["x1"])
y1 = np.int(row["y1"])
p = row["p"]
array[x:x1,y:y1] = p
map = ndimage.filters.gaussian_filter(array, sigma=16)
plt.imshow(map)
plt.show()
As per Ghassen's suggestion I am getting below results. I am still not getting the transparent background.
When Alpha =0
When alpha =0.5
When alpha =1
try with this code :
import matplotlib.pyplot as plt
from scipy import ndimage
import numpy as np
import pandas as pd
x_dim=1200
y_dim=1200
# Read CSV
df = pd.read_csv("/home/rosafi/Downloads/flower_feature.csv")
# Create numpy array of zeros os same size
array = np.ones((x_dim, y_dim), dtype=np.uint8)
for index, row in df.iterrows():
x = np.int(row["x"])
y = np.int(row["y"])
x1 = np.int(row["x1"])
y1 = np.int(row["y1"])
p = row["p"]
array[x:x1,y:y1] = p
map = ndimage.filters.gaussian_filter(array, sigma=16)
map = np.ma.masked_where(map == 0, map)
plt.imshow(map)
plt.show()
output:
I solved this issue by masking out the values where values ==0. The code will be
from mpl_toolkits.axes_grid1 import make_axes_locatable
masked_data = np.ma.masked_where(map == 0, map)

how to map netcdf data ob base map

file contains values of echos w.r.t to lat/long, I have to plot complete range of echos over base map.
from netCDF4 import Dataset
import numpy as np
import pandas as pd
from google.colab import files
upload = files.upload()
my_example_nc_file = 'a.nc'
fh = Dataset(my_example_nc_file, mode='r')
lons = fh.variables['longitude'][:]
lats = fh.variables['latitude'][:]
ech= fh.variables['echos'][:]
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
%matplotlib inline
m = Basemap(width=5000000,height=3500000,
resolution='l',projection='stere',\
lat_ts=40,lat_0=lat_0,lon_0=lon_0)
xi, yi = m(lons, lats)
#simple plot
#m.plot(xi, yi, 'co')
m.scatter(rge,yi, marker = 'o', color='r', zorder=5)
Current code execute below results.
enter image description here
I want to plot total echos with variation represented by colors as presented in below screen short
enter image description here

sklearn MinMaxScaler mis-scaling?

I'm having trouble understanding one of the scaled columns in a pandas dataframe returned by MinMaxScaler:
The code snippet is as follows:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
A = np.random.randint(5, size=(8, 4))
FrameA = pd.DataFrame()
FrameA = A
scaled_array = MinMaxScaler().fit_transform(FrameA)
Scaled (LHS) and original (RHS)
Column 2 is suspect. The formula seems to be: x[i] / max{x} - 1 which differs from the other columns.

Resources