Finding points in radius of each point in same GeoDataFrame - python-3.x

I have geoDataFrame:
df = gpd.GeoDataFrame([[0, 'A', Point(10,12)],
[1, 'B', Point(14,8)],
[2, 'C', Point(100,2)],
[3, 'D' ,Point(20,10)]],
columns=['ID','Value','geometry'])
Is it possible to find points in a range of radius for example 10 for each point and add their "Value" and 'geometry' to GeoDataFrame so output would look like:
['ID','Value','geometry','value_of_point_in_range_1','geometry_of_point_in_range_1','value_of_point_in_range_2','geometry_of_point_in_range_2' etc.]
Before i was finding nearest neighbor for each and after that was checking if is it in range but i must find all of the points in radius and don't know what tool should i use.

Although in your example the output will have a predictable amount of columns in the resulting dataframe, this not true in general. Therefore I would instead create a column in the dataframe that consists of a lists denoting the index/value/geometry of the nearby points.
In a small dataset like you provided, simple arithmetics in python will suffice. But for large datasets you will want to use a spatial tree to query the nearby points. I suggest to use scipy's KDTree like this:
import geopandas as gpd
import numpy as np
from shapely.geometry import Point
from scipy.spatial import KDTree
df = gpd.GeoDataFrame([[0, 'A', Point(10,12)],
[1, 'B', Point(14,8)],
[2, 'C', Point(100,2)],
[3, 'D' ,Point(20,10)]],
columns=['ID','Value','geometry'])
tree = KDTree(list(zip(df.geometry.x, df.geometry.y)))
pairs = tree.query_pairs(10)
df['ValueOfNearbyPoints'] = np.empty((len(df), 0)).tolist()
n = df.columns.get_loc("ValueOfNearbyPoints")
m = df.columns.get_loc("Value")
for (i, j) in pairs:
df.iloc[i, n].append(df.iloc[j, m])
df.iloc[j, n].append(df.iloc[i, m])
This yields the following dataframe:
ID Value geometry ValueOfNearbyPoints
0 0 A POINT (10.00000 12.00000) [B]
1 1 B POINT (14.00000 8.00000) [A, D]
2 2 C POINT (100.00000 2.00000) []
3 3 D POINT (20.00000 10.00000) [B]
To verify the results, you may find plotting the result usefull:
import matplotlib.pyplot as plt
ax = plt.subplot()
df.plot(ax=ax)
for (i, j) in pairs:
plt.plot([df.iloc[i].geometry.x, df.iloc[j].geometry.x],
[df.iloc[i].geometry.y, df.iloc[j].geometry.y], "-r")
plt.show()

Related

Annotate each FacetGrid subplot using custom df (or list) using a func

Consider the following data and FacetGrid:
d = {'SITE':['A', 'B', 'C', 'C', 'A'], 'VF':[0.00, 0.78, 0.99, 1.00, 0.50],'TYPE':['typeA', 'typeA', 'typeB', 'typeC', 'typeD']}
new_df = pd.DataFrame(data=d)
with sns.axes_style("white"):
g = sns.FacetGrid(data=new_df, col='SITE', col_wrap= 3, height=7, aspect=0.25,
hue='TYPE', palette=['#1E88E5', '#FFC107', '#D81B60'])
g.map(sns.scatterplot, 'VF', 'TYPE', s=100)
Using another dataframe:
d = {'SITE':['A', 'B', 'C'], 'N':[10, 5, 7]}
ann_df = pd.DataFrame(data=d)
Where the SITE matches the original new_df['SITE'], and is not the same dimensions as new_df['SITE'], but has the corresponding length of columns in the FacetGrid.
How do you annotate each subplot using a custom func using not the scatterplot new_df, but the ann_df or custom list, if it matches the original new_df['SITE'] and adds the ann_df['N'] to each subplot as shown below:
So, something along these lines or better:
def annotate(data, **kws):
n = data # should be the int for each matching SITE
ax = plt.gca()
ax.text(.1, .2, f"N = {n}", transform=ax.transAxes)
g.map_dataframe(annotate(ann_df))
It is recommended from seaborn v0.11.0 to use figure-level functions like seaborn.relplot instead of seaborn.FacetGrid
The values used for col= will be plotted alphabetically by default, otherwise specify an order with col_order=, and then make sure ann_df['SITE'] is sorted in the same order.
Flatten the seaborn.axisgrid.FacetGrid returned by sns.relplot, iterate through the matplotlib.axes, and add .text to each plot by using i from enumerate with .iloc to index the correct value for 'N'.
Similar to this answer, but getting data from a secondary DataFrame instead of a dict.
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
import seaborn as sns
import pandas as pd
# DataFrame 1
d1 = {'SITE':['A', 'B', 'C', 'C', 'A'],
'VF':[0.00, 0.78, 0.99, 1.00, 0.50],
'TYPE':['typeA', 'typeA', 'typeB', 'typeC', 'typeD']}
df = pd.DataFrame(data=d1)
# DataFrame 2
d2 = {'SITE':['A', 'B', 'C'], 'N':[10, 5, 7]}
ann_df = pd.DataFrame(data=d2)
# plot
g = sns.relplot(kind='scatter', data=df, x='VF', y='TYPE', col='SITE',
col_wrap=3, height=7, aspect=0.5, hue='TYPE', s=100)
# flatten axes into a 1-d array
axes = g.axes.flatten()
# iterate through the axes
for i, ax in enumerate(axes):
ax.text(0, 3, f"N = {ann_df.iloc[i, 1]}")

I want to find the distances between one point to many points by numpy

I have one points and many other points on 2D surface
import numpy as np
import math
pts_o = [[0,0], [11, 111], [222, 22], [333, 333]]
pts = np.array(pts_o)
ct = np.average(pts, axis=0)
res = []
for pt in pts_o:
res.append(math.sqrt((pt[0] - ct[0])**2 + (pt[1] - ct[1])**2))
res.sort()
I'd like to know how to get and sort the distances of between the central points to the other points by numpy.
I know linalg.norm can do something similar for a point to b point:
dist = numpy.linalg.norm(a-b)
but how to use this function for many points?

How can I count the number of polygons a shape intersects?

I have a very large dataset with a polygons and points with buffers around them. I would like to creat a new column in the points data which includes the number of polygons that point's buffer intersects.
Heres a simplified example:
import pandas as pd
import geopandas as gp
from shapely.geometry import Polygon
from shapely.geometry import Point
import matplotlib.pyplot as plt
## Create polygons and points ##
df = gp.GeoDataFrame([['a',Polygon([(1, 0), (1, 1), (2,2), (1,2)])],
['b',Polygon([(1, 0.25), (2,1.25), (3,0.25)])]],
columns = ['name','geometry'])
df = gp.GeoDataFrame(df, geometry = 'geometry')
points = gp.GeoDataFrame( [['box', Point(1.5, 1.115), 4],
['triangle', Point(2.5,1.25), 8]],
columns=['name', 'geometry', 'value'],
geometry='geometry')
##Set a buffer around the points##
buf = points.buffer(0.5)
points['buffer'] = buf
points = points.drop(['geometry'], axis = 1)
points = points.rename(columns = {'buffer': 'geometry'})
This data looks like this:
What I'd like to do is create another column in the points dataframe that includes the number of polygons that point intersects.
I've tried utilising a for loop as such:
points['intersect'] = []
for geo1 in points['geometry']:
for geo2 in df['geometry']:
if geo1.intersects(geo2):
points['intersect'].append('1')
Which I would then sum to get the total number of intersects.
However, I get the error: 'Length of values does not match length of index'. I know this is because it is attempting to assign three rows of data to a frame with only two rows.
How can I aggrigate the counts so the first point is assigned a value of 2 and the second a value of 1?
If you have large dataset, I would go for solution using rtree spatial index, something like this.
import pandas as pd
import geopandas as gp
from shapely.geometry import Polygon
from shapely.geometry import Point
import matplotlib.pyplot as plt
## Create polygons and points ##
df = gp.GeoDataFrame([['a',Polygon([(1, 0), (1, 1), (2,2), (1,2)])],
['b',Polygon([(1, 0.25), (2,1.25), (3,0.25)])]],
columns = ['name','geometry'])
df = gp.GeoDataFrame(df, geometry = 'geometry')
points = gp.GeoDataFrame( [['box', Point(1.5, 1.115), 4],
['triangle', Point(2.5,1.25), 8]],
columns=['name', 'geometry', 'value'],
geometry='geometry')
# generate spatial index
sindex = df.sindex
# define empty list for results
results_list = []
# iterate over the points
for index, row in points.iterrows():
buffer = row['geometry'].buffer(0.5) # buffer
# find approximate matches with r-tree, then precise matches from those approximate ones
possible_matches_index = list(sindex.intersection(buffer.bounds))
possible_matches = df.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.intersects(buffer)]
results_list.append(len(precise_matches))
# add list of results as a new column
points['polygons'] = pd.Series(results_list)

python3: find most k nearest vectors from a list?

Say I have a vector v1 and a list of vector l1. I want to find k vectors from l1 that are most closed (similar) to v1 in descending order.
I have a function sim_score(v1,v2) that will return a similarity score between 0 and 1 for any two input vectors.
Indeed, a naive way is to write a for loop over l1, calculate distance and store them into another list, then sort the output list. But is there a Pythonic way to do the task?
Thanks
import numpy as np
np.sort([np.sqrt(np.sum(( l-v1)*(l-v1))) For l in l1])[:3]
Consider using scipy.spatial.distance module for distance computations. It supports the most common metrics.
import numpy as np
from scipy.spatial import distance
v1 = [[1, 2, 3]]
l1 = [[11, 3, 5],
[ 2, 1, 9],
[.1, 3, 2]]
# compute distances
dists = distance.cdist(v1, l1, metric='euclidean')
# sorted distances
sd = np.sort(dists)
Note that each parameter to cdist must be two-dimensional. Hence, v1 must be a nested list, or a 2d numpy array.
You may also use your homegrown metric like:
def my_metric(a, b, **kwargs):
# some logic
dists = distance.cdist(v1, l1, metric=my_metric)

Print L and U matrices calculated by SuperLU using scipy

How can I print sparse L and U matrices calculated by splu, which uses SuperLU?
My MWE:
>>> import scipy
>>> import scipy.sparse
>>> import scipy.sparse.linalg
>>> from numpy import array
>>> M = scipy.array([ [19,0,21,21,0],[12,21,0,0,0],[0,12,16,0,0],[0,0,0,5,21],[12,12,0,0,18] ])
>>> cscM = scipy.sparse.csc_matrix(M)
>>> lu_obj = scipy.sparse.linalg.splu(cscM)
>>> b = array([1, 2, 3, 4, 5])
>>> lu_obj.solve(b)
array([ 0.01245301, 0.08812209, 0.12140843, -0.08505639, 0.21072771])
You can use
lu_obj = scipy.sparse.linalg.splu(A)
L,R = lu_obj.L, lu_obj.R
in the current scipy version, which returns the matrices in csc format (scipy docs).
Glancing through the scipy docs and source, scipy.sparse.linalg.splu does indeed use SuperLU. It looks like SuperLU may not explicitly calculate L or U. L & U are apt to be more dense than your original sparse matrix, so it makes sense to avoid storing them if they are not needed. If it is any consolation, your lu_obj does contain the permutaion info for L & U: lu_obj.perm_c, lu_obj.perm_r.
To get L & U, the path of least work is to use scipy.linalg.lu to get the LU matrixes. You'll have to convert your sparse matrixes to dense ones, though. ie
P, L, U = scipy.linalg.lu(cscM.todense())

Resources