geopandas doesn't find point in polygon even though it should? - python-3.x

I have some lat/long coordinates and need to confirm if they are with the city of Atlanta, GA. I'm testing it out but it doesn't seem to work.
I got a geojson from here which appears to be legit:
https://gis.atlantaga.gov/?page=OPEN-DATA-HUB
import pandas as pd
import geopandas
atl = geopandas.read_file('Official_City_Boundary.geojson')
atl['geometry'] # this shows the image of Atlanta which appears correct
I plug in a couple of coordinates I got from Google Maps:
x = [33.75865421788594, -84.43974601192079]
y = [33.729117878816, -84.4017757998275]
z = [33.827871937500255, -84.39646813516548]
df = pd.DataFrame({'latitude': [x[0], y[0], z[0]], 'longitude': [x[1], y[1], z[1]]})
geometry = geopandas.points_from_xy(df.longitude, df.latitude)
points = geopandas.GeoDataFrame(geometry=geometry)
points
geometry
0 POINT (-84.43975 33.75865)
1 POINT (-84.40178 33.72912)
2 POINT (-84.39647 33.82787)
But when I check if the points are in the boundary, only one is true:
atl['geometry'].contains(points)
0 True
1 False
2 False
Why are they not all true? Am I doing it wrong?

I found some geometry similar to what you refer to
an alternative approach is to use intersects() to find the contains relationship. NB use of unary_union as the Atlanta geometry I downloaded contains multiple polygons
import pandas as pd
import geopandas
from pathlib import Path
atl = geopandas.read_file(Path.home().joinpath("Downloads").joinpath('Official_City_Council_District_Boundaries.geojson'))
atl['geometry'] # this shows the image of Atlanta which appears correct
x = [33.75865421788594, -84.43974601192079]
y = [33.729117878816, -84.4017757998275]
z = [33.827871937500255, -84.39646813516548]
df = pd.DataFrame({'latitude': [x[0], y[0], z[0]], 'longitude': [x[1], y[1], z[1]]})
geometry = geopandas.points_from_xy(df.longitude, df.latitude)
points = geopandas.GeoDataFrame(geometry=geometry, crs="epsg:4326")
points.intersects(atl.unary_union)
0 True
1 True
2 True
dtype: bool

As it is said in documentation:
It does not check if an element of one GeoSeries contains any element
of the other one.
So you should use a loop to check all points.

Related

Using python to plot 'Gridded' map

I would like to know how I can create a gridded map of a country(i.e. Singapore) with resolution of 200m x 200m squares. (50m or 100m is ok too)
I would then use the 'nearest neighbour' technique to assign a rainfall data and colour code to each square based on the nearest rainfall station's data.
[I have the latitude,longitude & rainfall data for all the stations for each date.]
Then, I would like to store the data in an Array for each 'gridded map' (i.e. from 1-Jan-1980 to 31-Dec-2021)
Can this be done using python?
P.S Below is a 'simple' version I did as an example to how the 'gridded' map should look like for 1 particular day.
https://i.stack.imgur.com/9vIeQ.png
Thank you so much!
Can this be done using python? YES
I have previously provided a similar answer binning spatial dataframe. Reference that also for concepts
you have noted that you are working with Singapore geometry and rainfall data. To setup an answer I have sourced this data from government sources
for purpose on answer I have used 2kmx2km grid so when plotting to demonstrate answer resource utilisation is reduced
core concept: create a grid of box polygons that cover the total bounds of the geometry. Note it's important to use UTM CRS here so that bounds in meters make sense. Once boxes are created remove boxes that are within total bounds but do not intersect with actual geometry
next create a geopandas dataframe of rainfall data. Use longitude and latitude of weather station to create points
final step, join_nearest() grid geometry with rainfall data geometry and data
clearly this final data frame gdf_grid_rainfall is a data frame, which is effectively an array. You can use as an array as you please ...
have provided a folium and plotly interactive visualisations that demonstrate clearly solution is working
solution
Dependent on data sourcing
# number of meters
STEP = 2000
a, b, c, d = gdf_sg.to_crs(gdf_sg.estimate_utm_crs()).total_bounds
# create a grid for Singapore
gdf_grid = gpd.GeoDataFrame(
geometry=[
shapely.geometry.box(minx, miny, maxx, maxy)
for minx, maxx in zip(np.arange(a, c, STEP), np.arange(a, c, STEP)[1:])
for miny, maxy in zip(np.arange(b, d, STEP), np.arange(b, d, STEP)[1:])
],
crs=gdf_sg.estimate_utm_crs(),
).to_crs(gdf_sg.crs)
# restrict grid to only squares that intersect with Singapore geometry
gdf_grid = (
gdf_grid.sjoin(gdf_sg)
.pipe(lambda d: d.groupby(d.index).first())
.set_crs(gdf_grid.crs)
.drop(columns=["index_right"])
)
# geodataframe of weather station locations and rainfall by date
gdf_rainfall = gpd.GeoDataFrame(
df_stations.merge(df, on="id")
.assign(
geometry=lambda d: gpd.points_from_xy(
d["location.longitude"], d["location.latitude"]
)
)
.drop(columns=["location.latitude", "location.longitude"]),
crs=gdf_sg.crs,
)
# weather station to nearest grid
gdf_grid_rainfall = gpd.sjoin_nearest(gdf_grid, gdf_rainfall).drop(
columns=["Description", "index_right"]
)
# does it work? let's visualize with folium
gdf_grid_rainfall.loc[lambda d: d["Date"].eq("20220622")].explore("Rainfall (mm)", height=400, width=600)
data sourcing
import requests, itertools, io
from pathlib import Path
import urllib
from zipfile import ZipFile
import fiona.drvsupport
import geopandas as gpd
import numpy as np
import pandas as pd
import shapely.geometry
# get official Singapore planning area geometry
url = "https://geo.data.gov.sg/planning-area-census2010/2014/04/14/kml/planning-area-census2010.zip"
f = Path.cwd().joinpath(urllib.parse.urlparse(url).path.split("/")[-1])
if not f.exists():
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
with open(f, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
zfile = ZipFile(f)
zfile.extractall(f.stem)
fiona.drvsupport.supported_drivers['KML'] = 'rw'
gdf_sg = gpd.read_file(
[_ for _ in Path.cwd().joinpath(f.stem).glob("*.kml")][0], driver="KML"
)
# get data about Singapore weather stations
df_stations = pd.json_normalize(
requests.get("https://api.data.gov.sg/v1/environment/rainfall").json()["metadata"][
"stations"
]
)
# dates to get data from weather.gov.sg
dates = pd.date_range("20220601", "20220730", freq="MS").strftime("%Y%m")
df = pd.DataFrame()
# fmt: off
bad = ['S100', 'S201', 'S202', 'S203', 'S204', 'S205', 'S207', 'S208',
'S209', 'S211', 'S212', 'S213', 'S214', 'S215', 'S216', 'S217',
'S218', 'S219', 'S220', 'S221', 'S222', 'S223', 'S224', 'S226',
'S227', 'S228', 'S229', 'S230', 'S900']
# fmt: on
for stat, month in itertools.product(df_stations["id"], dates):
if not stat in bad:
try:
df_ = pd.read_csv(
io.StringIO(
requests.get(
f"http://www.weather.gov.sg/files/dailydata/DAILYDATA_{stat}_{month}.csv"
).text
)
).iloc[:, 0:5]
except pd.errors.ParserError as e:
bad.append(stat)
print(f"failed {stat} {month}")
df = pd.concat([df, df_.assign(id=stat)])
df["Rainfall (mm)"] = pd.to_numeric(
df["Daily Rainfall Total (mm)"], errors="coerce"
)
df["Date"] = pd.to_datetime(df[["Year","Month","Day"]]).dt.strftime("%Y%m%d")
df = df.loc[:,["id","Date","Rainfall (mm)", "Station"]]
visualisation using plotly animation
import plotly.express as px
# reduce dates so figure builds in sensible time
gdf_px = gdf_grid_rainfall.loc[
lambda d: d["Date"].isin(
gdf_grid_rainfall["Date"].value_counts().sort_index().index[0:15]
)
]
px.choropleth_mapbox(
gdf_px,
geojson=gdf_px.geometry,
locations=gdf_px.index,
color="Rainfall (mm)",
hover_data=gdf_px.columns[1:].tolist(),
animation_frame="Date",
mapbox_style="carto-positron",
center={"lat":gdf_px.unary_union.centroid.y, "lon":gdf_px.unary_union.centroid.x},
zoom=8.5
).update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0, "pad": 4})

Finding the distance between latlong

I am a bit stuck. I have a CSV which includes:
Site Name
Latitude
Longitude.
This CSV has 100,000 locations. I need to generate a comma separated list for each location, showing the other locations within 5KM
I have tried the attached, which transposes the table & gives me 100,000 columns with 100,000 rows and the distance populated as the result. But I am not sure how to just make a new pandas column which has a list of all the sites within 5KM.
Can you help?
from geopy.distance import geodesic
def distance(row, csr):
lat = row['latitude']
long = row['longitude']
lat_long = (lat, long)
try:
return round(geodesic(lat_long, lat_long_compare).kilometers,2)
except:
return 9999
for key, value in d.items():
lat_compare = value['latitude']
long_compare = value['longitude']
lat_long_compare = (lat_compare, long_compare)
csr = key
df[key] = df.apply([distance, csr], axis=1)
Some sample data can be:
destinations = { 'bigben' : {'latitude': 51.510357,
'longitude': -0.116773},
'heathrow' : {'latitude': 51.470020,
'longitude': -0.454295},
'alton_towers' : {'latitude': 52.987662716,
'longitude': -1.888829778}
}
bigben is 0.8KM from the London Eye
heathrow is 23.55KM from the London Eye
alton_towers is 204.63KM from the London Eye
So, in this case, the field should show only big ben.
So we get:
Site | Sites within 5KM
28, BigBen
Here is one way with NearestNeighbors.
from sklearn.neighbors import NearestNeighbors
# data from your input
df = pd.DataFrame.from_dict(destinations, orient='index').rename_axis('Site Name').reset_index()
radius = 50 #change to whatever, in km
# crate the algo with the raidus and the metric for geospatial distance
neigh = NearestNeighbors(radius=radius/6371, metric='haversine')
# fit the data in radians
neigh.fit(df[['latitude', 'longitude']].to_numpy()*np.pi/180)
# extract result and transform to get the expected output
df[f'Site_within_{radius}km'] = (
pd.Series(neigh.radius_neighbors()[1]) # get a list of index for each row
.explode()
.map(df['Site Name']) # get the site name from row index
.groupby(level=0) # transform back to row-row relation
.agg(list) # can use ', '.join instead of list
)
print(df)
Site Name latitude longitude Site_within_50km
0 bigben 51.510357 -0.116773 [heathrow]
1 heathrow 51.470020 -0.454295 [bigben]
2 alton_towers 52.987663 -1.888830 [nan]
Another way
from sklearn.neighbors import DistanceMetric
from math import radians
import pandas as pd
import numpy as np
#To Radians
df['latitude'] = np.radians(df['latitude'])
df['longitude'] = np.radians(df['longitude'])
#Pair the cities
df[['latitude','longitude']].to_numpy()
#Assume a sperical radius of 6373
dist = DistanceMetric.get_metric('haversine')#DistanceMetric class df=pd.DataFrame(dist.pairwise(df[['latitude','longitude']].to_numpy())*6373,columns=df.index.unique(), index=df.index.unique())
s=df.gt(0)&df.le(50)
df['Site_within_50km']=s.agg(lambda x: x.index[x].values, axis=1)#Filter
bigben heathrow alton_towers Site_within_50km
bigben 0.000000 23.802459 203.857533 [heathrow]
heathrow 23.802459 0.000000 195.048961 [bigben]
alton_towers 203.857533 195.048961 0.000000 []

How could I edit my code to plot 4D contour something similar to this example in python?

Similar to many other researchers on stackoverflow who are trying to plot a contour graph out of 4D data (i.e., X,Y,Z and their corresponding value C), I am attempting to plot a 4D contour map out of my data. I have tried many of the suggested solutions in stackover flow. From all of the plots suggested this, and this were the closest to what I want but sill not quite what I need in terms of data interpretation. Here is the ideal plot example: (source)
Here is a subset of the data. I put it on the dropbox. Once this data is downloaded to the directory of the python file, the following code will work. I have modified this script from this post.
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import matplotlib.tri as mtri
#####Importing the data
df = pd.read_csv('Data_4D_plot.csv')
do_random_pt_example = False;
index_x = 0; index_y = 1; index_z = 2; index_c = 3;
list_name_variables = ['x', 'y', 'z', 'c'];
name_color_map = 'seismic';
if do_random_pt_example:
number_of_points = 200;
x = np.random.rand(number_of_points);
y = np.random.rand(number_of_points);
z = np.random.rand(number_of_points);
c = np.random.rand(number_of_points);
else:
x = df['X'].to_numpy();
y = df['Y'].to_numpy();
z = df['Z'].to_numpy();
c = df['C'].to_numpy();
#end
#-----
# We create triangles that join 3 pt at a time and where their colors will be
# determined by the values of their 4th dimension. Each triangle contains 3
# indexes corresponding to the line number of the points to be grouped.
# Therefore, different methods can be used to define the value that
# will represent the 3 grouped points and I put some examples.
triangles = mtri.Triangulation(x, y).triangles;
choice_calcuation_colors = 2;
if choice_calcuation_colors == 1: # Mean of the "c" values of the 3 pt of the triangle
colors = np.mean( [c[triangles[:,0]], c[triangles[:,1]], c[triangles[:,2]]], axis = 0);
elif choice_calcuation_colors == 2: # Mediane of the "c" values of the 3 pt of the triangle
colors = np.median( [c[triangles[:,0]], c[triangles[:,1]], c[triangles[:,2]]], axis = 0);
elif choice_calcuation_colors == 3: # Max of the "c" values of the 3 pt of the triangle
colors = np.max( [c[triangles[:,0]], c[triangles[:,1]], c[triangles[:,2]]], axis = 0);
#end
#----------
###=====adjust this part for the labeling of the graph
list_name_variables[index_x] = 'X (m)'
list_name_variables[index_y] = 'Y (m)'
list_name_variables[index_z] = 'Z (m)'
list_name_variables[index_c] = 'C values'
# Displays the 4D graphic.
fig = plt.figure(figsize = (15,15));
ax = fig.gca(projection='3d');
triang = mtri.Triangulation(x, y, triangles);
surf = ax.plot_trisurf(triang, z, cmap = name_color_map, shade=False, linewidth=0.2);
surf.set_array(colors); surf.autoscale();
#Add a color bar with a title to explain which variable is represented by the color.
cbar = fig.colorbar(surf, shrink=0.5, aspect=5);
cbar.ax.get_yaxis().labelpad = 15; cbar.ax.set_ylabel(list_name_variables[index_c], rotation = 270);
# Add titles to the axes and a title in the figure.
ax.set_xlabel(list_name_variables[index_x]); ax.set_ylabel(list_name_variables[index_y]);
ax.set_zlabel(list_name_variables[index_z]);
ax.view_init(elev=15., azim=45)
plt.show()
Here would be the output:
Although it looks brilliant, it is not quite what I am looking for (the above contour map example). I have modified the following script from this post in the hope to reach the required graph, however, the chart looks nothing similar to what I was expecting (something similar to the previous output graph). Warning: the following code may take some time to run.
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
df = pd.read_csv('Data_4D_plot.csv')
x = df['X'].to_numpy();
y = df['Y'].to_numpy();
z = df['Z'].to_numpy();
cc = df['C'].to_numpy();
# convert to 2d matrices
Z = np.outer(z.T, z)
X, Y = np.meshgrid(x, y)
C = np.outer(cc.T,cc)
# fourth dimention - colormap
# create colormap according to cc-value
color_dimension = C # change to desired fourth dimension
minn, maxx = color_dimension.min(), color_dimension.max()
norm = matplotlib.colors.Normalize(minn, maxx)
m = plt.cm.ScalarMappable(norm=norm, cmap='jet')
m.set_array([])
fcolors = m.to_rgba(color_dimension)
# plot
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(X,Y,Z, rstride=1, cstride=1, facecolors=fcolors, vmin=minn, vmax=maxx, shade=False)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()
Now I was wondering from our kind community and experts if you can help me to plot a contour figure similar to the example graph (image one in this post), where the contours are based on the values within the range of C?

Bokeh BoxPlot > KeyError: 'the label [SomeCategory] is not in the [index]'

I'm attempting to create a BoxPlot using Bokeh. When I get to the section where I need to identify outliers, it fails if a given category has no outliers.
If I remove the "problem" category, the BoxPlot executes properly. it's only when I attempt to create this BoxPlot with a category that has no outliers it fails.
Any instruction on how to remedy this?
The failure occurs at the commented section "Prepare outlier data for plotting [...]"
import numpy as np
import pandas as pd
import datetime
import math
from bokeh.plotting import figure, show, output_file
from bokeh.models import NumeralTickFormatter
# Create time stamps to allow for figure to display span in title
today = datetime.date.today()
delta1 = datetime.timedelta(days=7)
delta2 = datetime.timedelta(days=1)
start = str(today - delta1)
end = str(today - delta2)
#Identify location of prices
itemloc = 'Everywhere'
df = pd.read_excel(r'C:\Users\me\prices.xlsx')
# Create a list from the dataframe that identifies distinct categories for the separate box plots
cats = df['subcategory_desc'].unique().tolist()
# Find the quartiles and IQR for each category
groups = df.groupby('subcategory_desc', sort=False)
q1 = groups.quantile(q=0.25)
q2 = groups.quantile(q=0.5)
q3 = groups.quantile(q=0.75)
iqr = q3 - q1
upper = q3 + 1.5*iqr
lower = q1 - 1.5*iqr
# Find the outliers for each category
def outliers(group):
cat = group.name
return group[(group.price > upper.loc[cat][0]) | (group.price < lower.loc[cat][0])]['price']
out = groups.apply(outliers).dropna()
# Prepare outlier data for plotting, we need coordinates for every outlier.
outx = []
outy = []
for cat in cats:
# only add outliers if they exist
if not out.loc[cat].empty:
for value in out[cat]:
outx.append(cat)
outy.append(value)
I expect that the Box-and-whisker portion of categories with no outliers merely show up without the outlier dots.
Have you tried the code from official documentation, https://docs.bokeh.org/en/latest/docs/gallery/boxplot.html?
# prepare outlier data for plotting, we need coordinates for every outlier.
if not out.empty:
outx = []
outy = []
for keys in out.index:
outx.append(keys[0])
outy.append(out.loc[keys[0]].loc[keys[1]])

Visualization of satellite track retrieved with pyephem is off

using the code below, and using pyephem and fastkml, I'd like to extract a ground track of a satellite from a TLE. The code looks as follows:
import numpy as np
import ephem
import datetime as dt
from fastkml import kml
from shapely.geometry import Point, LineString, Polygon
name = "ISS (ZARYA)"
line1 = "1 25544U 98067A 16018.27038796 .00010095 00000-0 15715-3 0 9995"
line2 = "2 25544 51.6427 90.6544 0006335 30.9473 76.2262 15.54535921981506"
tle_rec = ephem.readtle(name, line1, line2)
start_dt = dt.datetime.today()
intervall = dt.timedelta(minutes=1)
timelist = []
for i in range(100):
timelist.append(start_dt + i*intervall)
positions = []
for t in timelist:
tle_rec.compute(t)
positions.append((tle_rec.sublong,tle_rec.sublat,tle_rec.elevation))
k = kml.KML()
ns = '{http://www.opengis.net/kml/2.2}'
p = kml.Placemark(ns, 'Sattrack', 'Test', '100 Minute Track')
p.geometry = LineString(positions)#, tesselate=1,altitudemode="absolute")
k.append(p)
with open("test.kml", 'w') as kmlfile:
kmlfile.write(k.to_string())
Sadly, when I load the kml into Google Earth, the track looks as follows:
Any ideas where this goes wrong?
Your ground track is a loop around the position 0°N (on the Equator) and 0°E (directly south of Greenwich, near the Gulf of Guinea). This suggests that you are using angles expressed in radians, which can reach at most a value of about 6.2, and passing them to mapping software which reads them as degrees.
You should try converting them to degrees first:
positions.append((tle_rec.sublong / ephem.degree,
tle_rec.sublat / ephem.degree,
tle_rec.elevation))

Resources