Using python to plot 'Gridded' map - python-3.x

I would like to know how I can create a gridded map of a country(i.e. Singapore) with resolution of 200m x 200m squares. (50m or 100m is ok too)
I would then use the 'nearest neighbour' technique to assign a rainfall data and colour code to each square based on the nearest rainfall station's data.
[I have the latitude,longitude & rainfall data for all the stations for each date.]
Then, I would like to store the data in an Array for each 'gridded map' (i.e. from 1-Jan-1980 to 31-Dec-2021)
Can this be done using python?
P.S Below is a 'simple' version I did as an example to how the 'gridded' map should look like for 1 particular day.
https://i.stack.imgur.com/9vIeQ.png
Thank you so much!

Can this be done using python? YES
I have previously provided a similar answer binning spatial dataframe. Reference that also for concepts
you have noted that you are working with Singapore geometry and rainfall data. To setup an answer I have sourced this data from government sources
for purpose on answer I have used 2kmx2km grid so when plotting to demonstrate answer resource utilisation is reduced
core concept: create a grid of box polygons that cover the total bounds of the geometry. Note it's important to use UTM CRS here so that bounds in meters make sense. Once boxes are created remove boxes that are within total bounds but do not intersect with actual geometry
next create a geopandas dataframe of rainfall data. Use longitude and latitude of weather station to create points
final step, join_nearest() grid geometry with rainfall data geometry and data
clearly this final data frame gdf_grid_rainfall is a data frame, which is effectively an array. You can use as an array as you please ...
have provided a folium and plotly interactive visualisations that demonstrate clearly solution is working
solution
Dependent on data sourcing
# number of meters
STEP = 2000
a, b, c, d = gdf_sg.to_crs(gdf_sg.estimate_utm_crs()).total_bounds
# create a grid for Singapore
gdf_grid = gpd.GeoDataFrame(
geometry=[
shapely.geometry.box(minx, miny, maxx, maxy)
for minx, maxx in zip(np.arange(a, c, STEP), np.arange(a, c, STEP)[1:])
for miny, maxy in zip(np.arange(b, d, STEP), np.arange(b, d, STEP)[1:])
],
crs=gdf_sg.estimate_utm_crs(),
).to_crs(gdf_sg.crs)
# restrict grid to only squares that intersect with Singapore geometry
gdf_grid = (
gdf_grid.sjoin(gdf_sg)
.pipe(lambda d: d.groupby(d.index).first())
.set_crs(gdf_grid.crs)
.drop(columns=["index_right"])
)
# geodataframe of weather station locations and rainfall by date
gdf_rainfall = gpd.GeoDataFrame(
df_stations.merge(df, on="id")
.assign(
geometry=lambda d: gpd.points_from_xy(
d["location.longitude"], d["location.latitude"]
)
)
.drop(columns=["location.latitude", "location.longitude"]),
crs=gdf_sg.crs,
)
# weather station to nearest grid
gdf_grid_rainfall = gpd.sjoin_nearest(gdf_grid, gdf_rainfall).drop(
columns=["Description", "index_right"]
)
# does it work? let's visualize with folium
gdf_grid_rainfall.loc[lambda d: d["Date"].eq("20220622")].explore("Rainfall (mm)", height=400, width=600)
data sourcing
import requests, itertools, io
from pathlib import Path
import urllib
from zipfile import ZipFile
import fiona.drvsupport
import geopandas as gpd
import numpy as np
import pandas as pd
import shapely.geometry
# get official Singapore planning area geometry
url = "https://geo.data.gov.sg/planning-area-census2010/2014/04/14/kml/planning-area-census2010.zip"
f = Path.cwd().joinpath(urllib.parse.urlparse(url).path.split("/")[-1])
if not f.exists():
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
with open(f, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
zfile = ZipFile(f)
zfile.extractall(f.stem)
fiona.drvsupport.supported_drivers['KML'] = 'rw'
gdf_sg = gpd.read_file(
[_ for _ in Path.cwd().joinpath(f.stem).glob("*.kml")][0], driver="KML"
)
# get data about Singapore weather stations
df_stations = pd.json_normalize(
requests.get("https://api.data.gov.sg/v1/environment/rainfall").json()["metadata"][
"stations"
]
)
# dates to get data from weather.gov.sg
dates = pd.date_range("20220601", "20220730", freq="MS").strftime("%Y%m")
df = pd.DataFrame()
# fmt: off
bad = ['S100', 'S201', 'S202', 'S203', 'S204', 'S205', 'S207', 'S208',
'S209', 'S211', 'S212', 'S213', 'S214', 'S215', 'S216', 'S217',
'S218', 'S219', 'S220', 'S221', 'S222', 'S223', 'S224', 'S226',
'S227', 'S228', 'S229', 'S230', 'S900']
# fmt: on
for stat, month in itertools.product(df_stations["id"], dates):
if not stat in bad:
try:
df_ = pd.read_csv(
io.StringIO(
requests.get(
f"http://www.weather.gov.sg/files/dailydata/DAILYDATA_{stat}_{month}.csv"
).text
)
).iloc[:, 0:5]
except pd.errors.ParserError as e:
bad.append(stat)
print(f"failed {stat} {month}")
df = pd.concat([df, df_.assign(id=stat)])
df["Rainfall (mm)"] = pd.to_numeric(
df["Daily Rainfall Total (mm)"], errors="coerce"
)
df["Date"] = pd.to_datetime(df[["Year","Month","Day"]]).dt.strftime("%Y%m%d")
df = df.loc[:,["id","Date","Rainfall (mm)", "Station"]]
visualisation using plotly animation
import plotly.express as px
# reduce dates so figure builds in sensible time
gdf_px = gdf_grid_rainfall.loc[
lambda d: d["Date"].isin(
gdf_grid_rainfall["Date"].value_counts().sort_index().index[0:15]
)
]
px.choropleth_mapbox(
gdf_px,
geojson=gdf_px.geometry,
locations=gdf_px.index,
color="Rainfall (mm)",
hover_data=gdf_px.columns[1:].tolist(),
animation_frame="Date",
mapbox_style="carto-positron",
center={"lat":gdf_px.unary_union.centroid.y, "lon":gdf_px.unary_union.centroid.x},
zoom=8.5
).update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0, "pad": 4})

Related

geopandas doesn't find point in polygon even though it should?

I have some lat/long coordinates and need to confirm if they are with the city of Atlanta, GA. I'm testing it out but it doesn't seem to work.
I got a geojson from here which appears to be legit:
https://gis.atlantaga.gov/?page=OPEN-DATA-HUB
import pandas as pd
import geopandas
atl = geopandas.read_file('Official_City_Boundary.geojson')
atl['geometry'] # this shows the image of Atlanta which appears correct
I plug in a couple of coordinates I got from Google Maps:
x = [33.75865421788594, -84.43974601192079]
y = [33.729117878816, -84.4017757998275]
z = [33.827871937500255, -84.39646813516548]
df = pd.DataFrame({'latitude': [x[0], y[0], z[0]], 'longitude': [x[1], y[1], z[1]]})
geometry = geopandas.points_from_xy(df.longitude, df.latitude)
points = geopandas.GeoDataFrame(geometry=geometry)
points
geometry
0 POINT (-84.43975 33.75865)
1 POINT (-84.40178 33.72912)
2 POINT (-84.39647 33.82787)
But when I check if the points are in the boundary, only one is true:
atl['geometry'].contains(points)
0 True
1 False
2 False
Why are they not all true? Am I doing it wrong?
I found some geometry similar to what you refer to
an alternative approach is to use intersects() to find the contains relationship. NB use of unary_union as the Atlanta geometry I downloaded contains multiple polygons
import pandas as pd
import geopandas
from pathlib import Path
atl = geopandas.read_file(Path.home().joinpath("Downloads").joinpath('Official_City_Council_District_Boundaries.geojson'))
atl['geometry'] # this shows the image of Atlanta which appears correct
x = [33.75865421788594, -84.43974601192079]
y = [33.729117878816, -84.4017757998275]
z = [33.827871937500255, -84.39646813516548]
df = pd.DataFrame({'latitude': [x[0], y[0], z[0]], 'longitude': [x[1], y[1], z[1]]})
geometry = geopandas.points_from_xy(df.longitude, df.latitude)
points = geopandas.GeoDataFrame(geometry=geometry, crs="epsg:4326")
points.intersects(atl.unary_union)
0 True
1 True
2 True
dtype: bool
As it is said in documentation:
It does not check if an element of one GeoSeries contains any element
of the other one.
So you should use a loop to check all points.

Changing the values of a dict in lowercase ( values are code colors ) to be accepted as a color parametrer in plotly.graph.object

So, I'm trying to get the colors from the dictionary 'Disaster_type' to draw the markers in geoscatters depending of the type of disaster.
Basically, I want to reprensent in the graphic the natural diasasters with it's color code. eg; it's is a volcanic activity paint it 'orange'. I want to change the size of the marker as well depending of the magnitude of the disaster, but that's for another day.
here's the link of the dataset: https://www.kaggle.com/datasets/brsdincer/all-natural-disasters-19002021-eosdis
import plotly.graph_objects as go
import pandas as pd
import plotly as plt
df = pd.read_csv('1900_2021_DISASTERS - main.csv')
df.head()
df.tail()
disaster_set = {disaster for disaster in df['Disaster Type']}
disaster_type = {'Storm':'aliceblue',
'Volcanic activity':'orange',
'Flood':'royalblue',
'Mass movement (dry)':'darkorange',
'Landslide':'#C76114',
'Extreme temperature':'#FF0000',
'Animal accident':'gray55',
'Glacial lake outburst':'#7D9EC0',
'Earthquake':'#CD8C95',
'Insect infestation':'#EEE8AA',
'Wildfire':' #FFFF00',
'Fog':'#00E5EE',
'Drought':'#FFEFD5',
'Epidemic':'#00CD66 ',
'Impact':'#FF6347'}
# disaster_type_lower = {(k, v.lower()) for k, v in disaster_type.items()}
# print(disaster_type_lower)
# for values in disaster_type.values():
# disaster_type[values] = disaster_type.lowercase()
fig = go.Figure(data=go.Scattergeo(
lon = df['Longitude'],
lat = df['Latitude'],
text = df['Country'],
mode = 'markers',
marker_color = disaster_type_.values()
)
)
fig.show()
I cant figure how, I've left in comments after the dict how I tried to do that.
It changes them to lowercase, but know I dont know hot to get them...My brain is completly melted
it's a simple case of pandas map
found data that appears same as yours on kaggle so have used that
one type is unmapped Extreme temperature so used a fillna("red") to remove any errors
gray55 gave me an error so replaced it with RGB equivalent
import kaggle.cli
import sys
import pandas as pd
from zipfile import ZipFile
import urllib
import plotly.graph_objects as go
# fmt: off
# download data set
url = "https://www.kaggle.com/brsdincer/all-natural-disasters-19002021-eosdis"
sys.argv = [sys.argv[0]] + f"datasets download {urllib.parse.urlparse(url).path[1:]}".split(" ")
kaggle.cli.main()
zfile = ZipFile(f'{urllib.parse.urlparse(url).path.split("/")[-1]}.zip')
dfs = {f.filename: pd.read_csv(zfile.open(f)) for f in zfile.infolist()}
# fmt: on
df = dfs["DISASTERS/1970-2021_DISASTERS.xlsx - emdat data.csv"]
disaster_type = {
"Storm": "aliceblue",
"Volcanic activity": "orange",
"Flood": "royalblue",
"Mass movement (dry)": "darkorange",
"Landslide": "#C76114",
"Extreme temperature": "#FF0000",
"Animal accident": "#8c8c8c", # gray55
"Glacial lake outburst": "#7D9EC0",
"Earthquake": "#CD8C95",
"Insect infestation": "#EEE8AA",
"Wildfire": " #FFFF00",
"Fog": "#00E5EE",
"Drought": "#FFEFD5",
"Epidemic": "#00CD66 ",
"Impact": "#FF6347",
}
fig = go.Figure(
data=go.Scattergeo(
lon=df["Longitude"],
lat=df["Latitude"],
text=df["Country"],
mode="markers",
marker_color=df["Disaster Type"].map(disaster_type).fillna("red"),
)
)
fig.show()

drop down menu with dash / plotly

How can I make this code with drop down menu to chose between "New Cases" and 2 other columns that I have in my csv file
# load in new csv to merge with geodata
import pandas as pd
df = pd.read_csv("ALLCOUNTRIES-PREDICTED.csv", header=0, encoding="utf-8")
import plotly.express as px
fig = px.choropleth(df,
locations="iso_alpha_3",
color="New Cases", # identify representing column
hover_name="Country", # identify country code column
animation_frame="Date", # identify date column
projection="equirectangular", # select projection
color_continuous_scale = 'Reds', # select prefer color scale
range_color=[0,10000] # select range of dataset
)
fig.show()
fig.write_html("example_map1.html")
source OWID COVID data. Renamed columns to be consistent with column names in question
core concept. Build a figure for each column. Each figure contains traces (data), frames and layout. Key is that each frame name is unique, hence addition of a suffix (a, b or c)
integrate three figures
traces is simple, just traces from first figure
frames is relatively simple, all frames from all figures
layout take layout from first figure without play/pause buttons
updatemenus is drop down of required columns. args are sliders and coloraxis from appropriate figure
have used different color scales for each column. have used a different max for range_color for each column, calculated from underlying data
play / pause have been removed - they can be made to partially work using this concept https://plotly.com/python/animations/#defining-button-arguments However this means you then need to updatemenus from updatemenus which really does not work in a completely static structure that updatemenus is
import pandas as pd
import io, requests
import plotly.express as px
import plotly.graph_objects as go
# get OWID COVID data
dfall = pd.read_csv(
io.StringIO(
requests.get(
"https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv"
).text
)
)
# filter make data frame have same columns as question and filter to a few days..
dfall["date"] = pd.to_datetime(dfall["date"])
df = dfall.rename(
columns={
"iso_code": "iso_alpha_3",
"new_cases": "New Cases",
"location": "Country",
"date": "Date",
}
).loc[lambda d: d["Date"].ge("1-nov-2021")]
df["Date"] = df["Date"].dt.strftime("%Y-%b-%d")
# three columns we're going to build choropleths from
cols = ["New Cases", "new_deaths", "new_vaccinations"]
# build figures for each of the required columns
# key technique is append a suffix to animation frame so each frame has it's
# own name...
figs = [
px.choropleth(
df.assign(Date=lambda d: d["Date"] + f"~{suffix}"),
locations="iso_alpha_3",
color=c, # identify representing column
hover_name="Country", # identify country code column
animation_frame="Date", # identify date column
projection="equirectangular", # select projection
color_continuous_scale=color, # select prefer color scale
range_color=[
0,
df.groupby("Date")[c].quantile(0.75).mean(),
], # select range of dataset
)
for c, color, suffix in zip(cols, ["Blues", "Reds", "Greens"], list("abc"))
]
# play / pause don't work as don't stop between columns..
layout = {
k: v
for k, v in figs[0].to_dict()["layout"].items()
if k not in ["template", "updatemenus"]
}
# build figure from all frames, with layout excluding play/pause buttons
fig = go.Figure(
data=figs[0].data, frames=[fr for f in figs for fr in f.frames], layout=layout
)
# finally build drop down menu...
fig = fig.update_layout(
updatemenus=[
{
"buttons": [
{
"label": c,
"method": "relayout",
"args": [
{
"coloraxis": col_fig.layout.coloraxis,
"sliders": col_fig.layout.sliders,
}
],
}
for c, col_fig in zip(cols, figs)
]
}
]
)
fig
dash / plotly solution
using dash it becomes very simple, just build as many figures as columns
dropdown with call back just picks appropriate figure
import pandas as pd
import io, requests
import plotly.express as px
import plotly.graph_objects as go
import dash
from dash.dependencies import Input, Output, State
from jupyter_dash import JupyterDash
# get OWID COVID data
dfall = pd.read_csv(
io.StringIO(
requests.get(
"https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv"
).text
)
)
# filter make data frame have same columns as question and filter to a few days..
dfall["date"] = pd.to_datetime(dfall["date"])
df = dfall.rename(
columns={
"iso_code": "iso_alpha_3",
"new_cases": "New Cases",
"location": "Country",
"date": "Date",
}
).loc[lambda d: d["Date"].ge("1-nov-2021")]
df["Date"] = df["Date"].dt.strftime("%Y-%b-%d")
# three columns we're going to build choropleths from
cols = ["New Cases", "new_deaths", "new_vaccinations"]
# build figures for each of the required columns
figs = [
px.choropleth(
df,
locations="iso_alpha_3",
color=c, # identify representing column
hover_name="Country", # identify country code column
animation_frame="Date", # identify date column
projection="equirectangular", # select projection
color_continuous_scale=color, # select prefer color scale
range_color=[
0,
df.groupby("Date")[c].quantile(0.75).mean(),
], # select range of dataset
)
for c, color in zip(cols, ["Blues", "Reds", "Greens"])
]
# Build App
app = JupyterDash(__name__)
app.layout = dash.html.Div(
[
dash.dcc.Dropdown(
id="choropleth",
options=[{"label": c, "value": i} for i, c in enumerate(cols)],
value=0,
),
dash.dcc.Graph(
id="map",
),
]
)
#app.callback(Output("map", "figure"), Input("choropleth", "value"))
def updateGraph(id):
if not id: return figs[0]
return figs[int(id)]
# Run app and display result inline in the notebook
app.run_server(mode="inline")

Finding the distance between latlong

I am a bit stuck. I have a CSV which includes:
Site Name
Latitude
Longitude.
This CSV has 100,000 locations. I need to generate a comma separated list for each location, showing the other locations within 5KM
I have tried the attached, which transposes the table & gives me 100,000 columns with 100,000 rows and the distance populated as the result. But I am not sure how to just make a new pandas column which has a list of all the sites within 5KM.
Can you help?
from geopy.distance import geodesic
def distance(row, csr):
lat = row['latitude']
long = row['longitude']
lat_long = (lat, long)
try:
return round(geodesic(lat_long, lat_long_compare).kilometers,2)
except:
return 9999
for key, value in d.items():
lat_compare = value['latitude']
long_compare = value['longitude']
lat_long_compare = (lat_compare, long_compare)
csr = key
df[key] = df.apply([distance, csr], axis=1)
Some sample data can be:
destinations = { 'bigben' : {'latitude': 51.510357,
'longitude': -0.116773},
'heathrow' : {'latitude': 51.470020,
'longitude': -0.454295},
'alton_towers' : {'latitude': 52.987662716,
'longitude': -1.888829778}
}
bigben is 0.8KM from the London Eye
heathrow is 23.55KM from the London Eye
alton_towers is 204.63KM from the London Eye
So, in this case, the field should show only big ben.
So we get:
Site | Sites within 5KM
28, BigBen
Here is one way with NearestNeighbors.
from sklearn.neighbors import NearestNeighbors
# data from your input
df = pd.DataFrame.from_dict(destinations, orient='index').rename_axis('Site Name').reset_index()
radius = 50 #change to whatever, in km
# crate the algo with the raidus and the metric for geospatial distance
neigh = NearestNeighbors(radius=radius/6371, metric='haversine')
# fit the data in radians
neigh.fit(df[['latitude', 'longitude']].to_numpy()*np.pi/180)
# extract result and transform to get the expected output
df[f'Site_within_{radius}km'] = (
pd.Series(neigh.radius_neighbors()[1]) # get a list of index for each row
.explode()
.map(df['Site Name']) # get the site name from row index
.groupby(level=0) # transform back to row-row relation
.agg(list) # can use ', '.join instead of list
)
print(df)
Site Name latitude longitude Site_within_50km
0 bigben 51.510357 -0.116773 [heathrow]
1 heathrow 51.470020 -0.454295 [bigben]
2 alton_towers 52.987663 -1.888830 [nan]
Another way
from sklearn.neighbors import DistanceMetric
from math import radians
import pandas as pd
import numpy as np
#To Radians
df['latitude'] = np.radians(df['latitude'])
df['longitude'] = np.radians(df['longitude'])
#Pair the cities
df[['latitude','longitude']].to_numpy()
#Assume a sperical radius of 6373
dist = DistanceMetric.get_metric('haversine')#DistanceMetric class df=pd.DataFrame(dist.pairwise(df[['latitude','longitude']].to_numpy())*6373,columns=df.index.unique(), index=df.index.unique())
s=df.gt(0)&df.le(50)
df['Site_within_50km']=s.agg(lambda x: x.index[x].values, axis=1)#Filter
bigben heathrow alton_towers Site_within_50km
bigben 0.000000 23.802459 203.857533 [heathrow]
heathrow 23.802459 0.000000 195.048961 [bigben]
alton_towers 203.857533 195.048961 0.000000 []

Bokeh BoxPlot > KeyError: 'the label [SomeCategory] is not in the [index]'

I'm attempting to create a BoxPlot using Bokeh. When I get to the section where I need to identify outliers, it fails if a given category has no outliers.
If I remove the "problem" category, the BoxPlot executes properly. it's only when I attempt to create this BoxPlot with a category that has no outliers it fails.
Any instruction on how to remedy this?
The failure occurs at the commented section "Prepare outlier data for plotting [...]"
import numpy as np
import pandas as pd
import datetime
import math
from bokeh.plotting import figure, show, output_file
from bokeh.models import NumeralTickFormatter
# Create time stamps to allow for figure to display span in title
today = datetime.date.today()
delta1 = datetime.timedelta(days=7)
delta2 = datetime.timedelta(days=1)
start = str(today - delta1)
end = str(today - delta2)
#Identify location of prices
itemloc = 'Everywhere'
df = pd.read_excel(r'C:\Users\me\prices.xlsx')
# Create a list from the dataframe that identifies distinct categories for the separate box plots
cats = df['subcategory_desc'].unique().tolist()
# Find the quartiles and IQR for each category
groups = df.groupby('subcategory_desc', sort=False)
q1 = groups.quantile(q=0.25)
q2 = groups.quantile(q=0.5)
q3 = groups.quantile(q=0.75)
iqr = q3 - q1
upper = q3 + 1.5*iqr
lower = q1 - 1.5*iqr
# Find the outliers for each category
def outliers(group):
cat = group.name
return group[(group.price > upper.loc[cat][0]) | (group.price < lower.loc[cat][0])]['price']
out = groups.apply(outliers).dropna()
# Prepare outlier data for plotting, we need coordinates for every outlier.
outx = []
outy = []
for cat in cats:
# only add outliers if they exist
if not out.loc[cat].empty:
for value in out[cat]:
outx.append(cat)
outy.append(value)
I expect that the Box-and-whisker portion of categories with no outliers merely show up without the outlier dots.
Have you tried the code from official documentation, https://docs.bokeh.org/en/latest/docs/gallery/boxplot.html?
# prepare outlier data for plotting, we need coordinates for every outlier.
if not out.empty:
outx = []
outy = []
for keys in out.index:
outx.append(keys[0])
outy.append(out.loc[keys[0]].loc[keys[1]])

Resources