GeoJSON file not being rendered using Folium - python-3.x

I am trying to create a choropleth map using Folium. I exported a GeoJSON file for London boroughs from an official GIS shapefile. After hours of researching about possible causes, I noticed in my file that the features come up in a different order compared to another GeoJSON file that works, which I assume is the reason for not appearing on the map. Basically the order in mine is something like
"features": [
"geometry": {...},
"properties": {...}, etc
and the working GeoJSON has
"features": [
"properties": {...},
"geometry": {...},
My question is How can I change the order of the features or how to make it render with Folium?
The code for creating the map is as follows:
london = r'london_simple.json' # geojson file
# create a plain London map
london_map = folium.Map(location=[51.5074, 0.1278], zoom_start=10)
london_map.choropleth(
geo_data = london,
data = dfl1,
columns = ['Area_name', 'GLA_Population_Estimate_2017'],
key_on='feature.properties.Counties_1',
fill_color = 'YlOrRd',
fill_opacity = 0.7,
line_opacity=0.2,
legend_name='Population size in London'
)
london_map
I'm working in a jupyter notebook on IBM Watson, if that makes any difference. If I use my geojson file, no choropleth regions appear. If I change to the other file, it works (provided I change the map coordinates to Toronto ([37.7749, -122.4194]).
My code doesn't generate any error, just the plain map focused on London without the choropleth regions.
Link to working geojson
Link to my problematic geojson

Have you tried this instead?
key_on='feature.properties.Counties_a'
I think the code beginning with E should identify the relevant part of the shapefile.

There seem to be a few problems with the JSON file for London-
The coordinate values are faulty. They contain values such as [532946.0999999996, 181894.90000000037] when in reality it should be something like [ -0.042770, 51.531530 ].
There seem to be very few values to draw a polygon of the counties.
Search for an alternative Geojson for London, I found a working one here

Related

The best and simple way to convert labeled text classification data to spaCy v3 format

Let's suppose we have labeled data for text classification in a nice CSV file. We have 2 columns - "text" and "label". I am kind of struggling to understand spacy V3. documentation. If I understand the correctly main sources of examples of spacy v3 documentation are THIS PROJECTS ()https://github.com/explosion/projects/tree/v3/tutorials).
However, the training data are already prepared in the expected JSON nested structure format.
If I want to perform costume text classification in spacy v3 I need to convert the data to the example structure - e.g LIKE HERE (https://github.com/explosion/projects/blob/v3/tutorials/textcat_docs_issues/assets/docs_issues_eval.jsonl).
How to get from pandas data frame to here? Does prodigy support labeled data to spacy format? Let's have small example of the dataset
pd.DataFrame({
"TEXT":[
"i really like this post",
"thanks for that comment",
"i enjoy this friendly forum",
"this is a bad post",
"i dislike this article",
"this is not well written",
"who came up with this stupid idea?",
"This is just completely wrong!!",
"Get out of here now!!!!"],
"LABEL": [
"POS", "POS", "POS", "NEG", "NEG", "NEG", "RUDE", "RUDE", "RUDE"
]
})
In spaCy v3 training data is basically what you want your output to look like. So for text classification you create a doc with your text and then set doc.cats for your categories. After you've trained a model it'll do the same thing for new docs.
Whether your data is in a dataframe or not is irrelevant. You just need to iterate over the underlying values.
You can do something like this.
texts = ... list of texts ...
labels = ... aligned list of labels ...
for text, label in zip(texts, labels):
doc = nlp(text)
doc.cats[label] = True
# save the docs as a DocBin (.spacy file)
The project you linked to has a similar script here.

how to make a wind quiver plot on the geospatial map using plotly?

I have a series of the wind dataset with u/v component and try to make it as the quiver (wind vector) plot using plot.ly on the geospatial map. However, I still get troubles to make it happen. Could someone please help me to figure this out?
Following is my code
###- This is operating on Jupyter lab with dash extension
###- lon/lat/u/v are 2-D numpy array
fig = ff.create_quiver(lon,lat,u,v,
scale=.25,
arrow_scale=.4,
name='quiver',
line=dict(width=1))
fig['layout'].update(title='Quiver Plot')
fig['layout'].update(geo=dict(
resolution=50,
scope='usa',
showframe=False,
showcoastlines=True,
showland=True,
landcolor="lightgray",
countrycolor="white" ,
coastlinecolor="white",
projection=dict(type='equirectangular'),
domain = dict(x=[0, 1], y=[ 0, 1])
))
app = dash.Dash(__name__, external_stylesheets=external_stylesheets)
app.layout = html.Div(children=[
html.H1(children='Hello Dash'),
dcc.Graph(
id='example-graph-0',
figure=fig),
])
viewer.show(app)
You will not be able to use annotations on a scattermapbox while retaining the geo position of the arrows.
See my solution on how to find the arrow points and add them to the map as traces.
If you have several traces, even 40-50 traces or arrows, performances will plummet.
This is a known issue with plotly unfortunately which, I think, makes it basically useless.
You can look into other map APIs like google maps and apple maps. I tested google maps in a angular app and performance are great.

Cleaning up a column based on spelling? Pandas

I've got two very important, user entered, information columns in my data frame. They are mostly cleaned up except for one issue: the spelling, and the way names are written differ. For example I have five entries for one name: "red rocks canyon", "redrcks", "redrock canyon", "red rocks canyons". This data set is too large for me to go through and clean this manually (2 million entries). Are there any strategies to clean these features up with code?
I would look into doing phonetic string matching here. The basic idea behind this approach is to obtain a phonetic encoding for each entered string, and then group spelling variations by their encoding. Then, you could choose the most frequent variation in each group to be the "correct" spelling.
There are several different variations on phonetic encoding, and a great package in Python for trying some of them out is jellyfish. Here is an example of how to use it with the Soundex encoding:
import jellyfish
import pandas as pd
data = pd.DataFrame({
"name": [
"red rocks canyon",
"redrcks",
"redrock canyon",
"red rocks canyons",
"bosque",
"bosque escoces",
"bosque escocs",
"borland",
"borlange"
]
})
data["soundex"] = data.name.apply(lambda x: jellyfish.soundex(x))
print(data.groupby("soundex").agg({"name": lambda x: ", ".join(x)}))
This prints:
name
soundex
B200 bosque
B222 bosque escoces, bosque escocs
B645 borland, borlange
R362 red rocks canyon, redrcks, redrock canyon, red...
This definitely won't be perfect and you'll have to be careful as it might group things too aggressively, but I hope it gives you something to try!

Read shapefile attributes using talend

I am using the spatial plug-ins for TOS to perform the following task:
I have a dataset with X and Y coordinates. I have also a shapefile with multi polygons and two metadata attributes, name and Id. The idea is to look-up the names in the shapefile with the coordinates. With a point in polygon will be determined which polygon belongs a point to.
I am using the shapefile input component which points to the .shp file.
I am facing to hurdles:
I cannot retrieve the name and Id from the file. I can only see an attribute call the_geom. How can I read the metadata?
The second thing is, the file contains a multi polygon and I don't know how to iterate over it in order to perform a Contains or intersect with the points.
Any comment will be highly appreciated.
thanks for your input #chrki
I managed to solve my tasks in this way:
1) Create a generic schema under metadata:
As the .dbf file was in the same directory of the shapefile Talend automatically recognized the metadata:
2) This is the job overview:
3) I read the shape file using a sShapeFileInput component:
4) The shapefile contains multipolygons and I want to have polygons. My solution was to use a sSimplify component. I used the default settings.
5) The projection of the shapefile was "MGI / Austria Lambert" which corresponds to EPSG 31287. I want to re-project it as EPSG 4326 (GCS_WGS_1984) which is the one used by my input coordinates.
6) I read the x, y coordinates from a csv file.
7) With a s2DPointReplacer I converted the x,y coordinates as Point(x,y) (WKT)
8) Finally I created an expression in a tMap to get only the polygons and points with an intersection. I guess a "contains" would also work:
I hope this helps someone else.
Kind regards,
Paul

Basic importing coordinates into R and setting projection

Ok, I am trying to upload a .csv file, get it into a spatial points data frame and set the projection system to WGS 84. I then want to determine the distance between each point This is what I have come up with but I
cluster<-read.csv(file = "cluster.csv", stringsAsFactors=FALSE)
coordinates(cluster)<- ~Latitude+Longitude
cluster<-CRS("+proj=longlat +datum=WGS84")
d<-dist2Line(cluster)
This returns an error that says
Error in .pointsToMatrix(p) :
points should be vectors of length 2, matrices with 2 columns, or inheriting from a SpatialPoints* object
But this isn't working and I will be honest that I don't fully comprehend importing and manipulating spatial data in R. Any help would be great. Thanks
I was able to determine the issue I was running into. With WGS 84, the longitude comes before the latitude. This is just backwards from how all the GPS data I download is formatted (e.g. lat-long). Hope this helps anyone else who runs into this issue!
thus the code should have been
cluster<-read.csv(file = "cluster.csv", stringsAsFactors=FALSE)
coordinates(cluster)<- ~Longitude+Latitude
cluster<-CRS("+proj=longlat +datum=WGS84")

Resources