convert Array of polygons to Multipolygon - geospatial

I have an array of Polygons. I need to convert the array in to Multipolygon.
["POLYGON ((-93.8153401599999 31.6253224010001, -93.8154545089999 31.613245482, -93.8256952309999 31.6133096470001, -93.8239846819999 31.6142335050001, -93.822649241 31.614534889, -93.819589744 31.6141266810001, -93.8187199179999 31.6145615630001, -93.818796329 31.6166099970001, -93.8191396409999 31.616805696, -93.822160944 31.6185287610001, -93.8259606669999 31.6195415540001, -93.827173805 31.6202834370001, -93.826861 31.621054014, -93.826721397 31.6210996090001, -93.825838469 31.621387795, -93.823763302 31.620645804, -93.8224278609999 31.620880388, -93.8207344099999 31.6214468590001, -93.817712918 31.621645233, -93.8171636009999 31.6218779230001, -93.8170138 31.622175612, -93.816896795 31.622408104, -93.816843193 31.622514901, -93.8172703129999 31.623758464, -93.817027909 31.6250143240001, -93.816942408 31.624910524, -93.8153401599999 31.6253224010001))", "POLYGON ((-93.827875499 31.6135011530001, -93.8276549939999 31.6133218590001, -93.830593683 31.613340276, -93.827860513 31.616556659, -93.825911348 31.6159317660001, -93.825861447 31.615915767, -93.826296355 31.6149087000001, -93.8272805829999 31.614407122, -93.827341685 31.6143140250001, -93.827875499 31.6135011530001))"]
I am using the following code to convert the Multipolygons using Apache Sedona
select FID,ST_Multi(ST_GeomFromText(collect_list(polygon))) polygon_list group by 1
I am getting the error like "org.apache.spark.sql.catalyst.util.GenericArrayData cannot be cast to org.apache.spark.unsafe.types.UTF8String" .How can I overcome this issue ? is the same thing can be achieved using Geopandas or shapely?

The answer given by #Antoine B is a very good attempt. But it won't work with the polygons that have hole(s) in them. There is another approach that works with such polygons, and the code is easier to comprehend.
from shapely.geometry import Polygon, MultiPolygon
from shapely import wkt
from shapely.wkt import loads
# List of strings representing polygons
poly_string = ["POLYGON ((-93.8153401599999 31.6253224010001, -93.8154545089999 31.613245482, -93.8256952309999 31.6133096470001, -93.8239846819999 31.6142335050001, -93.822649241 31.614534889, -93.819589744 31.6141266810001, -93.8187199179999 31.6145615630001, -93.818796329 31.6166099970001, -93.8191396409999 31.616805696, -93.822160944 31.6185287610001, -93.8259606669999 31.6195415540001, -93.827173805 31.6202834370001, -93.826861 31.621054014, -93.826721397 31.6210996090001, -93.825838469 31.621387795, -93.823763302 31.620645804, -93.8224278609999 31.620880388, -93.8207344099999 31.6214468590001, -93.817712918 31.621645233, -93.8171636009999 31.6218779230001, -93.8170138 31.622175612, -93.816896795 31.622408104, -93.816843193 31.622514901, -93.8172703129999 31.623758464, -93.817027909 31.6250143240001, -93.816942408 31.624910524, -93.8153401599999 31.6253224010001))", "POLYGON ((-93.827875499 31.6135011530001, -93.8276549939999 31.6133218590001, -93.830593683 31.613340276, -93.827860513 31.616556659, -93.825911348 31.6159317660001, -93.825861447 31.615915767, -93.826296355 31.6149087000001, -93.8272805829999 31.614407122, -93.827341685 31.6143140250001, -93.827875499 31.6135011530001))"]
# Create a list of polygons from the list of strings
all_pgons = [loads(pgon) for pgon in poly_string]
# Create the required multipolygon
multi_pgon = MultiPolygon(all_pgons)
This is a list of strings of polygons with holes.
# List of polygons with hole
poly_string = ['POLYGON ((1 2, 1 5, 4 4, 1 2), (1.2 3, 3 4, 1.3 4, 1.2 3))',
'POLYGON ((11 12, 11 15, 14 14, 11 12), (11.2 13, 13 14, 11.3 14, 11.2 13))']
The code above also works well in this case.

a MultiPolygon is just a list of Polygon, so you need to reconstruct every Polygon in a list and then pass it to MultiPolygon.
With the format of the string you gave, I got it to work like that :
from shapely.geometry import Polygon, MultiPolygon
poly_string = ["POLYGON ((-93.8153401599999 31.6253224010001, -93.8154545089999 31.613245482, -93.8256952309999 31.6133096470001, -93.8239846819999 31.6142335050001, -93.822649241 31.614534889, -93.819589744 31.6141266810001, -93.8187199179999 31.6145615630001, -93.818796329 31.6166099970001, -93.8191396409999 31.616805696, -93.822160944 31.6185287610001, -93.8259606669999 31.6195415540001, -93.827173805 31.6202834370001, -93.826861 31.621054014, -93.826721397 31.6210996090001, -93.825838469 31.621387795, -93.823763302 31.620645804, -93.8224278609999 31.620880388, -93.8207344099999 31.6214468590001, -93.817712918 31.621645233, -93.8171636009999 31.6218779230001, -93.8170138 31.622175612, -93.816896795 31.622408104, -93.816843193 31.622514901, -93.8172703129999 31.623758464, -93.817027909 31.6250143240001, -93.816942408 31.624910524, -93.8153401599999 31.6253224010001))", "POLYGON ((-93.827875499 31.6135011530001, -93.8276549939999 31.6133218590001, -93.830593683 31.613340276, -93.827860513 31.616556659, -93.825911348 31.6159317660001, -93.825861447 31.615915767, -93.826296355 31.6149087000001, -93.8272805829999 31.614407122, -93.827341685 31.6143140250001, -93.827875499 31.6135011530001))"]
polygons = []
for poly in poly_string:
coordinates = []
for s in poly.split('('):
if len(s.split(')')) > 1:
for c in s.split(')')[0].split(','):
coordinates.append((float(c.lstrip().split(' ')[0]),
float(c.lstrip().split(' ')[1])))
polygons.append(Polygon(coordinates))
multipoly = MultiPolygon(polygons)
The resulting MultiPolygon looks like that :

I would try
select
FID,
ST_Multi(ST_Collect(ST_GeomFromText(polygon))) polygon_list
group by 1

Related

Surface triangulation and interpolation in python 3

I have 3 lists of equal length of x, y and z coordinates.
With them, I need to triangulate a surface, and retrieve values that lie in a line over that surface. In other words, I need the values that lie on that surface that intersect a given plane.
Problem is, I have no idea where to start.
I have tried scipy interp2d, but it seems I need more z values them what I actually have (like shown in this answer: Python interpolation and extracting value of z for x and y?.
# this is the data I have
x = [0.0, 17.67599999997765, 49.08499999996275, 90.57299999985844, 136.60500000044703]
y = [0.0, 45.22349889159747, 66.50303846438841, 114.04427618243405, 187.7707039612985]
z = [0.0, 1.8700000000000045, 1.9539999999999509, 1.3929999999999154, 1.6299999999999955]
I need a final grid with x y z values that look something like this:
I don't really need too much resolution
My desired final result is to be able to retrieve specific values on top of that surface
Like the point line in this image:
I have also tried looking at geospatial libraries, but I couldn't find a solution either.
Maybe it's possible to interpolate the z values that I need? But I'm not really sure how to do this. I have never used scipy library before, and I'm still struggling to understand it.
I'm using python 3.9
You barely have any data, so if you don't choose your intersecting plane carefully, you'll get no results back (or nonsense back). This includes the case of x=y; you can't do that at all - so the graph you've shown is entirely inapplicable to your data.
import numpy as np
import scipy.interpolate
x = [0.0, 17.6759999999776500, 49.0849999999627500, 90.5729999998584400, 136.6050000004470300]
y = [0.0, 45.2234988915974700, 66.5030384643884100, 114.0442761824340500, 187.7707039612985000]
z = [0.0, 1.8700000000000045, 1.9539999999999509, 1.3929999999999154, 1.6299999999999955]
xyi = np.empty((200, 2))
xyi[:, 0] = np.arange(200)
xyi[:, 1] = xyi[:, 0] * 1.374
zi = scipy.interpolate.griddata(
points=(x, y), values=z,
xi=xyi,
method='cubic',
)
good_vals = ~np.isnan(zi)
xyz = np.empty((np.count_nonzero(good_vals), 3))
xyz[:, :2] = xyi[good_vals, :]
xyz[:, 2] = zi[good_vals]
print(xyz)
[[0.00000000e+00 0.00000000e+00 0.00000000e+00]
[1.00000000e+00 1.37400000e+00 4.68988354e-02]
[2.00000000e+00 2.74800000e+00 9.44855957e-02]
[3.00000000e+00 4.12200000e+00 1.42698116e-01]
[4.00000000e+00 5.49600000e+00 1.91474231e-01]
[5.00000000e+00 6.87000000e+00 2.40751776e-01]
[6.00000000e+00 8.24400000e+00 2.90468585e-01]
[7.00000000e+00 9.61800000e+00 3.40562494e-01]
[8.00000000e+00 1.09920000e+01 3.90971337e-01]
[9.00000000e+00 1.23660000e+01 4.41632950e-01]
[1.00000000e+01 1.37400000e+01 4.92485168e-01]
[1.10000000e+01 1.51140000e+01 5.43465824e-01]
[1.20000000e+01 1.64880000e+01 5.94512755e-01]
[1.30000000e+01 1.78620000e+01 6.45563795e-01]
[1.40000000e+01 1.92360000e+01 6.96556779e-01]
[1.50000000e+01 2.06100000e+01 7.47429542e-01]
[1.60000000e+01 2.19840000e+01 7.98119919e-01]
[1.70000000e+01 2.33580000e+01 8.48565744e-01]
[1.80000000e+01 2.47320000e+01 8.98704854e-01]
[1.90000000e+01 2.61060000e+01 9.48475082e-01]
[2.00000000e+01 2.74800000e+01 9.97814263e-01]
[2.10000000e+01 2.88540000e+01 1.04666023e+00]
[2.20000000e+01 3.02280000e+01 1.09495083e+00]
[2.30000000e+01 3.16020000e+01 1.14262388e+00]
[2.40000000e+01 3.29760000e+01 1.18961722e+00]
[2.50000000e+01 3.43500000e+01 1.23586870e+00]
[2.60000000e+01 3.57240000e+01 1.28131613e+00]
[2.70000000e+01 3.70980000e+01 1.32589737e+00]
[2.80000000e+01 3.84720000e+01 1.36955024e+00]
[2.90000000e+01 3.98460000e+01 1.41221257e+00]
[3.00000000e+01 4.12200000e+01 1.45382221e+00]
[3.10000000e+01 4.25940000e+01 1.49431699e+00]
[3.20000000e+01 4.39680000e+01 1.53363474e+00]
[3.30000000e+01 4.53420000e+01 1.57171329e+00]
[3.40000000e+01 4.67160000e+01 1.60849049e+00]
[3.50000000e+01 4.80900000e+01 1.64390417e+00]
[3.60000000e+01 4.94640000e+01 1.67789216e+00]
[3.70000000e+01 5.08380000e+01 1.71039230e+00]
[3.80000000e+01 5.22120000e+01 1.74134241e+00]
[3.90000000e+01 5.35860000e+01 1.77068035e+00]
[4.00000000e+01 5.49600000e+01 1.79834394e+00]
[4.10000000e+01 5.63340000e+01 1.82427101e+00]
[4.20000000e+01 5.77080000e+01 1.84839941e+00]
[4.30000000e+01 5.90820000e+01 1.87066696e+00]
[4.40000000e+01 6.04560000e+01 1.89101151e+00]
[4.50000000e+01 6.18300000e+01 1.90937088e+00]
[4.60000000e+01 6.32040000e+01 1.92570604e+00]
[4.70000000e+01 6.45780000e+01 1.94056076e+00]
[4.80000000e+01 6.59520000e+01 1.95421968e+00]
[4.90000000e+01 6.73260000e+01 1.96719695e+00]
[5.00000000e+01 6.87000000e+01 1.97928362e+00]
[5.10000000e+01 7.00740000e+01 1.98942207e+00]
[5.20000000e+01 7.14480000e+01 1.99700320e+00]
[5.30000000e+01 7.28220000e+01 2.00180807e+00]
[5.40000000e+01 7.41960000e+01 2.00361776e+00]
[5.50000000e+01 7.55700000e+01 2.00221333e+00]
[5.60000000e+01 7.69440000e+01 1.99737586e+00]
[5.70000000e+01 7.83180000e+01 1.98888640e+00]
[5.80000000e+01 7.96920000e+01 1.97652604e+00]
[5.90000000e+01 8.10660000e+01 1.96007583e+00]
[6.00000000e+01 8.24400000e+01 1.93931685e+00]
[6.10000000e+01 8.38140000e+01 1.91403017e+00]
[6.20000000e+01 8.51880000e+01 1.88497097e+00]
[6.30000000e+01 8.65620000e+01 1.85740456e+00]
[6.40000000e+01 8.79360000e+01 1.83466390e+00]
[6.50000000e+01 8.93100000e+01 1.81970145e+00]
[6.60000000e+01 9.06840000e+01 1.81546971e+00]
[6.70000000e+01 9.20580000e+01 1.82177337e+00]
[6.80000000e+01 9.34320000e+01 1.82842946e+00]
[6.90000000e+01 9.48060000e+01 1.83439551e+00]
[7.00000000e+01 9.61800000e+01 1.83969621e+00]
[7.10000000e+01 9.75540000e+01 1.84435625e+00]
[7.20000000e+01 9.89280000e+01 1.84840034e+00]
[7.30000000e+01 1.00302000e+02 1.85185316e+00]
[7.40000000e+01 1.01676000e+02 1.85473940e+00]
[7.50000000e+01 1.03050000e+02 1.85708377e+00]
[7.60000000e+01 1.04424000e+02 1.85891096e+00]
[7.70000000e+01 1.05798000e+02 1.86024566e+00]
[7.80000000e+01 1.07172000e+02 1.86111256e+00]
[7.90000000e+01 1.08546000e+02 1.86153636e+00]
[8.00000000e+01 1.09920000e+02 1.86154176e+00]
[8.10000000e+01 1.11294000e+02 1.86115344e+00]
[8.20000000e+01 1.12668000e+02 1.86039610e+00]
[8.30000000e+01 1.14042000e+02 1.85929444e+00]
[8.40000000e+01 1.15416000e+02 1.85787402e+00]
[8.50000000e+01 1.16790000e+02 1.85624017e+00]
[8.60000000e+01 1.18164000e+02 1.85445481e+00]
[8.70000000e+01 1.19538000e+02 1.85252055e+00]
[8.80000000e+01 1.20912000e+02 1.85043999e+00]
[8.90000000e+01 1.22286000e+02 1.84821574e+00]
[9.00000000e+01 1.23660000e+02 1.84585039e+00]
[9.10000000e+01 1.25034000e+02 1.84334656e+00]
[9.20000000e+01 1.26408000e+02 1.84070685e+00]
[9.30000000e+01 1.27782000e+02 1.83793385e+00]
[9.40000000e+01 1.29156000e+02 1.83503019e+00]
[9.50000000e+01 1.30530000e+02 1.83199845e+00]
[9.60000000e+01 1.31904000e+02 1.82884125e+00]
[9.70000000e+01 1.33278000e+02 1.82556119e+00]
[9.80000000e+01 1.34652000e+02 1.82216087e+00]
[9.90000000e+01 1.36026000e+02 1.81864290e+00]
[1.00000000e+02 1.37400000e+02 1.81500988e+00]
[1.01000000e+02 1.38774000e+02 1.81126441e+00]
[1.02000000e+02 1.40148000e+02 1.80740911e+00]
[1.03000000e+02 1.41522000e+02 1.80344657e+00]
[1.04000000e+02 1.42896000e+02 1.79937940e+00]
[1.05000000e+02 1.44270000e+02 1.79521020e+00]
[1.06000000e+02 1.45644000e+02 1.79094157e+00]
[1.07000000e+02 1.47018000e+02 1.78657613e+00]
[1.08000000e+02 1.48392000e+02 1.78211648e+00]
[1.09000000e+02 1.49766000e+02 1.77756521e+00]
[1.10000000e+02 1.51140000e+02 1.77292494e+00]
[1.11000000e+02 1.52514000e+02 1.76819827e+00]
[1.12000000e+02 1.53888000e+02 1.76338780e+00]
[1.13000000e+02 1.55262000e+02 1.75849613e+00]
[1.14000000e+02 1.56636000e+02 1.75352588e+00]
[1.15000000e+02 1.58010000e+02 1.74847964e+00]
[1.16000000e+02 1.59384000e+02 1.74336002e+00]
[1.17000000e+02 1.60758000e+02 1.73816962e+00]
[1.18000000e+02 1.62132000e+02 1.73291105e+00]
[1.19000000e+02 1.63506000e+02 1.72758692e+00]
[1.20000000e+02 1.64880000e+02 1.72219982e+00]
[1.21000000e+02 1.66254000e+02 1.71675236e+00]
[1.22000000e+02 1.67628000e+02 1.71124714e+00]
[1.23000000e+02 1.69002000e+02 1.70568677e+00]
[1.24000000e+02 1.70376000e+02 1.70007386e+00]
[1.25000000e+02 1.71750000e+02 1.69441100e+00]
[1.26000000e+02 1.73124000e+02 1.68870081e+00]
[1.27000000e+02 1.74498000e+02 1.68294588e+00]
[1.28000000e+02 1.75872000e+02 1.67714882e+00]
[1.29000000e+02 1.77246000e+02 1.67131224e+00]
[1.30000000e+02 1.78620000e+02 1.66543873e+00]
[1.31000000e+02 1.79994000e+02 1.65953091e+00]
[1.32000000e+02 1.81368000e+02 1.65359138e+00]
[1.33000000e+02 1.82742000e+02 1.64762273e+00]
[1.34000000e+02 1.84116000e+02 1.64162758e+00]
[1.35000000e+02 1.85490000e+02 1.63560853e+00]
[1.36000000e+02 1.86864000e+02 1.62956819e+00]]

Create dict from array or list OR just parse the list for X and Y coordinates

I have 3000 lines of data like this:
['OFFD.271818,271818,"LINESTRING (16.303895355263016 48.18772778239529, 16.304571765172827 48.18758202488568, 16.30482300975865 48.18755484403183, 16.305031079294384 48.187546649202545, 16.30536730486924 48.187533407177206, 16.307523452290432 48.18753396398144, 16.309072536732444 48.18748514596115, 16.312777938045286 48.18734458451529, 16.313426882251083 48.18727411748434, 16.315405366265555 48.186920966444205, 16.316609208646593 48.18670268519608, 16.317260447683868 48.18652861710351, 16.31853471535412 48.186166775088815)",U4,4,U-Bahn,']
I want using matplotlib to create a plot, but I need X and Y coordinates.
The Targe is: U4 (from the line)
Coordinates are:
16.303895355263016 48.18772778239529, 16.304571765172827 48.18758202488568, 16.30482300975865 48.18755484403183, 16.305031079294384 48.187546649202545, 16.30536730486924 48.187533407177206, 16.307523452290432 48.18753396398144, 16.309072536732444 48.18748514596115, 16.312777938045286 48.18734458451529, 16.313426882251083 48.18727411748434, 16.315405366265555 48.186920966444205, 16.316609208646593 48.18670268519608, 16.317260447683868 48.18652861710351, 16.31853471535412 48.186166775088815
I do not get how to parse this string with numpy and create the dataset:
U4: coordinates for X: ... and for Y:....
Any hints?

How to find the shortest distance between two line segments capturing the sign values with python

I have a pandas dataframe of the form:
benchmark_x benchmark_y ref_point_x ref_point_y
0 525039.140 175445.518 525039.145 175445.539
1 525039.022 175445.542 525039.032 175445.568
2 525038.944 175445.558 525038.954 175445.588
3 525038.855 175445.576 525038.859 175445.576
4 525038.797 175445.587 525038.794 175445.559
5 525038.689 175445.609 525038.679 175445.551
6 525038.551 175445.637 525038.544 175445.577
7 525038.473 175445.653 525038.459 175445.594
8 525038.385 175445.670 525038.374 175445.610
9 525038.306 175445.686 525038.289 175445.626
I am trying to find the shortest distance from the line to the benchmark such that if the line is above the benchmark the distance is positive and if it is below the benchmark the distance is negative. See image below:
I used the KDTree from scipy like so:
from scipy.spatial import KDTree
tree=KDTree(df[["benchmark_x", "benchmark_y"]])
test = df.apply(lambda row: tree.query(row[["ref_point_x", "ref_point_y"]]), axis=1)
test=test.apply(pd.Series, index=["distance", "index"])
This seems to work except that it fails to capture the negative values as a result that the line is below the benchmark.
# recreating your example
columns = "benchmark_x benchmark_y ref_point_x ref_point_y".split(" ")
data = """525039.140 175445.518 525039.145 175445.539
525039.022 175445.542 525039.032 175445.568
525038.944 175445.558 525038.954 175445.588
525038.855 175445.576 525038.859 175445.576
525038.797 175445.587 525038.794 175445.559
525038.689 175445.609 525038.679 175445.551
525038.551 175445.637 525038.544 175445.577
525038.473 175445.653 525038.459 175445.594
525038.385 175445.670 525038.374 175445.610
525038.306 175445.686 525038.289 175445.626"""
data = [float(x) for x in data.replace("\n"," ").split(" ") if len(x)>0]
arr = np.array(data).reshape(-1,4)
df = pd.DataFrame(arr, columns=columns)
# adding your two new columns to the df
from scipy.spatial import KDTree
tree=KDTree(df[["benchmark_x", "benchmark_y"]])
df["distance"], df["index"] = tree.query(df[["ref_point_x", "ref_point_y"]])
Now to compare if one line is above the other or not, we have to evaluate y at the same x position. Therefore we need to interpolate the y points for the x positions of the other line.
df = df.sort_values("ref_point_x") # sorting is required for interpolation
xy_refpoint = df[["ref_point_x", "ref_point_y"]].values
df["ref_point_y_at_benchmark_x"] = np.interp(df["benchmark_x"], xy_refpoint[:,0], xy_refpoint[:,1])
And finally your criterium can be evaluated and applied:
df["distance"] = np.where(df["ref_point_y_at_benchmark_x"] < df["benchmark_y"], -df["distance"], df["distance"])
# or change the < to <,>,<=,>= as you wish

Overwrite GPS coordinates in Image Exif using Python 3.6

I am trying to transform image geotags so that images and ground control points lie in the same coordinate system inside my software (Pix4D mapper).
The answer here says:
Exif data is standardized, and GPS data must be encoded using
geographical coordinates (minutes, seconds, etc) described above
instead of a fraction. Unless it's encoded in that format in the exif
tag, it won't stick.
Here is my code:
import os, piexif, pyproj
from PIL import Image
img = Image.open(os.path.join(dirPath,fn))
exif_dict = piexif.load(img.info['exif'])
breite = exif_dict['GPS'][piexif.GPSIFD.GPSLatitude]
lange = exif_dict['GPS'][piexif.GPSIFD.GPSLongitude]
breite = breite[0][0] / breite[0][1] + breite[1][0] / (breite[1][1] * 60) + breite[2][0] / (breite[2][1] * 3600)
lange = lange[0][0] / lange[0][1] + lange[1][0] / (lange[1][1] * 60) + lange[2][0] / (lange[2][1] * 3600)
print(breite) #48.81368778730952
print(lange) #9.954511162420633
x, y = pyproj.transform(wgs84, gk3, lange, breite) #from WGS84 to GaussKrüger zone 3
print(x) #3570178.732528623
print(y) #5408908.20172699
exif_dict['GPS'][piexif.GPSIFD.GPSLatitude] = [ ( (int)(round(y,6) * 1000000), 1000000 ), (0, 1), (0, 1) ]
exif_bytes = piexif.dump(exif_dict) #error here
img.save(os.path.join(outPath,fn), "jpeg", exif=exif_bytes)
I am getting struct.error: argument out of range in the dump method. The original GPSInfo tag looks like: {0: b'\x02\x03\x00\x00', 1: 'N', 2: ((48, 1), (48, 1), (3449322402, 70000000)), 3: 'E', 4: ((9, 1), (57, 1), (1136812930, 70000000)), 5: b'\x00', 6: (3659, 10)}
I am guessing I have to offset the values and encode them properly before writing, but have no idea what is to be done.
It looks like you are already using PIL and Python 3.x, not sure if you want to continue using piexif but either way, you may find it easier to convert the degrees, minutes, and seconds into decimal first. It looks like you are trying to do that already but putting it in a separate function may be clearer and account for direction reference.
Here's an example:
def get_decimal_from_dms(dms, ref):
degrees = dms[0][0] / dms[0][1]
minutes = dms[1][0] / dms[1][1] / 60.0
seconds = dms[2][0] / dms[2][1] / 3600.0
if ref in ['S', 'W']:
degrees = -degrees
minutes = -minutes
seconds = -seconds
return round(degrees + minutes + seconds, 5)
def get_coordinates(geotags):
lat = get_decimal_from_dms(geotags['GPSLatitude'], geotags['GPSLatitudeRef'])
lon = get_decimal_from_dms(geotags['GPSLongitude'], geotags['GPSLongitudeRef'])
return (lat,lon)
The geotags in this example is a dictionary with the GPSTAGS as keys instead of the numeric codes for readability. You can find more detail and the complete example from this blog post: Getting Started with Geocoding Exif Image Metadata in Python 3
After much hemming & hawing I reached the pages of py3exiv2 image metadata manipulation library. One will find exhaustive lists of the metadata tags as one reads through but here is the list of EXIF tags just to save few clicks.
It runs smoothly on Linux and provides many opportunities to edit image-headers. The documentation is also quite clear. I recommend this as a solution and am interested to know if it solves everyone else's problems as well.

scikit-learn roc_curve: why does it return a threshold value = 2 some time?

Correct me if I'm wrong: the "thresholds" returned by scikit-learn's roc_curve should be an array of numbers that are in [0,1]. However, it sometimes gives me an array with the first number close to "2". Is it a bug or I did sth wrong? Thanks.
In [1]: import numpy as np
In [2]: from sklearn.metrics import roc_curve
In [3]: np.random.seed(11)
In [4]: aa = np.random.choice([True, False],100)
In [5]: bb = np.random.uniform(0,1,100)
In [6]: fpr,tpr,thresholds = roc_curve(aa,bb)
In [7]: thresholds
Out[7]:
array([ 1.97396826, 0.97396826, 0.9711752 , 0.95996265, 0.95744405,
0.94983331, 0.93290463, 0.93241372, 0.93214862, 0.93076592,
0.92960511, 0.92245024, 0.91179548, 0.91112166, 0.87529458,
0.84493853, 0.84068543, 0.83303741, 0.82565223, 0.81096657,
0.80656679, 0.79387241, 0.77054807, 0.76763223, 0.7644911 ,
0.75964947, 0.73995152, 0.73825262, 0.73466772, 0.73421299,
0.73282534, 0.72391126, 0.71296292, 0.70930102, 0.70116428,
0.69606617, 0.65869235, 0.65670881, 0.65261474, 0.6487222 ,
0.64805644, 0.64221486, 0.62699782, 0.62522484, 0.62283401,
0.61601839, 0.611632 , 0.59548669, 0.57555854, 0.56828967,
0.55652111, 0.55063947, 0.53885029, 0.53369398, 0.52157349,
0.51900774, 0.50547317, 0.49749635, 0.493913 , 0.46154029,
0.45275916, 0.44777116, 0.43822067, 0.43795921, 0.43624093,
0.42039077, 0.41866343, 0.41550367, 0.40032843, 0.36761763,
0.36642721, 0.36567017, 0.36148354, 0.35843793, 0.34371331,
0.33436415, 0.33408289, 0.33387442, 0.31887024, 0.31818719,
0.31367915, 0.30216469, 0.30097917, 0.29995201, 0.28604467,
0.26930354, 0.2383461 , 0.22803687, 0.21800338, 0.19301808,
0.16902881, 0.1688173 , 0.14491946, 0.13648451, 0.12704826,
0.09141459, 0.08569481, 0.07500199, 0.06288762, 0.02073298,
0.01934336])
Most of the time these thresholds are not used, for example in calculating the area under the curve, or plotting the False Positive Rate against the True Positive Rate.
Yet to plot what looks like a reasonable curve, one needs to have a threshold that incorporates 0 data points. Since Scikit-Learn's ROC curve function need not have normalised probabilities for thresholds (any score is fine), setting this point's threshold to 1 isn't sufficient; setting it to inf is sensible but coders often expect finite data (and it's possible the implementation also works for integer thresholds). Instead the implementation uses max(score) + epsilon where epsilon = 1. This may be cosmetically deficient, but you haven't given any reason why it's a problem!
From the documentation:
thresholds : array, shape = [n_thresholds]
Decreasing thresholds on the decision function used to compute
fpr and tpr. thresholds[0] represents no instances being predicted
and is arbitrarily set to max(y_score) + 1.
So the first element of thresholds is close to 2 because it is max(y_score) + 1, in your case thresholds[1] + 1.
this seems like a bug to me - in roc_curve(aa,bb), 1 is added to the first threshold. You should create an issue here https://github.com/scikit-learn/scikit-learn/issues

Resources