Apache Superset - Problems with the display of shapefile data - geometry

I have added a shapefile file in my PostgreSQL and this is how it looks visually.
https://i.stack.imgur.com/sRSWz.png
The same shapefile I added in Superset Apache from the PostgreSQL database and then if I take the visualization DECK.GL PATH and then take the column with the Geometry (polyline as lines encoding) it looks like this.
https://i.stack.imgur.com/JlKC2.png

Related

Read temperature, humdity, etc from grib2 files with EECodes in python3

I am trying to use EECodes in python to get various weather information, such as temperature, humidity, etc out of grib2 files. I am using the GFS files. I would like to be able to extract the data as (lat,lon,alt,$data_point), and as a 2d array for each altitude.
I have tried the example programs located here: https://confluence.ecmwf.int/display/ECC/grib_iterator_bitmap
I can't figure out what I am looking in the output of that program. When I load the messages using their keys, it is not obvious how to make a grid. When I load the grid, the data doesn't have labels I understand.
#craeft have a look to https://github.com/ecmwf/cfgrib. cfgrib is the new standard for python and grib file handling. It is easy to install and easy to access files. Please install the latest version because it supports GFS files.

THREDDS & METAR Plotting with Python

I'm having trouble trying to plot METAR data from the THREDDS data server, which comes in .nc format, onto a map. I'm using Siphon to grab the data and Xarray to open the dataset with remote_access, but the problem comes with attempting to convert the data to .csv or .txt. When using NetCDF Dataset and attempting to read the NetCDF4 file, I get a file does not exist error. I'm not really sure where to start as there doesn't seem to be much on the internet about this. Any help would be appreciated as I'm also really new to this.

Read shapefile attributes using talend

I am using the spatial plug-ins for TOS to perform the following task:
I have a dataset with X and Y coordinates. I have also a shapefile with multi polygons and two metadata attributes, name and Id. The idea is to look-up the names in the shapefile with the coordinates. With a point in polygon will be determined which polygon belongs a point to.
I am using the shapefile input component which points to the .shp file.
I am facing to hurdles:
I cannot retrieve the name and Id from the file. I can only see an attribute call the_geom. How can I read the metadata?
The second thing is, the file contains a multi polygon and I don't know how to iterate over it in order to perform a Contains or intersect with the points.
Any comment will be highly appreciated.
thanks for your input #chrki
I managed to solve my tasks in this way:
1) Create a generic schema under metadata:
As the .dbf file was in the same directory of the shapefile Talend automatically recognized the metadata:
2) This is the job overview:
3) I read the shape file using a sShapeFileInput component:
4) The shapefile contains multipolygons and I want to have polygons. My solution was to use a sSimplify component. I used the default settings.
5) The projection of the shapefile was "MGI / Austria Lambert" which corresponds to EPSG 31287. I want to re-project it as EPSG 4326 (GCS_WGS_1984) which is the one used by my input coordinates.
6) I read the x, y coordinates from a csv file.
7) With a s2DPointReplacer I converted the x,y coordinates as Point(x,y) (WKT)
8) Finally I created an expression in a tMap to get only the polygons and points with an intersection. I guess a "contains" would also work:
I hope this helps someone else.
Kind regards,
Paul

Get HDFS file path in PySpark for files in sequence file format

My data on HDFS is in Sequence file format. I am using PySpark (Spark 1.6) and trying to achieve 2 things:
Data path contains a timestamp in yyyy/mm/dd/hh format that I would like to bring into the data itself. I tried SparkContext.wholeTextFiles but I think that might not support Sequence file format.
How do I deal with the point above if I want to crunch data for a day and want to bring in the date into the data? In this case I would be loading data like yyyy/mm/dd/* format.
Appreciate any pointers.
If stored types are compatible with SQL types and you use Spark 2.0 it is quite simple. Import input_file_name:
from pyspark.sql.functions import input_file_name
Read file and convert to a DataFrame:
df = sc.sequenceFile("/tmp/foo/").toDF()
Add file name:
df.withColumn("input", input_file_name())
If this solution is not applicable in your case then universal one is to list files directly (for HDFS you can use hdfs3 library):
files = ...
read one by one adding file name:
def read(f):
"""Just to avoid problems with late binding"""
return sc.sequenceFile(f).map(lambda x: (f, x))
rdds = [read(f) for f in files]
and union:
sc.union(rdds)

Azure Machine learning - Strip top X rows from dataset

I have a plain text csv file, which i am trying to read in Azure ML studio - the file format is pretty much like this
Geolife trajectory
WGS 84
Altitude is in Feet
Reserved 3
0,2,255,My Track,0,0,2,8421376
0
39.984702,116.318417,0,492,39744.1201851852,2008-10-23,02:53:04
39.984683,116.31845,0,492,39744.1202546296,2008-10-23,02:53:10
39.984686,116.318417,0,492,39744.1203125,2008-10-23,02:53:15
39.984688,116.318385,0,492,39744.1203703704,2008-10-23,02:53:20
39.984655,116.318263,0,492,39744.1204282407,2008-10-23,02:53:25
39.984611,116.318026,0,493,39744.1204861111,2008-10-23,02:53:30
The real data starts from Line 7, how could i strip it off, these files need to be downloaded on the fly so I don't think i would like to strip off the data by some code.
What is your source location - SQL or Blob or http?
If SQL, then you can use query to start from line 6.
If Blob/http, I would suggest reading a file as a single column TSV format, use simple R/Python script to drop first 6 rows and convert to csv

Resources