Using gvNIX to create Map Based Application - gvnix

I am interested in creating a gvNIX/Roo application which shows the location of health facilities in Tanzania on a map. I am trying the tutorial available here. However my data is in the format shown below where my location data is in two columns (southings and eastings). The tutorial shows how to create three data types:
field geo --fieldName location --type POINT --class ~.domain.Owner
field geo --fieldName distance --type LINESTRING --class ~.domain.Owner
field geo --fieldName area --type POLYGON --class ~.domain.Owner
Am assuming I need the POINT data type to hold data on a health facility location but am not sure how to get the below 2 columns (southings and eastings) into a single POINT variable. Am pretty new to GIS as well. The data is as below (csv format):
outlet_name,Status ,southings,eastings,streetward,name_of_outlet
REHEMA MEDICS,02,2.49993,32.89512,K/POLISI,REVINA
KIRUMBA MEDICS,02,2.50023,32.89503,K/POLISI,GEDION
KIRUMBA PHARMACY,02,2.50152,32.89742,K/POLISI,MAURETH
TULI MEDICS,02,2.48737,32.89686,KITANGIRI,TULI
JULLY MEDICS,02,2.53275,32.93855,BUZURUGA,JULLY
MAGOMA MEDICS,02,2.53181,32.94211,BUZURUGA,MAGOMA
MECO PHARMACY,02,2.52923,32.94730,MECCO,DORCAS
UPENDO MEDICS,02,2.52923,32.94786,MECCO,UPENDO
DORIS MEDICS,02,2.49961,32.89191,KABUHORO,DORIS
SOPHIA MEDICS,02,2.49975,32.89120,KABUHORO,ESTER
MWALONI PHAMCY,02,2.56351,32.89416,MWALONI,ESTER
SILVER PHAMACY,02,2.51728,32.90614,K/KILOMERO,WANDWATA
KIBO PHARMACY,02,2.51688,32.90710,MISSION,MARIAM
Thanks

You need to transform your coordinates to WKT format (Well Known Text) in order to insert them in a column in your database (a postgresql database with postgis support). In order to achieve this you need to follow these steps:
Find the SRID of your coordinates reference system (CRS). That is, the identificator which define your coordinates system. Otherwise, your points won't match the real coordinates. You'll need the SRID in the last step.
Transform your data to WKT. The data needed for inserting the points is in the southings and eastings columns (I suppose they are equal to latitude and longitude, that are the most common used), so you'll need to transform these columns in one single column with WKT format. e.g. for your first row of data: Point(32.89512 2.49993). Note the space between them and the switch between the numbers.
Proceed with the inserts with SQL syntax, but using postgis functions. An example for your first row would be: INSERT into health_facilities (outlet_name, Status, streetward, location) VALUES ('REHEMA MEDICS', 02, 'K/POLISI', ST_GeomFromText('Point(32.89512 2.49993)', 4326));. Where "4326" are the numbers of the SRID you have to find (supossing it is the most common -> EPSG:4326).
You can find more info here and here. Also there are several pages where you can check coordinates and transform them between diferent CRS, like this and this.

Related

Apache Sedona - Query a Non-Rectangular Raster Mask Using Spark on Databricks

I'm referring to the manual on how to query spatial data, but in my scenario, there is a pixel-wise raster dataset of climate data over 8000+ days.
In simple words, the query could look something like this:
select temperature data for 5000 days for the defined amorphous region and aggregate the values using spark
My questions are:
Should all the pixels be converted into vector-like "points" to be applicable for masking operations?
Can third dimension (time) be somehow added to the spatial index, or should date be just a regular column?
Grateful for any input!
Some visual representation:

Can I know z-order column?

Is there any way to know which column was used for z-ordering for a given table? I've tried multiple commands like describe and describe extended along with viewing the delta log. I found no information which column was used to perform the optimization.
You can find this information in the history of the table. Filter by operation = 'OPTIMIZE' then the operationParameters column will be a struct with fields predicate and zOrderBy (string encoding a JSON array of ZOrder columns)...
For example, when I optimized table by with ZOrder by rnd column

Azure Data Flow - Can we have Dynamic columns or change in projections for Unpiovt functionality

The excel consist of 62 columns and 7 columns are fixed and rest of them have weeks as in year(week1 to week 52)
I have used a data flow task to unpivot the 53 columns into rows with 2 extra columns year and value.
The problem is that I have the 52 week column names keep changing on every week data load and how to I handle this change in column names in data flow. For a single run it gives the exact output
What you'll want to do here is to implement late-binding of your schema, or what ADF refers to as "schema drift". Instead of setting a hardened "early binding" schema in your Source projection, leave the dataset schema and projection empty.
Next, add a Derived Column after your source and call it "Projection". This is where you'll build your projection using rules to account for your evolving schema.
Build out your canonical model with the column names for your entire year using byName('columnname'). That will tell ADF to look for the existence of the column in single quotes from your source data while also providing a schema that you can use to build out your pivot table.
If you need to cast the values, wrap byName() inside of a casting function, i.e. toString(), toDate(), etc.

Geospatial data of Adventureworks database

I am new to Qliksense and I am practicing app (dashboard) development concepts on MS SQL Server's Adventureworks database. In one specific table, the Address table, there is a column which has Spatial Location data. The data is in the following format, Dallas - 0xE6100000010C10A810D1886240403A0F0653663158C0. The data is of the geography datatype and is said to represent latitude and longitude information of given address. I am trying to create a map and a GeoKey as a dimension, but GeoMakePoint() function takes latitude and longitude as a tuple and not in this format. Please help.
I figured out the solution myself. Just use the method [Columnname].Lat and [Columnname].Long on the geography datatype to extract the Latitude and Longitude values from column values. Store these values in separate columns during data load and use them as GeoKey.

Handling the following use case in Cassandra?

I've been given the task of modelling a simple in Cassandra. Coming from an almost solely SQL background, though, I'm having a bit of trouble figuring it out.
Basically, we have a list of feeds that we're listening to that update periodically. This can be in RSS, JSON, ATOM, XML, etc (depending on the feed).
What we want to do is periodically check for new items in each feed, convert the data into a few formats (i.e. JSON and RSS) and store that in a Cassandra store.
So, in an RBDMS, the structure would be something akin to:
Feed:
feedId
name
URL
FeedItem:
feedItemId
feedId
title
json
rss
created_time
I'm confused as to how to model that data in Cassandra to facilitate simple things such as getting x amount of items for a specific feed in descending created order (which is probably the most common query).
I've heard of one strategy that mentions having a composite key storing, in this example, the the created_time as a time-based UUID with the feed item ID but I'm still a little confused.
For example, lets say I have a series of rows whose key is basically the feedId. Inside each row, I store a range of columns as mentioned above. The question is, where does the actual data go (i.e. JSON, RSS, title)? Would I have to store all the data for that 'record' as the column value?
I think I'm confusing wide rows and narrow (short?) rows as I like the idea of the composite key but I also want to store other data with each record and I'm not sure how to meld the two together...
You can store everything in one column family. However If the data for each FeedItem is very large, you can split the data for each FeedItem into another column family.
For example, you can have 1 column familyfor Feed, and the columns of that key are FeedItem ids, something like,
Feeds # column family
FeedId1 #key
time-stamp-1-feed-item-id1 #columns have no value, or values are enough info
time-stamp-2-feed-item-id2 #to show summary info in a results list
The Feeds column allows you to quickly get the last N items from a feed, but querying for the last N items of a Feed doesn't require fetching all the data for each FeedItem, either nothing is fetched, or just a summary.
Then you can use another column family to store the actual FeedItem data,
FeedItems # column family
feed-item-id1 # key
rss # 1 column for each field of a FeedItem
title #
...
Using CQL should be easier to understand to you as per your SQL background.
Cassandra (and NoSQL in general) is very fast and you don't have real benefits from using a related table for feeds, and anyway you will not be capable of doing JOINs. Obviously you can still create two tables if that's comfortable for you, but you will have to manage linking data inside your application code.
You can use something like:
CREATE TABLE FeedItem (
feedItemId ascii PRIMARY KEY,
feedId ascii,
feedName ascii,
feedURL ascii,
title ascii,
json ascii,
rss ascii,
created_time ascii );
Here I used ascii fields for everything. You can choose to use different data types for feedItemId or created_time, and available data types can be found here, and depending on which languages and client you are using it can be transparent or require some more work to make them works.
You may want to add some secondary indexes. For example, if you want to search for feeds items from a specific feedId, something like:
SELECT * FROM FeedItem where feedId = '123';
To create the index:
CREATE INDEX FeedItem_feedId ON FeedItem (feedId);
Sorting / Ordering, alas, it's not something easy in Cassandra. Maybe reading here and here can give you some clues where to start looking for, and also that's really depending on the cassandra version you're going to use.

Resources