How to store Cassandra maps in an array? - cassandra

I want to store data in following structure :-
"id" : 100, -- primary key
"data" : [
{
"imei" : 862304021502870,
"details" : [
{
"start" : "2018-07-24 12:34:50",
"end" : "2018-07-24 12:44:34"
},
{
"start" : "2018-07-24 12:54:50",
"end" : "2018-07-24 12:56:34"
}
]
}
]
So how do I create table schema in Cassandra for the same ?
Thanks in advance.

There are several approaches to this, depending on the requirements regarding data access/modification - for example, do you need to modify individual fields, or you update at once:
Declare the map of imei/details as user-defined type (UDT), and then declare table like this:
create table tbl (
id int primary key,
data set<frozen<details_udt>>);
But this is relatively hard to support in the long term, especially if you add more nested objects with different types. Plus, you can't really update fields of the frozen records that you must to use in case of nested collections/UDTs - for this table structure you need to replace complete record inside set.
Another approach - just do explicit serialization/deserialization of data into/from JSON or other format, and have table structure like this:
create table tbl(
id int primary key,
data text);
the type of data field depends on what format you'll use - you can use blob as well to store binary data. But in this case you'll need to update/fetch complete field. You can simplify things if you use Java driver's custom codecs that will take care for conversion between your data structure in Java & desired format. See example in the documentation for conversion to/from JSON.

Related

Is it possible for CQL to parse a JSON object to insert data?

From what I looked so far, it seems impossible with Cassandra. But I thought I'd give it a shot:
How can I select a value of a json property, parsed from a json object string, and use it as part of an update / insert statement in Cassandra?
For example, I'm given the json object:
{
id:123,
some_string:"hello there",
mytimestamp: "2019-09-02T22:02:24.355Z"
}
And this is the table definition:
CREATE TABLE IF NOT EXISTS myspace.mytable (
id text,
data blob,
PRIMARY KEY (id)
);
Now the thing to know at this point is that for a given reason the data field will be set to the json string. In other words, there is no 1:1 mapping between the given json and the table columns, but the data field contains the json object as kind of a blob value.
... Is it possible to parse the timestamp value of the given json object as part of an insert statement?
Pseudo code example of what I mean, which obviously doesn't work ($myJson is a placeholder for the json object string above):
INSERT INTO myspace.mytable (id, data)
VALUES (123, $myJson)
USING timestamp toTimeStamp($myJson.mytimestamp)
The quick answer is no, it's not possible to do that with CQL.
The norm is to parse the elements of the JSON object within your application to extract the corresponding values to construct the CQL statement.
As a side note, I would discourage using the CQL blob type due to possible performance issues should the blob size exceeed 1MB. If it's JSON, consider storing it as CQL text type instead. Cheers!
Worth mentioning, but CQL can do a limited amount of JSON parsing on its own. Albeit, not as detailed as you're asking here (ex: USING timestamp).
But something like this works:
> CREATE TABLE myjsontable (
... id TEXT,
... some_string TEXT,
... PRIMARY KEY (id));
> INSERT INTO myjsontable JSON '{"id":"123","some_string":"hello there"}';
> SELECT * FROM myjsontable WHERE id='123';
id | some_string
-----+-------------
123 | hello there
(1 rows)
In your case you'd either have to redesign the table or the JSON payload so that they match. But as Erick and Cédrick have mentioned, the USING timestamp part would have to happen client-side.
What you detailed is doable with Cassandra.
Timestamp
To insert timestamp in a query it should be formatted as an ISO 8601 String. Sample examples could be found here. In your code, you might have to convert your incoming value to expected type and format.
Blob:
Blob expects to store binary data, as such it cannot be put Ad hoc as a String in a CQL query. (you can use TEXT type to do it if you want to encode base64)
When you need to insert binary data you need to provide proper type as well. For instance if you are working with Javascript to need to provide a Buffer as describe in the documentation Then when you execute your query you externalized your parameters
const sampleId = 123;
const sampleData = Buffer.from('hello world', 'utf8');
const sampleTimeStamp = new Date();
client.execute('INSERT INTO myspace.mytable (id, data) VALUES (?, ?) USING timestamp toTimeStamp(?)', [ sampleId, sampleData, sampleTimeStamp ]);

Read and Write sdo_geometry field in spark/GeoSpark(Sedona) from Oracle Table

i'm using geospark(sedona) with pyspark:
is possible read from Oracle a sdo_geometry type and write in a table in Oracle with sdo_Geometry field?
in my app:
i'm able to read :
db_table = "(SELECT sdo_util.to_wktgeometry(geom_32632) geom FROM geodss_dev.CATASTO_GALLERIE cg WHERE rownum <10)" <---Query on Oracle Db
df_oracle = spark.read.jdbc(db_url, db_table, properties=db_properties)
df_oracle.show()
df_oracle.printSchema()
but when i write:
df_oracle.createOrReplaceTempView("gallerie")
df_write=spark.sql("select ST_AsBinary(st_geomfromwkt(geom)) geom_32632 from gallerie") <--query with Sedona Library on tempView Gallerie
print(df_write.dtypes)
df_write.write.jdbc(db_url, "geodss_dev.gallerie_test", properties=db_properties,mode="append")
i have this error:
ORA-00932: inconsistent data types: expected MDSYS.SDO_GEOMETRY, got BINARY
there is a solution for write sdo_geometry type?
thanks
Regards
You are reading the geometries in serialized formats: WKT (text) in your first example, WKB (binary) in the second.
If you want to write those back as SDO_GEOMETRY objects, you will need to deserialize them back. This can be done in two ways:
Using the SDO_GEOMETRY constructor:
insert into my_table(my_geom) values (sdo_geometry(:wkb))
or
insert into my_table(my_geom) values (sdo_geometry(:wkt))
Using the explicit conversion functions:
insert into my_table(my_geom) values (sdo_util.from_wkbgeometry(:wkb))
or
insert into my_table(my_geom) values (sdo_util.from_wktgeometry(:wkt))
I have no idea how you can express this using geospark. I assume it does allow you to specify things like a list of columns to write to, and a list of input values ?
What definitely does not happen is an automatic transformation from the serialized format (binary or text) to a geometry object. There are actually a number of serialized format in addition to the oldish WKT and WKB: GML and GeoJSON are the main alternatives. But those two need explicit calls to the transformation functions.
EDIT: About your second example: instead of stacking two function calls, you can just do:
SELECT sdo_util.to_wkbgeometry(geom_32632) geom ...
Also, in both examples, you can use the object methods instead of the function calls. The result will be the same (the methods just call those same functions anyway), but the syntax is a bit more compact. IMPORTANT: this requires using aliases!.
SELECT cg.geom_32632.get_wkt() geom
FROM geodss_dev.CATASTO_GALLERIE cg
WHERE rownum <10
SELECT cg.geom_32632.get_wkb() geom
FROM geodss_dev.CATASTO_GALLERIE cg
WHERE rownum <10

Presto subqueries: Key not present in map

I have been banging my head a while to Superset -> Presto (PrestoSQL) -> Prometheus combination (as Superset does not yet support Prometheus) and got stymied with an issue when trying to extract columns from Presto's map type column containing Prometheus labels.
In order to get necessary labels mapped as columns from Superset's point of view, I create extra table (or I guess a view in this case) in Superset on top of existing table which had following SQL for creating the necessary columns:
SELECT labels['system_name'] AS "system",labels['instance'] AS "instance","timestamp" AS "timestamp","value" AS "value" FROM "up"
This table is then used as a data source in Superset's chart which treats it as a subquery. The resulting SQL query created by Superset and then sent to Presto looks e.g. like this:
SELECT "system" AS "system",
"instance" AS "instance",
"timestamp" AS "timestamp",
"value" AS "value"
FROM
(SELECT labels['system_name'] AS "system",
labels['instance'] AS "instance",
"timestamp" AS "timestamp",
"value" AS "value"
FROM "up") AS "expr_qry"
WHERE "timestamp" >= from_iso8601_timestamp('2020-10-19T12:00:00.000000')
AND "timestamp" < from_iso8601_timestamp('2020-10-19T13:00:00.000000')
ORDER BY "timestamp" ASC
LIMIT 250;
However, what I get out from above is an error:
io.prestosql.spi.PrestoException: Key not present in map: system_name
at io.prestosql.operator.scalar.MapSubscriptOperator$MissingKeyExceptionFactory.create(MapSubscriptOperator.java:173)
at io.prestosql.operator.scalar.MapSubscriptOperator.subscript(MapSubscriptOperator.java:143)
at io.prestosql.$gen.CursorProcessor_20201019_165636_32.filter(Unknown Source)
After reading a bit about queries from Presto's user guide, I tried a modified query from command line by using WITH:
WITH x AS (SELECT labels['system_name'] AS "system",labels['instance'] AS "instance","timestamp" AS "timestamp","value" AS "value" FROM "up")
SELECT system, timestamp, value FROM x
WHERE "timestamp" >= from_iso8601_timestamp('2020-10-19T12:00:00.000000')
AND "timestamp" < from_iso8601_timestamp('2020-10-19T13:00:00.000000')
LIMIT 250;
And that went throught without any issues. But it seems that I have no way to define how Superset executes its queries, so I'm stuck with the first option. The question is, is there anything wrong with it which could be fixed?
I guess that one option (if everything else fails) would be defining extra tables in Presto side which would do the same trick for mapping the columns, thus hopefully avoiding above issue.
The map subscript operator in Presto requires that the key be present in the map. Otherwise, you get the failure you described.
If some keys can be missing, you can use the element_at function instead, which will return a NULL result:
Returns value for given key, or NULL if the key is not contained in the map.

Save array of objects in cassandra

How can I save array of objects in cassandra?
I'm using a nodeJS application and using cassandra-driver to connect to Cassandra DB. I wanted to save records like below in my db:
{
"id" : "5f1811029c82a61da4a44c05",
"logs" : [
{
"conversationId" : "e9b55229-f20c-4453-9c18-a1f4442eb667",
"source" : "source1",
"destination" : "destination1",
"url" : "https://asdasdas.com",
"data" : "data1"
},
{
"conversationId" : "e9b55229-f20c-4453-9c18-a1f4442eb667",
"source" : "source2",
"destination" : "destination2",
"url" : "https://afdvfbwadvsffd.com",
"data" : "data2"
}
],
"conversationId" : "e9b55229-f20c-4453-9c18-a1f4442eb667"
}
In the above record, I can use type "text" to save values of the columns "id" and "conversationId". But not sure how can I define the schema and save data for the field "logs".
With Cassandra, you'll want to store the data in the same way that you want to query it. As you mentioned querying by conversatonid, that's going to influence how the PRIMARY KEY definition should look. Given this, conversationid, should make a good partition key. As for the clustering columns, I had to make some guesses as to cardinality. So, sourceid looked like it could be used to uniquely identify a log entry within a conversation, so I went with that next.
I thought about using id as the final clustering column, but it looks like all entries with the same conversationid would also have the same id. It might be a good idea to give each entry its own unique identifier, to help ensure uniqueness:
{
"uniqueid": "e53723ca-2ab5-441f-b360-c60eacc2c854",
"conversationId" : "e9b55229-f20c-4453-9c18-a1f4442eb667",
"source" : "source1",
"destination" : "destination1",
"url" : "https://asdasdas.com",
"data" : "data1"
},
This makes the final table definition look like this:
CREATE TABLE conversationlogs (
id TEXT,
conversationid TEXT,
uniqueid UUID,
source TEXT,
destination TEXT,
url TEXT,
data TEXT,
PRIMARY KEY (conversationid,sourceid,uniqueid));
You have a few options depending on how you want to query this data.
The first is to stringify the json in logs field and save that to the database and then convert it back to JSON after querying the data.
The second option is similar to the first, but instead of stringifying the array, you store the data as a list in the database.
The third option is to define a new table for the logs with a primary key of the conversation and clustering keys for each element of the logs. This will allow you to lookup either by the full key or query by just the primary key and retrieve all the rows that match those criteria.
CREATE TABLE conversationlogs (
conversationid uuid,
logid timeuuid,
...
PRIMARY KEY ((conversationid), logid));

UPDATE prepared statement with Object

I have an Object that maps column names to values. The columns to be updated are not known beforehand and are decided at run-time.
e.g. map = {col1: "value1", col2: "value2"}.
I want to execute an UPDATE query, updating a table with those columns to the corresponding values. Can I do the following? If not, is there an elegant way of doing it without building the query manually?
db.none('UPDATE mytable SET $1 WHERE id = 99', map)
is there an elegant way of doing it without building the query manually?
Yes, there is, by using the helpers for SQL generation.
You can pre-declare a static object like this:
const cs = new pgp.helpers.ColumnSet(['col1', 'col2'], {table: 'mytable'});
And then use it like this, via helpers.update:
const sql = pgp.helpers.update(data, cs) + /* WHERE clause with the condition */;
// and then execute it:
db.none(sql).then(data => {}).catch(error => {})
This approach will work with both a single object and an array of objects, and you will just append the update condition accordingly.
See also: PostgreSQL multi-row updates in Node.js
What if the column names are not known beforehand?
For that see: Dynamic named parameters in pg-promise, and note that a proper answer would depend on how you intend to cast types of such columns.
Something like this :
map = {col1: "value1", col2: "value2",id:"existingId"}.
db.none("UPDATE mytable SET col1=${col1}, col2=${col2} where id=${id}", map)

Resources