Is it possible for CQL to parse a JSON object to insert data? - cassandra

From what I looked so far, it seems impossible with Cassandra. But I thought I'd give it a shot:
How can I select a value of a json property, parsed from a json object string, and use it as part of an update / insert statement in Cassandra?
For example, I'm given the json object:
{
id:123,
some_string:"hello there",
mytimestamp: "2019-09-02T22:02:24.355Z"
}
And this is the table definition:
CREATE TABLE IF NOT EXISTS myspace.mytable (
id text,
data blob,
PRIMARY KEY (id)
);
Now the thing to know at this point is that for a given reason the data field will be set to the json string. In other words, there is no 1:1 mapping between the given json and the table columns, but the data field contains the json object as kind of a blob value.
... Is it possible to parse the timestamp value of the given json object as part of an insert statement?
Pseudo code example of what I mean, which obviously doesn't work ($myJson is a placeholder for the json object string above):
INSERT INTO myspace.mytable (id, data)
VALUES (123, $myJson)
USING timestamp toTimeStamp($myJson.mytimestamp)

The quick answer is no, it's not possible to do that with CQL.
The norm is to parse the elements of the JSON object within your application to extract the corresponding values to construct the CQL statement.
As a side note, I would discourage using the CQL blob type due to possible performance issues should the blob size exceeed 1MB. If it's JSON, consider storing it as CQL text type instead. Cheers!

Worth mentioning, but CQL can do a limited amount of JSON parsing on its own. Albeit, not as detailed as you're asking here (ex: USING timestamp).
But something like this works:
> CREATE TABLE myjsontable (
... id TEXT,
... some_string TEXT,
... PRIMARY KEY (id));
> INSERT INTO myjsontable JSON '{"id":"123","some_string":"hello there"}';
> SELECT * FROM myjsontable WHERE id='123';
id | some_string
-----+-------------
123 | hello there
(1 rows)
In your case you'd either have to redesign the table or the JSON payload so that they match. But as Erick and Cédrick have mentioned, the USING timestamp part would have to happen client-side.

What you detailed is doable with Cassandra.
Timestamp
To insert timestamp in a query it should be formatted as an ISO 8601 String. Sample examples could be found here. In your code, you might have to convert your incoming value to expected type and format.
Blob:
Blob expects to store binary data, as such it cannot be put Ad hoc as a String in a CQL query. (you can use TEXT type to do it if you want to encode base64)
When you need to insert binary data you need to provide proper type as well. For instance if you are working with Javascript to need to provide a Buffer as describe in the documentation Then when you execute your query you externalized your parameters
const sampleId = 123;
const sampleData = Buffer.from('hello world', 'utf8');
const sampleTimeStamp = new Date();
client.execute('INSERT INTO myspace.mytable (id, data) VALUES (?, ?) USING timestamp toTimeStamp(?)', [ sampleId, sampleData, sampleTimeStamp ]);

Related

Read and Write sdo_geometry field in spark/GeoSpark(Sedona) from Oracle Table

i'm using geospark(sedona) with pyspark:
is possible read from Oracle a sdo_geometry type and write in a table in Oracle with sdo_Geometry field?
in my app:
i'm able to read :
db_table = "(SELECT sdo_util.to_wktgeometry(geom_32632) geom FROM geodss_dev.CATASTO_GALLERIE cg WHERE rownum <10)" <---Query on Oracle Db
df_oracle = spark.read.jdbc(db_url, db_table, properties=db_properties)
df_oracle.show()
df_oracle.printSchema()
but when i write:
df_oracle.createOrReplaceTempView("gallerie")
df_write=spark.sql("select ST_AsBinary(st_geomfromwkt(geom)) geom_32632 from gallerie") <--query with Sedona Library on tempView Gallerie
print(df_write.dtypes)
df_write.write.jdbc(db_url, "geodss_dev.gallerie_test", properties=db_properties,mode="append")
i have this error:
ORA-00932: inconsistent data types: expected MDSYS.SDO_GEOMETRY, got BINARY
there is a solution for write sdo_geometry type?
thanks
Regards
You are reading the geometries in serialized formats: WKT (text) in your first example, WKB (binary) in the second.
If you want to write those back as SDO_GEOMETRY objects, you will need to deserialize them back. This can be done in two ways:
Using the SDO_GEOMETRY constructor:
insert into my_table(my_geom) values (sdo_geometry(:wkb))
or
insert into my_table(my_geom) values (sdo_geometry(:wkt))
Using the explicit conversion functions:
insert into my_table(my_geom) values (sdo_util.from_wkbgeometry(:wkb))
or
insert into my_table(my_geom) values (sdo_util.from_wktgeometry(:wkt))
I have no idea how you can express this using geospark. I assume it does allow you to specify things like a list of columns to write to, and a list of input values ?
What definitely does not happen is an automatic transformation from the serialized format (binary or text) to a geometry object. There are actually a number of serialized format in addition to the oldish WKT and WKB: GML and GeoJSON are the main alternatives. But those two need explicit calls to the transformation functions.
EDIT: About your second example: instead of stacking two function calls, you can just do:
SELECT sdo_util.to_wkbgeometry(geom_32632) geom ...
Also, in both examples, you can use the object methods instead of the function calls. The result will be the same (the methods just call those same functions anyway), but the syntax is a bit more compact. IMPORTANT: this requires using aliases!.
SELECT cg.geom_32632.get_wkt() geom
FROM geodss_dev.CATASTO_GALLERIE cg
WHERE rownum <10
SELECT cg.geom_32632.get_wkb() geom
FROM geodss_dev.CATASTO_GALLERIE cg
WHERE rownum <10

Inserting Timestamp Into Snowflake Using Python 3.8

I have an empty table defined in snowflake as;
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP
);
And it creates the correct table, which has been checked using desc command in sql. Then using a snowflake python connector we are trying to execute following query;
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},{ct});'
ctx.cursor().execute(insert_query)
Just before this query the variables are defined, The main challenge is getting the current time stamp written into snowflake. Here the value of ct is defined as;
import datetime
ct = datetime.datetime.now()
print(ct)
2021-04-30 21:54:41.676406
But when we try to execute this INSERT query we get the following errr message;
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 1 at position 157 unexpected '21'.
Can I kindly get some help on ow to format the date time value here? Help is appreciated.
In addition to the answer #Lukasz provided you could also think about defining the current_timestamp() as default for the TIME_PREDICTED column:
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP DEFAULT current_timestamp
);
And then just insert ACCOUNT_ID and PREDICTED_PROBABILITY:
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY) VALUES ({accountId}, {risk_score});'
ctx.cursor().execute(insert_query)
It will automatically assign the insert time to TIME_PREDICTED
Educated guess. When performing insert with:
insert_query = f'INSERT INTO ...(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED)
VALUES ({accountId}, {risk_score},{ct});'
It is a string interpolation. The ct is provided as string representation of datetime, which does not match a timestamp data type, thus error.
I would suggest using proper variable binding instead:
ctx.cursor().execute("INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES "
"(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) "
"VALUES(:1, :2, :3)",
(accountId,
risk_score,
("TIMESTAMP_LTZ", ct)
)
);
Avoid SQL Injection Attacks
Avoid binding data using Python’s formatting function because you risk SQL injection. For example:
# Binding data (UNSAFE EXAMPLE)
con.cursor().execute(
"INSERT INTO testtable(col1, col2) "
"VALUES({col1}, '{col2}')".format(
col1=789,
col2='test string3')
)
Instead, store the values in variables, check those values (for example, by looking for suspicious semicolons inside strings), and then bind the parameters using qmark or numeric binding style.
You forgot to place the quotes before and after the {ct}. The code should be :
insert_query = "INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},'{ct}');".format(accountId=accountId,risk_score=risk_score,ct=ct)
ctx.cursor().execute(insert_query)

What are the returned data types from a Knex select() statement?

Hi everyone,
I am currently using Knex.js for a project and a question arise when I make a knex('table').select() function call.
What are the returned types from the query ? In particular, If I have a datetime column in my table, what is the return value for this field ?
I believe the query will return a value of type string for this column. But it is the case for any database (I use SQLite3) ? It is possible that the query returns a Date value ?
EXAMPLE :
the user table has this schema :
knex.schema.createTable('user', function (table) {
table.increments('id');
table.string('username', 256).notNullable().unique();
table.timestamps(true, true);
})
since I use SQLite3, table.timestamps(true, true); produces 2 datetime columns : created_at & modified_at.
when I make a query knex('user').select(), it returns a array of objects with the attributes : id, username, created_at, modified_at.
id is of type number
username is of type string
what will be the types of created_at & modified_at ?
Will it be always of string type ? If I use an other database like PostgreSQL, these columns will have the timestamptz SQL type. The returned type of knex will be also a string type ?
This is not in fact something that Knex is responsible for, but rather the underlying database library. So if you're using SQLite, it would be sqlite3. If you're using Postgres, pg is responsible and you could find more documentation here. Broadly, most libraries take the approach that types which have a direct JavaScript equivalent (booleans, strings, null, integers, etc.) are returned as those types; anything else is converted to a string.
Knex's job is to construct the SQL that the other libraries use to talk to the database, and receives the response that they return.
as I believe it will be object of strings or numbers

How to insert an array of strings in javascript into PostgreSQL

I am building an API server which accepts file uploads using multer.
I need to store an array of all the paths to all files uploaded for each request to a column in the PostgreSQL database which I have connected to the server.
Say I have a table created with the following query
CREATE TABLE IF NOT EXISTS records
(
id SERIAL PRIMARY KEY,
created_on TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by INTEGER,
title VARCHAR NOT NULL,
type VARCHAR NOT NULL
)
How do I define a new column filepaths on the above table where I can insert a javascript string array (ex: ['path-to-file-1', 'path-to-file-2', 'path-to-file-3']).
Also how do I retrive, update/edit the list in javascript using node-postgres
You have 2 options:
use json or jsonb type. In the case string to insert will look:
'["path-to-file-1", "path-to-file-2", "path-to-file-3"]'
I would prefer jsonb - it allows to have good indexes. Json is rather just text with some additional built-in functions.
Use array of text - something like filepaths text[]. To insert you can use:
ARRAY ['path-to-file-1', 'path-to-file-2', 'path-to-file-3']
or
'{path-to-file-1,path-to-file-2,path-to-file-3,"path to file 4"}'
You need to use " here only for elements that contain space and so on. But you fill free to use it for all elements too.
You can create a file table that has a path column and a foreign key reference to the record that it belongs to. This way you can store the path as just a text column instead of storing an array in a column, which is better practice for relational databases. You'll also be able to store additional information on a file if you need to later. And it'll be more simple to interact with the file path records since you'd add a new file path by just inserting a new row into the file table (with the appropriate foreign key) and remove by deleting a row from the file table.
For example:
CREATE TABLE IF NOT EXISTS file (
record_id integer NOT NULL REFERENCES records(id) ON DELETE CASCADE,
path text NOT NULL
);
Then to get all the files for a record you can join the two tables together and convert to an array if you want.
For example:
SELECT
records.*,
ARRAY (
SELECT
file.path
FROM
file
WHERE
records.id = file.record_id
) AS file_paths
FROM
records;
Sample input (using only the title field of records):
INSERT INTO records (title) VALUES ('A'), ('B'), ('C');
INSERT INTO file (record_id, path) VALUES (1, 'patha1'), (1, 'patha2'), (1, 'patha3'), (2, 'pathb1');
Sample output:
id | title | file_paths
----+-------+------------------------
1 | A | {patha1,patha2,patha3}
2 | B | {pathb1}
3 | C | {}

How to store Cassandra maps in an array?

I want to store data in following structure :-
"id" : 100, -- primary key
"data" : [
{
"imei" : 862304021502870,
"details" : [
{
"start" : "2018-07-24 12:34:50",
"end" : "2018-07-24 12:44:34"
},
{
"start" : "2018-07-24 12:54:50",
"end" : "2018-07-24 12:56:34"
}
]
}
]
So how do I create table schema in Cassandra for the same ?
Thanks in advance.
There are several approaches to this, depending on the requirements regarding data access/modification - for example, do you need to modify individual fields, or you update at once:
Declare the map of imei/details as user-defined type (UDT), and then declare table like this:
create table tbl (
id int primary key,
data set<frozen<details_udt>>);
But this is relatively hard to support in the long term, especially if you add more nested objects with different types. Plus, you can't really update fields of the frozen records that you must to use in case of nested collections/UDTs - for this table structure you need to replace complete record inside set.
Another approach - just do explicit serialization/deserialization of data into/from JSON or other format, and have table structure like this:
create table tbl(
id int primary key,
data text);
the type of data field depends on what format you'll use - you can use blob as well to store binary data. But in this case you'll need to update/fetch complete field. You can simplify things if you use Java driver's custom codecs that will take care for conversion between your data structure in Java & desired format. See example in the documentation for conversion to/from JSON.

Resources