how to accepts list columns as Cassandra UDF parameter

how to accepts list columns as Cassandra UDF parameter - cassandra

I created one table 
CREATE TABLE human (chromosome text, position bigint,
hg01583 frozen<set<text>>,
hg03006 frozen<set<text>>,
PRIMARY KEY (chromosome, position)
)
and i created function 
CREATE FUNCTION process(sample list<frozen<set<text>>>)
CALLED ON NULL INPUT
RETURNS text
LANGUAGE java
AS
$$
return leftsample==null?null:leftsample.getClass().toString()+" "+leftsample.toString();
$$;
when i issie CQL query
SELECT chromosome,position,hg01583, hg03006, process([hg01583,hg03006]) from human;
i got this error
SyntaxException: line 1:80 no viable alternative at input ',' ([[hg01583],..
how can i pass hg01583 ,hg03006 as list to process function?

With each as own argument like: SELECT chromosome, position, hg01583, hg03006, process(hg01583, hg03006) from human;
CREATE FUNCTION process(hg01583 frozen<set<text>>, hg03006 frozen<set<text>>)
CALLED ON NULL INPUT
RETURNS text
LANGUAGE java AS
$$
return hg01583==null? null : ...
$$;
If you want them to be dynamic, instead of creating fixed columns for each one make it a wide row and use a UDA to aggregate them with an accumulator function. like:
CREATE TABLE human (chromosome text, position bigint,
sample text,
value frozen<set<text>>
PRIMARY KEY (chromosome, position, sample)
)

Related

Inserting Timestamp Into Snowflake Using Python 3.8

I have an empty table defined in snowflake as;
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP
);
And it creates the correct table, which has been checked using desc command in sql. Then using a snowflake python connector we are trying to execute following query;
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},{ct});'
ctx.cursor().execute(insert_query)
Just before this query the variables are defined, The main challenge is getting the current time stamp written into snowflake. Here the value of ct is defined as;
import datetime
ct = datetime.datetime.now()
print(ct)
2021-04-30 21:54:41.676406
But when we try to execute this INSERT query we get the following errr message;
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 1 at position 157 unexpected '21'.
Can I kindly get some help on ow to format the date time value here? Help is appreciated.

In addition to the answer #Lukasz provided you could also think about defining the current_timestamp() as default for the TIME_PREDICTED column:
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP DEFAULT current_timestamp
);
And then just insert ACCOUNT_ID and PREDICTED_PROBABILITY:
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY) VALUES ({accountId}, {risk_score});'
ctx.cursor().execute(insert_query)
It will automatically assign the insert time to TIME_PREDICTED

Educated guess. When performing insert with:
insert_query = f'INSERT INTO ...(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED)
VALUES ({accountId}, {risk_score},{ct});'
It is a string interpolation. The ct is provided as string representation of datetime, which does not match a timestamp data type, thus error.
I would suggest using proper variable binding instead:
ctx.cursor().execute("INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES "
"(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) "
"VALUES(:1, :2, :3)",
(accountId,
risk_score,
("TIMESTAMP_LTZ", ct)
)
);
Avoid SQL Injection Attacks
Avoid binding data using Python’s formatting function because you risk SQL injection. For example:
# Binding data (UNSAFE EXAMPLE)
con.cursor().execute(
"INSERT INTO testtable(col1, col2) "
"VALUES({col1}, '{col2}')".format(
col1=789,
col2='test string3')
)
Instead, store the values in variables, check those values (for example, by looking for suspicious semicolons inside strings), and then bind the parameters using qmark or numeric binding style.

You forgot to place the quotes before and after the {ct}. The code should be :
insert_query = "INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},'{ct}');".format(accountId=accountId,risk_score=risk_score,ct=ct)
ctx.cursor().execute(insert_query)

Checking if key exists in Presto value map

I am new to Presto, and can't quite figure out how to check if a key is present in a map. When I run a SELECT query, this error message is returned:
Key not present in map: element
SELECT value_map['element'] FROM
mytable
WHERE name = 'foobar'
Adding AND contains(value_map, 'element') does not work
The data type is a string array
SELECT typeof('value_map') FROM mytable
returns varchar(9)
How would I only select records where 'element' is present in the value_map?

You can lookup a value in a map if the key is present with element_at, like this:
SELECT element_at(value_map, 'element')
FROM ...
WHERE element_at(value_map, 'element') IS NOT NULL

element_at is ambiguous in that case -- it'll return NULL when either there's no such key or the key does exist and has NULL associated with it. A guaranteed approach is contains(map_keys(my_map), 'mykey'), which admittedly should be a bit slower than the original variant.

How to insert an array of strings in javascript into PostgreSQL

I am building an API server which accepts file uploads using multer.
I need to store an array of all the paths to all files uploaded for each request to a column in the PostgreSQL database which I have connected to the server.
Say I have a table created with the following query
CREATE TABLE IF NOT EXISTS records
(
id SERIAL PRIMARY KEY,
created_on TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by INTEGER,
title VARCHAR NOT NULL,
type VARCHAR NOT NULL
)
How do I define a new column filepaths on the above table where I can insert a javascript string array (ex: ['path-to-file-1', 'path-to-file-2', 'path-to-file-3']).
Also how do I retrive, update/edit the list in javascript using node-postgres

You have 2 options:
use json or jsonb type. In the case string to insert will look:
'["path-to-file-1", "path-to-file-2", "path-to-file-3"]'
I would prefer jsonb - it allows to have good indexes. Json is rather just text with some additional built-in functions.
Use array of text - something like filepaths text[]. To insert you can use:
ARRAY ['path-to-file-1', 'path-to-file-2', 'path-to-file-3']
or
'{path-to-file-1,path-to-file-2,path-to-file-3,"path to file 4"}'
You need to use " here only for elements that contain space and so on. But you fill free to use it for all elements too.

You can create a file table that has a path column and a foreign key reference to the record that it belongs to. This way you can store the path as just a text column instead of storing an array in a column, which is better practice for relational databases. You'll also be able to store additional information on a file if you need to later. And it'll be more simple to interact with the file path records since you'd add a new file path by just inserting a new row into the file table (with the appropriate foreign key) and remove by deleting a row from the file table.
For example:
CREATE TABLE IF NOT EXISTS file (
record_id integer NOT NULL REFERENCES records(id) ON DELETE CASCADE,
path text NOT NULL
);
Then to get all the files for a record you can join the two tables together and convert to an array if you want.
For example:
SELECT
records.*,
ARRAY (
SELECT
file.path
FROM
file
WHERE
records.id = file.record_id
) AS file_paths
FROM
records;
Sample input (using only the title field of records):
INSERT INTO records (title) VALUES ('A'), ('B'), ('C');
INSERT INTO file (record_id, path) VALUES (1, 'patha1'), (1, 'patha2'), (1, 'patha3'), (2, 'pathb1');
Sample output:
id | title | file_paths
----+-------+------------------------
1 | A | {patha1,patha2,patha3}
2 | B | {pathb1}
3 | C | {}

Cassandra UDF and user input

I am using Cassandra 2.2 and I'm having a problem with User Defined Functions.
I want to create a function that take as parameter a integer column of a table and another integer as user input and mutiply the two values as follow:
CREATE OR REPLACE FUNCTION testFunc (val int, input int)
CALLED ON NULL INPUT RETURNS int
LANGUAGE java AS 'return val * input;';
I can execute the function on two integer column like
select testFunc(int_column, another_int_column) from my_table;
and it's working, but when I try to execute it with a user input like:
select testFunc(int_column, 3) from my_table;
i receive following exception:
SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:22 no viable alternative at input '3' (select testFunc(year, [3]...)"
Is it possible to achieve what I'm trying or I should find another way to do it?

Calling, UDF in this way testFunc(int_column, 3) is same as passing an int to a function parameter which takes String (i.e column name) only and hence the incorrect syntax error no viable alternative at input '3'. Not sure if this fits into your scenario, but you can try something like this:
CREATE OR REPLACE FUNCTION testFunc (val int)
CALLED ON NULL INPUT RETURNS int
LANGUAGE java AS 'return val * 3;';
Or add a multiplier column to your table.

DSE/Cassandra CQL now() does not work for timestamp type

I am having troubles with using now() function with timestamp type.
Please take a look at the following code:
Table creation:
CREATE TABLE "Test" (
video_id UUID,
upload_timestamp TIMESTAMP,
title VARCHAR,
views INT,
PRIMARY KEY (video_id, upload_timestamp)
) WITH CLUSTERING ORDER BY (upload_timestamp DESC);
The problematic INSERT query:
INSERT INTO "Test" (video_id, upload_timestamp, title, views)
VALUES (uuid(), now(), 'Test', 0);
The INSERT query seems looking fine to me. However, when I execute it, I see the following error:
Unable to execute CQL script on 'XXX': cannot assign result of function now (type timeuuid) to upload_timestamp (type timestamp)
What I am doing wrong here?
I use DataStax Enterprise 4.5.2

now() returns a timeuuid, not a timestamp. You clould try dateOf(now()). Have a read of this from the docs:
dateOf and unixTimestampOf
The dateOf and unixTimestampOf functions take a timeuuid argument and
extract the embedded timestamp. However, while the dateof function
return it with the timestamp type (that most client, including cqlsh,
interpret as a date), the unixTimestampOf function returns it as a
bigint raw value.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

how to accepts list columns as Cassandra UDF parameter - cassandra

Related

Inserting Timestamp Into Snowflake Using Python 3.8

Checking if key exists in Presto value map

How to insert an array of strings in javascript into PostgreSQL

Cassandra UDF and user input

DSE/Cassandra CQL now() does not work for timestamp type

Categories

Resources