Cassandra UDF and user input - cassandra

I am using Cassandra 2.2 and I'm having a problem with User Defined Functions.
I want to create a function that take as parameter a integer column of a table and another integer as user input and mutiply the two values as follow:
CREATE OR REPLACE FUNCTION testFunc (val int, input int)
CALLED ON NULL INPUT RETURNS int
LANGUAGE java AS 'return val * input;';
I can execute the function on two integer column like
select testFunc(int_column, another_int_column) from my_table;
and it's working, but when I try to execute it with a user input like:
select testFunc(int_column, 3) from my_table;
i receive following exception:
SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:22 no viable alternative at input '3' (select testFunc(year, [3]...)"
Is it possible to achieve what I'm trying or I should find another way to do it?

Calling, UDF in this way testFunc(int_column, 3) is same as passing an int to a function parameter which takes String (i.e column name) only and hence the incorrect syntax error no viable alternative at input '3'. Not sure if this fits into your scenario, but you can try something like this:
CREATE OR REPLACE FUNCTION testFunc (val int)
CALLED ON NULL INPUT RETURNS int
LANGUAGE java AS 'return val * 3;';
Or add a multiplier column to your table.

Related

Max aggregate function syntax (to be called by the supabase client)

I need to write a simple custom aggregate function that returns the max of a column (as MAX is not suported by the supabase client). Just not sure about the syntax, so any help is most welcome. I have tried permutations of:
select max(my_column) from my_table as $$
return $$ + 1
getting an error:
Failed to validate sql query: syntax error at or near "select"
Try with:
create or replace function max_value() returns int as $$
select max(my_column) from my_table;
$$ language sql;
Then calling it through rpc.

Cannot use custom SQL function with arguments inside transform scope [Spark SQL] (Error in SQL statement: AnalysisException: Resolved attribute(s)...)

I am using a Spark SQL context in Azure Databricks.
My query uses the transform function for handling an array like so:
SELECT
colA,
colB,
transform(colC,
x -> named_struct(
"innerColA", functionA(x.innerColA), -- does not work
"innerColB", [...x.innerColB...], -- works (same logic as functionA)
"test1", test1(), -- works
"test2", test2(x.innerColA) -- does not work
)
)
FROM
tableA
I get the following error regarding the use of functionA:
Error in SQL statement: AnalysisException: Resolved attribute(s) x#2723416 missing from in operator !Project [cast(lambda x#2723416 as string) AS arg1#2723417].
functionA is simple enough that, if I rewrite it directly into the query, it works (as shown using "innerColC" of my code example.
I have tested with simple functions that don't take any arguments and they can be used without any issues:
CREATE OR REPLACE FUNCTION test1() RETURNS STRING RETURN "test"
But if you have any arguments, it throws that error:
CREATE OR REPLACE FUNCTION test2(arg1 STRING) RETURNS STRING RETURN "test"
Is that a limitation of SparkSQL? Are there any workarounds?

Mapping data flow SQL query and Parameters failing

In my mapping dataflow I have simplified this down to dimdate just for the test
My parameters are
The source even tells you exactly how to enter the select query if you are using parameters which is what I'm trying to achieve
Then I import but get two different errors
for parameterizing a table`
SELECT * FROM {$df_TableName}
I get
This error from a select * or invidiual columns
I've tried just the WHERE clause (what I actually need) as a parameter but keep getting datatype mismatch errors
I then started testing multiple ways and it only allows the schema to be parameterised from my queries below
all of these other options seem to fail no matter what I do
SELECT * FROM [{$df_Schema}].[{$df_TableName}] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[DimDate] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[{$df_TableName}] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[DimDate] Where [Period] = 2106
I know there's an issue with the Integer datatype but don't know how to pass this to the query within the parameter without changing its type as the sql engine cannot run [period] as a string
Use CONCAT function in expression builder to build the Query in Dataflow.
concat(<this> : string, <that> : string, ...) => string
Note: Concatenates a variable number of strings together. All the variables should be in form of strings.
Example 1:
concat(toString("select * from "), toString($df_tablename))
Example 2:
concat(toString("select * from "), toString($df_tablename), ' ', toString(" where incomingperiod = "), toString($df_incomingPeriod))
Awesome, it worked like magic for me. I was struggling with parameterizing tables= names which I was passing through Array list.
Created a data flow parameter and gave this value:
#item().TABLE_NAME

how to accepts list columns as Cassandra UDF parameter

I created one table 
CREATE TABLE human (chromosome text, position bigint,
hg01583 frozen<set<text>>,
hg03006 frozen<set<text>>,
PRIMARY KEY (chromosome, position)
)
and i created function 
CREATE FUNCTION process(sample list<frozen<set<text>>>)
CALLED ON NULL INPUT
RETURNS text
LANGUAGE java
AS
$$
return leftsample==null?null:leftsample.getClass().toString()+" "+leftsample.toString();
$$;
when i issie CQL query
SELECT chromosome,position,hg01583, hg03006, process([hg01583,hg03006]) from human;
i got this error
SyntaxException: line 1:80 no viable alternative at input ',' ([[hg01583],..
how can i pass hg01583 ,hg03006 as list to process function?
With each as own argument like: SELECT chromosome, position, hg01583, hg03006, process(hg01583, hg03006) from human;
CREATE FUNCTION process(hg01583 frozen<set<text>>, hg03006 frozen<set<text>>)
CALLED ON NULL INPUT
RETURNS text
LANGUAGE java AS
$$
return hg01583==null? null : ...
$$;
If you want them to be dynamic, instead of creating fixed columns for each one make it a wide row and use a UDA to aggregate them with an accumulator function. like:
CREATE TABLE human (chromosome text, position bigint,
sample text,
value frozen<set<text>>
PRIMARY KEY (chromosome, position, sample)
)

Does Cassandra allow user defined function in where clause?

I created user defined function fStringToDouble which takes string as an argument and returns double. This user defined functions works fine in select statement.
SELECT applieddatetime, fStringToDouble(variablevalue) from my_table WHERE locationid='xyz' and applieddatetime >= '2016-08-22' AND applieddatetime < '2016-08-23' ;
When I put this user defined function in where clause , I get syntax error as "no viable alternative at input"
SELECT applieddatetime from my_table WHERE locationid='xyz' and applieddatetime >= '2016-08-22' AND applieddatetime < '2016-08-23'and fStringToDouble(variablevalue)<6.0;
What is wrong with above query ? Is there any built in function to cast String to Double in Cassandra?
You cannot use user defined function in WHERE clauses but only some range query operators.
If you want to know more about what you can do in WHERE clauses, you can have a look at this post: http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

Resources