Does Cassandra allow user defined function in where clause? - cassandra

I created user defined function fStringToDouble which takes string as an argument and returns double. This user defined functions works fine in select statement.
SELECT applieddatetime, fStringToDouble(variablevalue) from my_table WHERE locationid='xyz' and applieddatetime >= '2016-08-22' AND applieddatetime < '2016-08-23' ;
When I put this user defined function in where clause , I get syntax error as "no viable alternative at input"
SELECT applieddatetime from my_table WHERE locationid='xyz' and applieddatetime >= '2016-08-22' AND applieddatetime < '2016-08-23'and fStringToDouble(variablevalue)<6.0;
What is wrong with above query ? Is there any built in function to cast String to Double in Cassandra?

You cannot use user defined function in WHERE clauses but only some range query operators.
If you want to know more about what you can do in WHERE clauses, you can have a look at this post: http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

Related

Max aggregate function syntax (to be called by the supabase client)

I need to write a simple custom aggregate function that returns the max of a column (as MAX is not suported by the supabase client). Just not sure about the syntax, so any help is most welcome. I have tried permutations of:
select max(my_column) from my_table as $$
return $$ + 1
getting an error:
Failed to validate sql query: syntax error at or near "select"
Try with:
create or replace function max_value() returns int as $$
select max(my_column) from my_table;
$$ language sql;
Then calling it through rpc.

Mapping data flow SQL query and Parameters failing

In my mapping dataflow I have simplified this down to dimdate just for the test
My parameters are
The source even tells you exactly how to enter the select query if you are using parameters which is what I'm trying to achieve
Then I import but get two different errors
for parameterizing a table`
SELECT * FROM {$df_TableName}
I get
This error from a select * or invidiual columns
I've tried just the WHERE clause (what I actually need) as a parameter but keep getting datatype mismatch errors
I then started testing multiple ways and it only allows the schema to be parameterised from my queries below
all of these other options seem to fail no matter what I do
SELECT * FROM [{$df_Schema}].[{$df_TableName}] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[DimDate] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[{$df_TableName}] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[DimDate] Where [Period] = 2106
I know there's an issue with the Integer datatype but don't know how to pass this to the query within the parameter without changing its type as the sql engine cannot run [period] as a string
Use CONCAT function in expression builder to build the Query in Dataflow.
concat(<this> : string, <that> : string, ...) => string
Note: Concatenates a variable number of strings together. All the variables should be in form of strings.
Example 1:
concat(toString("select * from "), toString($df_tablename))
Example 2:
concat(toString("select * from "), toString($df_tablename), ' ', toString(" where incomingperiod = "), toString($df_incomingPeriod))
Awesome, it worked like magic for me. I was struggling with parameterizing tables= names which I was passing through Array list.
Created a data flow parameter and gave this value:
#item().TABLE_NAME

Spark SQL query with IN operator in CASE WHEN cannot be cast to SparkPlan

I'm trying to execute the test query like this:
SELECT COUNT(CASE WHEN name IN (SELECT name FROM requiredProducts) THEN name END)
FROM myProducts
which throws the following exception:
java.lang.ClassCastException:
org.apache.spark.sql.execution.datasources.LogicalRelation cannot be cast to
org.apache.spark.sql.execution.SparkPlan
I have a suggestion that IN operator can not be used in CASE WHEN. Is it really so? Spark documentation is silent about this.
The IN operator using a subquery does not work in a projection regardless of whether it is contained in a CASE WHEN, it will only work in filters. It works fine if you specify values in the IN clause directly rather than using a subquery.
I am not sure how to generate the exact exception you got above, but when I attempt to run a similar query in Spark Scala, it returns a more descriptive error:
org.apache.spark.sql.AnalysisException: IN/EXISTS predicate sub-queries can only be used in a Filter: Project [CASE WHEN agi_label#5 IN (list#96 []) THEN 1 ELSE 0 END AS CASE WHEN (agi_label IN (listquery())) THEN 1 ELSE 0 END#97]
I have run into this issue in the past. Your best bet is probably to restructure it to use a left join to requiredProducts and then check for a null in the case statement. For example, something like this might work:
SELECT COUNT(CASE WHEN rp.name is not null THEN mp.name END)
FROM myProducts mp
LEFT JOIN requiredProducts rp ON mp.name = rp.name

Get all rows if keyword is null else return matching

I have a plpgsql function in Postgres. It's working fine when keyword is not null and returning the matching results but when keyword is null I want to ignore it and return arbitrary rows.
CREATE OR REPLACE FUNCTION get_all_companies(_keyword varchar(255))
RETURNS TABLE(
id INTEGER,
name VARCHAR,
isactive boolean
) AS $$
BEGIN
RETURN Query
SELECT c.id, c.name, c.isactive FROM companydetail AS c
WHERE c.name ~* _keyword LIMIT 50 ;
END;$$
LANGUAGE plpgsql;
Verify if it's NULL or it's empty:
RETURN QUERY
SELECT c.id, c.name, c.isactive
FROM companydetail AS c
WHERE _keyword IS NULL
OR _keyword = ''::varchar(255)
OR c.name ~* _keyword
LIMIT 50 ;
#jahuuar provided a simple and elegant solution with a single SELECT to solve this with a single query (also skipping empty strings if you need that). You don't need plpgsql or even a function for this.
While working with plpgsql, you can optimize performance:
CREATE OR REPLACE FUNCTION get_all_companies(_keyword varchar(255))
RETURNS TABLE(id INTEGER, name VARCHAR, isactive boolean) AS
$func$
BEGIN
IF _keyword <> '' THEN -- exclude null and empty string
RETURN QUERY
SELECT c.id, c.name, c.isactive
FROM companydetail AS c
WHERE c.name ~* _keyword
LIMIT 50;
ELSE
RETURN QUERY
SELECT c.id, c.name, c.isactive
FROM companydetail AS c
LIMIT 50;
END IF;
END
$func$ LANGUAGE plpgsql;
Postgres can use separate, optimized plans for the two distinct queries this way. A trigram GIN index scan for the first query (you need the matching index, of course - see links below), a sequential scan for the second. And PL/pgSQL saves query plans when executed repeatedly in the same session.
Related:
Best way to check for "empty or null value"
Difference between LIKE and ~ in Postgres
PostgreSQL LIKE query performance variations
Difference between language sql and language plpgsql in PostgreSQL functions

Cassandra UDF and user input

I am using Cassandra 2.2 and I'm having a problem with User Defined Functions.
I want to create a function that take as parameter a integer column of a table and another integer as user input and mutiply the two values as follow:
CREATE OR REPLACE FUNCTION testFunc (val int, input int)
CALLED ON NULL INPUT RETURNS int
LANGUAGE java AS 'return val * input;';
I can execute the function on two integer column like
select testFunc(int_column, another_int_column) from my_table;
and it's working, but when I try to execute it with a user input like:
select testFunc(int_column, 3) from my_table;
i receive following exception:
SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:22 no viable alternative at input '3' (select testFunc(year, [3]...)"
Is it possible to achieve what I'm trying or I should find another way to do it?
Calling, UDF in this way testFunc(int_column, 3) is same as passing an int to a function parameter which takes String (i.e column name) only and hence the incorrect syntax error no viable alternative at input '3'. Not sure if this fits into your scenario, but you can try something like this:
CREATE OR REPLACE FUNCTION testFunc (val int)
CALLED ON NULL INPUT RETURNS int
LANGUAGE java AS 'return val * 3;';
Or add a multiplier column to your table.

Resources