Can NiFi string manipulate functions be used in QueryRecord? - string

Can one use substring or concat functions in queryRecord SQL statement field? See I have flowfile
{ "field1": "1, Tom Johnson", "field2":"3", "field3":"xyz" }
In QueryRecord processor SQL query is,
select substringAfter(/field1, ',') as NAME, substringBefore(/field2, ',') as ID, field3 from flowfile
, I got error when run the processor with error about query. Don't know what problem is. How can this be done?
The upstream is above flowfile and tried sql query as both below:
select substringAfter(/field1, ',') as NAME, substringBefore(/field2, ',') as ID, field3 from flowfile
and
select substringAfter(field1, ',') as NAME, substringBefore(field2, ',') as ID, field3 from flowfile
The query with PATH as /field1 are not accepted by processor. The second one trigger run time error during prepare the SQL statement. So are these NiFi function can be used in QueryRecord?

You can use the UpdateRecord processor to do this.
You'll need to clean up the leftover fields afterwards, but that can be done with QueryRecord with:
SELECT NAME, ID, field3 from FLOWFILE

Related

how to insert subQuery at join by "QueryDLS"

I'm using QueryDLS.
But I got some problems.
My Database server is MariaDB.
select s.* , a.user_id
from schedule s
left outer join(
select sch_idx, group_concat(user_id SEPARATOR ',') as user_id
from attendee a
group by sch_idx
) a
on s.idx = a.sch_idx;
and this is my code( Ofcoursly, It' is wrong)
public void selectJoin(){
JPAQueryFactory query = new JPAQueryFactory(em);
QSchedule qSchedule = QSchedule.schedule;
QAttendee qAttendee = QAttendee.attendee;
query.selectFrom(qSchedule)
.leftJoin(
query.select(
qAttendee.schIdx,
qAttendee.userId
).from(qAttendee)
.groupBy(qAttendee.schIdx)
).on(qSchedule.idx.eq(qAttendee.schIdx))
.fetch();
}
How can I insert subquery in the join like my sql sample?
And I Have to resolve Group_concat in QueryDSL
If I can not resolve my truoble by QueryDSL, Let me know what is best solutions.
please, I need your help!

Inserting Timestamp Into Snowflake Using Python 3.8

I have an empty table defined in snowflake as;
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP
);
And it creates the correct table, which has been checked using desc command in sql. Then using a snowflake python connector we are trying to execute following query;
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},{ct});'
ctx.cursor().execute(insert_query)
Just before this query the variables are defined, The main challenge is getting the current time stamp written into snowflake. Here the value of ct is defined as;
import datetime
ct = datetime.datetime.now()
print(ct)
2021-04-30 21:54:41.676406
But when we try to execute this INSERT query we get the following errr message;
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 1 at position 157 unexpected '21'.
Can I kindly get some help on ow to format the date time value here? Help is appreciated.
In addition to the answer #Lukasz provided you could also think about defining the current_timestamp() as default for the TIME_PREDICTED column:
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP DEFAULT current_timestamp
);
And then just insert ACCOUNT_ID and PREDICTED_PROBABILITY:
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY) VALUES ({accountId}, {risk_score});'
ctx.cursor().execute(insert_query)
It will automatically assign the insert time to TIME_PREDICTED
Educated guess. When performing insert with:
insert_query = f'INSERT INTO ...(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED)
VALUES ({accountId}, {risk_score},{ct});'
It is a string interpolation. The ct is provided as string representation of datetime, which does not match a timestamp data type, thus error.
I would suggest using proper variable binding instead:
ctx.cursor().execute("INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES "
"(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) "
"VALUES(:1, :2, :3)",
(accountId,
risk_score,
("TIMESTAMP_LTZ", ct)
)
);
Avoid SQL Injection Attacks
Avoid binding data using Python’s formatting function because you risk SQL injection. For example:
# Binding data (UNSAFE EXAMPLE)
con.cursor().execute(
"INSERT INTO testtable(col1, col2) "
"VALUES({col1}, '{col2}')".format(
col1=789,
col2='test string3')
)
Instead, store the values in variables, check those values (for example, by looking for suspicious semicolons inside strings), and then bind the parameters using qmark or numeric binding style.
You forgot to place the quotes before and after the {ct}. The code should be :
insert_query = "INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},'{ct}');".format(accountId=accountId,risk_score=risk_score,ct=ct)
ctx.cursor().execute(insert_query)

How do I escape the ampersand character (&) in cql?

I am inserting a statement into a table that looks something like this:
insert into db.table (field1, field2) values (1, 'eggs&cheese')
but when i later query this error on our servers, my query returns:
eggs\u0026cheese instead.
Not sure whether to use \ or '
If anyone can help, that would be great. Thank you!
This doesn't appear to be a problem with CQL but the way your app displays the value.
For example, if the CQL column type is text, the unicode character is encoded as a UTF-8 string.
Using this example schema:
CREATE TABLE unicodechars (
id int PRIMARY KEY,
randomtext text
)
cqlsh displays the ampersand as expected:
cqlsh> SELECT * FROM unicodechars ;
id | randomtext
----+-------------
1 | eggs&cheese

Getting ValidationFailureSemanticException on 'INSERT OVEWRITE'

I am creating a DataFrame and registering that DataFrame as temp view using df.createOrReplaceTempView('mytable'). After that I try to write the content from 'mytable' into Hive table(It has partition) using the following query
insert overwrite table
myhivedb.myhivetable
partition(testdate) // ( 1) : Note here : I have a partition named 'testdate'
select
Field1,
Field2,
...
TestDate //(2) : Note here : I have a field named 'TestDate' ; Both (1) & (2) have the same name
from
mytable
when I execute this query, I am getting the following error
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.Table$ValidationFailureSemanticException: Partition spec
{testdate=, TestDate=2013-01-01}
Looks like I am getting this error because of the same field names ; ie testdate(the partition in Hive) & TestDate (The field in temp table 'mytable')
Whereas if my partition name testdate is different from the fieldname(ie TestDate), the query executes successuflly. Example...
insert overwrite table
myhivedb.myhivetable
partition(my_partition) //Note here the partition name is not 'testdate'
select
Field1,
Field2,
...
TestDate
from
mytable
My guess is it looks like a Bug in Spark...but would like to have second opinion...Am I missing something here?
#DuduMarkovitz #dhee ; apologies for being too late for the response. I am finally able to resolve the issue. Earlier I was creating the table using cameCase(in the CREATE statement) which seems to be the reason for the Exception. Now i have created the table using the DDL where field names are in lower case. This has resolved my issue

Turning a Comma Separated string into individual rows in Teradata

I read the post:
Turning a Comma Separated string into individual rows
And really like the solution:
SELECT A.OtherID,
Split.a.value('.', 'VARCHAR(100)') AS Data
FROM
( SELECT OtherID,
CAST ('<M>' + REPLACE(Data, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM Table1
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);
But it did not work when I tried to apply the method in Teradata for a similar question. Here is the summarized error code:
select failed 3707: expected something between '.' and the 'value' keyword. So is the code only valid in SQL Server? Would anyone help me to make it work in Teradata or SAS SQL? Your help will be really appreciated!
This is SQL Server syntax.
In Teradata there's a table UDF named STRTOK_SPLIT_TO_TABLE,
e.g.
SELECT * FROM dbc.DatabasesV AS db
JOIN
(
SELECT token AS DatabaseName, tokennum
FROM TABLE (STRTOK_SPLIT_TO_TABLE(1, 'dbc,systemfe', ',')
RETURNS (outkey INTEGER,
tokennum INTEGER,
token VARCHAR(128) CHARACTER SET UNICODE)
) AS d
) AS dt
ON db.DatabaseName = dt.DatabaseName
ORDER BY tokennum;
Or see my answer to this similar question

Resources