Mapping data flow SQL query and Parameters failing

Mapping data flow SQL query and Parameters failing - apache-spark

In my mapping dataflow I have simplified this down to dimdate just for the test
My parameters are
The source even tells you exactly how to enter the select query if you are using parameters which is what I'm trying to achieve
Then I import but get two different errors
for parameterizing a table`
SELECT * FROM {$df_TableName}
I get
This error from a select * or invidiual columns
I've tried just the WHERE clause (what I actually need) as a parameter but keep getting datatype mismatch errors
I then started testing multiple ways and it only allows the schema to be parameterised from my queries below
all of these other options seem to fail no matter what I do
SELECT * FROM [{$df_Schema}].[{$df_TableName}] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[DimDate] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[{$df_TableName}] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[DimDate] Where [Period] = 2106
I know there's an issue with the Integer datatype but don't know how to pass this to the query within the parameter without changing its type as the sql engine cannot run [period] as a string

Use CONCAT function in expression builder to build the Query in Dataflow.
concat(<this> : string, <that> : string, ...) => string
Note: Concatenates a variable number of strings together. All the variables should be in form of strings.
Example 1:
concat(toString("select * from "), toString($df_tablename))
Example 2:
concat(toString("select * from "), toString($df_tablename), ' ', toString(" where incomingperiod = "), toString($df_incomingPeriod))

Awesome, it worked like magic for me. I was struggling with parameterizing tables= names which I was passing through Array list.
Created a data flow parameter and gave this value:
#item().TABLE_NAME

Related

psycopg2 SELECT query with inbuilt functions

I have the following SQL statement where i am reading the database to get the records for 1 day. Here is what i tried in pgAdmin console -
SELECT * FROM public.orders WHERE createdat >= now()::date AND type='t_order'
I want to convert this to the syntax of psycopg2but somehow it throws me errors -
Database connection failed due to invalid input syntax for type timestamp: "now()::date"
Here is what i am doing -
query = f"SELECT * FROM {table} WHERE (createdat>=%s AND type=%s)"
cur.execute(query, ("now()::date", "t_order"))
records = cur.fetchall()
Any help is deeply appreciated.

DO NOT use f strings. Use proper Parameter Passing
now()::date is better expressed as current_date. See Current Date/Time.
You want:
query = "SELECT * FROM public.orders WHERE (createdat>=current_date AND type=%s)"
cur.execute(query, ["t_order"])
If you want dynamic identifiers, table/column names then:
from psycopg2 import sql
query = sql.SQL("SELECT * FROM {} WHERE (createdat>=current_date AND type=%s)").format(sql.Identifier(table))
cur.execute(query, ["t_order"])
For more information see sql.

Inserting Timestamp Into Snowflake Using Python 3.8

I have an empty table defined in snowflake as;
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP
);
And it creates the correct table, which has been checked using desc command in sql. Then using a snowflake python connector we are trying to execute following query;
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},{ct});'
ctx.cursor().execute(insert_query)
Just before this query the variables are defined, The main challenge is getting the current time stamp written into snowflake. Here the value of ct is defined as;
import datetime
ct = datetime.datetime.now()
print(ct)
2021-04-30 21:54:41.676406
But when we try to execute this INSERT query we get the following errr message;
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 1 at position 157 unexpected '21'.
Can I kindly get some help on ow to format the date time value here? Help is appreciated.

In addition to the answer #Lukasz provided you could also think about defining the current_timestamp() as default for the TIME_PREDICTED column:
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP DEFAULT current_timestamp
);
And then just insert ACCOUNT_ID and PREDICTED_PROBABILITY:
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY) VALUES ({accountId}, {risk_score});'
ctx.cursor().execute(insert_query)
It will automatically assign the insert time to TIME_PREDICTED

Educated guess. When performing insert with:
insert_query = f'INSERT INTO ...(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED)
VALUES ({accountId}, {risk_score},{ct});'
It is a string interpolation. The ct is provided as string representation of datetime, which does not match a timestamp data type, thus error.
I would suggest using proper variable binding instead:
ctx.cursor().execute("INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES "
"(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) "
"VALUES(:1, :2, :3)",
(accountId,
risk_score,
("TIMESTAMP_LTZ", ct)
)
);
Avoid SQL Injection Attacks
Avoid binding data using Python’s formatting function because you risk SQL injection. For example:
# Binding data (UNSAFE EXAMPLE)
con.cursor().execute(
"INSERT INTO testtable(col1, col2) "
"VALUES({col1}, '{col2}')".format(
col1=789,
col2='test string3')
)
Instead, store the values in variables, check those values (for example, by looking for suspicious semicolons inside strings), and then bind the parameters using qmark or numeric binding style.

You forgot to place the quotes before and after the {ct}. The code should be :
insert_query = "INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},'{ct}');".format(accountId=accountId,risk_score=risk_score,ct=ct)
ctx.cursor().execute(insert_query)

Spark SQL - How do i set a variable within the query, to re-use throughout?

I'm trying to convert a query from T-SQL to Spark's SQL.
I've got 99% of the way, but we've made strong use of the DECLARE statement in T-SQL.
I can't seem to find an alternative in Spark SQL that behaves the same - that is, allows me to declare variables in the query itself, to be re-used in that query.
Example in T-SQL:
DECLARE #varA int
SET #varA = '4'
SELECT * FROM tblName where id = #varA;
How do i do the declaration of such a variable in Spark SQL? (I don't want to use string interpolation, unless necessary)

You can try this:
sqlContext.sql("set id_value = 3")
sqlContext.sql("select * from country where id = ${id_value}").show()

"non-integer constant in ORDER BY" when using pg-promise with named parameters

I am trying to write a simple query using the pgp-promise library. My original implementation looks like:
var bar = function(orderBy){
var qs = 'select * from mytable order by ${orderBy};';
return db.many(qs,{orderBy:orderBy});
}
...
bar('id').then(...)
But this gives an error of non-integer constant in ORDER BY
I have also tried adding quotes aroung ${orderBy} and adding double quotes to the orderBy paramater to no avail. I have a working solution by doing var qs = 'select * from mytable order by "' + orderBy + '";' though it should be obvious why I don't want code like that in the project.
My question: Is there a way to get pg-promise to build a query with an order by clause that isn't vulnerable to sql injection?

Is there a way to get pg-promise to build a query with an order by clause that isn't vulnerable to sql injection?
The value for ORDER BY clause is an SQL name, and it is to be formatted using SQL Names:
const bar = function(orderBy) {
const qs = 'select * from mytable order by ${orderBy:name}';
return db.many(qs, {orderBy});
}
whereas :raw / ^ is injecting raw text, which is vulnerable to SQL injections when it comes from outside, and to be used only for strings that have been created and pre-formatted inside the server.

Postgres SQL Joins for Many To Many Relationship

right now I am "learning" Postgres SQL. I have 3 tables:
1) User: userId
2) Stack :stackId
3) User_Stack: userId, stackId
Now I want to fetch all stacks belonging to one user, given the userId. I understand I need to use Joins, but thats were I get stuck... I try it like this:
SELECT * FROM "Stack" LEFT OUTER JOIN "User_Stack" ON ('User_Stack.stackId' = 'Stack.stackId') WHERE "userId" = '590855';
Error: The returned data is empty.
PS: Is there any GUI Query builder out there ? Or do you have any other tips how to systematically create queries ?
EDIT: If I change the query to this:
SELECT * FROM "Stack" INNER JOIN "User_Stack" ON (User_Stack.stackId = Stack.stackId) WHERE "userId" = '590855';
I get the following error:
Kernel error: ERROR: missing FROM-clause entry for table "user_stack"
LINE 1: SELECT * FROM "Stack" INNER JOIN "User_Stack" ON (User_Stack...

Your main error is in the join. If you do 'something' = 'other' you're comparing string literals, not getting anything from the database. So this will always return false. You will want to compare table1.field1 = table2.field2
Another thing is the LEFT OUTER JOIN. I'm pretty sure you want an INNER JOIN since you want only fields that exist in the other table.
Also don't use double quotes for fields and table names since then the database will require case sensitivity and usually it's not good to have case sensitive names. You can use them with lowercase names if you need and always create them in lowercase.
Numbers also don't need to be quoted, it will just cause more processing when the system has to convert them from text to numbers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Mapping data flow SQL query and Parameters failing - apache-spark

Awesome, it worked like magic for me. I was struggling with parameterizing tables= names which I was passing through Array list. Created a data flow parameter and gave this value: #item().TABLE_NAME

Related

psycopg2 SELECT query with inbuilt functions

Inserting Timestamp Into Snowflake Using Python 3.8

Spark SQL - How do i set a variable within the query, to re-use throughout?

"non-integer constant in ORDER BY" when using pg-promise with named parameters

Postgres SQL Joins for Many To Many Relationship

Categories

Resources