How to execute sql Script with multiple statements from Airflow OracleOperator - python-3.x

i am trying to call a sql file with multiple statements separated by ; through the OracleOperator in airflow , but its giving below error with multiple statements
E.g File Containing
CALL DROP_OBJECTS('TABLE_XYZ');
CREATE TABLE TABLE_XYZ AS SELECT 1 Dummy from DUAL;
[2019-06-18 18:19:12,582] {init.py:1580} ERROR - ORA-00933: SQL command not properly ended
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/init.py", line 1441, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/oracle_operator.py", line 63, in execute
parameters=self.parameters)
File "/usr/local/lib/python3.6/site-packages/airflow/hooks/dbapi_hook.py", line 172, in run
cur.execute(s)
cx_Oracle.DatabaseError: ORA-00933: SQL command not properly ended
Even with single statement ending with ; giving below error :
e.g file
CREATE TABLE TABLE_XYZ AS SELECT 1 Dummy from DUAL;
[2019-06-18 17:47:53,137] {init.py:1580} ERROR - ORA-00922: missing or invalid option
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/init.py", line 1441, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/oracle_operator.py", line 63, in execute
parameters=self.parameters)
File "/usr/local/lib/python3.6/site-packages/airflow/hooks/dbapi_hook.py", line 172, in run
cur.execute(s)
with DAG('my_simple_dag',
default_args=default_args,
template_searchpath=['/root/rahul/'],
schedule_interval='*/10 * * * *',
) as dag:
opr_oracle = OracleOperator(task_id='oracleTest',oracle_conn_id='STG',
sql='test.sql')
do i need to pass any additional parameter to make the dbhook understand that the file need to be split in separate statement ?
as per the documentation it expects
param sql: the sql code to be executed. Can receive a str representing a sql statement,
a list of str (sql statements), or reference to a template file.
Template reference are recognized by str ending in '.sql'
(templated)
but the .sql template is not working with multiple statement. any help will be greatly appreciated . Thanks !!

The Oracle Operator will take a list of SQL Strings that are templated.
What I have done is read the SQL file in as a text file and then split it on the ';' to create a list of strings.
with open('/home/airflow/airflow/dags/sql/test_multi.sql') as sql_file:
sql_list = list(filter(None, sql_file.read().split(';')))
t_run_sql = OracleOperator(task_id='run_sql',
sql=sql_list,
oracle_conn_id='user_id',
autocommit=True,
depends_on_past=True,
dag=dag)
I tested this with templating (and yes this will fail in Oracle without creating the table first):
drop table test_multi;
create table test_multi as
select
{{ macros.datetime.strftime(execution_date.in_tz('Australia/Sydney') + macros.timedelta(days=1),'%Y%m%d') }} as day1,
{{ macros.datetime.strftime(execution_date.in_tz('Australia/Sydney') + macros.timedelta(days=2),'%Y%m%d') }} as day2,
{{ macros.datetime.strftime(execution_date.in_tz('Australia/Sydney') + macros.timedelta(days=3),'%Y%m%d') }} as day3
from dual;
insert into test_multi
select
{{ macros.datetime.strftime(execution_date.in_tz('Australia/Sydney') + macros.timedelta(days=4),'%Y%m%d') }} as day1,
{{ macros.datetime.strftime(execution_date.in_tz('Australia/Sydney') + macros.timedelta(days=5),'%Y%m%d') }} as day2,
{{ macros.datetime.strftime(execution_date.in_tz('Australia/Sydney') + macros.timedelta(days=6),'%Y%m%d') }} as day3
from dual;
This solution has an issue with the need to ensure that your SQL doesn't contain a semi-colon anywhere else. I also think splitting on ';/n' might be better, but it requires that the user always starts a new-line after the ';', so still not ideal.
I also found that I needed to deal with the last semi-colon with the filter(None,...) or the operator would submit an empty command to the database and then error.

Instead of sql file, you can assign string format sql statement"
Below is original API doc for Oracle operation in airflow. the sql could be str or list of str. If you prefer use file template, you need rendering the file template with parameter.
Note: Airflow use jinjia2 as template rending.
oracle_operator API
sql (str or list[str]) – the sql code to be executed. Can receive a str representing a sql statement, a list of str (sql statements), or
reference to a template file. Template reference are recognized by str
ending in ‘.sql’ (templated)
oracle_conn_id (str) – reference to a specific Oracle database
parameters (mapping or iterable) – (optional) the parameters to render
the SQL query with.
autocommit (bool) – if True, each command is automatically committed.
(default value: False)

Related

GNU Parallel with Python Script - command line variables not working

This is the first time I am trying to do python execution in GNU parallel.
I have the below python script. I am trying to run it in parallel with a text.txt document loading the variables. The text document has the variables one on each line.
I execute the below script with this code:
parallel --bar -a PairNames.txt python3 CreateDataTablePythonScriptv2.py
Here is the python script being executed:
import sqlite3
import sys
PairName = sys.argv[1]
print(PairName)
DTBLocation = '//mnt//c//Users//Jonathan//OneDrive - Mazars in Oman//Trading//Systems//FibMatrix//Testing Trade Analysis//SQLite//Trade Analysis.db
connection = sqlite3.connect(DTBLocation)
cursor = connection.cursor()
TableName = PairName+'_DATA'
print(TableName)
cursor.execute("""CREATE TABLE IF NOT EXISTS {}
(
Date_Time INTEGER,
Open REAL,
Max_60m_Box REAL
)""".format(TableName))
connection.commit()
connection.close()
It executes correctly the first variable just fine. But the remainder of the variables do print correctly from the print command for the PairName, but for print(TableName) I get the below displays:
GBPUSD
_DATAD
USDCHF
_DATAF
NZDJPY
_DATAY
Its weird to me that it prints the PairName just fine and correctly, but then the PairName does not show up when concating the TableName.
Also, its weird that an extra letter gets added to the end of DATA for each one. It appears that the extra letter at the end of the DATA is the last letter of the input variable. I don't know why its choping the 5 letters off and how it puts it at the end of the DATA.
I printed the tablename.
I watched this video at https://www.youtube.com/watch?v=OpaiGYxkSuQ&ab_channel=OleTange[^]
I tried moving the TableName concat to right under the PairName
I printed the type of the PairName, and it is a string
I tried seperating the varibales in the txt document by tabs and commas instead of next line
I tried assigning the "_DATA" to a variable and then concating the two objects. But it had same result:
TableEnd = '_DATA'
TableName = PairName + TableEnd
If I remove the concat of PairName+'_DATA' and just use PairName only as the TableName, then it works correctly.
Sorry if this is a simple answer, but I cannot figure it out and especially since there is not too much documentation / tutorials for a newbie on GNU Parallel in this situation. Thanks for the help!
The input file is not in DOS format (i.e. ends in a CRLF rather than just an LF)? I checked this using the FILE command:
$ file test.txt
test.txt: ASCII text, with CRLF line terminators
$
Since it was CRLF (DOS format), I converted it using tr:
Copy Codetr -d '\r' < input.file > output.file```

Field delimiter ',' found while expecting record delimiter '\n'

I am encountering an error while processing the csv file while loading it into snowflake. How do i treat this issue every time i encounter it as there are few records with the same error and I don't want to delete any record from the csv file.
Or how can i implement a try-except here? I want to either correct or skip records in csv with errors
import snowflake.connector
tableName='F58155'
ctx = snowflake.connector.connect(
user='-',
password='-',
account='-')
cs = ctx.cursor()
ctx.cursor().execute("USE DATABASE STORE_PROFILE_LANDING")
ctx.cursor().execute("USE SCHEMA PUBLIC")
ctx.cursor().execute("PUT file:///temp/data/{tableName}/* #%{tableName}".format(tableName=tableName))
ctx.cursor().execute("truncate table {tableName}".format(tableName=tableName))
ctx.cursor().execute("COPY INTO {tableName} ".format(tableName=tableName,
FIELD_OPTIONALLY_ENCLOSED_BY
= '"', sometimes=','))
ctx.close()
Here is the image of line 178 where I am getting the error.
This is explained here. Most common cause of this error is that field value contains the delimiter
https://community.snowflake.com/s/article/Copy-Error-Message-Field-delimiter-found-while-expecting-record-delimiter-n

Is there a way to save a txt file under a new name after reading/writing the file?

I am trying to run a python program to open a template multiple times and while running through a loop, save multiple copies of the txt template under distinct file names.
An example problem is included below: The example template takes the following form:
Null Null
Null
This is the test
But there is still more text.
The code I've made to do a quick edit is as follows:
longStr = (r"C:\Users\jrwaller\Documents\Automated Eve\NewTest.txt")
import fileinput
for line in fileinput.FileInput(longStr,inplace=1):
if "This" in line:
line=line.replace(line,line+"added\n")
print(line, end='')
The output of the code correctly adds the new line "added" to the text file:
Null Null
Null
This is the test
added
But there is still more text.
However, I want to save this new text as a new file name, say "New Test Edited" while keeping a copy of the old txt file available for further edits.
Here is a working example for you:
longStr = (r"C:\Users\jrwaller\Documents\Automated Eve\NewTest.txt")
with open(longStr) as old_file:
with open(r"C:\Users\jrwaller\Documents\Automated Eve\NewTestEdited.txt", "w") as new_file:
for line in old_file:
if "This" in line:
line=line.replace(line,line+"added\n")
new_file.write(line)
A simple file read and write operation with a context managers to close up when you're finished.

psycopg2 programming error while creating table

In a Python2.7 script, the following gave me an error, I can't figure out why:
import psycopg2
conn = psycopg2.connect("dbname=mydb user=username password=password")
curs = conn.cursor()
curs.execute("CREATE TABLE newtable;")
The error looks like:
Traceback (most recent call last):
File "<ipython-input-17-f4ba0186c40c>", line 1, in <module>
curs.execute("CREATE TABLE newtable;")
ProgrammingError: syntax error at or near ";"
LINE 1: CREATE TABLE newtable;
Any SELECT statement works perfectly well on the other hand. For example:
curs.execute("SELECT * FROM table1 LIMIT 0;")
works like a charm.
CREATE TABLE newtable; is not the correct syntax to create a new table. You need to define some columns.
CREATE TABLE newtable (
foo INTEGER,
bar TEXT
);
See the CREATE TABLE docs for more info.

strange Oracle error: "invalid format text"

I'm trying to fetch some data from a column whose DATA_TYPE=NUMBER(1,0) with this piece of code:
import cx_Oracle
conn = cx_Oracle.connect(usr, pwd, url)
cursor = conn.cursor()
cursor.execute("SELECT DELETED FROM SERVICEORDER WHERE ORDERID='TEST'")
print(cursor.fetchone()[0])
which complains thus:
Traceback (most recent call last):
File "main.py", line 247, in <module>
check = completed()
File "main.py", line 57, in completed
deleted = cursor.fetchone()[0]
cx_Oracle.DatabaseError: OCI-22061: invalid format text [T
Replacing 'DELETED' column with one whose DATA_TYPE=VARCHAR2 does not throw such a complaint.
I am running in to this problem now using cx_Oracle 5.0.4 with Unicode support. The above accepted solution did not work for me. The DELETED column in the question is a Numeric column, which is what causes this bug.
According to the mailing list ( http://comments.gmane.org/gmane.comp.python.db.cx-oracle/2390 ) it may be a bug in Oracle that shows only in cx_Oracle with Unicode support.
from the link:
"When I build cx_Oracle without Unicode support, it all works as expected.
When I build cx_Oracle with Unicode support, attempting to use a query
that returns a numeric value (such as):
con = Connection( ... )
cur = con.cursor()
cur.execute( 'SELECT 1 FROM DUAL' )
rows = cur.fetchall()
results in this exception:
cx_Oracle.DatabaseError: OCI-22061: invalid format text [T
"
What I did to work around it, is on the select statement, do:
cur.execute( 'SELECT to_char(1) FROM DUAL' )
rows = cur.fetchall()
for row in rows:
val = int(row[0])
It's pretty ugly, but it works.
These types of errors went away when I upgraded to cx_Oracle 5.1. If the RPM doesn't install (like it happened for me on Red Hat 5.5) then you can usually rpm2cpio the file, take the cx_Oracle.so and put it into your python site-packages directory.
A work-around is putting time.sleep(1) before cursor.fetchone():
...
cursor.execute("SELECT DELETED FROM SERVICEORDER WHERE ORDERID='TEST'")
time.sleep(1)
print(cursor.fetchone()[0])
I had the same error.
Commit helped me:
conn = cx_Oracle.connect(...)
...
cursor.execute()
conn.commit()

Resources