Python, dataframe sql join - python-3.x

I have two python functions that query a database directly.
Is there a way to join these 2 functions within python?
I want to do a couple of joins not really sure how to do that in python.
Query 1:
def query1(businessDate):
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
#businessDate = r"'2019-03-13'"
#remember business date should be entered like "'2019-03-13'"
sql = f"""
SELECT
iddate,
businessdate,
stack, identifier
FROM stackoverflow
where stack is not null
and businessdate = {businessDate}
"""
df_stack = pd.read_sql(sql,con)
con.close()
return(df_stack)
query 2:
def superuser(businessDate):
con = pyodbc.connect(r'DSN='+'super',autocommit=True)
print('working')
#remember business date should be entered like "'2019-03-13'"
sql = f"""
SELECT
iddate,
businessdate,
stack, identifier
FROM superuser
WHERE stack is not null
and businessdate = {businessDate}
"""
df_super = pd.read_sql(sql,con)
con.close()
return(df_super)
I'd want to do a left outer join table 1 with table 2 on identifier, stack, iddate and businessdate
Trying:
def testjoin():
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
pd.merge(df_stack,df_super, on = ['identifier','stack','iddate'])
df_test = pd.read_sql(sql,con)
con.close()
return(df_test)
trying 2:
def testjoin():
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
df_stack= query1("'2019-03-13'")
df_super= superuser("'2019-03-13'")
pd.merge(df_stack,df_super, on = ['identifier','stack','iddate'])
df_test = pd.read_sql(sql,con)
con.close()
return(df_test)
getting error name 'sql' is not defined'

Left Outer Join
SELECT *
FROM df_stack
LEFT OUTER JOIN df_super
ON df_stack.stack = df_super.stack
ON df_stack.identifier= df_super.identifier
ON df_stack.iddate = df_super.iddate
ON df_stack.businessdate = df_super.businessdate;
pd.merge(df_stack,df_super,
on=['iddate','businessdate', 'stack', 'identifier'],
how='left')

Ok I'm going to post this as an answer instead of the comments as there are several ways which to do what you are asking. sql is not defined as I noted in the comment because it is outside the scope of the function testjoin().
One way to is to treat the SQL string as a global variable and then access it inside the function.
sql = '''
SELECT
iddate,
businessdate,
stack, identifier
FROM stackoverflow
where stack is not null
and businessdate = {businessDate}
'''
def testjoin():
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
df_stack= query1("'2019-03-13'")
df_super= superuser("'2019-03-13'")
pd.merge(df_stack,df_super, on = ['identifier','stack','iddate'])
df_test = pd.read_sql(sql.format(businessDate="'2019-03-14'"),con)
con.close()
return(df_test)
The reason I used .format() instead of an f string is that an f string requires the variable to be declared at the time the f string is created. Which if you did not have businessdate as a variable it would be an error. .format() allows you to place the variable in the string and then change it's value whenever you want. I would do it this way if the main part of your query isn't going to change that much and you just need to filter by date.
The other way would to build the string outside the function and then pass it in as a parameter
businessdate = "'2019-03-13'"
sql = f'''
SELECT
iddate,
businessdate,
stack, identifier
FROM stackoverflow
where stack is not null
and businessdate = {businessDate}
'''
def testjoin(sql_string):
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
df_stack= query1("'2019-03-13'")
df_super= superuser("'2019-03-13'")
pd.merge(df_stack,df_super, on = ['identifier','stack','iddate'])
df_test = pd.read_sql(sql_string,con)
con.close()
return(df_test)
test_df = testjoin(sql)
You could also continue building the string inside each function as well but given the Don't Repeat Yourself paradigm in coding, and the fact you are already building it in two other functions it would be best to avoid that.

Related

Multiple WHERE conditions in Pandas read_sql

I've got my data put into an SQLite3 database, and now I'm trying to work on a little script to access data I want for given dates. I got the SELECT statement to work with the date ranges, but I can't seem to add another condition to fine tune the search.
db columns id, date, driverid, drivername, pickupStop, pickupPkg, delStop, delPkg
What I've got so far:
import pandas as pd
import sqlite3
sql_data = 'driverperformance.sqlite'
conn = sqlite3.connect(sql_data)
cur = conn.cursor()
date_start = "2021-12-04"
date_end = "2021-12-10"
df = pd.read_sql_query("SELECT DISTINCT drivername FROM DriverPerf WHERE date BETWEEN :dstart and :dend", params={"dstart": date_start, "dend": date_end}, con=conn)
drivers = df.values.tolist()
for d in drivers:
driverDF = pd.read_sql_query("SELECT * FROM DriverPerf WHERE drivername = :driver AND date BETWEEN :dstart and :dend", params={"driver": d, "dstart": date_start, "dend": date_end}, con=conn)
I've tried a few different versions of the "WHERE drivername" part but it always seems to fail.
Thanks!
If I'm not mistaken, drivers will be a list of lists. Have you tried
.... params={"driver": d[0] ....

How to send in python params (which is string) into SQL question on DB2

I want to write function which receive parameter as string which should be used inside SQL statement for DB2 database. Then I need to take row by row and do smth in each loop step:
import ibm_db
conn_str = 'database=xxx;hostname=x.x.x.x;port=xxxx;protocol=TCPIP;UID=xxxxx;pwd=secret;'
ibm_db_conn = ibm_db.connect(conn_str,'','')
def question_to_db(tel : string):
sql = "SELECT * from mper where mper.tel = ?"
sql2 = ibm_db.prepare(ibm_db_conn, sql)
ibm_db.bind_param(sql2, 1, tel, ibm_db.SQL_PARAM_INPUT, ibm_db.SQLCHAR)
stmt = ibm_db.exec_immediate(ibm_db_conn, sql2)
row = ibm_db.fetch_both(stmt)
while row != False
do_smth_with_row ....
row = ibm_db.fetch_both(stmt)
return(True)
After run of program I receive error:
stmt = ibm_db.exec_immediate(ibm_db_conn, sql2)
Exception: statement must be a string or unicode
I'm looking for any solution of my problem. I can't find any exmaples with string and fetching rows :(
Any one can help me ? Thanks in advance.
Well, the Db2 Python API documentation has an example. The problem is that your are mixing different functions. You either
execute_immediate: A string is passed in to be executed once as SQL statement.
prepare and execute: You first prepare a string as SQL statement to be executed. When prepared, you can execute that statement once or many times.
Something like this should work:
sql_stmt = "SELECT * from mper where mper.tel = ?"
stmt = ibm_db.prepare(ibm_db_conn, sql_stmt)
ibm_db.bind_param(stmt, 1, tel)
try:
ibm_db.execute(stmt)
except:
print(ibm_db.stmt_errormsg())

Python: replaced dynamic named parameter in string(for SQL)

I have a SQL as below:
sql= '''
select name from table1 where asof between '$varA' and '$varB'
union
select name from table2 where asof between '$varC' and '$varD'
'''
This sql contains dynamic variables.
Use Template.substitude can replace the variables to the value, but in my situation the variable name is dynamic. That is to say I don't know if it's $varA, $varB...
Is there a way i can do dynamic substitude?
Thanks
I got it!
def parseSQL(sql, dict):
template = Template(sql)
# print(dict)
try:
sql = template.substitute(**dict)
except KeyError:
print('Incomplete substitution resulted in KeyError!')
finally:
return sql
usage:
dict = {'startDate': '2021-01-01', 'endDate': '2021-01-31'}
sql = parseSQL(sql, dict)
Just use different "dict" to parse the SQL

Error while getting user input and using Pandas DataFrame to extract data from LEFT JOIN

I am trying to create Sqlite3 statement in Python 3 to collect data from two tables called FreightCargo & Train where a train ID is the input value. I want to use Pandas since its easy to read the tables.
I have created the code below which is working perfectly fine, but its static and looks for only one given line in the statement.
import pandas as pd
SQL = '''SELECT F.Cargo_ID, F.Name, F.Weight, T.Train_ID, T.Assembly_date
FROM FreightCargo F LEFT JOIN [Train] T
ON F.Cargo_ID = T.Cargo_ID
WHERE Train_ID = 2;'''
cursor = conn.cursor()
cursor.execute( SQL )
names = [x[0] for x in cursor.description]
rows = cursor.fetchall()
Temp = pd.DataFrame( rows, columns=names)
Temp'''
I want to be able to create a variable with an input. The outcome of this action will then be determined with what has been given from the user. For example the user is asked for a train_id which is a primary key in a table and the relations with the train will be listed.
I expanded the code, but I am getting an error: ValueError: operation parameter must be str
Train_ID = input('Train ID')
SQL = '''SELECT F.Cargo_ID, F.Name, F.Weight, T.Train_ID, T.Assembly_date
FROM FreightCargo F LEFT JOIN [Train] T
ON F.Cargo_ID = T.Cargo_ID
WHERE Train_ID = ?;''', (Train_ID)
cursor = conn.cursor()
cursor.execute( SQL )
names = [x[0] for x in cursor.description]
rows = cursor.fetchall()
Temp = pd.DataFrame( rows, columns=names)
Temp
The problem lays in your definition of the SQL variable.
You are creating a tuple/collection of two elements. If you print type(SQL) you will see something like this: ('''SELECT...?;''', ('your_user's_input')).
When you pass this to cursor.execute(sql[, parameters]), it is expecting a string as the first argument, with the "optional" parameters. Your parameters are not really optional, since they are defined by your SQL-query's [Train]. Parameters must be a collection, for example a tuple.
You can unwrap your SQL statement with cursor.execute(*SQL), which will pass each element of your SQL list as a different argument, or you can move the parameters to the execute function.
Train_ID = input('Train ID')
SQL = '''SELECT F.Cargo_ID, F.Name, F.Weight, T.Train_ID, T.Assembly_date
FROM FreightCargo F LEFT JOIN [Train] T
ON F.Cargo_ID = T.Cargo_ID
WHERE Train_ID = ?;'''
cursor = conn.cursor()
cursor.execute( SQL, (Train_ID,) )
names = [x[0] for x in cursor.description]
rows = cursor.fetchall()
Temp = pd.DataFrame( rows, columns=names)
Temp

How to read and insert bytea columns using psycopg2?

I am working on a Python script to replicate some Postgresql tables from one environment to another (which does a little more than pg_dump). It works except when I am copying a table that has bytea data type.
I read the source table data in memory, then I dump the memory in the target database with concatenated inserts.
Here is my method that produces an insert statement:
def generateInsert(self, argCachedRow):
colOrd = 0;
valClauseList = []
hasBinary = False
for colData in argCachedRow:
colOrd += 1
colName = self.colOrdLookup.get(colOrd)
col = self.colLookup.get(colName)
dataType = col.dataType
insVal = None
if colData is not None:
strVal = str(colData)
if dataType.useQuote:
if "'" in strVal:
strVal = strVal.replace("'", "''")
insVal = "'%s'" % strVal
else:
if dataType.binary:
hasBinary = True
#insVal = psycopg2.Binary(colData)
#else:
insVal = strVal
else:
insVal = "NULL"
valClauseList.append(insVal)
valClause = ", ".join(valClauseList)
if hasBinary:
valClause = psycopg2.Binary(valClause)
result = "INSERT INTO %s VALUES (%s)" % (self.name, valClause)
return result
which works with every table that doesn't have binary data.
I also tried (intuitively) to wrap just the binary column data in psycopg2.Binary, which is the commented out line and then not do it to the whole row value list but that didn't work either.
Here is my simple DataType wrapper, which is loaded by reading Postgres' information_schema tables:
class DataType(object):
def __init__(self, argDispName, argSqlName, argUseQuote, argBin):
self.dispName = argDispName
self.sqlName = argSqlName
self.useQuote = argUseQuote
self.binary = argBin
How do I read and insert bytea columns using psycopg2?
If you have this database structure:
CREATE TABLE test (a bytea,
b int,
c text)
then inserting binary data into the request can be done like so, without any wrappers:
bin_data = b'bytes object'
db = psycopg2.connect(*args) # DB-API 2.0
c = db.cursor()
c.execute('''INSERT INTO test VALUES (%s, %s, %s)''', (bin_data, 1337, 'foo'))
c.execute('''UPDATE test SET a = %s''', (bin_data + b'1',))
Then, when you query it:
c.execute('''SELECT a FROM test''')
You'll receive a memoryview, which is easily converted back to bytes:
mview = c.fetchone()
new_bin_data = bytes(mview)
print(new_bin_data)
Output: b'bytes object1'
Also, I'd suggest you not to assemble queries by string formatting. psycopg2's built-in parameter substitution is much more convenient and you don't have to worry about validating data to protect from SQL injections.

Resources