How to read and insert bytea columns using psycopg2? - python-3.x

I am working on a Python script to replicate some Postgresql tables from one environment to another (which does a little more than pg_dump). It works except when I am copying a table that has bytea data type.
I read the source table data in memory, then I dump the memory in the target database with concatenated inserts.
Here is my method that produces an insert statement:
def generateInsert(self, argCachedRow):
colOrd = 0;
valClauseList = []
hasBinary = False
for colData in argCachedRow:
colOrd += 1
colName = self.colOrdLookup.get(colOrd)
col = self.colLookup.get(colName)
dataType = col.dataType
insVal = None
if colData is not None:
strVal = str(colData)
if dataType.useQuote:
if "'" in strVal:
strVal = strVal.replace("'", "''")
insVal = "'%s'" % strVal
else:
if dataType.binary:
hasBinary = True
#insVal = psycopg2.Binary(colData)
#else:
insVal = strVal
else:
insVal = "NULL"
valClauseList.append(insVal)
valClause = ", ".join(valClauseList)
if hasBinary:
valClause = psycopg2.Binary(valClause)
result = "INSERT INTO %s VALUES (%s)" % (self.name, valClause)
return result
which works with every table that doesn't have binary data.
I also tried (intuitively) to wrap just the binary column data in psycopg2.Binary, which is the commented out line and then not do it to the whole row value list but that didn't work either.
Here is my simple DataType wrapper, which is loaded by reading Postgres' information_schema tables:
class DataType(object):
def __init__(self, argDispName, argSqlName, argUseQuote, argBin):
self.dispName = argDispName
self.sqlName = argSqlName
self.useQuote = argUseQuote
self.binary = argBin
How do I read and insert bytea columns using psycopg2?

If you have this database structure:
CREATE TABLE test (a bytea,
b int,
c text)
then inserting binary data into the request can be done like so, without any wrappers:
bin_data = b'bytes object'
db = psycopg2.connect(*args) # DB-API 2.0
c = db.cursor()
c.execute('''INSERT INTO test VALUES (%s, %s, %s)''', (bin_data, 1337, 'foo'))
c.execute('''UPDATE test SET a = %s''', (bin_data + b'1',))
Then, when you query it:
c.execute('''SELECT a FROM test''')
You'll receive a memoryview, which is easily converted back to bytes:
mview = c.fetchone()
new_bin_data = bytes(mview)
print(new_bin_data)
Output: b'bytes object1'
Also, I'd suggest you not to assemble queries by string formatting. psycopg2's built-in parameter substitution is much more convenient and you don't have to worry about validating data to protect from SQL injections.

Related

How to insert/update postgresql table using WHERE clause?

I'm using pscopg2 module to write into my database (postgresql). I'm trying to insert or update a column/row value based on an equivalence statement such as:
INSERT x INTO TABLE A WHERE variable_x MATCHES/EQUAL TO value y IN SAME TABLE;
Code:
def update_existing_record(dev_eui, device_serial_num):
cur = con.cursor()
con.autocommit = True
sql_command = " IF EXISTS (dev_eui == %s) SET severn_db.device_serial_num VALUES (%s)"
#sql_command = "INSERT INTO severn_db (device_serial_num) WHERE EXISTS (dev_eui == dev_eui) VALUES (%s) "
sql_values = (dev_eui, device_serial_num)
cur.execute(sql_command, sql_values,)
cur.close()
All I am trying to do is insert or update table column that matches the condition x == y (where x would be a parameter passed in as a local variable to the python function).
Any help?

Argument must be str not tuple

i need to load only the data from database by todays date.
date column in database is in TEXT...
''code to load all the data from database''
def load_database(self):
today_date = current_date[0:11]
while self.workingpatient_table.rowCount() > 0:
self.workingpatient_table.removeRow(0)
conn = sqlite3.connect(r'mylab.db')
content = ("SELECT * FROM daily_patients where date=?",(today_date))
result = conn.execute(content)
for row_index,row_data in enumerate(result):
self.workingpatient_table.insertRow(row_index)
for column_index,column_data in enumerate(row_data):
self.workingpatient_table.setItem(row_index,column_index,QTableWidgetItem(str(column_data)))
conn.close()
''when i run the program i get following error ''
result = conn.execute(content)
TypeError: argument 1 must be str, not tuple
any possible solution?
Change your line from
content = ("SELECT * FROM daily_patients where date=?",(today_date))
result = conn.execute(content)
to
content = ("SELECT * FROM daily_patients where date=?",(today_date, ))
result = conn.execute(*content)

Error while getting user input and using Pandas DataFrame to extract data from LEFT JOIN

I am trying to create Sqlite3 statement in Python 3 to collect data from two tables called FreightCargo & Train where a train ID is the input value. I want to use Pandas since its easy to read the tables.
I have created the code below which is working perfectly fine, but its static and looks for only one given line in the statement.
import pandas as pd
SQL = '''SELECT F.Cargo_ID, F.Name, F.Weight, T.Train_ID, T.Assembly_date
FROM FreightCargo F LEFT JOIN [Train] T
ON F.Cargo_ID = T.Cargo_ID
WHERE Train_ID = 2;'''
cursor = conn.cursor()
cursor.execute( SQL )
names = [x[0] for x in cursor.description]
rows = cursor.fetchall()
Temp = pd.DataFrame( rows, columns=names)
Temp'''
I want to be able to create a variable with an input. The outcome of this action will then be determined with what has been given from the user. For example the user is asked for a train_id which is a primary key in a table and the relations with the train will be listed.
I expanded the code, but I am getting an error: ValueError: operation parameter must be str
Train_ID = input('Train ID')
SQL = '''SELECT F.Cargo_ID, F.Name, F.Weight, T.Train_ID, T.Assembly_date
FROM FreightCargo F LEFT JOIN [Train] T
ON F.Cargo_ID = T.Cargo_ID
WHERE Train_ID = ?;''', (Train_ID)
cursor = conn.cursor()
cursor.execute( SQL )
names = [x[0] for x in cursor.description]
rows = cursor.fetchall()
Temp = pd.DataFrame( rows, columns=names)
Temp
The problem lays in your definition of the SQL variable.
You are creating a tuple/collection of two elements. If you print type(SQL) you will see something like this: ('''SELECT...?;''', ('your_user's_input')).
When you pass this to cursor.execute(sql[, parameters]), it is expecting a string as the first argument, with the "optional" parameters. Your parameters are not really optional, since they are defined by your SQL-query's [Train]. Parameters must be a collection, for example a tuple.
You can unwrap your SQL statement with cursor.execute(*SQL), which will pass each element of your SQL list as a different argument, or you can move the parameters to the execute function.
Train_ID = input('Train ID')
SQL = '''SELECT F.Cargo_ID, F.Name, F.Weight, T.Train_ID, T.Assembly_date
FROM FreightCargo F LEFT JOIN [Train] T
ON F.Cargo_ID = T.Cargo_ID
WHERE Train_ID = ?;'''
cursor = conn.cursor()
cursor.execute( SQL, (Train_ID,) )
names = [x[0] for x in cursor.description]
rows = cursor.fetchall()
Temp = pd.DataFrame( rows, columns=names)
Temp

update statement using loop over tuple of query and data fails in psycopg2

I have created a mini functional pipeline which creates an update statement with regex and then passes the statement and the data to pycopg2 to execute.
If I copy paste the statement outside of the loop it works, if I try to loop over all statements I get an error.
# Function to create statement
def psycopg2_regex_replace_chars(table, col, regex_chars_old, char_new):
query = "UPDATE {} SET {} = regexp_replace({}, %s , %s, 'g');".format(table, col, col)
data = (regex_chars_old, char_new)
return (query, data)
# Create functions with intelligible names
replace_separators_with_space = partial(psycopg2_regex_replace_chars,regex_chars_old='[.,/[-]]',char_new=' ')
replace_amper_with_and = partial(psycopg2_regex_replace_chars, regex_chars_old='&', char_new='and')
# create funcs_list
funcs_edit = [replace_separators_with_space,
replace_amper_with_and]
So far, so good.
This works
stmt = "UPDATE persons SET name = regexp_replace(name, %s , %s, 'g');"
data = ('[^a-zA-z0-9]', ' ')
cur.execute(stmt, data)
conn.commit()
This fails
tables = ["persons"]
cols = ["name", "dob"]
for table in tables:
for col in cols:
for func in funcs_edit:
query, data = func(table=table, col=col)
cur.execute(query, data)
conn.commit()
error
<ipython-input-92-c8ba5d469f88> in <module>
6 for func in funcs_edit:
7 query, data = func(table=table, col=col)
----> 8 cur.execute(query, data)
9 conn.commit()
ProgrammingError: function regexp_replace(date, unknown, unknown, unknown) does not exist
LINE 1: UPDATE persons SET dob = regexp_replace(dob, '[.,/[-]]' , ' ...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.```

Python, dataframe sql join

I have two python functions that query a database directly.
Is there a way to join these 2 functions within python?
I want to do a couple of joins not really sure how to do that in python.
Query 1:
def query1(businessDate):
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
#businessDate = r"'2019-03-13'"
#remember business date should be entered like "'2019-03-13'"
sql = f"""
SELECT
iddate,
businessdate,
stack, identifier
FROM stackoverflow
where stack is not null
and businessdate = {businessDate}
"""
df_stack = pd.read_sql(sql,con)
con.close()
return(df_stack)
query 2:
def superuser(businessDate):
con = pyodbc.connect(r'DSN='+'super',autocommit=True)
print('working')
#remember business date should be entered like "'2019-03-13'"
sql = f"""
SELECT
iddate,
businessdate,
stack, identifier
FROM superuser
WHERE stack is not null
and businessdate = {businessDate}
"""
df_super = pd.read_sql(sql,con)
con.close()
return(df_super)
I'd want to do a left outer join table 1 with table 2 on identifier, stack, iddate and businessdate
Trying:
def testjoin():
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
pd.merge(df_stack,df_super, on = ['identifier','stack','iddate'])
df_test = pd.read_sql(sql,con)
con.close()
return(df_test)
trying 2:
def testjoin():
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
df_stack= query1("'2019-03-13'")
df_super= superuser("'2019-03-13'")
pd.merge(df_stack,df_super, on = ['identifier','stack','iddate'])
df_test = pd.read_sql(sql,con)
con.close()
return(df_test)
getting error name 'sql' is not defined'
Left Outer Join
SELECT *
FROM df_stack
LEFT OUTER JOIN df_super
ON df_stack.stack = df_super.stack
ON df_stack.identifier= df_super.identifier
ON df_stack.iddate = df_super.iddate
ON df_stack.businessdate = df_super.businessdate;
pd.merge(df_stack,df_super,
on=['iddate','businessdate', 'stack', 'identifier'],
how='left')
Ok I'm going to post this as an answer instead of the comments as there are several ways which to do what you are asking. sql is not defined as I noted in the comment because it is outside the scope of the function testjoin().
One way to is to treat the SQL string as a global variable and then access it inside the function.
sql = '''
SELECT
iddate,
businessdate,
stack, identifier
FROM stackoverflow
where stack is not null
and businessdate = {businessDate}
'''
def testjoin():
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
df_stack= query1("'2019-03-13'")
df_super= superuser("'2019-03-13'")
pd.merge(df_stack,df_super, on = ['identifier','stack','iddate'])
df_test = pd.read_sql(sql.format(businessDate="'2019-03-14'"),con)
con.close()
return(df_test)
The reason I used .format() instead of an f string is that an f string requires the variable to be declared at the time the f string is created. Which if you did not have businessdate as a variable it would be an error. .format() allows you to place the variable in the string and then change it's value whenever you want. I would do it this way if the main part of your query isn't going to change that much and you just need to filter by date.
The other way would to build the string outside the function and then pass it in as a parameter
businessdate = "'2019-03-13'"
sql = f'''
SELECT
iddate,
businessdate,
stack, identifier
FROM stackoverflow
where stack is not null
and businessdate = {businessDate}
'''
def testjoin(sql_string):
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
df_stack= query1("'2019-03-13'")
df_super= superuser("'2019-03-13'")
pd.merge(df_stack,df_super, on = ['identifier','stack','iddate'])
df_test = pd.read_sql(sql_string,con)
con.close()
return(df_test)
test_df = testjoin(sql)
You could also continue building the string inside each function as well but given the Don't Repeat Yourself paradigm in coding, and the fact you are already building it in two other functions it would be best to avoid that.

Resources