Insert list of integers into postgres table with psycopg2 - python-3.x

Given a list of integers, I would like to insert every integer into a new row in a Postgres table, ideally in a very efficient way (i.e. not looping through and inserting 1-by-1). arr = [1,2,3,4,5]. What I've tried doing is converting this to a list of tuples, arr2 = [(i,) for i in arr], and then feeding this into postgres with cur.execute("INSERT INTO my_table (my_value) VALUES (%s)", arr2, but I am receiving an error: Not all arguments converted during string formatting`. What exactly am I doing wrong here?
Full code
import psycopg2
conn = psycopg2.connect(host="myhost", database="mydb", user="postgres", password="password", port="5432")
cur = conn.cursor()
arr = [1,2,3,4,5]
arr2 = [(i,) for i in arr]
cur.execute("INSERT INTO my_table (my_value) VALUES (%s)", arr2

I am not familiar yet with psycopg2, working on it, but a ways to go. So I'll give the pure sql version. Postgres has a a pretty good set of built in array functions, one being UNNEST(). That function takes a array as a parameter and returns the individual entries. So you just need to provide an array to the query. (see demo).
insert into my_table(my_column)
select unnest( array [1,2,3,4,5] );
Borrowing (ie copying) your code perhaps:
import psycopg2
conn = psycopg2.connect(host="myhost", database="mydb", user="postgres", password="password", port="5432")
cur = conn.cursor()
arr = [1,2,3,4,5]
cur.execute("insert into my_table (my_column) select unnest (array [%s])", arr
But I am not sure if that gets the Postgres Array structure; it neede the [].

What exactly am I doing wrong here?
You are trying to insert a list of integers into a single row.
Instead, use execute_values() to insert many rows in a single query. Do not forget to commit the insert:
#...
cur = conn.cursor()
arr = [1,2,3,4,5]
arr2 = [(i,) for i in arr]
from psycopg2.extras import execute_values
execute_values(cur, "INSERT INTO my_table (my_value) VALUES %s", arr2)
conn.commit() # important!

Related

Multiple WHERE conditions in Pandas read_sql

I've got my data put into an SQLite3 database, and now I'm trying to work on a little script to access data I want for given dates. I got the SELECT statement to work with the date ranges, but I can't seem to add another condition to fine tune the search.
db columns id, date, driverid, drivername, pickupStop, pickupPkg, delStop, delPkg
What I've got so far:
import pandas as pd
import sqlite3
sql_data = 'driverperformance.sqlite'
conn = sqlite3.connect(sql_data)
cur = conn.cursor()
date_start = "2021-12-04"
date_end = "2021-12-10"
df = pd.read_sql_query("SELECT DISTINCT drivername FROM DriverPerf WHERE date BETWEEN :dstart and :dend", params={"dstart": date_start, "dend": date_end}, con=conn)
drivers = df.values.tolist()
for d in drivers:
driverDF = pd.read_sql_query("SELECT * FROM DriverPerf WHERE drivername = :driver AND date BETWEEN :dstart and :dend", params={"driver": d, "dstart": date_start, "dend": date_end}, con=conn)
I've tried a few different versions of the "WHERE drivername" part but it always seems to fail.
Thanks!
If I'm not mistaken, drivers will be a list of lists. Have you tried
.... params={"driver": d[0] ....

Error while getting user input and using Pandas DataFrame to extract data from LEFT JOIN

I am trying to create Sqlite3 statement in Python 3 to collect data from two tables called FreightCargo & Train where a train ID is the input value. I want to use Pandas since its easy to read the tables.
I have created the code below which is working perfectly fine, but its static and looks for only one given line in the statement.
import pandas as pd
SQL = '''SELECT F.Cargo_ID, F.Name, F.Weight, T.Train_ID, T.Assembly_date
FROM FreightCargo F LEFT JOIN [Train] T
ON F.Cargo_ID = T.Cargo_ID
WHERE Train_ID = 2;'''
cursor = conn.cursor()
cursor.execute( SQL )
names = [x[0] for x in cursor.description]
rows = cursor.fetchall()
Temp = pd.DataFrame( rows, columns=names)
Temp'''
I want to be able to create a variable with an input. The outcome of this action will then be determined with what has been given from the user. For example the user is asked for a train_id which is a primary key in a table and the relations with the train will be listed.
I expanded the code, but I am getting an error: ValueError: operation parameter must be str
Train_ID = input('Train ID')
SQL = '''SELECT F.Cargo_ID, F.Name, F.Weight, T.Train_ID, T.Assembly_date
FROM FreightCargo F LEFT JOIN [Train] T
ON F.Cargo_ID = T.Cargo_ID
WHERE Train_ID = ?;''', (Train_ID)
cursor = conn.cursor()
cursor.execute( SQL )
names = [x[0] for x in cursor.description]
rows = cursor.fetchall()
Temp = pd.DataFrame( rows, columns=names)
Temp
The problem lays in your definition of the SQL variable.
You are creating a tuple/collection of two elements. If you print type(SQL) you will see something like this: ('''SELECT...?;''', ('your_user's_input')).
When you pass this to cursor.execute(sql[, parameters]), it is expecting a string as the first argument, with the "optional" parameters. Your parameters are not really optional, since they are defined by your SQL-query's [Train]. Parameters must be a collection, for example a tuple.
You can unwrap your SQL statement with cursor.execute(*SQL), which will pass each element of your SQL list as a different argument, or you can move the parameters to the execute function.
Train_ID = input('Train ID')
SQL = '''SELECT F.Cargo_ID, F.Name, F.Weight, T.Train_ID, T.Assembly_date
FROM FreightCargo F LEFT JOIN [Train] T
ON F.Cargo_ID = T.Cargo_ID
WHERE Train_ID = ?;'''
cursor = conn.cursor()
cursor.execute( SQL, (Train_ID,) )
names = [x[0] for x in cursor.description]
rows = cursor.fetchall()
Temp = pd.DataFrame( rows, columns=names)
Temp

An issue with inserting blob data into SQL tables

I'm trying to create a code piece that inserts an object I've created to store data in a very specific way into an SQL table as a blob type, and it keeps giving me an ' sqlite3.InterfaceError: Error binding parameter 1 - probably unsupported type.' error.
Has any of you encountered something similar before? Do you have any ideas how to deal with it?
conn = sqlite3.connect('my_database.db')
c = conn.cursor()
params = (self.question_id, i) #i is the object in question
c.execute('''
INSERT INTO '''+self.current_test_name+''' VALUES (?, ?)
''',params)
conn.commit()
conn.close()
For starters, this would be a more appropriate execute statement as it is way cleaner:
c.execute("INSERT INTO "+self.current_test_name+" VALUES (?, ?)", (self.question_id, i))
You are also missing the table you are inserting into (or the columns if self.current_test_name is the table name.)
Also, Is the column in the database setup to handle the data type for the provided input for self.question_id and i? (Not expecting TEXT when you provided INT?)
Example of a working script to insert into a table that has 2 columns named test and test2:
import sqlite3
conn = sqlite3.connect('my_database.db')
c = conn.cursor()
c.execute("CREATE TABLE IF NOT EXISTS test(test INT, test2 INT)")
conn.commit()
for i in range(10):
params = (i, i) # i is the object in question
c.execute("INSERT INTO test (test, test2) VALUES (?, ?)", params)
conn.commit()
conn.close()

Python, dataframe sql join

I have two python functions that query a database directly.
Is there a way to join these 2 functions within python?
I want to do a couple of joins not really sure how to do that in python.
Query 1:
def query1(businessDate):
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
#businessDate = r"'2019-03-13'"
#remember business date should be entered like "'2019-03-13'"
sql = f"""
SELECT
iddate,
businessdate,
stack, identifier
FROM stackoverflow
where stack is not null
and businessdate = {businessDate}
"""
df_stack = pd.read_sql(sql,con)
con.close()
return(df_stack)
query 2:
def superuser(businessDate):
con = pyodbc.connect(r'DSN='+'super',autocommit=True)
print('working')
#remember business date should be entered like "'2019-03-13'"
sql = f"""
SELECT
iddate,
businessdate,
stack, identifier
FROM superuser
WHERE stack is not null
and businessdate = {businessDate}
"""
df_super = pd.read_sql(sql,con)
con.close()
return(df_super)
I'd want to do a left outer join table 1 with table 2 on identifier, stack, iddate and businessdate
Trying:
def testjoin():
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
pd.merge(df_stack,df_super, on = ['identifier','stack','iddate'])
df_test = pd.read_sql(sql,con)
con.close()
return(df_test)
trying 2:
def testjoin():
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
df_stack= query1("'2019-03-13'")
df_super= superuser("'2019-03-13'")
pd.merge(df_stack,df_super, on = ['identifier','stack','iddate'])
df_test = pd.read_sql(sql,con)
con.close()
return(df_test)
getting error name 'sql' is not defined'
Left Outer Join
SELECT *
FROM df_stack
LEFT OUTER JOIN df_super
ON df_stack.stack = df_super.stack
ON df_stack.identifier= df_super.identifier
ON df_stack.iddate = df_super.iddate
ON df_stack.businessdate = df_super.businessdate;
pd.merge(df_stack,df_super,
on=['iddate','businessdate', 'stack', 'identifier'],
how='left')
Ok I'm going to post this as an answer instead of the comments as there are several ways which to do what you are asking. sql is not defined as I noted in the comment because it is outside the scope of the function testjoin().
One way to is to treat the SQL string as a global variable and then access it inside the function.
sql = '''
SELECT
iddate,
businessdate,
stack, identifier
FROM stackoverflow
where stack is not null
and businessdate = {businessDate}
'''
def testjoin():
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
df_stack= query1("'2019-03-13'")
df_super= superuser("'2019-03-13'")
pd.merge(df_stack,df_super, on = ['identifier','stack','iddate'])
df_test = pd.read_sql(sql.format(businessDate="'2019-03-14'"),con)
con.close()
return(df_test)
The reason I used .format() instead of an f string is that an f string requires the variable to be declared at the time the f string is created. Which if you did not have businessdate as a variable it would be an error. .format() allows you to place the variable in the string and then change it's value whenever you want. I would do it this way if the main part of your query isn't going to change that much and you just need to filter by date.
The other way would to build the string outside the function and then pass it in as a parameter
businessdate = "'2019-03-13'"
sql = f'''
SELECT
iddate,
businessdate,
stack, identifier
FROM stackoverflow
where stack is not null
and businessdate = {businessDate}
'''
def testjoin(sql_string):
con = pyodbc.connect(r'DSN='+'Stack',autocommit=True)
print('working')
df_stack= query1("'2019-03-13'")
df_super= superuser("'2019-03-13'")
pd.merge(df_stack,df_super, on = ['identifier','stack','iddate'])
df_test = pd.read_sql(sql_string,con)
con.close()
return(df_test)
test_df = testjoin(sql)
You could also continue building the string inside each function as well but given the Don't Repeat Yourself paradigm in coding, and the fact you are already building it in two other functions it would be best to avoid that.

Python SQLite3: Select from table by multiple keys and list of key values, using parametrization if possible

If I query SqLite table using single key, I can use the following code for parametrization:
contact_phones_list = ['+123456789', '+987654321']
q = "select * from {} WHERE user_phone in ({})".format(
my_table_name,
', '.join('?' for _ in contact_phones_list)
)
res = self.cursor.execute(q, contact_phones_list).fetchall()
Now I want to query for key pairs for which I have values:
keys = ['user_phone', 'contact_phone']
values = [('+1234567', '+1000000'), ('+987654321', '+1200000')]
q = "select contact_phone, is_main, aliases from {} WHERE ({}) in ({})".format(
my_table_name,
', '.join(keys),
', '.join('(?, ?)' for _ in values)
)
res = self.cursor.execute(q, values).fetchall()
I'm getting error "row value misused". I tried many combinations with sublist instead of tuple, single "?", etc.
How can I create parametrization in this case?
EDIT: adding "VALUES" keyword and flattening list works:
keys = ['user_phone', 'contact_phone']
values = [('+1234567', '+1000000'), ('+987654321', '+1200000')]
values_q = []
for v in values:
values_q += [v[0], v[1]]
q = "select * from my_table_name WHERE ({}) IN (VALUES {})".format(
', '.join(keys),
', '.join('(?, ?)' for _ in values)
)
res = cursor.execute(q, values_q).fetchall()
Is this a workaround or only acceptable solution?
From the documentation:
For a row-value IN operator, the left-hand side (hereafter "LHS") can be either a parenthesized list of values or a subquery with multiple columns. But the right-hand side (hereafter "RHS") must be a subquery expression.
You're building up something that looks like (?,?) IN ((?,?), (?,?)), which doesn't meet that requirement. The syntax (?,?) IN (VALUES (?,?), (?,?)) works, though.
Also, I think you might have to flatten out that list of tuples you pass to the prepared statement, but somebody more knowledgeable about python would have to say for sure.

Resources