Psycopg2 upsert multiple rows with one query - python-3.x

I have table in postgresql with fields id (unique) and val.
I want to execute something like this:
INSERT INTO my_table (id, val)
VALUES (%(id)s, %(val)s)
ON CONFLICT(id) DO UPDATE
SET val = val + %(val)s
Data to insert is like [{"id": 123, "val": 5}, {"id": 456, "val": 8}, ...]. Is there any way to upsert all of these with one query?
cursor.executemany won't do, it's the same as to make queries with all of these dicts in a loop one after another.
Without ON CONFLICT DO UPDATE I could just do something like "insert into mytable (id, val) values " + ', '.join(['(%s, %s)'] * len(data)) and transform data to the list [id1, val1, id2, val2, ...]. But I've no idea how to combine multiple values to insert and update statement.

I have the same problem. After a while searching, I found two post:
Posgresql - upsert: https://dba.stackexchange.com/questions/167591/postgresql-psycopg2-upsert-syntax-to-update-columns
Psycopg2 - insert many rows: psycopg2: insert multiple rows with one query
=> So the answer for you is:
# my table: cas(address, id, description) - address is primary key
data = [('0x18f9f00a432F50c6E2429d31776724d3cB873BEF', '1000', 'mot ngan'),
('0x06471C53CE649Eb4dA88b792D500544A7E5C9635', '2000', 'hai ngan')]
args = [cur.mogrify('(%s, %s, %s)', x).decode('utf-8')
for x in data]
args_str = ', '.join(args)
cur.execute('''INSERT INTO cas (address, id, description) VALUES'''
+ args_str +
'''ON CONFLICT (address) DO UPDATE SET
(address, id, description) = (EXCLUDED.address, EXCLUDED.id, EXCLUDED.description)''')

Related

Inserting row with many columns of different datatypes into postgresql with psycopg2

This question is about constructing an insertion SQL statement of a single record into a table that has many columns (135 in my case).
Before anyone goes into analyzing why so many columns, let me simplify: I'm attempting to ingest raw data with the least modification possible, and the raw data has 135 columns.
Now, following this guide, a simple way to insert a record is this:
import psycopg2
con = psycopg2.connect(<your db credentials>)
cur = con.cursor()
cur.execute("INSERT INTO STUDENT (ADMISSION,NAME,AGE,COURSE,DEPARTMENT) VALUES (3420, 'John', 18, 'Computer Science', 'ICT')");
Also note that if we're inserting a record without omitting any columns, then we don't need to specify the column names more details here:
cur.execute("INSERT INTO STUDENT VALUES (3420, 'John', 18, 'Computer Science', 'ICT')");
Should our data be kept in python variables, psycopg2 allows us to do this:
admission = 3420
name = 'John'
age = 18
course = 'Computer Science'
department = 'ICT'
cur.execute("INSERT INTO STUDENT VALUES (%s, %s, %s, %s, %s)",(admission, name, age, course, department))
But what is the recommended way of inserting a record with 135 attributes?
While my immediate intuition was to construct the SQL query myself, the docs do point out:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
So, to sum it up: how do I ingest raw data with an arbitrary number of columns into a table?
It looks like using psycopg2.sql.Placeholder does the trick.
From the example:
>>> names = ['foo', 'bar', 'baz']
>>> q1 = sql.SQL("insert into table ({}) values ({})").format(
... sql.SQL(', ').join(map(sql.Identifier, names)),
... sql.SQL(', ').join(sql.Placeholder() * len(names)))
>>> print(q1.as_string(conn))
insert into table ("foo", "bar", "baz") values (%s, %s, %s)
>>> q2 = sql.SQL("insert into table ({}) values ({})").format(
... sql.SQL(', ').join(map(sql.Identifier, names)),
... sql.SQL(', ').join(map(sql.Placeholder, names)))
>>> print(q2.as_string(conn))
insert into table ("foo", "bar", "baz") values (%(foo)s, %(bar)s, %(baz)s)
Therefore I guess I can do something like:
cols = ['ADMISSION', 'NAME', 'AGE', 'COURSE', 'DEPARTMENT']
row = [admission, name, age, course, department]
insertion_query = sql.SQL("INSERT INTO STUDENT VALUES ({})").format(sql.SQL(', ').join(map(sql.Placeholder() * len(cols)))
cur.execute(insertion_query, row)

SELECT rows with primary key of multiple columns

How do I select all relevant records according to the provided list of pairs?
table:
CREATE TABLE "users_groups" (
"user_id" INTEGER NOT NULL,
"group_id" BIGINT NOT NULL,
PRIMARY KEY (user_id, group_id),
"permissions" VARCHAR(255)
);
For example, if I have the following JavaScript array of pairs that I should get from DB
[
{user_id: 1, group_id: 19},
{user_id: 1, group_id: 11},
{user_id: 5, group_id: 19}
]
Here we see that the same user_id can be in multiple groups.
I can pass with for-loop over every array element and create the following query:
SELECT * FROM users_groups
WHERE (user_id = 1 AND group_id = 19)
OR (user_id = 1 AND group_id = 11)
OR (user_id = 5 AND group_id = 19);
But is this the best solution? Let say if the array is very long. As I know query length may get ~1GB.
what is the best and quick solution to do this?
Bill Karwin's answer will work for Postgres just as well.
However, I have made the experience that joining against a VALUES clause is very often faster than a large IN list (with hundreds if not thousands of elements):
select ug.*
from user_groups ug
join (
values (1,19), (1,11), (5,19), ...
) as l(uid, guid) on l.uid = ug.user_id and l.guid = ug.group_id;
This assumes that there are no duplicates in the values provided, otherwise the JOIN would result in duplicated rows, which the IN solution would not do.
You tagged both mysql and postgresql, so I don't know which SQL database you're really using.
MySQL at least supports tuple comparisons:
SELECT * FROM users_groups WHERE (user_id, group_id) IN ((1,19), (1,11), (5,19), ...)
This kind of predicate can be optimized in MySQL 5.7 and later. See https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html#row-constructor-range-optimization
I don't know whether PostgreSQL supports this type of predicate, or if it optimizes it.

Is there a way to bind parameters to db2 dialect(ibm_db_sa) compiled query after compiling?

I am trying to compiled query using db2 dialect ibm_db_sa. After compiling, it binds '?' instead of parameter.
I have tried same for MSSQL and Oracle dialects, they are giving expected results.
import ibm_db_sa
from sqlalchemy import bindparam
from sqlalchemy import Table, MetaData, Column, Integer
tab = Table('customers', MetaData(), Column('cust_id', Integer,
primary_key=True))
stmt = select([tab]).where(literal_column('cust_id') ==
bindparam('cust_id'))
ms_sql = stmt.compile(dialect=mssql.dialect())
oracle_q = stmt.compile(dialect=oracle.dialect())
db2 = stmt.compile(dialect=ibm_db_sa.dialect())
If i print all 3 queries, will output:
MSSQL => SELECT customers.cust_id FROM customers WHERE cust_id = :cust_id
Oracle => SELECT customers.cust_id FROM customers WHERE cust_id = :cust_id
DB2 => SELECT customers.cust_id FROM customers WHERE cust_id = ?
Is there any way to get DB2 query same as others ?
The docs that you reference have that solution:
In the case that a plain SQL string is passed, and the underlying
DBAPI accepts positional bind parameters, a collection of tuples or
individual values in *multiparams may be passed:
conn.execute(
"INSERT INTO table (id, value) VALUES (?, ?)",
(1, "v1"), (2, "v2")
)
conn.execute(
"INSERT INTO table (id, value) VALUES (?, ?)",
1, "v1"
)
For Db2, you just pass a comma-separated list of values as documented in the 2nd example:
conn.execute(stmt,1, "2nd value", storeID, whatever)

sqlite3 update/adding data to new column

I made new column with NULL values called 'id' in table. Now I want to add data to it from list. It holds about 130k elements.
I tried with insert, it returned error:
conn = create_connection(xml_db)
cursor = conn.cursor()
with conn:
cursor.execute("ALTER TABLE xml_table ADD COLUMN id integer")
for data in ssetId:
cursor.execute("INSERT INTO xml_table(id) VALUES (?)", (data,))
conn.commit()
I also tried with update:
conn = create_connection(xml_db)
cursor = conn.cursor()
with conn:
cursor.execute("ALTER TABLE xml_table ADD COLUMN id INTEGER")
for data in ssetId:
cursor.execute("UPDATE xml_table SET ('id' = ?)", (data,))
conn.commit()
What is incorrect here ?
EDIT for clarification.
The table was already existing, filled with data. I want to add column 'id' with custom values to it.
Heres an example similar to yours which may be useful.
import sqlite3
conn = sqlite3.connect("xml.db")
cursor = conn.cursor()
with conn:
# for testing purposes, remove this or else the table gets dropped whenever the file is loaded
cursor.execute("drop table if exists xml_table")
# create table with some other field
cursor.execute("create table if not exists xml_table (other_field integer not null)")
for other_data in range(5):
cursor.execute("INSERT INTO xml_table (other_field) VALUES (?)", (other_data,))
# add id field
cursor.execute("ALTER TABLE xml_table ADD COLUMN id integer")
# make sure the table exists
res = cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
print("Table Name: {}".format(res.fetchone()[0]))
# add data to the table
for data in range(5):
cursor.execute("UPDATE xml_table SET id = ? WHERE other_field = ?", (data, data))
# if you must insert an id, you must specify a other_field value as well, since other_field must be not null
cursor.execute("insert into xml_table (id, other_field) VALUES (? ,?)", (100, 105))
# make sure data exists
res = cursor.execute("SELECT id, other_field FROM xml_table")
for id_result in res:
print(id_result)
conn.commit()
conn.close()
As I stated in the comment below, since one of your rows has a NOT NULL constraint on it, no rows can exist in the table that have that column NULL. In the example above other_field is specified NOT NULL, therefore there can be no rows that have NULL values in the column other_field. Any deviation from this would be an IntegrityError.
Output:
Table Name: xml_table
(0, 0)
(1, 1)
(2, 2)
(3, 3)
(4, 4)
(100, 105)

How to make sqlite3 module not converting column data to integer type

I'm trying to read data from a sqlite3 database using python3 and it looks as it tries to be smart and convert columns looking like a integer to integer type. I don't want that (if I got it right sqlite3 stores data as text no matter what anyway).
I've created the database as:
sqlite> create table t (id integer primary key, foo text, bar datetime);
sqlite> insert into t values (NULL, 1, 2);
sqlite> insert into t values (NULL, 1, 'fubar');
sqlite> select * from t;
1|1|2
2|1|fubar
and tried to read it using:
db = sqlite3.connect(dbfile)
cur = db.cursor()
cur.execute("SELECT * FROM t")
for l in cur:
print(t)
db.close()
And getting output like:
(1, '1', 2)
(2, '1', 'fubar')
but I expected/wanted something like
('1', '1', '2')
('2', '1', 'fubar')
(definitely for the last column)
Try
for l in cur:
print((str(x) for x in t))
SQLite stores values in whatever affinity the column has.
If you do not want to have numbers, don't use datetime but text.

Resources