Cannot update existing row on conflict in PostgreSQL with Psycopg2 - python-3.x

I have the following function defined to insert several rows with iteration in Python using Psycopg2 and PostgreSQL 11.
When I receive the same obj (with same id), I want to update its date.
def insert_execute_values_iterator(
connection,
objs: Iterator[Dict[str, Any]],
page_size: int = 1000,
) -> None:
with connection.cursor() as cursor:
try:
psycopg2.extras.execute_values(cursor, """
INSERT INTO objs(\
id,\
date,\
) VALUES %s \
ON CONFLICT (id) \
DO UPDATE SET (date) = (EXCLUDED.date) \
""", ((
obj['id'],
obj['date'],
) for obj in objs), page_size=page_size)
except (Exception, Error) as error:
print("Error while inserting as in database", error)
When a conflict happens on the unique primary key of the table while inserting an element, I get the error:
Error while inserting as in database ON CONFLICT DO UPDATE command
cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
FYI, the clause works on PostgreSQL directly but not from the Python code.

Use unique VALUE-combinations in your INSERT statement:
create table foo(id int primary key, date date);
This should work:
INSERT INTO foo(id, date)
VALUES(1,'2021-02-17')
ON CONFLICT(id)
DO UPDATE SET date = excluded.date;
This one won't:
INSERT INTO foo(id, date)
VALUES(1,'2021-02-17') , (1, '2021-02-16') -- 2 conflicting rows
ON CONFLICT(id)
DO UPDATE SET date = excluded.date;
DEMO
You can fix this by using DISTINCT ON() in a SELECT statement:
INSERT INTO foo(id, date)
SELECT DISTINCT ON(id) id, date
FROM (VALUES(1,CAST('2021-02-17' AS date)) , (1, '2021-02-16')) s(id, date)
ORDER BY id, date ASC
ON CONFLICT(id)
DO UPDATE SET date = excluded.date;

Related

Auto update timestamp on update

I have created a table where its following code is:
CREATE TABLE products (
id SERIAL,
store_id INTEGER NOT NULL,
url TEXT UNIQUE,
visible BOOLEAN,
added_date timestamp without time zone NOT NULL DEFAULT NOW(),
PRIMARY KEY(id, store_id)
);
and together with Peewee and I did figure out that on update, the timestamp doesn't automatically updates whenever I do an update on a row by doing e.g.
class Products(Model):
id = IntegerField(column_name='id')
store_id = TextField(column_name='store_id')
url = TextField(column_name='url')
visible = BooleanField(column_name='visible')
added_date = TimestampField(column_name='added_date')
class Meta:
database = postgres_pool
db_table = "products"
#classmethod
def on_exists(cls, pageData):
try:
if exists := cls.select().where((cls.store_id == Stores.store_id) & (cls.url == pageData.url)).exists():
# On update execution it should refresh the added_time to current time
cls.update(visible=False).where((cls.store_id == Stores.store_id) & (cls.url == pageData.url)).execute()
return exists
except Products.DoesNotExist:
return None
However my problem is that everytime I am using this command and successfully update. the added_time timestamp does not seem to update its timestamp. My question is, how can it automatically update its timestamp on update?
This is a job for a trigger:
CREATE FUNCTION upd_ts() RETURNS trigger
LANGUAGE plpgsql AS
$$BEGIN
NEW.added_date = current_timestamp;
RETURN NEW;
END;$$;
CREATE TRIGGER upd_ts BEFORE UPDATE ON products
FOR EACH ROW EXECUTE PROCEDURE upd_ts();
The trigger will fire before the rows are modified and change the row about to be inserted.
See the documentation for all the details.

Merge in Spark SQL - WHEN NOT MATCHED BY SOURCE THEN

I am coding Python and Spark SQL in Databricks and I am using spark 2.4.5.
I have two tables.
Create table IF NOT EXISTS db_xsi_ed_faits_shahgholi_ardalan.Destination
(
id Int,
Name string,
Deleted int
) USING Delta;
Create table IF NOT EXISTS db_xsi_ed_faits_shahgholi_ardalan.Source
(
id Int,
Name string,
Deleted int
) USING Delta;
I need to ran a Merge command between my source and destination. I wrote below command
%sql
MERGE INTO db_xsi_ed_faits_shahgholi_ardalan.Destination AS D
USING db_xsi_ed_faits_shahgholi_ardalan.Source AS S
ON (S.id = D.id)
-- UPDATE
WHEN MATCHED AND S.Name <> D.Name THEN
UPDATE SET
D.Name = S.Name
-- INSERT
WHEN NOT MATCHED THEN
INSERT (id, Name, Deleted)
VALUES (S.id, S.Name, S.Deleted)
-- DELETE
WHEN NOT MATCHED BY SOURCE THEN
UPDATE SET
D.Deleted = 1
When i ran this command i have below error:
It seems that we do not have NOT MATCHED BY SOURCE in spark! I need a solution to do that.
I wrote this code but still i am looking for better approach
%sql
MERGE INTO db_xsi_ed_faits_shahgholi_ardalan.Destination AS D
USING db_xsi_ed_faits_shahgholi_ardalan.Source AS S
ON (S.id = D.id)
-- UPDATE
WHEN MATCHED AND S.Name <> D.Name THEN
UPDATE SET
D.Name = S.Name
-- INSERT
WHEN NOT MATCHED THEN
INSERT (id, Name, Deleted)
VALUES (S.id, S.Name, S.Deleted)
;
%sql
-- Logical delete
UPDATE db_xsi_ed_faits_shahgholi_ardalan.Destination
SET Deleted = 1
WHERE db_xsi_ed_faits_shahgholi_ardalan.Destination.id in
(
SELECT
D.id
FROM db_xsi_ed_faits_shahgholi_ardalan.Destination AS D
LEFT JOIN db_xsi_ed_faits_shahgholi_ardalan.Source AS S ON (S.id = D.id)
WHERE S.id is null
)

How to sort python Sqlite database in Ascending order based on a particular column?

I have python-3.7 and also sqlite module.
If I want to sort the rows according to ascending order of a particular column how do I do it?
All the columns contain strings. I want to sort the database according to 'name' column's strings' ascending order?
def sort():
try:
sqliteConnection = sqlite3.connect('SQLite_Python.db');
cursor = sqliteConnection.cursor();
print("Connected to SQLite");
sqlite_select_query = """SELECT * FROM 'SqliteDb_Addresser'
ORDER BY ('name') ASC;""";
cursor.execute(sqlite_select_query);
print("SQLite DB is sorted in ascending order succesfully.");
cursor.close();
except sqlite3.Error as error:
print("Failed to insert Python variable into sqlite table", error);
finally:
if (sqliteConnection):
sqliteConnection.close();
print("The SQLite connection is closed");
SELECT * FROM SqliteDb_Addresser ORDER BY name COLLATE NOCASE ASC
This ignores letter case

How to apply DISTINCT on only date part of datetime field in sqlalchemy python?

I need to query my database and return the result by applying Distinct on only date part of datetime field.
My code is:
#blueprint.route('/<field_id>/timeline', methods=['GET'])
#blueprint.response(field_timeline_paged_schema)
def get_field_timeline(
field_id,
page=1,
size=10,
order_by=['capture_datetime desc'],
**kwargs
):
session = flask.g.session
field = fetch_field(session, parse_uuid(field_id))
if field:
query = session.query(
func.distinct(cast(Capture.capture_datetime, Date)),
Capture.capture_datetime.label('event_date'),
Capture.tags['visibility'].label('visibility')
).filter(Capture.field_id == parse_uuid(field_id))
return paginate(
query=query,
order_by=order_by,
page=page,
size=size
)
However this returns the following error:
(psycopg2.errors.InvalidColumnReference) for SELECT DISTINCT, ORDER BY expressions must appear in select list
The resulting query is:
SELECT distinct(CAST(tenant_resson.capture.capture_datetime AS DATE)) AS distinct_1, CAST(tenant_resson.capture.capture_datetime AS DATE) AS event_date, tenant_resson.capture.tags -> %(tags_1)s AS visibility
FROM tenant_resson.capture
WHERE tenant_resson.capture.field_id = %(field_id_1)s
Error is:
Query error - {'error': ProgrammingError('(psycopg2.errors.InvalidColumnReference) SELECT DISTINCT ON expressions must match initial ORDER BY expressions\nLINE 2: FROM (SELECT DISTINCT ON (CAST(tenant_resson.capture.capture...\n ^\n',)
How to resolve this issue? Cast is not working for order_by.
I am not familiar with sqlalchemy but this resulting query works as you expect. Please note the DISTINCT ON.
Maybe there is a way in sqlalchemy to execute non-trivial parameterized queries? This would give you the extra benefit to be able to test and optimize the query upfront.
SELECT DISTINCT ON (CAST(tenant_resson.capture.capture_datetime AS DATE))
CAST(tenant_resson.capture.capture_datetime AS DATE) AS event_date,
tenant_resson.capture.tags -> %(tags_1)s AS visibility
FROM tenant_resson.capture
WHERE tenant_resson.capture.field_id = %(field_id_1)s;
You can order by event_date if your business logic needs.
The query posted by #Stefanov.sm is correct. In SQLAlchemy terms it would be
query = (
session.query(
Capture.capture_datetime.label('event_date'),
Capture.tags['visibility'].label('visibility')
).distinct(cast(Capture.capture_datetime, Date))\
.filter(Capture.field_id == parse_uuid(field_id))
)
See the docs for more information
I needed to add order_by to my query. Now it works fine.
query = session.query(
cast(Capture.capture_datetime, Date).label('event_date'),
Capture.tags['visibility'].label('visibility')
).filter(Capture.field_id == parse_uuid(field_id)) \
.distinct(cast(Capture.capture_datetime, Date)) \
.order_by(cast(Capture.capture_datetime, Date).desc())

Inserting/Updating sqlite table from python program

I have a sqlite3 table as shown below
Record(WordID INTEGER PRIMARY KEY, Word TEXT, Wordcount INTEGER, Docfrequency REAL).
I want to create and insert data into this table if the table not exists else I want to update the table in such a way that only 'Wordcount' column get updated on the basis(Reference) of data in the column 'Word'. I am trying to execute this from a python program like
import sqlite3
conn = sqlite3.connect("mydatabase")
c = conn.cursor()
#Create table
c.execute("CREATE TABLE IF NOT EXISTS Record(WordID INTEGER PRIMARY KEY, Words TEXT, Wordcount INTEGER, Docfrequency REAL)")
#Update table
c.execute("UPDATE TABLE IF EXISTS Record")
#Insert a row of data
c.execute("INSERT INTO Record values (1,'wait', 9, 10.0)")
c.execute("INSERT INTO Record values (2,'Hai', 5, 6.0)")
#Updating data
c.execute("UPDATE Record SET Wordcount='%d' WHERE Words='%s'" %(11,'wait') )
But I can't update the table. On running the program I am getting the error message as
c.execute("UPDATE TABLE IF EXISTS Record")
sqlite3.OperationalError: near "TABLE": syntax error
How should I write the code to update the table ?
Your SQL query for UPDATE is invalid - see the documentation.
Also, I don't understand why you'd want to check for the table's existence when updating, given that just before that you're creating it if it doesn't exist.
If your goal is to update an entry if it exists or insert it if it doesn't, you might do it either by:
First doing an UPDATE and checking the number of rows updated. If 0, you know the record didn't exist and you should INSERT instead.
First doing an INSERT - if there's an error related to constraint violation, you know the entry already existed and you should do an UPDATE instead.

Resources