How to automatically drop temporary table upon commit of a transaction in YugabyteDB? - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
I am running YugabyteDB 2.12 single node and would like to know if it is possible to create a temporary table such that it is automatically dropped upon committing the transaction in which it was created.
In “vanilla” PostgreSQL it is possible to specify ON COMMIT DROP option when creating a temporary table. In the YugabyteDB documentation for CREATE TABLE no such option is mentioned, however, when I tried it from ysqlsh it did not complain about the syntax. Here is what I tried from within ysqlsh:
yugabyte=# begin;
BEGIN
yugabyte=# create temp table foo (x int) on commit drop;
CREATE TABLE
yugabyte=# insert into foo (x) values (1);
INSERT 0 1
yugabyte=# select * from foo;
x
---
1
(1 row)
yugabyte=# commit;
ERROR: Illegal state: Transaction for catalog table write operation 'pg_type' not found
The CREATE TABLE documentation for YugabyteDB mentions the following for temporary tables:
Temporary tables are only visible in the current client session or transaction in which they are created and are automatically dropped at the end of the session or transaction.
When I create a temporary table (without the ON COMMIT DROP option), indeed the table is automatically dropped at the end of the session, but it is not automatically dropped upon commit of the transaction. Is there any way that this can be accomplished (apart from manually dropping the table just before the transaction is committed)?
Your input is greatly appreciated.
Thank you

See these two GitHub issues:
#12221: The create table doc section doesn’t mention the ON COMMIT clause for a temp table
and
#7926 CREATE TEMP … ON COMMIT DROP writes data into catalog table outside the DDL transaction
You cannot (yet, through YB-2.13.0.1) use the ON COMMIT DROP feature. But why not use ON COMMIT DELETE ROWS and simply let the temp table remain in place until the session ends?
Saying this raises a question: how do you create the temp table in the first place? Your stated goal implies that you’d need to create it before every use. But why? You could, instead, have dedicated initialization code to create the ON COMMIT DELETE ROWS temp table that you call from the client for this purpose at (but only at) the start of a session.
If you don’t want to have this, then (back to a variant of your present thinking) you could just do this before every intended use the table:
drop table if exists t;
create temp table t(k int) on commit delete rows;
After all, how else (without dedicated initialization code) would you know whether or not the temp table exists yet?
If you prefer, you could use this logic instead:
do $body$
begin
if not
(
select exists
(
select 1 from information_schema.tables
where
table_type='LOCAL TEMPORARY' and
table_name='t'
)
)
then
create temp table t(k int) on commit delete rows;
end if;
end;
$body$;

Related

atomically replace content of SQLite database, preferably in SQL or python3

I have a python script that supports an --edit-the-database option to invoke the user's preferred editor on a dump of the script's SQLite database. This option is intended to facilitate quick access to parts of the database that the script's other options don't provide access to, particularly during the development of this script.
Once the script has dumped the database's content, launched the editor and verified that the modified content is still valid then it needs to replace the existing database content.
First it removes all existing content by executing this SQL (using python's sqlite module):
PRAGMA writable_schema = 1;
DELETE FROM sqlite_master WHERE type IN ('table', 'index', 'trigger');
PRAGMA writable_schema = 0;
VACUUM;
and then it loads the new content using the sqlite module's executescript() method:
cursor.executescript(sql_slurped_from_user_modified_dump)
The problem is that these two operations (deleting existing content, loading new content) are not executed atomically: press CTRL-C at the wrong moment and the database content has been lost.
If I try to execute those two blocks of code inside a transaction then I get the error:
Error: cannot VACUUM from within a transaction
And if I keep the transaction but remove the VACUUM then I get the error:
Error: table first_table already exists
I have an ugly workaround in place: prior to calling the editor, the script copies the dump file to a safe location, writes a warning message to the user:
WARNING: if anything goes wrong then a backup of the database
can be found in /some/path
and, if the script continues and completes loading the new content, then it deletes the copy of the dump. But this is pretty ugly!
I could use DROP TABLE instead of the DELETE FROM sqlite_master ..., but if I am trying to allow the database to be modified in this way then I am allowing that the list of tables itself may change. I.e. if the user adds this to the dump:
CREATE TABLE t3 (n INT);
then a hard-coded list of DROPs like this:
BEGIN TRANSACTION
DROP TABLE t1;
DROP TABLE t2;
DROP INDEX ...
...
cursor.executescript(sql_slurped_from_user_modified_dump)
...
END TRANSACTION;
isn't going to work second time round (because it doesn't delete table t3).
I could use filesystem-atomic operations (i.e. something like: load the modified dump into a new database file; hardlink new file to old file), but that would require the script to close its database connection and reopen it afterwards, which, for reasons beyond the scope of this question, I would prefer not to do.
Does anybody have any better ideas for atomically replacing the entire content of a database whose list of tables is not predictable?
In case Google leads you here ...
I managed to do the first half of the task (delete existing content inside a single transaction) with something like this pseudocode:
-- Make the order in which tables are dropped irrelevant. Unfortunately, this
-- cannot be done just around the table dropping because it doesn't work inside
-- transactions.
PRAGMA foreign_keys = 0;
BEGIN TRANSACTION;
indexes = (SELECT name
FROM sqlite_master
WHERE type = 'index' AND
name NOT LIKE 'sqlite_autoindex_%';)
triggers = (SELECT name
FROM sqlite_master
WHERE type = 'trigger';)
tables = (SELECT name
FROM sqlite_master
WHERE type = 'table';)
for thing in indexes+triggers+tables:
DROP thing;
At which point I thought the second half (loading new content in the same transaction) would just be this:
cursor.executescript(sql_slurped_from_user_modified_dump)
END TRANSACTION;
-- Reinstate foreign key constraints.
PRAGMA foreign_keys = 1;
Unfortunately, pressing CTRL-C in the middle of the two blocks resulted in any empty database. The cause? cursor.executescript() does an immediate COMMIT before running the provided SQL. That turns the above code into two transactions!
This isn't the first time I've been caught out by this module's hidden transaction
management, but this time I was motivated to try the apsw module instead. This switch was remarkably easy. The latter half of the code now looks like this:
cursor.execute(sql_slurped_from_user_modified_dump)
END TRANSACTION;
-- Reinstate foreign key constraints.
PRAGMA foreign_keys = 1;
and it works perfectly!

ADF copy data activity - check for duplicate records before inserting into SQL db

I have a very simple ADF pipeline to copy data from local mongoDB (self-hosted integration environment) to Azure SQL database.
My pipleline is able to copy the data from mongoDB and insert into SQL db.
Currently if I run the pipeline it inserts duplicate data if run multiple times.
I have made _id column as unique in SQL database and now running pipeline throws and error because of SQL constraint wont letting it insert the record.
How do I check for duplicate _id before inserting into SQL db?
should I use Pre-copy script / stored procedure?
Some guidance / directions would be helpful on where to add extra steps. Thanks
Azure Data Factory Data Flow can help you achieve that:
You can follow these steps:
Add two sources: Cosmos db table(source1) and SQL database table(source2).
Using Join active to get all the data from two tables(left join/full join/right join) on Cosmos table.id= SQL table.id.
AlterRow expression to filter the duplicate _id, it not duplicate then insert it.
Then mapping the no-duplicate column to the Sink SQL database table.
Hope this helps.
You Should implement your SQL Logic to eliminate duplicate at the Pre-Copy Script
Currently I got the solution using a Stored Procedure which look like a lot less work as far this requirement is concerned.
I have followed this article:
https://www.cathrinewilhelmsen.net/2019/12/16/copy-sql-server-data-azure-data-factory/
I created table type and used in stored procedure to check for duplicate.
my sproc is very simple as shown below:
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[spInsertIntoDb]
(#sresults dbo.targetSensingResults READONLY)
AS
BEGIN
MERGE dbo.sensingresults AS target
USING #sresults AS source
ON (target._id = source._id)
WHEN NOT MATCHED THEN
INSERT (_id, sensorNumber, applicationType, place, spaceType, floorCode, zoneCountNumber, presenceStatus, sensingTime, createdAt, updatedAt, _v)
VALUES (source._id, source.sensorNumber, source.applicationType, source.place, source.spaceType, source.floorCode,
source.zoneCountNumber, source.presenceStatus, source.sensingTime, source.createdAt, source.updatedAt, source.updatedAt);
END
I think using stored proc should do for and also will help in future if I need to do more transformation.
Please let me know if using sproc in this case has potential risk in future ?
To remove the duplicates you can use the pre-copy script. OR what you can do is you can store the incremental or new data into a temp table using copy activity and use a store procedure to delete only those Ids from the main table which are in temp table after deletion insert the temp table data into the main table. and then drop the temp table.

Is there any alternative of CREATE TYPE in SQL as CREATE TYPE is Not supported in Azure SQL data warehouse

I am trying to execute this query but as userdefined(Create type) types are not supportable in azure data warehouse. and i want to use it in stored procedure.
CREATE TYPE DataTypeforCustomerTable AS TABLE(
PersonID int,
Name varchar(255),
LastModifytime datetime
);
GO
CREATE PROCEDURE usp_upsert_customer_table #customer_table DataTypeforCustomerTable READONLY
AS
BEGIN
MERGE customer_table AS target
USING #customer_table AS source
ON (target.PersonID = source.PersonID)
WHEN MATCHED THEN
UPDATE SET Name = source.Name,LastModifytime = source.LastModifytime
WHEN NOT MATCHED THEN
INSERT (PersonID, Name, LastModifytime)
VALUES (source.PersonID, source.Name, source.LastModifytime);
END
GO
CREATE TYPE DataTypeforProjectTable AS TABLE(
Project varchar(255),
Creationtime datetime
);
GO
CREATE PROCEDURE usp_upsert_project_table #project_table DataTypeforProjectTable READONLY
AS
BEGIN
MERGE project_table AS target
USING #project_table AS source
ON (target.Project = source.Project)
WHEN MATCHED THEN
UPDATE SET Creationtime = source.Creationtime
WHEN NOT MATCHED THEN
INSERT (Project, Creationtime)
VALUES (source.Project, source.Creationtime);
END
Is there any alternative way to do this.
You've got a few challenges there, because most of what you're trying to convert is not the way to do things on ASDW.
First, as you point out, CREATE TYPE is not supported, and there is no equivalent alternative.
Next, the code appears to be doing single inserts to a table. That's really bad on ASDW, performance will be dreadful.
Next, there's no MERGE statement (yet) for ASDW. That's because UPDATE is not the best way to handle changing data.
And last, stored procedures work a little differently on ASDW, they're not compiled, but interpreted each time the procedure is called. Stored procedures are great for big chunks of table-level logic, but not recommended for high volume calls with single-row operations.
I'd need to know more about the use case to make specific recommendations, but in general you need to think in tables rather than rows. In particular, focus on the CREATE TABLE AS (CTAS) way of handling your ELT.
Here's a good link, it shows how the equivalent of a Merge/Upsert can be handled using a CTAS:
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-develop-ctas#replace-merge-statements
As you'll see, it processes two tables at a time, rather than one row. This means you'll need to review the logic that called your stored procedure example.
If you get your head around doing everything in CTAS, and separately around Distribution, you're well on your way to having a high performance data warehouse.
Temp tables in Azure SQL Data Warehouse have a slightly different behaviour to box product SQL Server or Azure SQL Database - they exist at the session level. So all you have to do is convert your CREATE TYPE statements to temp tables and split the MERGE out into separate INSERT / UPDATE / DELETE statements as required.
Example:
CREATE TABLE #DataTypeforCustomerTable (
PersonID INT,
Name VARCHAR(255),
LastModifytime DATETIME
)
WITH
(
DISTRIBUTION = HASH( PersonID ),
HEAP
)
GO
CREATE PROCEDURE usp_upsert_customer_table
AS
BEGIN
-- Add records which do not already exist
INSERT INTO customer_table ( PersonID, Name, LastModifytime )
SELECT PersonID, Name, LastModifytime
FROM #DataTypeforCustomerTable AS source
WHERE NOT EXISTS
(
SELECT *
FROM customer_table target
WHERE source.PersonID = target.PersonID
)
...
Simply load the temp table and execute the stored proc. See here for more details on temp table scope.
If you are altering a large portion of the table then you should consider the CTAS approach to create a new table, then rename it as suggested by Ron.

Insert bulk data into big-query without keeping it in streaming buffer

My motive here is as follow:
Insert bulk records into big-query every half an hour
Delete the record if the exists
Those records are transactions which change their statuses from: pending, success, fail and expire.
BigQuery does not allow me to delete the rows that are inserted just half an hour ago as they are still in the streaming buffer.
can anyone suggest me some workaround as i am getting some duplicate rows in my table.
A better course of action would be to:
Perform periodic loads into a staging table (loading is a free operation)
After the load completes, execute a MERGE statement.
You would want something like this:
MERGE dataset.TransactionTable dt
USING dataset.StagingTransactionTable st
ON dt.tx_id = st.tx_id
WHEN MATCHED THEN
UPDATE dt.status = st.status
WHEN NOT MATCHED THEN
INSERT (tx_id, status) VALUES (st.tx_id, st.status)

How do you write a stored procedure that will either update or insert into a table using another table as the source

I'm using Azure SQL database via Microsoft SQL Server Management Studio. I'm trying to write a stored procedure that will take records from a staging table and either use those record to update or insert into a Target table (T).
This is the stored procedure that I have written:
/*** Create (or Re-create( the procedure ***/
ALTER PROCEDURE [dbo].[sp_PostCreateJockeyMerge]
AS
SET NOCOUNT ON;
BEGIN
MERGE dbo.JOCKEY AS T
USING dbo.jockey_tmp AS S
ON T.JOCKEY_TF LIKE S.JOCKEY_TF
WHEN MATCHED THEN
UPDATE SET T.FIRST_NAME = IsNULL(S.FIRST_NAME,T.FIRST_NAME),
T.MIDDLE_NAME = IsNULL(S.MIDDLE_NAME, T.MIDDLE_NAME),
T.LAST_NAME = IsNULL(S.LAST_NAME, T.LAST_NAME),
T.JOCKEY_NAME = IsNULL(S.JOCKEY_NAME, T.JOCKEY_NAME),
T.JOCKEY_INITIALS = IsNULL(S.JOCKEY_INITIALS, T.JOCKEY_INITIALS),
T.JOCKEY_KEY = IsNULL(S.JOCKEY_KEY, T.JOCKEY_KEY),
T.JOCKEY_RAGS = IsNULL(S.JOCKEY_RAGS, T.JOCKEY_RAGS)
WHEN NOT MATCHED THEN
INSERT (JOCKEY_TF, LAST_NAME) VALUES(JOCKEY_TF, LAST_NAME);
END
When I run the procedure I receive the Command(s) completed successfully, however, upon further inspection no records have been inserted into the Target table. For example, before I run the stored procedure the S source table (jockey_tmp) has 550 unique records and the target table has 300 unique records. After I run the stored procedure the target table still has the same unique record - I've written a stored procedure that does nothing.
I researched this online and believe this should be working. Does anyone have any insights to where I'm going wrong?

Resources