Execute stored procedure in Azure Data Platform - Post SQL Scripts - azure

Based on the documentation below,
https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database
There is a feature to run post SQL script. Would it be possible to run stored procedure from there?
I have tried, it does not seem to be working and currently investigating.
Thanks for your information in advance.

I created a test to prove that the stored procedure can be called in the Post SQL scripts.
I created two tables:
CREATE TABLE [dbo].[emp](
id int IDENTITY(1,1),
[name] [nvarchar](max) NULL,
[age] [nvarchar](max) NULL
)
CREATE TABLE [dbo].[emp_stage](
id int,
[name] [nvarchar](max) NULL,
[age] [nvarchar](max) NULL
)
I created a sotred procedure.
create PROCEDURE [dbo].[spMergeEmpData]
AS
BEGIN
SET IDENTITY_INSERT dbo.emp ON
MERGE [dbo].[emp] AS target
USING [dbo].[emp_stage] AS source
ON (target.[id] = source.[id])
WHEN MATCHED THEN
UPDATE SET name = source.name,
age = source.age
WHEN NOT matched THEN
INSERT (id, name, age)
VALUES (source.id, source.name, source.age);
TRUNCATE TABLE [dbo].[emp_stage]
END
I will copy the csv file into my Azure SQL staging table [dbo].[emp_stage], then use stored porcedure [dbo].[spMergeEmpData] to transfer data from [dbo].[emp_stage] to [dbo].[emp].
Enter the stored procedure name exec [dbo].[spMergeEmpData] in the Post SQL scripts field.
I successfully debugged.
I can see the data are all in TABLE [dbo].[emp].

Related

Incremental load Azure Data Factory

I'm trying to do an incremental load using ADF, but I got this error message, how to solve it and to do the incremental load in the right way?
Note:the Table name variable is defined through the stored procedure.
Error Message ADF
Stored Procedure:
ALTER PROCEDURE [INT].[usp_write_watermark]
#LastModifiedtime datetime, #TableName varchar(50)
AS
BEGIN
UPDATE [log].watermarktable
SET [WatermarkValue] = #LastModifiedtime
WHERE [TableName] = #TableName
END

Azure SQL Data Warehouse Polybase Query to Azure Data Lake Gen 2 returns zero rows

Why does an Azure SQL Data Warehouse Polybase Query to Azure Data Lake Gen 2 return many rows for a single file source, but zero rows for the parent folder source?
I created:
Master Key (CREATE MASTER KEY;)
Credential (CREATE DATABASE SCOPED
CREDENTIAL) - uses the ADLS Gen 2 account key
External data source (CREATE EXTERNAL DATA SOURCE)
File format (CREATE EXTERNAL FILE FORMAT)
External table (CREATE EXTERNAL TABLE)
Everything works fine when my external table points to a specific file, i.e.
CREATE EXTERNAL TABLE [ext].[Time]
(
[TimeID] int NOT NULL,
[HourNumber] tinyint NOT NULL,
[MinuteNumber] tinyint NOT NULL,
[SecondNumber] tinyint NOT NULL,
[TimeInSecond] int NOT NULL,
[HourlyBucket] varchar(15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
)
WITH
(
LOCATION = '/Time/time001.txt',
DATA_SOURCE = ADLSDataSource,
FILE_FORMAT = uncompressedcsv,
REJECT_TYPE = value,
REJECT_VALUE = 2147483647
);
SELECT * FROM [ext].[Time];
Many rows returned, therefore I am confident all items mentioned above are configured correctly.
The Time folder in Azure Data Lake Gen 2 contains many files, not just time001.txt. When I change my external table to point at a folder, and not an individual file, the query returns zero rows, i.e.
CREATE EXTERNAL TABLE [ext].[Time]
(
[TimeID] int NOT NULL,
[HourNumber] tinyint NOT NULL,
[MinuteNumber] tinyint NOT NULL,
[SecondNumber] tinyint NOT NULL,
[TimeInSecond] int NOT NULL,
[HourlyBucket] varchar(15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
)
WITH
(
LOCATION = '/Time/',
DATA_SOURCE = ADLSDataSource,
FILE_FORMAT = uncompressedcsv,
REJECT_TYPE = value,
REJECT_VALUE = 2147483647
);
SELECT * FROM [ext].[Time];
Zero rows returned
I tried:
LOCATION = '/Time/',
LOCATION = '/Time',
LOCATION = 'Time/',
LOCATION = 'Time',
But always zero rows. I also followed the instructions at https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-lake-store
I tested all the files within the folder and individually each returns many rows of data.
I queried all the files from Blob storage and not ADLS Gen2 and the "Folder" query returns all rows as expected.
How do I query all files in a folder "as one" from Azure Data Lake Gen2 storage using Azure SQL Data Warehouse and Polybase?
I was facing the exactly same issue: the problem was on the Data Source protocol.
Script with error:
CREATE EXTERNAL DATA SOURCE datasourcename
WITH (
TYPE = HADOOP,
LOCATION = 'abfss://container#storage.blob.core.windows.net',
CREDENTIAL = credential_name
Script that solves issue:
CREATE EXTERNAL DATA SOURCE datasourcename
WITH (
TYPE = HADOOP,
LOCATION = 'abfss://container#storage.dfs.core.windows.net',
CREDENTIAL = credential_name
The only change needed was the LOCATION.
Thanks to the Microsoft Support Team for helping me on this.

update and insert into Azure data warehouse using Azure data factory pipelines

I'm trying to run an adf copy pipeline with and update and insert statements that is supposed to replace merge statement. basically a statement like:
UPDATE TARGET
SET ProductName = SOURCE.ProductName,
TARGET.Rate = SOURCE.Rate
FROM Products AS TARGET
INNER JOIN UpdatedProducts AS SOURCE
ON TARGET.ProductID = SOURCE.ProductID
WHERE TARGET.ProductName <> SOURCE.ProductName
OR TARGET.Rate <> SOURCE.Rate
INSERT Products (ProductID, ProductName, Rate)
SELECT SOURCE.ProductID, SOURCE.ProductName, SOURCE.Rate
FROM UpdatedProducts AS SOURCE
WHERE NOT EXISTS
(
SELECT 1
FROM Products
WHERE ProductID = SOURCE.ProductID
)
If the target is an azure sql db I would use this way: https://www.taygan.co/blog/2018/04/20/upsert-to-azure-sql-db-with-azure-data-factory
but if the target is an adw a stored procedure option doesn't exist! any suggestion? do I have to have a staging table first then I run the update and insert statements from stg_table to target_table? or maybe there is any possibility to do it directly from adf?
If you can't use a stored procedure, my suggestion would be to create a second copy data transform. Run the pre-script on the second transform and drop the table since its a temp table that you created on the first.
BEGIN
MERGE Target AS target_sqldb
USING TempTable AS source_tblstg
ON (target_sqldb.Id= source_tblstg.Id)
WHEN MATCHED THEN
UPDATE SET
[Name] = source_tblstg.Name,
[State] = source_tblstg.State
WHEN NOT MATCHED THEN
INSERT([Name], [State])
VALUES (source_tblstg.Name, source_tblstg.State);
DROP TABLE TempTable;
END

COPY FROM CSV with static fields on Postgres

I'd like to switch an actual system importing data into a PostgreSQL 9.5 database from CSV files to a more efficient system.
I'd like to use the COPY statement because of its good performance. The problem is that I need to have one field populated that is not in the CSV file.
Is there a way to have the COPY statement add a static field to all the rows inserted ?
The perfect solution would have looked like that :
COPY data(field1, field2, field3='Account-005')
FROM '/tmp/Account-005.csv'
WITH DELIMITER ',' CSV HEADER;
Do you know a way to have that field populated in every row ?
My server is running node.js so I'm open to any cost-efficient solution to complete the files using node before COPYing it.
Use a temp table to import into. This allows you to:
add/remove/update columns
add extra literal data
delete or ignore records (such as duplicates)
, before inserting the new records into the actual table.
-- target table
CREATE TABLE data
( id SERIAL PRIMARY KEY
, batch_name varchar NOT NULL
, remote_key varchar NOT NULL
, payload varchar
, UNIQUE (batch_name, remote_key)
-- or::
-- , UNIQUE (remote_key)
);
-- temp table
CREATE TEMP TABLE temp_data
( remote_key varchar -- PRIMARY KEY
, payload varchar
);
COPY temp_data(remote_key,payload)
FROM '/tmp/Account-005'
;
-- The actual insert
-- (you could also filter out or handle duplicates here)
INSERT INTO data(batch_name, remote_key, payload)
SELECT 'Account-005', t.remote_key, t.payload
FROM temp_data t
;
BTW It is possible to automate the above: put it into a function (or maybe a prepared statement), using the filename/literal as argument.
Set a default for the column:
alter table data
alter column field3 set default 'Account-005'
Do not mention it the the copy command:
COPY data(field1, field2) FROM...

SQLite foreign key, on delete cascade, and SQLITE_CONSTRAINT

Overview:
I have a parent / child table relationship where the child may contain 2:n records with FK's back to the parent. When attempting to delete from the parent, I get a SQLITE_CONSTRAINT error. This is unexpected as I have FK's enabled, have the child registered with ON DELETE CASCADE, and a new enough SQLite version.
However: My child table originally did not have ON DELETE CASCADE. I added (and enabled FK's) after data had been added to parent/child. From there, I renamed the original child & created a new table with the constraint, and finally moved to the new table.
Table layout as follows:
CREATE TABLE IF NOT EXISTS message (
message_id INTEGER PRIMARY KEY,
area_tag VARCHAR NOT NULL,
message_uuid VARCHAR(36) NOT NULL,
reply_to_message_id INTEGER,
to_user_name VARCHAR NOT NULL,
from_user_name VARCHAR NOT NULL,
subject, /* FTS # message_fts */
message, /* FTS # message_fts */
modified_timestamp DATETIME NOT NULL,
view_count INTEGER NOT NULL DEFAULT 0,
UNIQUE(message_uuid)
);
CREATE INDEX IF NOT EXISTS message_by_area_tag_index
ON message (area_tag);
CREATE VIRTUAL TABLE IF NOT EXISTS message_fts USING fts4 (
content="message",
subject,
message
);
CREATE TRIGGER IF NOT EXISTS message_before_update BEFORE UPDATE ON message BEGIN
DELETE FROM message_fts WHERE docid=old.rowid;
END;
CREATE TRIGGER IF NOT EXISTS message_before_delete BEFORE DELETE ON message BEGIN
DELETE FROM message_fts WHERE docid=old.rowid;
END;
CREATE TRIGGER IF NOT EXISTS message_after_update AFTER UPDATE ON message BEGIN
INSERT INTO message_fts(docid, subject, message) VALUES(new.rowid, new.subject, new.message);
END;
CREATE TRIGGER IF NOT EXISTS message_after_insert AFTER INSERT ON message BEGIN
INSERT INTO message_fts(docid, subject, message) VALUES(new.rowid, new.subject, new.message);
END;
CREATE TABLE IF NOT EXISTS message_meta (
message_id INTEGER NOT NULL,
meta_category INTEGER NOT NULL,
meta_name VARCHAR NOT NULL,
meta_value VARCHAR NOT NULL,
UNIQUE(message_id, meta_category, meta_name, meta_value),
FOREIGN KEY(message_id) REFERENCES message(message_id) ON DELETE CASCADE
);
At startup, directly after attaching to the DB's I ensure FK's are enabled:
PRAGMA foreign_keys = ON;
Other details:
SQLite version: 3.7.17
Access: node-sqlite3
Exact error: Error: SQLITE_CONSTRAINT: FOREIGN KEY constraint failed
Is this caused by the fact that I later added the constraint? (See Update 1)
How do I fix this without losing data?
Update 1:
I can confirm that only select messages (I believe, messages that were in message before ON DELETE CASCADE as added to message_meta) cause the constraint error. Others delete just fine and properly take out associated message_meta records.
Answering my own question -- after some hours of trying various things I was able to find the issue(s):
When I originally added the ON DELETE CASCADE clause, I did so by renaming the original message_meta table to message_meta_backup, creating a new table with the clause, then moving the data into it: SELECT * FROM message_meta_backup INSERT INTO message_meta;. What I did not do was to drop the backup table.
Due to #1 or something related, something internal to my database became corrupted or confused.
What I tried (that did not work):
REINDEX;
Simply dropping the backup table: DROP TABLE message_meta_backup;
...and various other things I forget :)
What DID work:
What finally ended up working was a combination of dropping the backup table and completely rebuilding the database using the sqlite3 shell's .drop command:
> sqlite3 db/message.sqlite3
SQLite version 3.7.17 2013-05-20 00:56:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> drop table message_meta_backup;
sqlite> .quit
> sqlite3 db/message.sqlite3 ".dump" >> message_dump.sql
rm db/message.sqlite3
> cat message_dump.sql | sqlite3 db/message.sqlite3
I'm now able to DELETE FROM message ... and have it properly cascade the delete to message_meta without the nasty error:
sqlite> DELETE FROM message WHERE message_id IN(SELECT message_id FROM message WHERE area_tag='some_area' ORDER BY message_id desc limit -1 offset 200);
sqlite>
(no error given!)

Resources