Incremental load Azure Data Factory - azure

I'm trying to do an incremental load using ADF, but I got this error message, how to solve it and to do the incremental load in the right way?
Note:the Table name variable is defined through the stored procedure.
Error Message ADF
Stored Procedure:
ALTER PROCEDURE [INT].[usp_write_watermark]
#LastModifiedtime datetime, #TableName varchar(50)
AS
BEGIN
UPDATE [log].watermarktable
SET [WatermarkValue] = #LastModifiedtime
WHERE [TableName] = #TableName
END

Related

Azure Synapse TSQL

I am new to Azure Synapse and had a question about how the files are setup on Azure while creating an external table from a select. Would the files be over-written or would one need to truncate the files every time a create external table script is run? For e.g. if I run the following script
CREATE EXTERNAL TABLE [dbo].[PopulationCETAS] WITH (
LOCATION = 'populationParquet/',
DATA_SOURCE = [MyDataSource],
FILE_FORMAT = [ParquetFF]
) AS
SELECT
*
FROM
OPENROWSET(
BULK 'csv/population-unix/population.csv',
DATA_SOURCE = 'sqlondemanddemo',
FORMAT = 'CSV', PARSER_VERSION = '2.0'
) WITH (
CountryCode varchar(4),
CountryName varchar(64),
Year int,
PopulationCount int
) AS r;
Would the file created
LOCATION = 'populationParquet/',
DATA_SOURCE = [MyDataSource],
FILE_FORMAT = [ParquetFF]
be overwritten every time the script is run? Can this be specified at the time of setup or within the query options?
I would love to be able to drop the files in storage with a DELETE or TRUNCATE operation but this feature doesn’t currently exist within T-SQL. Please vote for this feature.
In the meantime you will need to use outside automation like an Azure Data Factory pipeline.

Execute stored procedure in Azure Data Platform - Post SQL Scripts

Based on the documentation below,
https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database
There is a feature to run post SQL script. Would it be possible to run stored procedure from there?
I have tried, it does not seem to be working and currently investigating.
Thanks for your information in advance.
I created a test to prove that the stored procedure can be called in the Post SQL scripts.
I created two tables:
CREATE TABLE [dbo].[emp](
id int IDENTITY(1,1),
[name] [nvarchar](max) NULL,
[age] [nvarchar](max) NULL
)
CREATE TABLE [dbo].[emp_stage](
id int,
[name] [nvarchar](max) NULL,
[age] [nvarchar](max) NULL
)
I created a sotred procedure.
create PROCEDURE [dbo].[spMergeEmpData]
AS
BEGIN
SET IDENTITY_INSERT dbo.emp ON
MERGE [dbo].[emp] AS target
USING [dbo].[emp_stage] AS source
ON (target.[id] = source.[id])
WHEN MATCHED THEN
UPDATE SET name = source.name,
age = source.age
WHEN NOT matched THEN
INSERT (id, name, age)
VALUES (source.id, source.name, source.age);
TRUNCATE TABLE [dbo].[emp_stage]
END
I will copy the csv file into my Azure SQL staging table [dbo].[emp_stage], then use stored porcedure [dbo].[spMergeEmpData] to transfer data from [dbo].[emp_stage] to [dbo].[emp].
Enter the stored procedure name exec [dbo].[spMergeEmpData] in the Post SQL scripts field.
I successfully debugged.
I can see the data are all in TABLE [dbo].[emp].

data factory copy data without explicitly creating a target table

I am working on copying data from a source Oracle database to a Target SQL data warehouse using the Data factory.
When using the copy function in data factory, we are asked to specify the destination location and a table to copy the data to. There are multiple tables that needs to be copied, and therefore making a table for each in the destination is time consuming.
How can I setup data factory to copy data from the source to a destination, where it will automatically create a table at the destination, without having to explicitly create them manually?
TIA
Came across the same issue last year, used pipeline.parameters() for dynamic naming and a Data Factory stored procedure activity before the copy activity to first create the empty table from a template before copying https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-stored-procedure.
CREATE PROCEDURE create_sql_table_proc #WindowStartYear NVARCHAR(30), #WindowStartMonth NVARCHAR(30), #WindowStartDay NVARCHAR(30)
AS
BEGIN
declare #strsqlcreatetable as [NVARCHAR](255)
declare #strsqldroptable as [NVARCHAR](255)
declare #tablename as [NVARCHAR](255)
declare #strsqlsetpk as [NVARCHAR](255)
select #tablename = 'TABLE_NAME_' + #WindowStartYear + #WindowStartMonth + #WindowStartDay
select #strsqldroptable = 'DROP TABLE IF EXISTS ' + #tablename
select #strsqlcreatetable = 'SELECT * INTO ' + #tablename + ' FROM OUTPUT_TEMPLATE'
select #strsqlsetpk = 'ALTER TABLE ' + #tablename + ' ADD PRIMARY KEY (CustID)'
exec (#strsqldroptable)
exec (#strsqlcreatetable)
exec (#strsqlsetpk)
END
Since have started pushing the table to SQL from our Pyspark scripts running on a cluster, where it is not necessary to first create the empty table https://medium.com/#radek.strnad/tips-for-using-jdbc-in-apache-spark-sql-396ea7b2e3d3.

update and insert into Azure data warehouse using Azure data factory pipelines

I'm trying to run an adf copy pipeline with and update and insert statements that is supposed to replace merge statement. basically a statement like:
UPDATE TARGET
SET ProductName = SOURCE.ProductName,
TARGET.Rate = SOURCE.Rate
FROM Products AS TARGET
INNER JOIN UpdatedProducts AS SOURCE
ON TARGET.ProductID = SOURCE.ProductID
WHERE TARGET.ProductName <> SOURCE.ProductName
OR TARGET.Rate <> SOURCE.Rate
INSERT Products (ProductID, ProductName, Rate)
SELECT SOURCE.ProductID, SOURCE.ProductName, SOURCE.Rate
FROM UpdatedProducts AS SOURCE
WHERE NOT EXISTS
(
SELECT 1
FROM Products
WHERE ProductID = SOURCE.ProductID
)
If the target is an azure sql db I would use this way: https://www.taygan.co/blog/2018/04/20/upsert-to-azure-sql-db-with-azure-data-factory
but if the target is an adw a stored procedure option doesn't exist! any suggestion? do I have to have a staging table first then I run the update and insert statements from stg_table to target_table? or maybe there is any possibility to do it directly from adf?
If you can't use a stored procedure, my suggestion would be to create a second copy data transform. Run the pre-script on the second transform and drop the table since its a temp table that you created on the first.
BEGIN
MERGE Target AS target_sqldb
USING TempTable AS source_tblstg
ON (target_sqldb.Id= source_tblstg.Id)
WHEN MATCHED THEN
UPDATE SET
[Name] = source_tblstg.Name,
[State] = source_tblstg.State
WHEN NOT MATCHED THEN
INSERT([Name], [State])
VALUES (source_tblstg.Name, source_tblstg.State);
DROP TABLE TempTable;
END

Return the table data using stored procedure in acumatica

Unable to get the complete table data from stored procedure. For easy understanding, I have provided a simple stored procedure. Internally, we have different logic and different stored procedure that will return a table.
ALTER PROCEDURE [dbo].[SP_GetResultWeeklyUnitSold]
#fromDate VARCHAR(10),
#toDate VARCHAR(10)
AS
BEGIN
SELECT *
FROM SOOrder
WHERE OrderDate BETWEEN #fromDate AND #toDate
End
var pars = new List<PXSPParameter>();
PXSPParameter fromDate = new PXSPInParameter("#fromDate",
PXDbType.VarChar, Filters.Current.StartDate);
PXSPParameter toDate = new PXSPInParameter("#toDate", PXDbType.VarChar,
Filters.Current.EndDate);
pars.Add(fromDate);
pars.Add(toDate);
var results = PXDatabase.Execute("SP_WeeklyUnitSold", pars.ToArray());
To get the result table from stored procedure.
I didn't find any way of direct getting information from stored procedure of the whole table.
The only working way I found, was serializing output into xml at stored procedure level, and then deserializing it in Acumatica.
But for case that you've described I can't understand, why not to use ordinary PXSelect? It will be much simpler and I'm sure much more efficient.

Resources