I'm trying to do an incremental load using ADF, but I got this error message, how to solve it and to do the incremental load in the right way?
Note:the Table name variable is defined through the stored procedure.
Error Message ADF
Stored Procedure:
ALTER PROCEDURE [INT].[usp_write_watermark]
#LastModifiedtime datetime, #TableName varchar(50)
AS
BEGIN
UPDATE [log].watermarktable
SET [WatermarkValue] = #LastModifiedtime
WHERE [TableName] = #TableName
END
Related
I am new to Azure Synapse and had a question about how the files are setup on Azure while creating an external table from a select. Would the files be over-written or would one need to truncate the files every time a create external table script is run? For e.g. if I run the following script
CREATE EXTERNAL TABLE [dbo].[PopulationCETAS] WITH (
LOCATION = 'populationParquet/',
DATA_SOURCE = [MyDataSource],
FILE_FORMAT = [ParquetFF]
) AS
SELECT
*
FROM
OPENROWSET(
BULK 'csv/population-unix/population.csv',
DATA_SOURCE = 'sqlondemanddemo',
FORMAT = 'CSV', PARSER_VERSION = '2.0'
) WITH (
CountryCode varchar(4),
CountryName varchar(64),
Year int,
PopulationCount int
) AS r;
Would the file created
LOCATION = 'populationParquet/',
DATA_SOURCE = [MyDataSource],
FILE_FORMAT = [ParquetFF]
be overwritten every time the script is run? Can this be specified at the time of setup or within the query options?
I would love to be able to drop the files in storage with a DELETE or TRUNCATE operation but this feature doesn’t currently exist within T-SQL. Please vote for this feature.
In the meantime you will need to use outside automation like an Azure Data Factory pipeline.
Based on the documentation below,
https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database
There is a feature to run post SQL script. Would it be possible to run stored procedure from there?
I have tried, it does not seem to be working and currently investigating.
Thanks for your information in advance.
I created a test to prove that the stored procedure can be called in the Post SQL scripts.
I created two tables:
CREATE TABLE [dbo].[emp](
id int IDENTITY(1,1),
[name] [nvarchar](max) NULL,
[age] [nvarchar](max) NULL
)
CREATE TABLE [dbo].[emp_stage](
id int,
[name] [nvarchar](max) NULL,
[age] [nvarchar](max) NULL
)
I created a sotred procedure.
create PROCEDURE [dbo].[spMergeEmpData]
AS
BEGIN
SET IDENTITY_INSERT dbo.emp ON
MERGE [dbo].[emp] AS target
USING [dbo].[emp_stage] AS source
ON (target.[id] = source.[id])
WHEN MATCHED THEN
UPDATE SET name = source.name,
age = source.age
WHEN NOT matched THEN
INSERT (id, name, age)
VALUES (source.id, source.name, source.age);
TRUNCATE TABLE [dbo].[emp_stage]
END
I will copy the csv file into my Azure SQL staging table [dbo].[emp_stage], then use stored porcedure [dbo].[spMergeEmpData] to transfer data from [dbo].[emp_stage] to [dbo].[emp].
Enter the stored procedure name exec [dbo].[spMergeEmpData] in the Post SQL scripts field.
I successfully debugged.
I can see the data are all in TABLE [dbo].[emp].
I am working on copying data from a source Oracle database to a Target SQL data warehouse using the Data factory.
When using the copy function in data factory, we are asked to specify the destination location and a table to copy the data to. There are multiple tables that needs to be copied, and therefore making a table for each in the destination is time consuming.
How can I setup data factory to copy data from the source to a destination, where it will automatically create a table at the destination, without having to explicitly create them manually?
TIA
Came across the same issue last year, used pipeline.parameters() for dynamic naming and a Data Factory stored procedure activity before the copy activity to first create the empty table from a template before copying https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-stored-procedure.
CREATE PROCEDURE create_sql_table_proc #WindowStartYear NVARCHAR(30), #WindowStartMonth NVARCHAR(30), #WindowStartDay NVARCHAR(30)
AS
BEGIN
declare #strsqlcreatetable as [NVARCHAR](255)
declare #strsqldroptable as [NVARCHAR](255)
declare #tablename as [NVARCHAR](255)
declare #strsqlsetpk as [NVARCHAR](255)
select #tablename = 'TABLE_NAME_' + #WindowStartYear + #WindowStartMonth + #WindowStartDay
select #strsqldroptable = 'DROP TABLE IF EXISTS ' + #tablename
select #strsqlcreatetable = 'SELECT * INTO ' + #tablename + ' FROM OUTPUT_TEMPLATE'
select #strsqlsetpk = 'ALTER TABLE ' + #tablename + ' ADD PRIMARY KEY (CustID)'
exec (#strsqldroptable)
exec (#strsqlcreatetable)
exec (#strsqlsetpk)
END
Since have started pushing the table to SQL from our Pyspark scripts running on a cluster, where it is not necessary to first create the empty table https://medium.com/#radek.strnad/tips-for-using-jdbc-in-apache-spark-sql-396ea7b2e3d3.
I'm trying to run an adf copy pipeline with and update and insert statements that is supposed to replace merge statement. basically a statement like:
UPDATE TARGET
SET ProductName = SOURCE.ProductName,
TARGET.Rate = SOURCE.Rate
FROM Products AS TARGET
INNER JOIN UpdatedProducts AS SOURCE
ON TARGET.ProductID = SOURCE.ProductID
WHERE TARGET.ProductName <> SOURCE.ProductName
OR TARGET.Rate <> SOURCE.Rate
INSERT Products (ProductID, ProductName, Rate)
SELECT SOURCE.ProductID, SOURCE.ProductName, SOURCE.Rate
FROM UpdatedProducts AS SOURCE
WHERE NOT EXISTS
(
SELECT 1
FROM Products
WHERE ProductID = SOURCE.ProductID
)
If the target is an azure sql db I would use this way: https://www.taygan.co/blog/2018/04/20/upsert-to-azure-sql-db-with-azure-data-factory
but if the target is an adw a stored procedure option doesn't exist! any suggestion? do I have to have a staging table first then I run the update and insert statements from stg_table to target_table? or maybe there is any possibility to do it directly from adf?
If you can't use a stored procedure, my suggestion would be to create a second copy data transform. Run the pre-script on the second transform and drop the table since its a temp table that you created on the first.
BEGIN
MERGE Target AS target_sqldb
USING TempTable AS source_tblstg
ON (target_sqldb.Id= source_tblstg.Id)
WHEN MATCHED THEN
UPDATE SET
[Name] = source_tblstg.Name,
[State] = source_tblstg.State
WHEN NOT MATCHED THEN
INSERT([Name], [State])
VALUES (source_tblstg.Name, source_tblstg.State);
DROP TABLE TempTable;
END
Unable to get the complete table data from stored procedure. For easy understanding, I have provided a simple stored procedure. Internally, we have different logic and different stored procedure that will return a table.
ALTER PROCEDURE [dbo].[SP_GetResultWeeklyUnitSold]
#fromDate VARCHAR(10),
#toDate VARCHAR(10)
AS
BEGIN
SELECT *
FROM SOOrder
WHERE OrderDate BETWEEN #fromDate AND #toDate
End
var pars = new List<PXSPParameter>();
PXSPParameter fromDate = new PXSPInParameter("#fromDate",
PXDbType.VarChar, Filters.Current.StartDate);
PXSPParameter toDate = new PXSPInParameter("#toDate", PXDbType.VarChar,
Filters.Current.EndDate);
pars.Add(fromDate);
pars.Add(toDate);
var results = PXDatabase.Execute("SP_WeeklyUnitSold", pars.ToArray());
To get the result table from stored procedure.
I didn't find any way of direct getting information from stored procedure of the whole table.
The only working way I found, was serializing output into xml at stored procedure level, and then deserializing it in Acumatica.
But for case that you've described I can't understand, why not to use ordinary PXSelect? It will be much simpler and I'm sure much more efficient.