Azure Data factory copy activity failed mapping strings (from csv) to Azure SQL table sink uniqueidentifier field - azure

I have an Azure data factory (DF) pipeline that consists a Copy activity. The Copy activity uses HTTP connector as source to invoke a REST end-point and returns csv stream that sinks with Azure SQL Database table.
The Copy fails when CSV contains strings (such as 40f52caf-e616-4321-8ea3-12ea3cbc54e9) which are mapped to an uniqueIdentifier field in target table with error message The given value of type String from the data source cannot be converted to type uniqueidentifier of the specified target column.
I have tried to wrapped the source string with {} such as {40f52caf-e616-4321-8ea3-12ea3cbc54e9} with no success.
The Copy activity will work if I modified the target table field from uniqueIdentifier to nvarchar(100).

I reproduce your issue on my side.
The reason is data types of source and sink are dismatch.You could check the Data type mapping for SQL server.
Your source data type is string which is mapped to nvarchar or varchar, and uniqueidentifier in sql database needs GUID type in azure data factory.
So,please configure sql server stored procedure in your sql server sink as a workaround.
Please follow the steps from this doc:
Step 1: Configure your Sink dataset:
Step 2: Configure Sink section in copy activity as follows:
Step 3: In your database, define the table type with the same name as sqlWriterTableType. Notice that the schema of the table type should be same as the schema returned by your input data.
CREATE TYPE [dbo].[CsvType] AS TABLE(
[ID] [varchar](256) NOT NULL
)
Step 4: In your database, define the stored procedure with the same name as SqlWriterStoredProcedureName. It handles input data from your specified source, and merge into the output table. Notice that the parameter name of the stored procedure should be the same as the "tableName" defined in dataset.
Create PROCEDURE convertCsv #ctest [dbo].[CsvType] READONLY
AS
BEGIN
MERGE [dbo].[adf] AS target
USING #ctest AS source
ON (1=1)
WHEN NOT MATCHED THEN
INSERT (id)
VALUES (convert(uniqueidentifier,source.ID));
END
Output:
Hope it helps you.Any concern,please free feel to let me know.

There is a way to fix guid conversion into uniqueidentifier SQL column type properly via JSON configuration.
Edit the Copy Activity via Code {} button in top right toolbar.
Put:
"translator": {
"type": "TabularTranslator",
"typeConversion": true
}
into typeProperties block of the Copy activity. This will also work if Mapping schema is unspecified / dynamic.

Related

Azure Data Factory - copy Lookup activity output to json in blob

in ADF below is the output of my lookup activity (this the header of a flat file from SFTP)
{
"firstRow": {
"Prop_0": "000",
"Prop_1": "IN",
"Prop_2": "12123",
"Prop_3": "XYZ_ABC",
"Prop_4": "20211011",
"Prop_5": "034255",
"Prop_6": "272023"
}
Can someone help me the approach to transform this to a JSON file with custom field names instead of prop_x and save to a Blob storage
You can simply leverage the additional column in a copy activity.
Follow your lookup activity by the copy activity:
In the source settings of the copy activity, add the new column names(i.e. the ones you expect in json). Here I used p0, p1...
Taking p0 as example, you can simply put #activity('Lookup1').output.firstRow.Prop_0 in the dynamic content.
Then in the Mapping tab, you just map to the target columns of your json file. (Assume you already import the schema of your target json)

Azure Data Factory Error: "incorrect syntax near"

I'm trying to do a simple incremental update from an on-prem database as source to Azure SQL database based on a varchar column called "RP" in On-Prem database that contains "date+staticdescription" for example: "20210314MetroFactory"
1- I've created a Lookup activity called Lookup1 using a table created in Azure SQL Database and uses this Query
"Select RP from SubsetwatermarkTable"
2- I've created a Copy data activity where the source settings have this Query
"Select * from SourceDevSubsetTable WHERE RP NOT IN '#{activity('Lookup1').output.value}'"
When debugging -- I'm getting the error:
Failure type: User configuration issue
Details: Failure happened on 'Source' side.
'Type=System.Data.SqlClient.SqlException,Message=Incorrect syntax near
'[{"RP":"20210307_1Plant
1KAO"},{"RP":"20210314MetroFactory"},{"RP":"20210312MetroFactory"},{"RP":"20210312MetroFactory"},{"RP":"2'.,Source=.Net
SqlClient Data
Provider,SqlErrorNumber=102,Class=15,ErrorCode=-2146232060,State=1,Errors=[{Class=15,Number=102,State=1,Message=Incorrect
syntax near
'[{"RP":"20210311MetroFactory"},{"RP":"20210311MetroFactory"},{"RP":"202103140MetroFactory"},{"RP":"20210308MetroFactory"},{"RP":"2'.,},],'
Can anyone tell me what I am doing wrong and how to fix it even if it requires creating more activities.
Note: There is no LastModifiedDate column in the table. Also I haven't yet created the StoredProcedure that will update the Lookup table when it is done with the incremental copy.
Steve is right as to why it is failling and the query you need in the Copy Data.
As he says, you want a comma-separated list of quoted values to use in your IN clause.
You can get this more easily though - from your Lookup directly using this query:-
select stuff(
(
select ','''+rp+''''
from subsetwatermarktable
for xml path('')
)
, 1, 1, ''
) as in_clause
The sub-query gets the comma separated list with quotes around each rp-value, but has a spurious comma at the start - the outer query with stuff removes this.
Now tick the First Row Only box on the Lookup and change your Copy Data source query to:
select *
from SourceDevSubsetTable
where rp not in (#{activity('lookup').output.firstRow.in_clause})
The result of #activity('Lookup1').output.value is an array like your error shows
[{"RP":"20210307_1Plant
1KAO"},{"RP":"20210314MetroFactory"},{"RP":"20210312MetroFactory"},{"RP":"20210312MetroFactory"},{"RP":"2'.,Source=.Net
SqlClient Data
Provider,SqlErrorNumber=102,Class=15,ErrorCode=-2146232060,State=1,Errors=[{Class=15,Number=102,State=1,Message=Incorrect
syntax near
'[{"RP":"20210311MetroFactory"},{"RP":"20210311MetroFactory"},{"RP":"202103140MetroFactory"},{"RP":"20210308MetroFactory"},{"RP":"2'.,},]
However, your SQL should be like this:Select * from SourceDevSubsetTable WHERE RP NOT IN ('20210307_1Plant 1KAO','20210314MetroFactory',...).
To achieve this in ADF, you need to do something like this:
create three variables like the following screenshot:
loop your result of #activity('Lookup1').output.value and append 'item().RP' to arrayvalues:
expression:#activity('Lookup1').output.value
expression:#concat(variables('apostrophe'),item().RP,variables('apostrophe'))
3.cast arrayvalues to string and add parentheses by Set variable activity
expression:#concat('(',join(variables('arrayvalues'),','),')')
4.copy to your Azure SQL database
expression:Select * from SourceDevSubsetTable WHERE RP NOT IN #{variables('stringvalues')}

Load FileName in SQL Table using Copy Data Activity

I am newbie to Azure Data Factory. I'm trying to load multiple files of various states from FTP location into a single Azure SQL Server table. My requirement is to get state name from of the file and dump it into table along with actual data.
Currently, my source is FTP. Sink is Azure SQL Server table. I have used Stored Procedure to load the data. However, I'm unable to send file name as a parameter as shown below to the stored procedure so that I can dump it into the table. Below is the Copy Data component -
I have defined SourceFileName parameter in stored procedure, however, I am unable to send it via COPY Data activity.
Any help is appreciated.
We can conclude that additional column option can not be used here. Because ADF will return a column(contain filepath) not a string. So we need to use GetMetaData activity to get the file list, then foreach the file list and inside a Foreach activity to copy them.
I've created a simple test, it works well.
In my local FTP server, there is two text files. I need to copy them into an Azure SQL table.
At GetMetaData activity, I use Child Items to get the filelist.
At ForEach activity, I use #activity('Get Metadata1').output.childItems to foreach the file list.
Inside the ForEach activity, I use dynamic content #item().name to get the file path.
source setting:
sink setting:
So we can get the filename. Follows are some operations I did on Azure SQL.
-- create a table
CREATE TABLE [dbo].[employee](
[firstName] [varchar](50) NULL,
[lastName] [varchar](50) NULL,
[filePath] [varchar](50) NULL
) ON [PRIMARY]
GO
-- create a table type
CREATE TYPE [dbo].[ct_employees_type] AS TABLE(
[firstName] [varchar](50) NULL,
[lastName] [varchar](50) NULL
)
GO
-- create a Stored procedure
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[spUpsertEmployees]
#employees ct_employees_type READONLY,
#filePath varchar(50)
AS
BEGIN
set #filePath = SUBSTRING( #filePath,1,len(#filePath)-4)
MERGE [dbo].[employee] AS target_sqldb
USING #employees AS source_tblstg
ON (target_sqldb.firstName = source_tblstg.firstName)
WHEN MATCHED THEN
UPDATE SET
firstName = source_tblstg.firstName,
lastName = source_tblstg.lastName
WHEN NOT MATCHED THEN
INSERT (
firstName,
lastName,
filePath
)
VALUES (
source_tblstg.firstName,
source_tblstg.lastName,
#filePath
);
END
GO
After I run debug, the result is follows:

How to drop duplicates in source data set (JSON) and load data into azure SQL DB in azure data factory

I have a table in SQL DB with primary key fields. Now i am using a copy activity in azure data factory with source dataset(JSON).
We are writing this data into sink dataset(SQL DB) but the pipeline is failing with the below error
"message": "'Type=System.Data.SqlClient.SqlException,Message=Violation of
PRIMARY KEY constraint 'PK__field__399771B9251AD6D4'. Cannot
insert duplicate key in object 'dbo.crop_original_new'. The
duplicate key value is (9161, en).\r\nThe statement has been
terminated.,Source=.Net SqlClient Data Provider,SqlErrorNumber=2627,Class=14,ErrorCode=-2146232060,State=1,Errors=
[{Class=14,Number=2627,State=1,Message=Violation of PRIMARY KEY
constraint 'PK__field__399771B9251AD6D4'. Cannot insert
duplicate key in object 'Table'. The duplicate key value is
(9161, en).,},{Class=0,Number=3621,State=0,Message=The statement has
been terminated.,},],'",
You can use Fault tolerance setting provided in copy activity to skip incompatible rows.
Setting image
Well, the finest solution would be:
Create a staging table in your SQL environment stg_table (this table should have a different key policy)
Load data from JSON source to stg_table
Write a stored procedure to clean data from duplicates and to load into your destination table
Or if you are familiar with Mapping Data Flows in ADF you can check this article by Mark Kromer

How to get the Source Data When Ingestion Failure in KUSTO ADX

I have a base table in ADX Kusto DB.
.create table base (info:dynamic)
I have written a function which parses(dynamic column) the base table and greps a few columns and stores it in another table whenever the base table gets data(from EventHub). Below function and its update policy
.create function extractBase()
{
base
| evaluate bag_unpack(info)
| project tostring(column1), toreal(column2), toint(column3), todynamic(column4)
}
.alter table target_table policy update
#'[{"IsEnabled": true, "Source": "base", "Query": "extractBase()", "IsTransactional": false, "PropagateIngestionProperties": true}]'
suppose if the base table does not contain the expected column, ingestion error happens. how do I get the source(row) for the failure?
When using .show ingestion failures, it displays the failure message. there is a column called IngestionSourcePath. when I browse the URL, getting an exception as Resource Not Found.
If ingestion failure happens, I need to store the particular row of base table into IngestionFailure Table. for further investigation
In this case, your source data cannot "not have" a column defined by its schema.
If no value was ingested for some column in some row, a null value will be present there and the update policy will not fail.
Here the update policy will break if the original table row does not contain enough columns. Currently the source data for such errors is not emitted as part of the failure message.
In general, the source URI is only useful when you are ingesting data from blobs. In other cases the URI shown in the failed ingestion info is a URI on an internal blob that was created on the fly and no one has access to.
However, there is a command that is missing from documentation (we will make sure to update it) that allows you to duplicate (dump to storage container you provide) the source data for the next failed ingestion into a specific table.
The syntax is:
.dup-next-failed-ingest into TableName to h#'Path to Azure blob container'
Here the path to Azure Blob container must include a writeable SAS.
The required permission to run this command is DB admin.

Resources