Azure Data Factory Copy Data using XML Source - azure

Lets assume I have a simple XML file source which I've mapped to a corresponding sink in my SQL server database.
<Date Date="2020-03-13Z">
<Identification>
<Identifier>Maverick</Identifier>
</Identification>
<Pilot HomeAirport="New York">
<AirportICAOCode>USA</AirportICAOCode>
</Pilot>
</Date>
And then the schema
CREATE TABLE pilots
identifier VARCHAR(20),
ICAO_code VARCHAR(3)
)
I created a stored procedure in my sql server database that takes an input of the user-defined table type pilots_type which corresponds to the above schema to merge my data correctly.
But the pipeline fails when run with the error:
{
"errorCode": "2200",
"message": "ErrorCode=UserErrorInvalidPluginType,'Type=Microsoft.DataTransfer.Common.Shared.PluginNotRegisteredException,Message=Invalid type 'XmlFormat' is provided in 'format'. Please correct the type in payload and retry.,Source=Microsoft.DataTransfer.ClientLibrary,'",
"failureType": "UserError",
"target": "Sink XML",
"details": []
}
See image
Here the source is a blob that contains the XML.
Is XML not supported as a source after all?

XML is supported as a source.
I've made a same test according to your sample xml file and sql table successfully.
I created a Table Type named ct_pilot_type:
CREATE TYPE ct_pilot_type AS TABLE(
identifier nvarchar(MAX),
ICAO_code nvarchar(MAX)
)
I created the stored procedure named spUpsertPolit:
CREATE PROCEDURE spUpsertPolit
#polit ct_pilot_type READONLY
AS
BEGIN
MERGE [dbo].[pilot_airports] AS target_sqldb
USING #polit AS source_tblstg
ON (target_sqldb.identifier = source_tblstg.identifier)
WHEN MATCHED THEN
UPDATE SET
identifier = source_tblstg.identifier,
ICAO_code = source_tblstg.ICAO_code
WHEN NOT MATCHED THEN
INSERT (
identifier,
ICAO_code
)
VALUES (
source_tblstg.identifier,
source_tblstg.ICAO_code
);
END
I set the sink in the Copy activity:
I set the mapping:
It cpoied successfully:
The result shows:

Related

Execute stored procedure in Azure Data Platform - Post SQL Scripts

Based on the documentation below,
https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database
There is a feature to run post SQL script. Would it be possible to run stored procedure from there?
I have tried, it does not seem to be working and currently investigating.
Thanks for your information in advance.
I created a test to prove that the stored procedure can be called in the Post SQL scripts.
I created two tables:
CREATE TABLE [dbo].[emp](
id int IDENTITY(1,1),
[name] [nvarchar](max) NULL,
[age] [nvarchar](max) NULL
)
CREATE TABLE [dbo].[emp_stage](
id int,
[name] [nvarchar](max) NULL,
[age] [nvarchar](max) NULL
)
I created a sotred procedure.
create PROCEDURE [dbo].[spMergeEmpData]
AS
BEGIN
SET IDENTITY_INSERT dbo.emp ON
MERGE [dbo].[emp] AS target
USING [dbo].[emp_stage] AS source
ON (target.[id] = source.[id])
WHEN MATCHED THEN
UPDATE SET name = source.name,
age = source.age
WHEN NOT matched THEN
INSERT (id, name, age)
VALUES (source.id, source.name, source.age);
TRUNCATE TABLE [dbo].[emp_stage]
END
I will copy the csv file into my Azure SQL staging table [dbo].[emp_stage], then use stored porcedure [dbo].[spMergeEmpData] to transfer data from [dbo].[emp_stage] to [dbo].[emp].
Enter the stored procedure name exec [dbo].[spMergeEmpData] in the Post SQL scripts field.
I successfully debugged.
I can see the data are all in TABLE [dbo].[emp].

Kusto/Azure Data Explorer - How can I partition an external table using a timespan field?

Hoping someone can help..
I am new to Kusto and have to get an external table reading data from an Azure Blob storage account working, but the one table I have is unique in that the data for the timestamp column is split into 2 separate columns , i.e. LogDate and LogTime (see script below).
My data is stored in the following structure in the Azure Storage account container (container is named "employeedata", for example):
{employeename}/{year}/{month}/{day}/{hour}/{minute}.csv, in a simple CSV format.
I know the CSV is good because if I import it into a normal Kusto table, it works perfectly.
My KQL script for the external table creation looks as follows:
.create-or-alter external table EmpLogs (Employee: string, LogDate: datetime, LogTime:timestamp)
kind=blob
partition by (EmployeeName:string = Employee, yyyy:datetime = startofday(LogDate), MM:datetime = startofday(LogDate), dd:datetime = startofday(LogDate), HH:datetime = todatetime(LogTime), mm:datetime = todatetime(LogTime))
pathformat = (EmployeeName "/" datetime_pattern("yyyy", yyyy) "/" datetime_pattern("MM", MM) "/" datetime_pattern("dd", dd) "/" substring(HH, 0, 2) "/" substring(mm, 3, 2) ".csv")
dataformat=csv
(
h#'************************'
)
with (folder="EmployeeInfo", includeHeaders="All")
I am getting the error below constantly, which is not very helpful (redacted from full error, basically comes down to the fact there is a syntax error somewhere):
Syntax error: Query could not be parsed: {
"error": {
"code": "BadRequest_SyntaxError",
"message": "Request is invalid and cannot be executed.",
"#type": "Kusto.Data.Exceptions.SyntaxException",
"#message": "Syntax error: Query could not be parsed: . Query: '.create-or-alter external table ........
I know the todatetime() function works on timespan's, I tested it with another table and it created a date similar to the following: 0001-01-01 20:18:00.0000000.
I have tried using the bin() function on the timestamp/LogTime columns, but the same error as above, and even tried importing the time value as a string and doing some string manipulation on it, no luck. Getting the same syntax error.
Any help/guidance would be greatly appreciated.
Thank you!!
Currently, there's no way to define an external table partition based on more than one column. If your dataset timestamp is splitted between two columns: LogDate:datetime and LogTime:timestamp, then the best you can do is use virtual column for the partition by time:
.create-or-alter external table EmpLogs(Employee: string, LogDate:datetime, LogTime:timespan)
kind=blob
partition by (EmployeeName:string = Employee, PartitionDate:datetime)
pathformat = (EmployeeName "/" datetime_pattern("yyyy/MM/dd/HH/mm", PartitionDate))
dataformat=csv
(
//h#'************************'
)
with (folder="EmployeeInfo", includeHeaders="All")
Now, you can filter by the virtual column and fine tune using LogTime:
external_table("EmpLogs")
| where Employee in ("John Doe", ...)
| where PartitionDate between(datetime(2020-01-01 10:00:00) .. datetime(2020-01-01 11:00:00))
| where LogTime ...

To use the output of a lookup activity to query the db and write to a csv file in storage account usin ADF

My requirement is to use ADF to read data (columnA) from an xlx/csv file which is in the storage account and use that (columnA) to query my db and the output of my query which includes (columnA) should be written to a file in storage account.
I was able to read the data from the storage account but getting it as table. I Need to use it as a individual entry like select * from table where id=columnA.
Then the next task if I'm able to read each data, how to write it to a file
I used lookup activity to read data from excel, the below is the sample output, I need to use only the sku number for my query next, not able to proceed with this. Kindly suggest a solution
I set a variable as the output of the lookup as suggested here https://www.mssqltips.com/sqlservertip/6185/azure-data-factory-lookup-activity-example/ and tried to use that variable in my query, but I'm getting exception when I trigger it, bad template error.
Please try this:
I create a sample like yours and there is no need to use set variable.
Details:
Below is lookup output:
{
"count": 3,
"value": [
{
"SKU": "aaaa"
},
{
"SKU": "bbbb"
},
{
"SKU": "ccc"
}
]
}
Setting of copy data activity:
Query sql:
select * from data_source_table where Name = '#{activity('Lookup1').output.value[0].SKU}'
You can also use this sql,if you need:
select * from data_source_table where Name in('#{activity('Lookup1').output.value[0].SKU}','#{activity('Lookup1').output.value[1].SKU}','#{activity('Lookup1').output.value[2].SKU}')
This is my test data in my SQL DataBase:
Here is the result:
1,"aaaa",0,2017-09-01 00:56:00.0000000
2,"bbbb",0,2017-09-02 05:23:00.0000000
Hope this can help you.
Update:
You can try to use DataFlow.
source1 is your csv file,source2 is SQL DataBase.
This is setting of lookup
Filter condition:!isNull(PersonID)(One column in your SQL DataBase.)
Then,use select delete the SKU column.
Finally,Output to single file.

What does that error mean in Azure Data factory?

What I want to do is, I have 6 tables in Azure Warehouse and I have 2 tables in Azure database and I want to execute a pipeline which will copy data from one table SalesLT.ProductCategory which is at Warehouse to dbo.DimProductCategory which is at Database.
And I am getting the error like this.
{ "errorCode": "2200", "message": "ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed. Please search error to get more details.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=Violation of PRIMARY KEY constraint 'PK__DimProdu__3224ECEE2FD4E7AD'. Cannot insert duplicate key in object 'dbo.DimProductCategory'. The duplicate key value is (1).\r\nThe statement has been terminated.,Source=.Net SqlClient Data Provider,SqlErrorNumber=2627,Class=14,ErrorCode=-2146232060,State=1,Errors=[{Class=14,Number=2627,State=1,Message=Violation of PRIMARY KEY constraint 'PK__DimProdu__3224ECEE2FD4E7AD'. Cannot insert duplicate key in object 'dbo.DimProductCategory'. The duplicate key value is (1).,},{Class=0,Number=3621,State=0,Message=The statement has been terminated.,},],'", "failureType": "UserError", "target": "productcategory", "details": [] }
Here is the structure of the DimProductCategory Table..
create table DimProductCategory
(
ProductCategoryID int not null primary key,
name nvarchar(max)
)
I have tried deleting primary key from dimProductCategory table but still got the same error.

Data Factory V2 Query Azure Table Storage but use a lookup Value

I have a SQL watermark table which contains the last date in my destination table
My source data is coming from an Azure Storage Table and the date time is a string
I set up the date time in the watermark table to match the format in the Azure table storage
I create a lookup and a copy task
If I hard code the date into the Query for source and run this works fine CreatedAt ge '2019-03-06T14:03:11.000Z'
But obviously I dont want to hard code this value. I want to use the date from the lookup
But when I replace the hardcoded date with the lookup value
CreatedAt ge 'activity('LookupWatermarkOld').output'
I get an error
{
"errorCode": "2200",
"message":"ErrorCode=FailedStorageOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A
storage operation failed with the following error 'The remote server returned an error: (400) Bad Request.'.,Source=,
''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (400) Bad Request.,
Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=Syntax
error at position 42 in 'CreatedAt ge 'activity('LookupWatermarkOld').output''.\nRequestId:8c65ced9-b002-0051-79d9-d41d49000000\nTime:2019-03-07T11:35:39.0640233Z,,''Type=System.Net.WebException,Message=The remote server returned an error: (400) Bad Request.,Source=Microsoft.WindowsAzure.Storage,'",
"failureType": "UserError",
"target": "CopyMentions"
}
Can anyone help me with this? How do you use the Lookup value in a Azure Table query?
check this out:
1) Lookup activity. Query field:
SELECT MAX(WatermarkColumnName) as LastId FROM TableName;
Also, make sure that you checked "First row only" option.
2) In Copy Data activity use query. Query field:
#concat('SELECT * FROM TableName as s WHERE s.WatermarkColumnName > ''', activity('LookupActivity').output.firstRow.LastID, '''')
Finally I got some help on this and it works with
CreatedAt gt '#{activity('LookupWatermarkOld').output.firstRow.WaterMarkValue}'
the WaterarkValue is the column name from the SQL Lookup table
The Lookup creates an array so you have to specify the FirstRow from this array
And wrap in '' so its used as a string value
--For recent ADFv2
Use the watermark/lookup/output value in parameter.
Example: ParamUserCount = #{activity('LookupActivity').output.count}
or for output function
and you can use it in query as
Example: "select * from userDetails where usercount = {$ParamUserCount}"
make sure you enclose the query in " " to set as string and parameter in query should be enclosed in { }

Resources