To use the output of a lookup activity to query the db and write to a csv file in storage account usin ADF - azure

My requirement is to use ADF to read data (columnA) from an xlx/csv file which is in the storage account and use that (columnA) to query my db and the output of my query which includes (columnA) should be written to a file in storage account.
I was able to read the data from the storage account but getting it as table. I Need to use it as a individual entry like select * from table where id=columnA.
Then the next task if I'm able to read each data, how to write it to a file
I used lookup activity to read data from excel, the below is the sample output, I need to use only the sku number for my query next, not able to proceed with this. Kindly suggest a solution
I set a variable as the output of the lookup as suggested here https://www.mssqltips.com/sqlservertip/6185/azure-data-factory-lookup-activity-example/ and tried to use that variable in my query, but I'm getting exception when I trigger it, bad template error.

Please try this:
I create a sample like yours and there is no need to use set variable.
Details:
Below is lookup output:
{
"count": 3,
"value": [
{
"SKU": "aaaa"
},
{
"SKU": "bbbb"
},
{
"SKU": "ccc"
}
]
}
Setting of copy data activity:
Query sql:
select * from data_source_table where Name = '#{activity('Lookup1').output.value[0].SKU}'
You can also use this sql,if you need:
select * from data_source_table where Name in('#{activity('Lookup1').output.value[0].SKU}','#{activity('Lookup1').output.value[1].SKU}','#{activity('Lookup1').output.value[2].SKU}')
This is my test data in my SQL DataBase:
Here is the result:
1,"aaaa",0,2017-09-01 00:56:00.0000000
2,"bbbb",0,2017-09-02 05:23:00.0000000
Hope this can help you.
Update:
You can try to use DataFlow.
source1 is your csv file,source2 is SQL DataBase.
This is setting of lookup
Filter condition:!isNull(PersonID)(One column in your SQL DataBase.)
Then,use select delete the SKU column.
Finally,Output to single file.

Related

How can I ingest data from Apache Avro into the Azure Data Explorer?

for several days I'm trying to ingest Apache Avro formatted data from a blob storage into the Azure Data Explorer.
I'm able to reference the toplevel JSON-keys like $.Body (see red underlined example in the screenshot below), but when it goes to the nested JSON-keys, Azure fails to parse them properly and displays nothing (as seen in the green column: I would expect $.Body.entityId to reference the key "entityId" inside the Body-JSON).
Many thanks in advance for any help!
Here is a screenshot of the azure data explorer web interface
Edit 1
I already tried to increase the "Nested Levels" Option to 2, but all I got is this error message with no further details. The error message won't even disappear when I decrease the Level back to 1. I have to cancel and start the process all over agein.
I just recognize that the auto-generated columns have some strange types. Seems like they add up to the type string... This seems a little odd to me either.
Edit 2
Here is some kql-Code.
This is the schema of my input .avro file, what I get from my Eventhub-Capture:
{
SequenceNumber: ...,
Offset: ...,
EnqueuedTimeUTC: ...,
SystemProperties: ...,
Properties: ...,
Body: {
entityId: ...,
eventTime: ...,
messageId: ...,
data: ...
}
}, ...
And with these ingestion commands I can't reference the inner JSON-keys. The toplevel keys work perfectly fine.
// Create table command
////////////////////////////////////////////////////////////
.create table ['test_table'] (['Body']:dynamic, ['entityId']:string)
// Create mapping command
////////////////////////////////////////////////////////////
.create table ['test_table'] ingestion apacheavro mapping 'test_table_mapping' '[{"column":"Body", "Properties":{"Path":"$.Body"}},{"column":"entityId", "Properties":{"Path":"$.Body.entityId"}}]'
// Ingest data into table command
///////////////////////////////////////////////////////////
.ingest async into table ['test_table'] (h'[SAS URL]') with (format='apacheavro',ingestionMappingReference='test_table_mapping',ingestionMappingType='apacheavro',tags="['503a2cfb-5b81-4c07-8658-639009870862']")
I would love to ingest the inner data fields on separate columns, instead of building any workaround with update policies.
For those having the same issue, here is the workaround we currently use:
First, assume that we want to ingest the contents of the Body field from the avro file to the table avro_destination.
Step 1: Create an ingestion table
.create table avro_ingest(
Body: dynamic
// optional other columns, if you want...
)
Step 2: Create an update policy
.create-or-alter function
with (docstring = 'Convert avro_ingest to avro_destination', folder='ingest')
convert_avro_ingest() {
avro_ingest
| extend entityId = tostring(Body.entityId)
| extend messageId = tostring(Body.messageId)
| extend eventTime = todatetime(Body.eventTime)
| extend data = Body.data
| project entityId, messageId, eventTime, data
}
.alter table avro_destination policy update
#'[{ "IsEnabled": true, "Source": "avro_ingest", "Query": "convert_avro_ingest()", "IsTransactional": false, "PropagateIngestionProperties": true}]'
Step 3: Ingest the .avro files into the avro_ingest table
...as seen in the Question, with one Column containing the whole Body-JSON per entry.
Following the OP updates
Here is the Avro schema of an Event Hubs capture.
As you can see, Body as of type bytes, so there is practically nothing you can do with it at this form, other than ingesting it As Is (as Dynamic).
{
"type":"record",
"name":"EventData",
"namespace":"Microsoft.ServiceBus.Messaging",
"fields":[
{"name":"SequenceNumber","type":"long"},
{"name":"Offset","type":"string"},
{"name":"EnqueuedTimeUtc","type":"string"},
{"name":"SystemProperties","type":{"type":"map","values":["long","double","string","bytes"]}},
{"name":"Properties","type":{"type":"map","values":["long","double","string","bytes"]}},
{"name":"Body","type":["null","bytes"]}
]
}
If you'll take a look on the ingested data, you'll see that the content of Body is arrays of integers.
Those integers are the decimal values of the characters that construct Body.
capture
| project Body
| take 3
Body
[123,34,105,100,34,58,32,34,56,49,55,98,50,99,100,57,45,97,98,48,49,45,52,100,51,53,45,57,48,51,54,45,100,57,55,50,51,55,55,98,54,56,50,57,34,44,32,34,100,116,34,58,32,34,50,48,50,49,45,48,56,45,49,50,84,49,54,58,52,56,58,51,50,46,53,57,54,50,53,52,34,44,32,34,105,34,58,32,48,44,32,34,109,121,105,110,116,34,58,32,50,48,44,32,34,109,121,102,108,111,97,116,34,58,32,48,46,51,57,56,53,52,52,56,55,52,53,57,56,57,48,55,57,55,125]
[123,34,105,100,34,58,32,34,57,53,100,52,100,55,56,48,45,97,99,100,55,45,52,52,57,50,45,98,97,54,100,45,52,56,49,54,97,51,56,100,52,56,56,51,34,44,32,34,100,116,34,58,32,34,50,48,50,49,45,48,56,45,49,50,84,49,54,58,52,56,58,51,50,46,53,57,54,50,53,52,34,44,32,34,105,34,58,32,49,44,32,34,109,121,105,110,116,34,58,32,56,56,44,32,34,109,121,102,108,111,97,116,34,58,32,48,46,54,53,53,51,55,51,51,56,49,57,54,53,50,52,52,49,125]
[123,34,105,100,34,58,32,34,53,50,100,49,102,54,54,53,45,102,57,102,54,45,52,49,50,49,45,97,50,57,99,45,55,55,56,48,102,101,57,53,53,55,48,56,34,44,32,34,100,116,34,58,32,34,50,48,50,49,45,48,56,45,49,50,84,49,54,58,52,56,58,51,50,46,53,57,54,50,53,52,34,44,32,34,105,34,58,32,50,44,32,34,109,121,105,110,116,34,58,32,49,57,44,32,34,109,121,102,108,111,97,116,34,58,32,48,46,52,53,57,54,49,56,54,51,49,51,49,50,50,52,50,50,51,125]
Body can be converted to text using make_string() and then parsed to JSON using todynamic()
capture
| project BodyJSON = todynamic(make_string(Body))
| take 3
BodyJSON
{"id":"817b2cd9-ab01-4d35-9036-d972377b6829","dt":"2021-08-12T16:48:32.5962540Z","i":0,"myint":20,"myfloat":"0.398544874598908"}
{"id":"95d4d780-acd7-4492-ba6d-4816a38d4883","dt":"2021-08-12T16:48:32.5962540Z","i":1,"myint":88,"myfloat":"0.65537338196524408"}
{"id":"52d1f665-f9f6-4121-a29c-7780fe955708","dt":"2021-08-12T16:48:32.5962540Z","i":2,"myint":19,"myfloat":"0.45961863131224223"}
Simply increase "Nested levels" to 2.

Kusto/Azure Data Explorer - How can I partition an external table using a timespan field?

Hoping someone can help..
I am new to Kusto and have to get an external table reading data from an Azure Blob storage account working, but the one table I have is unique in that the data for the timestamp column is split into 2 separate columns , i.e. LogDate and LogTime (see script below).
My data is stored in the following structure in the Azure Storage account container (container is named "employeedata", for example):
{employeename}/{year}/{month}/{day}/{hour}/{minute}.csv, in a simple CSV format.
I know the CSV is good because if I import it into a normal Kusto table, it works perfectly.
My KQL script for the external table creation looks as follows:
.create-or-alter external table EmpLogs (Employee: string, LogDate: datetime, LogTime:timestamp)
kind=blob
partition by (EmployeeName:string = Employee, yyyy:datetime = startofday(LogDate), MM:datetime = startofday(LogDate), dd:datetime = startofday(LogDate), HH:datetime = todatetime(LogTime), mm:datetime = todatetime(LogTime))
pathformat = (EmployeeName "/" datetime_pattern("yyyy", yyyy) "/" datetime_pattern("MM", MM) "/" datetime_pattern("dd", dd) "/" substring(HH, 0, 2) "/" substring(mm, 3, 2) ".csv")
dataformat=csv
(
h#'************************'
)
with (folder="EmployeeInfo", includeHeaders="All")
I am getting the error below constantly, which is not very helpful (redacted from full error, basically comes down to the fact there is a syntax error somewhere):
Syntax error: Query could not be parsed: {
"error": {
"code": "BadRequest_SyntaxError",
"message": "Request is invalid and cannot be executed.",
"#type": "Kusto.Data.Exceptions.SyntaxException",
"#message": "Syntax error: Query could not be parsed: . Query: '.create-or-alter external table ........
I know the todatetime() function works on timespan's, I tested it with another table and it created a date similar to the following: 0001-01-01 20:18:00.0000000.
I have tried using the bin() function on the timestamp/LogTime columns, but the same error as above, and even tried importing the time value as a string and doing some string manipulation on it, no luck. Getting the same syntax error.
Any help/guidance would be greatly appreciated.
Thank you!!
Currently, there's no way to define an external table partition based on more than one column. If your dataset timestamp is splitted between two columns: LogDate:datetime and LogTime:timestamp, then the best you can do is use virtual column for the partition by time:
.create-or-alter external table EmpLogs(Employee: string, LogDate:datetime, LogTime:timespan)
kind=blob
partition by (EmployeeName:string = Employee, PartitionDate:datetime)
pathformat = (EmployeeName "/" datetime_pattern("yyyy/MM/dd/HH/mm", PartitionDate))
dataformat=csv
(
//h#'************************'
)
with (folder="EmployeeInfo", includeHeaders="All")
Now, you can filter by the virtual column and fine tune using LogTime:
external_table("EmpLogs")
| where Employee in ("John Doe", ...)
| where PartitionDate between(datetime(2020-01-01 10:00:00) .. datetime(2020-01-01 11:00:00))
| where LogTime ...

Azure Data factory v2 copy activity source value is null sink value is not allowed null

My question is about how/where to perform the type conversion during copy activity.
I have an Azure data factory pipeline which defines data imports from a TSV file in Data Lake Gen1 to SQL server database table.
The schema of the TSV file is: {QueryDate,Count1,Count2}
count1 and count2 may have no value.
example data in TSV file:
20180717 10
20180717 5 5
20180717 7 1
20180717 7
The Schema of the SQL server table is
{QueryDate(datetime2(7)), UserNumber(int), ActiveNumber(int)}
Both UserNumber and ActiveNumber have Not null constraint.
When I use copy activity in my pipeline to copy TSV data to the table I get an error like this:
error Code": "2200", "message":
"'Type=System.InvalidOperationException, does not allow
DBNull.Value...
when count1 or count2 have no value I want to use 0 to replace the null value.
And I think that will not cause the error anymore.
But I don't know where and how to perform this conversion.
The conversion should in the source dataset, the sink dataset or the copy activity?
And the correct syntax of the conversion I don't figure out either.
I was trying to set the nullValue of the source datatset format setting
with value like null, 0, NULL, but none of them works, I still get the error.
"typeProperties": {
"format": {
"type": "TextFormat",
"columnDelimiter": "\t",
"nullValue": "NULL",
"treatEmptyAsNull": true,
"skipLineCount": 0,
"firstRowAsHeader": false},
...
}
I also see the question: Azure Data Factory - can't convert from "null" to datetime field
but it still not solve my problem.

Where does Row Id come from when working the SQL connector of Logic Apps?

I'm executing a stored procedure using the On-Premise Data Gateway. The data returned is the following (obfuscated):
{
"OutputParameters": {},
"ResultSets": {
"Table1": [
{
"SOPNUMBE": "string",
"SYNCSTATUS3": "M",
"Tracking_Number": "string",
"OBJECTKEY": "2|string",
"ScribeModifiedBy": "UPS",
"ScribeModifiedDate": "2018-03-19T15:59:30.007"
},
{
...
}
]
}
}
Nothing in there says to me "this is the Row Id".
Is this a limitation of working with on-prem SQL?
To provide some additional information, my stored procedure takes some information from dbo.SCRIBESHADOW and another table. I intend to update dbo.SCRIBESHADOW. Here is a screenshot from SSMS.
Here are some sample rows. What would I put for the Row ID to update one of these rows?
Stored Procedure results do not have any Row ID because SP results are not table rows, even if the SP is composed of just one SELECT. This is because the SP result is it's own table.
So, no, this isn't any limitation, it's just how Stored Procedures work. If you need the RowID of any source table, you need to add it to the results.
To continue, if you need to Update a record based on the Stored Procedure results, then you will need to return from the SP the value of the Primary Key of the table you want to update. That value is the Row id in the Update row Action.
However, if you are using a Stored Procedure to retrieve the data, you really should also be using a Stored Procedure to update the data.

Azure stream analytics - Joining on a csv file returns 0 rows

I have the following query:
SELECT
[VanList].deviceId
,[VanList].[VanName]
events.[timestamp]
,events.externaltemp
,events.internaltemp
,events.humidity
,events.latitude
,events.longitude
INTO
[iot-powerBI]
FROM
[iot-EventHub] as events timestamp by [timestamp]
join [VanList] on events.DeviceId = [VanList].deviceId
where iot-eventHub is my event hub and VanList is a reference list (csv file) that has been uploaded to azure storage.
I have tried uploading sample data to test the query, but it always returns 0 rows.
Below is a sample of the JSON captured by my Event Hub Input
[
{
"DeviceId":1,
"Timestamp":"2015-06-29T12:15:18.0000000",
"ExternalTemp":9,
"InternalTemp":8,
"Humidity":43,
"Latitude":51.3854942,
"Longitude":-1.12774682,
"EventProcessedUtcTime":"2015-06-29T12:25:46.0932317Z",
"PartitionId":1,
"EventEnqueuedUtcTime":"2015-06-29T12:15:18.5990000Z"
} ]
Below is a sample of my CSV reference data.
deviceId,VanName
1,VAN 1
2,VAN 2
3,Standby Van
Both lists contain a device id of 1, so I am expecting my query to be able to join the two together.
I have tried using both "inner join" and "join" in my query syntax, but neither result in a successful join.
What is wrong with my Stream Analytics query?
Try adding a CAST function in the join. I'm not sure why that works and adding a CREATE TABLE clause for the VanList reference data input doesn't accomplish the same thing. But I think this works.
SELECT
[VanList].deviceId
,[VanList].[VanName]
,events.[timestamp]
,events.externaltemp
,events.internaltemp
,events.humidity
,events.latitude
,events.longitude
INTO
[iot-powerBI]
FROM
[iot-EventHub] as events timestamp by [Timestamp]
join [VanList] on events.DeviceId = cast([VanList].deviceId as bigint)
The only thing I can see is you are missing a comma in your original query, otherwise it looks correct. I would try recreating the Stream Analytics job. Here is another example that worked for me.
SELECT
countryref.CountryName as Geography,
input.GeographyId as GeographyId
into [country-out]
FROM input timestamp by [TransactionDateTime]
Join countryref
on countryref.GeographyID = input.GeographyId here
Input data example
{"pageid":801,"firstname":"Gertrude","geographyid":2,"itemid":2,"itemprice":79.0,"transactiondatetime":"2015-06-30T14:25:51.0000000","creditcardnumber":"2ggnC"}
{"pageid":801,"firstname":"Venice","geographyid":1,"itemid":10,"itemprice":169.0,"transactiondatetime":"2015-06-30T14:25:51.0000000","creditcardnumber":"xLyOp"}
{"pageid":801,"firstname":"Christinia","geographyid":2,"itemid":2,"itemprice":79.0,"transactiondatetime":"2015-06-30T14:25:51.0000000","creditcardnumber":"VuycQ"}
{"pageid":801,"firstname":"Dorethea","geographyid":4,"itemid":2,"itemprice":79.0,"transactiondatetime":"2015-06-30T14:25:51.0000000","creditcardnumber":"tgvQP"}
{"pageid":801,"firstname":"Dwain","geographyid":4,"itemid":4,"itemprice":129.0,"transactiondatetime":"2015-06-30T14:25:51.0000000","creditcardnumber":"O5TwV"}
Country ref data
[
{
"GeographyID":1,
"CountryName":"USA"
},
{
"GeographyID":2,
"CountryName":"China"
},
{
"GeographyID":3,
"CountryName":"Brazil"
},
{
"GeographyID":4,
"CountryName":"Andrews country"
},
{
"GeographyID":5,
"CountryName":"Chile"
}
]

Resources