Parsing JSON in Azure DataFactory - azure

I want to use azure data factory to call an api, parse some json data, and save this into a azure sql database.
There is no api which returns all the customer data in one go so this can only be done one customer at a time. This can not be changed. I have a customer api which returns a basic list of customer numbers, for example:
{
"customerId": 100001,
"customerId": 100002,
"customerId": 100003,
"customerId": 100004,
"customerId": 100005,
"customerId": 100006,
"customerId": 100007,
}
I am using a http api connection in data factory to retrieve this list, then using a for each loop to go through them one by one and triggering another pipeline. This other pipeline will go and get the customer data, which looks like this:
{
"customerId": 125488,
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "fax",
"number": "646 555-4567"
}
],
"history": [
{
"description": "registered",
"date": "2021-01-011T01:45:53.5736067Z"
},
{
"description": "verificationRequired",
"date": "2021-01-011T01:49:53.5736067Z"
},
{
"description": "verified",
"date": "2021-01-011T01:56:53.5736067Z"
}
]
}
My goal is to put customerid, firstname, lastname and age into a customers table, like the one shown below.
create table customer (
customerId int,
firstName varchar(255),
lastName varchar(255),
age int
)
This part I have already done using the copy data (api to sql). My next goal is to put the phone numbers into a phone number table, like the one shown below.
create table phonenumber (
customerId int,
phoneNumber varchar(255)
phoneType varchar(255)
)
I also want to repeat this for customer history too.
I am using the copy data task in a pipeline to move the customer data into the customer table, but I cannot do multiple outputs writing to different tables. The only way I can think to do this is having three different pipelines for the three different tables, but this means calling the api three different times for the same data. There must be a better way?
Value your help
Peace, Amjid

I think you can create a stored procedure to move the customer data into serveral tables.
My example is as follows
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
alter PROCEDURE [dbo].[uspCustomer] #json NVARCHAR(MAX)
AS
BEGIN
INSERT INTO dbo.customer(customerId,firstName,lastName,age)
SELECT customerId,firstName,lastName,age
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
firstName VARCHAR(255) '$.firstName',
lastName VARCHAR(255) '$.lastName',
age INT '$.age'
);
INSERT INTO dbo.phonenumber(customerId,phoneNumber,phoneType)
SELECT customerId,phoneNumber,phoneType
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
phoneNumber VARCHAR(255) '$.phoneNumber[0].number',
phoneType VARCHAR(255) '$.phoneNumber[0].type'
);
INSERT INTO dbo.phonenumber(customerId,phoneNumber,phoneType)
SELECT customerId,phoneNumber,phoneType
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
phoneNumber VARCHAR(255) '$.phoneNumber[1].number',
phoneType VARCHAR(255) '$.phoneNumber[1].type'
);
END
The following is a test of the stored procedure.
DECLARE #json NVARCHAR(MAX);
SET #json = '{"customerId": 125488,"firstName": "John","lastName": "Smith","age": 25,"address": {"streetAddress": "21 2nd Street","city": "New York","state": "NY","postalCode": "10021"},"phoneNumber":[{"type": "home","number": "212 555-1234"},{"type": "fax","number": "646 555-4567"}]};'
exec [dbo].[uspCustomer] #json
The result is as follows:
I think you can use a stored procedure activity to accept this json Object.

Related

SQL Query CONTAINS property value in array of objects

I am trying to create a SQL query to get a list of companies that a User belongs to. The database is Cosmos DB Serverless, and the container is called "Companies" with multiple company items inside:
The structure of the company items are as follows:
{
"id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"name": "Company Name",
"users": [
{
"id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"name": "Susan Washington",
"email": "susan.washington#gmail.com",
"createdBy": "xxxx#gmail.com",
"createdServerDateUTC": "2022-01-12T19:21:10.0644424Z",
"createdLocalTime": "2022-01-12T19:21:09Z"
},
{
"id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"name": "Kerwin Evans",
"title": "Test Dev",
"email": "kerwin.e#yahoo.com",
"createdBy": "xxxx#gmail.com",
"createdServerDateUTC": "2022-01-12T19:21:10.0644424Z",
"createdLocalTime": "2022-01-12T19:21:09Z"
},
ETC.
]
}
And this is the SQL query I was trying to use, where user is an email that I pass in:
SELECT *
FROM c
WHERE IS_NULL(c.deletedServerDateUTC) = true
AND CONTAINS(c.users, user)
ORDER BY c.name DESC
OFFSET 0 LIMIT 10
This doesn't work, because the users property is an array. So I believe I need to check each object in the users array to see if the email property matches the user I enter in.
You can query the array via ARRAY_CONTAINS(). Something like this to return company names for a given username that you specify:
SELECT c.name
FROM c
WHERE ARRAY_CONTAINS(c.users,{'name': username}, true)
The 3rd parameter set to true means the array elements are documents, not scalar values.

Cosmos DB Query Array value using SQL

I have several JSON files with below structure in my cosmos DB.
[
{
"USA": {
"Applicable": "Yes",
"Location": {
"City": [
"San Jose",
"San Diego"
]
}
}
}]
I want to query all the results/files that has the array value of city = "San Diego".
I've tried the below sql queries
SELECT DISTINCT *
FROM c["USA"]["Location"]
WHERE ["City"] IN ('San Diego')
SELECT DISTINCT *
FROM c["USA"]["Location"]
WHERE ["City"] = 'San Diego'
SELECT c
FROM c JOIN d IN c["USA"]["Location"]
WHERE d["City"] = 'San Diego'
I'm getting the results as 0 - 0
You need to query data from your entire document, where your USA.Location.City array contains an item. For example:
SELECT *
FROM c
WHERE ARRAY_CONTAINS (c.USA.Location.City, "San Jose")
This will give you what you're trying to achieve.
Note: You have a slight anti-pattern in your schema, using "USA" as the key, which means you can't easily query all the location names. You should replace this with something like:
{
"Country": "USA",
"CountryMetadata": {
"Applicable": "Yes",
"Location": {
"City": [
"San Jose",
"San Diego"
]
}
}
}
This lets you query all the different countries. And the query above would then need only a slight change:
SELECT *
FROM c
WHERE c.Country = "USA
AND ARRAY_CONTAINS (c.CountryMetadata.Location.City, "San Jose")
Note that the query now works for any country, and you can pass in country value as a parameter (vs needing to hardcode the country name into the query because it's an actual key name).
Tl;dr don't put values as your keys.

Can we parse and copy json array into multiple sql tables via ADF

I want to use azure data factory parse some json data and copy this into a azure sql database.
The customer data, which looks like this:
{
"customerId": 125488,
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "fax",
"number": "646 555-4567"
}
]
}
My goal is to put customerId, firstName, lastNameand age into a customers table, like the one shown below.
create table dbo.customer (
customerId int,
firstName varchar(255),
lastName varchar(255),
age int
)
This part I have already done using the copy data (api to sql). My next goal is to put the phone numbers into a phone number table, like the one shown below.
create table dbo.phonenumber (
customerId int,
phoneNumber varchar(255)
phoneType varchar(255)
)
I am using the copy activity in a pipeline to move the customer data into the customer table, but I cannot do multiple outputs writing to different tables. Can we do that in one pipeline?
I think you can use stored procedure in copy activity to copy the data into serveral tables.
I created a simple test as follows:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
alter PROCEDURE [dbo].[uspCustomer] #json NVARCHAR(MAX)
AS
BEGIN
INSERT INTO dbo.customer(customerId,firstName,lastName,age)
SELECT customerId,firstName,lastName,age
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
firstName VARCHAR(255) '$.firstName',
lastName VARCHAR(255) '$.lastName',
age INT '$.age'
);
INSERT INTO dbo.phonenumber(customerId,phoneNumber,phoneType)
SELECT customerId,phoneNumber,phoneType
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
phoneNumber VARCHAR(255) '$.phoneNumber[0].number',
phoneType VARCHAR(255) '$.phoneNumber[0].type'
);
INSERT INTO dbo.phonenumber(customerId,phoneNumber,phoneType)
SELECT customerId,phoneNumber,phoneType
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
phoneNumber VARCHAR(255) '$.phoneNumber[1].number',
phoneType VARCHAR(255) '$.phoneNumber[1].type'
);
END
The following is a test of the stored procedure.
DECLARE #json NVARCHAR(MAX);
SET #json = '{"customerId": 125488,"firstName": "John","lastName": "Smith","age": 25,"address": {"streetAddress": "21 2nd Street","city": "New York","state": "NY","postalCode": "10021"},"phoneNumber":[{"type": "home","number": "212 555-1234"},{"type": "fax","number": "646 555-4567"}]};'
exec [dbo].[uspCustomer] #json
The result is as follows:
That's all.

Stream Analytics JSON Input query parsing

I am doing a POC on ingesting a JSON though EventHub, processing it through Stream job and pushing it into a Azure SQL DW.
I have worked with JSON ingestion before but the difficulty I face now is with the naming structure used in JSON.
Here is the sample:
{
"1-1": [{
"Details": [{
"FirstName": "Super",
"LastName": "Man"
}
]
}
]
}
The root element has a hyphen (-) and I am having tough time parsing through this element to access the relevant items.
I have tried the following queries and I get NULLs in the SQL tables it outputs to:
--#1
SELECT
["2-1"].Details.FirstName AS First_Name
,["2-1"].Details.LastName AS Last_Name
INTO
[SA-OUTPUT]
FROM
[SA-INPUT]
--#2
SELECT
[2-1].Details.FirstName AS First_Name
,[2-1].Details.LastName AS Last_Name
INTO
[SA-OUTPUT]
FROM
[SA-INPUT]
--#3
SELECT
2-1.Details.FirstName AS First_Name
,2-1.Details.LastName AS Last_Name
INTO
[SA-OUTPUT]
FROM
[SA-INPUT]
--#4
SELECT
SA-INPUT.["2-1"].Details.FirstName AS First_Name
,SA-INPUT.["2-1"].Details.LastName AS Last_Name
INTO
[SA-OUTPUT]
FROM
[SA-INPUT]
Would appreciate the correct way to do this.
Thanks in advance.
Your JSON schema is nested but also has some arrays. In order to read the data you will need to use the GetArrayElement function.
Here's a query that will read your sample data:
WITH Step1 AS(
SELECT GetArrayElement([1-1], 0) as FirstLevel
FROM iothub),
Step2 AS(
SELECT GetArrayElement(FirstLevel.Details,0) SecondLevel
FROM Step1)
SELECT SecondLevel.FirstName, SecondLevel.LastName from Step2
For more info, you can have a look at our page Work with complex Data Types in JSON and AVRO.
Let me know if you have any question.
Thanks,
JS (ASA team)
It tried and it worked beautifully. If lets say I have to generate data from two separate array elements, I would have to create two separate CTEs.
{
"1-1": [{
"Details": [{
"FirstName": "Super",
"LastName": "Man"
}
]
}
]
},
{
"2-1": [{
"Address": [{
"Street": "Main",
"Lane": "Second"
}
]
}
]
}
How do I merge elements from two CTEs into one output query? I can only refer CTE in the following line.

Efficient way to model to Cassandra?

I'm a Cassandra newbie and I'm trying to model this json to cassandra tables
the module has N criteria
the criteria has N procedure
the procedure has N activities
the activity has N tasks
-
{
"module": [{
"id": "xxxxxx",
"type": "module",
"name": "xxxxxxx",
"criteria": [{
"id": "xxxxxx",
"type": "crieria",
"name": "xxxxxxx",
"procedure": [{
"id": "xxxxxx",
"type": "procedure",
"name": "xxxxxxx",
"activity": [{
"id": "xxxxxx",
"type": "activity",
"name": "xxxxxxx",
"task": [{
"id": "xxxxxx",
"type": "activity",
"name": "xxxxxxx",
"user_assigned": "xxxxxx"
"date": "xxxxxxx"
},
....
]
},
....
]
},
....
]
},
....
]
},
....
]
}
I was trying with UDT but Non-frozen UDTs are not allowed inside collections: map and I want to let the user update specific parts.
CREATE TYPE IF NOT EXISTS task (
id text,
enumerate text,
name text,
user_assigned text,
description text
);
CREATE TYPE IF NOT EXISTS activity (
id text,
enumerate text,
name text,
description text,
task map<int, frozen<task>>
);
CREATE TYPE IF NOT EXISTS procedure (
id text,
enumerate text,
name text,
description text,
activity map<int, frozen<activity>>
);
CREATE TYPE IF NOT EXISTS criteria (
id text,
enumerate text,
name text,
description text,
procedure map<int, frozen<procedure>>
);
CREATE TYPE IF NOT EXISTS module (
id text,
enumerate text,
name text,
description text,
criteria map<int, frozen<criteria>>
);
CREATE TABLE IF NOT EXISTS certification (
id timeuuid,
owner text,
description text,
name text,
template map<int, frozen<module>>,
images map<text, text>,
PRIMARY KEY (id, owner)
);
I've limitations if I use collections
The recomendations are to try query-first approach to start designing but I want to start showing all the data and go in by links to the tasks
what is the best way to model it?.
UPDATE
Queries I need
Q1. Lookup modules by ID.
Q2. Lookup for modules, criteria, procedure, activity in the same
time
Q3. Lookup tasks by user_assigned
Q4. Lookup tasks by activity

Resources