Efficient way to model to Cassandra? - cassandra

I'm a Cassandra newbie and I'm trying to model this json to cassandra tables
the module has N criteria
the criteria has N procedure
the procedure has N activities
the activity has N tasks
-
{
"module": [{
"id": "xxxxxx",
"type": "module",
"name": "xxxxxxx",
"criteria": [{
"id": "xxxxxx",
"type": "crieria",
"name": "xxxxxxx",
"procedure": [{
"id": "xxxxxx",
"type": "procedure",
"name": "xxxxxxx",
"activity": [{
"id": "xxxxxx",
"type": "activity",
"name": "xxxxxxx",
"task": [{
"id": "xxxxxx",
"type": "activity",
"name": "xxxxxxx",
"user_assigned": "xxxxxx"
"date": "xxxxxxx"
},
....
]
},
....
]
},
....
]
},
....
]
},
....
]
}
I was trying with UDT but Non-frozen UDTs are not allowed inside collections: map and I want to let the user update specific parts.
CREATE TYPE IF NOT EXISTS task (
id text,
enumerate text,
name text,
user_assigned text,
description text
);
CREATE TYPE IF NOT EXISTS activity (
id text,
enumerate text,
name text,
description text,
task map<int, frozen<task>>
);
CREATE TYPE IF NOT EXISTS procedure (
id text,
enumerate text,
name text,
description text,
activity map<int, frozen<activity>>
);
CREATE TYPE IF NOT EXISTS criteria (
id text,
enumerate text,
name text,
description text,
procedure map<int, frozen<procedure>>
);
CREATE TYPE IF NOT EXISTS module (
id text,
enumerate text,
name text,
description text,
criteria map<int, frozen<criteria>>
);
CREATE TABLE IF NOT EXISTS certification (
id timeuuid,
owner text,
description text,
name text,
template map<int, frozen<module>>,
images map<text, text>,
PRIMARY KEY (id, owner)
);
I've limitations if I use collections
The recomendations are to try query-first approach to start designing but I want to start showing all the data and go in by links to the tasks
what is the best way to model it?.
UPDATE
Queries I need
Q1. Lookup modules by ID.
Q2. Lookup for modules, criteria, procedure, activity in the same
time
Q3. Lookup tasks by user_assigned
Q4. Lookup tasks by activity

Related

SQL Query CONTAINS property value in array of objects

I am trying to create a SQL query to get a list of companies that a User belongs to. The database is Cosmos DB Serverless, and the container is called "Companies" with multiple company items inside:
The structure of the company items are as follows:
{
"id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"name": "Company Name",
"users": [
{
"id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"name": "Susan Washington",
"email": "susan.washington#gmail.com",
"createdBy": "xxxx#gmail.com",
"createdServerDateUTC": "2022-01-12T19:21:10.0644424Z",
"createdLocalTime": "2022-01-12T19:21:09Z"
},
{
"id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"name": "Kerwin Evans",
"title": "Test Dev",
"email": "kerwin.e#yahoo.com",
"createdBy": "xxxx#gmail.com",
"createdServerDateUTC": "2022-01-12T19:21:10.0644424Z",
"createdLocalTime": "2022-01-12T19:21:09Z"
},
ETC.
]
}
And this is the SQL query I was trying to use, where user is an email that I pass in:
SELECT *
FROM c
WHERE IS_NULL(c.deletedServerDateUTC) = true
AND CONTAINS(c.users, user)
ORDER BY c.name DESC
OFFSET 0 LIMIT 10
This doesn't work, because the users property is an array. So I believe I need to check each object in the users array to see if the email property matches the user I enter in.
You can query the array via ARRAY_CONTAINS(). Something like this to return company names for a given username that you specify:
SELECT c.name
FROM c
WHERE ARRAY_CONTAINS(c.users,{'name': username}, true)
The 3rd parameter set to true means the array elements are documents, not scalar values.

Cosmos DB Query Array value using SQL

I have several JSON files with below structure in my cosmos DB.
[
{
"USA": {
"Applicable": "Yes",
"Location": {
"City": [
"San Jose",
"San Diego"
]
}
}
}]
I want to query all the results/files that has the array value of city = "San Diego".
I've tried the below sql queries
SELECT DISTINCT *
FROM c["USA"]["Location"]
WHERE ["City"] IN ('San Diego')
SELECT DISTINCT *
FROM c["USA"]["Location"]
WHERE ["City"] = 'San Diego'
SELECT c
FROM c JOIN d IN c["USA"]["Location"]
WHERE d["City"] = 'San Diego'
I'm getting the results as 0 - 0
You need to query data from your entire document, where your USA.Location.City array contains an item. For example:
SELECT *
FROM c
WHERE ARRAY_CONTAINS (c.USA.Location.City, "San Jose")
This will give you what you're trying to achieve.
Note: You have a slight anti-pattern in your schema, using "USA" as the key, which means you can't easily query all the location names. You should replace this with something like:
{
"Country": "USA",
"CountryMetadata": {
"Applicable": "Yes",
"Location": {
"City": [
"San Jose",
"San Diego"
]
}
}
}
This lets you query all the different countries. And the query above would then need only a slight change:
SELECT *
FROM c
WHERE c.Country = "USA
AND ARRAY_CONTAINS (c.CountryMetadata.Location.City, "San Jose")
Note that the query now works for any country, and you can pass in country value as a parameter (vs needing to hardcode the country name into the query because it's an actual key name).
Tl;dr don't put values as your keys.

Can we parse and copy json array into multiple sql tables via ADF

I want to use azure data factory parse some json data and copy this into a azure sql database.
The customer data, which looks like this:
{
"customerId": 125488,
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "fax",
"number": "646 555-4567"
}
]
}
My goal is to put customerId, firstName, lastNameand age into a customers table, like the one shown below.
create table dbo.customer (
customerId int,
firstName varchar(255),
lastName varchar(255),
age int
)
This part I have already done using the copy data (api to sql). My next goal is to put the phone numbers into a phone number table, like the one shown below.
create table dbo.phonenumber (
customerId int,
phoneNumber varchar(255)
phoneType varchar(255)
)
I am using the copy activity in a pipeline to move the customer data into the customer table, but I cannot do multiple outputs writing to different tables. Can we do that in one pipeline?
I think you can use stored procedure in copy activity to copy the data into serveral tables.
I created a simple test as follows:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
alter PROCEDURE [dbo].[uspCustomer] #json NVARCHAR(MAX)
AS
BEGIN
INSERT INTO dbo.customer(customerId,firstName,lastName,age)
SELECT customerId,firstName,lastName,age
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
firstName VARCHAR(255) '$.firstName',
lastName VARCHAR(255) '$.lastName',
age INT '$.age'
);
INSERT INTO dbo.phonenumber(customerId,phoneNumber,phoneType)
SELECT customerId,phoneNumber,phoneType
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
phoneNumber VARCHAR(255) '$.phoneNumber[0].number',
phoneType VARCHAR(255) '$.phoneNumber[0].type'
);
INSERT INTO dbo.phonenumber(customerId,phoneNumber,phoneType)
SELECT customerId,phoneNumber,phoneType
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
phoneNumber VARCHAR(255) '$.phoneNumber[1].number',
phoneType VARCHAR(255) '$.phoneNumber[1].type'
);
END
The following is a test of the stored procedure.
DECLARE #json NVARCHAR(MAX);
SET #json = '{"customerId": 125488,"firstName": "John","lastName": "Smith","age": 25,"address": {"streetAddress": "21 2nd Street","city": "New York","state": "NY","postalCode": "10021"},"phoneNumber":[{"type": "home","number": "212 555-1234"},{"type": "fax","number": "646 555-4567"}]};'
exec [dbo].[uspCustomer] #json
The result is as follows:
That's all.

Parsing JSON in Azure DataFactory

I want to use azure data factory to call an api, parse some json data, and save this into a azure sql database.
There is no api which returns all the customer data in one go so this can only be done one customer at a time. This can not be changed. I have a customer api which returns a basic list of customer numbers, for example:
{
"customerId": 100001,
"customerId": 100002,
"customerId": 100003,
"customerId": 100004,
"customerId": 100005,
"customerId": 100006,
"customerId": 100007,
}
I am using a http api connection in data factory to retrieve this list, then using a for each loop to go through them one by one and triggering another pipeline. This other pipeline will go and get the customer data, which looks like this:
{
"customerId": 125488,
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "fax",
"number": "646 555-4567"
}
],
"history": [
{
"description": "registered",
"date": "2021-01-011T01:45:53.5736067Z"
},
{
"description": "verificationRequired",
"date": "2021-01-011T01:49:53.5736067Z"
},
{
"description": "verified",
"date": "2021-01-011T01:56:53.5736067Z"
}
]
}
My goal is to put customerid, firstname, lastname and age into a customers table, like the one shown below.
create table customer (
customerId int,
firstName varchar(255),
lastName varchar(255),
age int
)
This part I have already done using the copy data (api to sql). My next goal is to put the phone numbers into a phone number table, like the one shown below.
create table phonenumber (
customerId int,
phoneNumber varchar(255)
phoneType varchar(255)
)
I also want to repeat this for customer history too.
I am using the copy data task in a pipeline to move the customer data into the customer table, but I cannot do multiple outputs writing to different tables. The only way I can think to do this is having three different pipelines for the three different tables, but this means calling the api three different times for the same data. There must be a better way?
Value your help
Peace, Amjid
I think you can create a stored procedure to move the customer data into serveral tables.
My example is as follows
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
alter PROCEDURE [dbo].[uspCustomer] #json NVARCHAR(MAX)
AS
BEGIN
INSERT INTO dbo.customer(customerId,firstName,lastName,age)
SELECT customerId,firstName,lastName,age
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
firstName VARCHAR(255) '$.firstName',
lastName VARCHAR(255) '$.lastName',
age INT '$.age'
);
INSERT INTO dbo.phonenumber(customerId,phoneNumber,phoneType)
SELECT customerId,phoneNumber,phoneType
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
phoneNumber VARCHAR(255) '$.phoneNumber[0].number',
phoneType VARCHAR(255) '$.phoneNumber[0].type'
);
INSERT INTO dbo.phonenumber(customerId,phoneNumber,phoneType)
SELECT customerId,phoneNumber,phoneType
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
phoneNumber VARCHAR(255) '$.phoneNumber[1].number',
phoneType VARCHAR(255) '$.phoneNumber[1].type'
);
END
The following is a test of the stored procedure.
DECLARE #json NVARCHAR(MAX);
SET #json = '{"customerId": 125488,"firstName": "John","lastName": "Smith","age": 25,"address": {"streetAddress": "21 2nd Street","city": "New York","state": "NY","postalCode": "10021"},"phoneNumber":[{"type": "home","number": "212 555-1234"},{"type": "fax","number": "646 555-4567"}]};'
exec [dbo].[uspCustomer] #json
The result is as follows:
I think you can use a stored procedure activity to accept this json Object.

How to define PRIMARY KEY in azure cosmosdb arm template

is it possible to define PRIMARY key in azure cosmosdb cassandra arm template ?
Let's say I have next table:
CREATE TABLE foo
(
id text
name test
PRIMARY KEY (id)
)
And my ARM template:
"schema":{
"columns":[
{
"name":"id",
"type":"text"
}
],
"partitionKeys":[
{"name":"id"} // how to define primary key ?
}
Primary key in Cassandra consists of one or more partition columns, and zero or more clustering columns. In ARM templates they are defined as partitionKeys and clusterKeys arrays of objects. Here is the example from documentation:
"partitionKeys": [
{ "name": "machine" },
{ "name": "cpu" },
{ "name": "mtime" }
],
"clusterKeys": [
{
"name": "loadid",
"orderBy": "asc"
}
]

Resources