I have several JSON files with below structure in my cosmos DB.
[
{
"USA": {
"Applicable": "Yes",
"Location": {
"City": [
"San Jose",
"San Diego"
]
}
}
}]
I want to query all the results/files that has the array value of city = "San Diego".
I've tried the below sql queries
SELECT DISTINCT *
FROM c["USA"]["Location"]
WHERE ["City"] IN ('San Diego')
SELECT DISTINCT *
FROM c["USA"]["Location"]
WHERE ["City"] = 'San Diego'
SELECT c
FROM c JOIN d IN c["USA"]["Location"]
WHERE d["City"] = 'San Diego'
I'm getting the results as 0 - 0
You need to query data from your entire document, where your USA.Location.City array contains an item. For example:
SELECT *
FROM c
WHERE ARRAY_CONTAINS (c.USA.Location.City, "San Jose")
This will give you what you're trying to achieve.
Note: You have a slight anti-pattern in your schema, using "USA" as the key, which means you can't easily query all the location names. You should replace this with something like:
{
"Country": "USA",
"CountryMetadata": {
"Applicable": "Yes",
"Location": {
"City": [
"San Jose",
"San Diego"
]
}
}
}
This lets you query all the different countries. And the query above would then need only a slight change:
SELECT *
FROM c
WHERE c.Country = "USA
AND ARRAY_CONTAINS (c.CountryMetadata.Location.City, "San Jose")
Note that the query now works for any country, and you can pass in country value as a parameter (vs needing to hardcode the country name into the query because it's an actual key name).
Tl;dr don't put values as your keys.
Related
I have many records in Cosmos DB container with the following structure (sample):
{
"id": "aaaa",
"itemCode": "1234",
"itemDesc": "TEST",
"otherfileds": ""
}
{
"id": "bbbb",
"itemCode": "1234",
"itemDesc": "TEST2",
"otherfileds": ""
}
{
"id": "cccc",
"itemCode": "5678",
"itemDesc": "HELLO",
"otherfileds": ""
}
{
"id": "dddd",
"itemCode": "5678",
"itemDesc": "HELLO",
"otherfileds": ""
}
{
"id": "eeee",
"itemCode": "9012",
"itemDesc": "WORLD",
"otherfileds": ""
}
{
"id": "ffff",
"itemCode": "9012",
"itemDesc": "WORLD",
"otherfileds": ""
}
Now I want to select records from this where an item code have a non distinct item description. Based on the above example records, I would like to return item code 1234 since it has different values of item descriptions in other records.
{
"id": "aaaa",
"itemCode": "1234",
"itemDesc": "TEST",
"otherfileds": ""
}
{
"id": "bbbb",
"itemCode": "1234",
"itemDesc": "TEST2",
"otherfileds": ""
}
I have tried the below query, but realised, it will return the duplicate entries which has same item code and description only.
select count(1) from (select distinct value d.itemCode FROM (SELECT
c.itemCode, c.itemDesc, COUNT(1) as dupcount
FROM
c where c.itemCode<>null
GROUP BY
c.itemCode, c.itemDesc) d where d.dupcount>1 )
But I need to find records where the same item code is having different item descriptions (the query above will return only records which has more than one occurrence of item code/descriptions, ie, item code 9012 and 5678)
EDIT
I think i managed to form the query to filter these results by 2 sub queries (I think this could be improved though).
select e.itemCode from (select d.itemCode, count(1) as dupcount FROM
(SELECT
c.itemCode, c.itemDesc
FROM
c where c.itemCode<>null
GROUP BY
c.itemCode, c.itemDesc) d group by d.itemCode )e where e.dupcount>1
I think I managed to form the query to filter these results by 2 sub-queries (I think this could be improved though).
select distinct e.itemCode from (select d.itemCode, count(1) as dupcount FROM
(SELECT
c.itemCode, c.itemDesc
FROM
c where c.itemCode<>null
GROUP BY
c.itemCode, c.itemDesc) d group by d.itemCode )e where e.dupcount>1
Using this structure as an example, saved in a Cosmos database (course-database) in a collection (course-collection):
{
"courseId": "courseId",
"sessions": [
{
"sessionId": "sessionId1",
"venues": [
{
"id": "venueId1"
},
{
"id": "venueId2"
}
]
},
{
"sessionId": "sessionId2",
"venues": [
{
"id": "venueId3"
},
{
"id": "venueId4"
}
]
}
]
}
How do you create the SQL:
Count the total number of courses, where a course has at least one session, which has at least one venue, which has an ID equals to e.g. venueId3
I've got this so far, but it restricts to the first item of the list, as opposed to just any:
SELECT COUNT(c.id) FROM c WHERE c.sessions[0].venues[0].id = "id"
The answer was join:
SELECT COUNT(c.id)
FROM c
JOIN s in c.sessions
JOIN v in s.venues
WHERE CONTAINS(v.id,"venueId3")
You would then add a new join the deeper in the JSON you would want to go e.g. if venues had an array of contacts:
SELECT COUNT(c.id)
FROM c
JOIN s in c.sessions
JOIN v in s.venues
JOIN co in v.contacts
WHERE CONTAINS(co.id,"contactId")
I have following json in my Cosmos DB:
[
{
"FirstName": "FirstName",
"LastName": "LastName",
"TechnologyRatings": [
{
"Technology": {
"Name": "C#",
"id": "d76d59a7-c9a3-404d-91dd-cf2596ee7501"
},
"Rating": 1
},
{
"Technology": {
"Name": "SQL",
"id": "5686189b-ccfc-41c6-bcdb-b56f80130b45",
},
"Rating": 2
}
],
"id": "7c34718f-ef01-4b40-9a03-f0880f424fd4",
"ModifiedAt": "2021-05-28T09:55:37.6260562Z",
"_rid": "GyRkALN-kZcCAAAAAAAAAA==",
"_self": "dbs/GyRkAA==/colls/GyRkALN-kZc=/docs/GyRkALN-kZcCAAAAAAAAAA==/",
"_etag": "\"00000000-0000-0000-53a7-9c3d693501d7\"",
"_attachments": "attachments/",
"_ts": 1622195737
}
]
Now I try to apply a filter on Technology.id and Rating. Meaning I want to select all entries for example with C# with Rating = 1 and SQL with Rating = 2.
Something like
(Technology.id = "d76d59a7-c9a3-404d-91dd-cf2596ee7501" and Rating = 1) OR (Technology.id = "5686189b-ccfc-41c6-bcdb-b56f80130b45" and Rating = 2)
As TechnologyRatings is an array that doesn't work.
I also played around with ARRAY_CONTAINS but I didn't get it to work.
SELECT VALUE c FROM c JOIN t IN c.TechnologyRatings WHERE ARRAY_CONTAINS([{"id": "d76d59a7-c9a3-404d-91dd-cf2596ee7501", "Rating": 1}, {"id": "5686189b-ccfc-41c6-bcdb-b56f80130b45", "Rating": 2}], {"id": t.Technology.id, "Rating": t.Rating}, true)
How can I write such a query?
You can try this SQL:
SELECT
Distinct VALUE c
FROM c
JOIN t IN c.TechnologyRatings
WHERE (t.Technology.id = "d76d59a7-c9a3-404d-91dd-cf2596ee7501" and t.Rating = 1) OR (t.Technology.id = "5686189b-ccfc-41c6-bcdb-b56f80130b45" and t.Rating = 2)
or
SELECT
VALUE c
FROM c
WHERE
(ARRAY_CONTAINS(c.TechnologyRatings,{"Technology": {"id":"d76d59a7-c9a3-404d-91dd-cf2596ee7501"}},true) and ARRAY_CONTAINS(c.TechnologyRatings,{"Rating":1},true))
OR
(ARRAY_CONTAINS(c.TechnologyRatings,{"Technology": {"id":"5686189b-ccfc-41c6-bcdb-b56f80130b45"}},true) and ARRAY_CONTAINS(c.TechnologyRatings,{"Rating":2},true))
Here's the query:
SELECT VALUE root FROM root JOIN (SELECT VALUE EXISTS(SELECT VALUE tRatings FROM root JOIN tRatings IN root["TechnologyRatings"]
WHERE ((tRatings["Technology"]["id"] = "5686189b-ccfc-41c6-bcdb-b56f80130b45") OR (tRatings["Technology"]["id"] = "d76d59a7-c9a3-404d-91dd-cf2596ee7501")))) AS found WHERE found
Note that you must make sure to include a partition key on that query to avoid extra delays and costs on the query.
If the partition key was the 'id' field, the query would look like this:
SELECT VALUE root FROM root JOIN (SELECT VALUE EXISTS(SELECT VALUE tRatings FROM root JOIN tRatings IN root["TechnologyRatings"]
WHERE ((tRatings["Technology"]["id"] = "5686189b-ccfc-41c6-bcdb-b56f80130b45") OR (tRatings["Technology"]["id"] = "d76d59a7-c9a3-404d-91dd-cf2596ee7501")))) AS found
WHERE ((root["id"] = "5686189b-ccfc-41c6-bcdb-b56f80130b45") AND found)
The query with the partition key has the following stats
I want to use azure data factory to call an api, parse some json data, and save this into a azure sql database.
There is no api which returns all the customer data in one go so this can only be done one customer at a time. This can not be changed. I have a customer api which returns a basic list of customer numbers, for example:
{
"customerId": 100001,
"customerId": 100002,
"customerId": 100003,
"customerId": 100004,
"customerId": 100005,
"customerId": 100006,
"customerId": 100007,
}
I am using a http api connection in data factory to retrieve this list, then using a for each loop to go through them one by one and triggering another pipeline. This other pipeline will go and get the customer data, which looks like this:
{
"customerId": 125488,
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "fax",
"number": "646 555-4567"
}
],
"history": [
{
"description": "registered",
"date": "2021-01-011T01:45:53.5736067Z"
},
{
"description": "verificationRequired",
"date": "2021-01-011T01:49:53.5736067Z"
},
{
"description": "verified",
"date": "2021-01-011T01:56:53.5736067Z"
}
]
}
My goal is to put customerid, firstname, lastname and age into a customers table, like the one shown below.
create table customer (
customerId int,
firstName varchar(255),
lastName varchar(255),
age int
)
This part I have already done using the copy data (api to sql). My next goal is to put the phone numbers into a phone number table, like the one shown below.
create table phonenumber (
customerId int,
phoneNumber varchar(255)
phoneType varchar(255)
)
I also want to repeat this for customer history too.
I am using the copy data task in a pipeline to move the customer data into the customer table, but I cannot do multiple outputs writing to different tables. The only way I can think to do this is having three different pipelines for the three different tables, but this means calling the api three different times for the same data. There must be a better way?
Value your help
Peace, Amjid
I think you can create a stored procedure to move the customer data into serveral tables.
My example is as follows
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
alter PROCEDURE [dbo].[uspCustomer] #json NVARCHAR(MAX)
AS
BEGIN
INSERT INTO dbo.customer(customerId,firstName,lastName,age)
SELECT customerId,firstName,lastName,age
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
firstName VARCHAR(255) '$.firstName',
lastName VARCHAR(255) '$.lastName',
age INT '$.age'
);
INSERT INTO dbo.phonenumber(customerId,phoneNumber,phoneType)
SELECT customerId,phoneNumber,phoneType
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
phoneNumber VARCHAR(255) '$.phoneNumber[0].number',
phoneType VARCHAR(255) '$.phoneNumber[0].type'
);
INSERT INTO dbo.phonenumber(customerId,phoneNumber,phoneType)
SELECT customerId,phoneNumber,phoneType
FROM OPENJSON(#json)
WITH (
customerId INT '$.customerId',
phoneNumber VARCHAR(255) '$.phoneNumber[1].number',
phoneType VARCHAR(255) '$.phoneNumber[1].type'
);
END
The following is a test of the stored procedure.
DECLARE #json NVARCHAR(MAX);
SET #json = '{"customerId": 125488,"firstName": "John","lastName": "Smith","age": 25,"address": {"streetAddress": "21 2nd Street","city": "New York","state": "NY","postalCode": "10021"},"phoneNumber":[{"type": "home","number": "212 555-1234"},{"type": "fax","number": "646 555-4567"}]};'
exec [dbo].[uspCustomer] #json
The result is as follows:
I think you can use a stored procedure activity to accept this json Object.
I want to get entire document in a bucket using N1ql
My document Name is CR_5031_114156723_2016-08-02 where other then CR numbers will be different for different document.
I tried with below 2 query but getting only OrderDetails details
select d.* from Delivery.OrderDetails d where d.orderId in ['114156723']
select d.*,Delivery.OrderLines from Delivery.OrderDetails d where d.orderId in ['114156723']
Delivery is my bucket Name
Below is the document
Please help me in writing a query to get entire document
{
"OrderDetails": {
"orderId": "114156737",
"vanNumber": "5J",
"voucherPromotionName": "Computers for Schools",
"customerNumber": "85516242",
"shortOrderNumber": "4692",
"VoucherName": "Clubcard Voucher"
},
"OrderLines": {
"Product": [
{
"isApplicableForVat": "N",
"productQuantity": "6",
"productId": "52599951",
"productDescription": "Ni Pstrd S/Skimmed Milk 3ltr "
},
{
"isApplicableForVat": "Y",
"productQuantity": "1",
"productId": "55771771",
"productDescription": "Dale Farm Vanilla Ice Cream 1ltr *"
}
]
},
"DeliveryDetails": {
"deliverySlotStartTime": "20:00",
"deliverySlotEndTime": "21:00"
},
"ECoupons": {
"coupon": "0.0000"
},
"_class": "com.model.CustomerReceipt",
"OutOfStockProducts": {},
}
I tried with below 2 query but getting only OrderDetails details
select d.* from Delivery.OrderDetails d where d.orderId in ['114156723']
select d.*,Delivery.OrderLines from Delivery.OrderDetails d where d.orderId in ['114156723']
Delivery is my bucket Name
Thanks
Do this to get the full document:
select d from Delivery d where d.OrderDetails.orderId in ['114156723']