I do have simple JSON data with an array, when I query array contains or join, the Request Unit is too high.
select value count(0)
From c
where ARRAY_CONTAINS(c.signerDetails, {'displayStatus':0}, true)
and c.orgId = "e27dd002-bad3-4444-aa2b-855e5d4e79c8"
For the above ARRAY_CONTAINS query, we get 1882 RU for 40518 Count
select count(0)
From c WHERE c.orgId = "e27dd002-bad3-4444-aa2b-855e5d4e79c8"
and EXISTS(SELECT VALUE t FROM t IN c.signerDetails WHERE t.displayStatus = 0)
For the above exists query, we do get 1960 Ru for 40518 count
{
"id": "1b675bb2-f783-48a0-93bb-967a990b5204f901795b-d18d-4c2d-a68c-61adf7e8e8da",
"orgId": "e27dd002-bad3-4444-aa2b-855e5d4e79c8", //Partition key
"userDetails": [
{
"userName": "",
"userEmail": "",
"displayStatus": 4,
},
{
"userName": "",
"userEmail": "",
"displayStatus": 1,
}
],
"status": "Completed",
"_ts": 1619535494
}
Please share your thoughts on reducing the RU, or do I need to JSON data format or do I need to consider any other database for this kind of operation.
PS: I do have complex array querying, on the above sample I have removed the same for the code brevity.
Related
I'm not sure how to query when using CosmosDb as I'm used to SQL. My question is about how to get the maximum value of a property in an array of arrays. I've been trying subqueries so far but apparently I don't understand very well how they work.
In an structure such as the one below, how do I query the city with more population among all states using the Data Explorer in Azure:
{
"id": 1,
"states": [
{
"name": "New York",
"cities": [
{
"name": "New York",
"population": 8500000
},
{
"name": "Hempstead",
"population": 750000
},
{
"name": "Brookhaven",
"population": 500000
}
]
},
{
"name": "California",
"cities":[
{
"name": "Los Angeles",
"population": 4000000
},
{
"name": "San Diego",
"population": 1400000
},
{
"name": "San Jose",
"population": 1000000
}
]
}
]
}
This is currently not possible as far as I know.
It would look a bit like this:
SELECT TOP 1 state.name as stateName, city.name as cityName, city.population FROM c
join state in c.states
join city in state.cities
--order by city.population desc <-- this does not work in this case
You could write a user defined function that will allow you to write the query you probably expect, similar to this: CosmosDB sort results by a value into an array
The result could look like:
SELECT c.name, udf.OnlyMaxPop(c.states) FROM c
function OnlyMaxPop(states){
function compareStates(stateA,stateB){
stateB.cities[0].poplulation - stateA.cities[0].population;
}
onlywithOneCity = states.map(s => {
maxpop = Math.max.apply(Math, s.cities.map(o => o.population));
return {
name: s.name,
cities: s.cities.filter(x => x.population === maxpop)
}
});
return onlywithOneCity.sort(compareStates)[0];
}
You would probably need to adapt the function to your exact query needs, but I am not certain what your desired result would look like.
I have following json in my Cosmos DB:
[
{
"FirstName": "FirstName",
"LastName": "LastName",
"TechnologyRatings": [
{
"Technology": {
"Name": "C#",
"id": "d76d59a7-c9a3-404d-91dd-cf2596ee7501"
},
"Rating": 1
},
{
"Technology": {
"Name": "SQL",
"id": "5686189b-ccfc-41c6-bcdb-b56f80130b45",
},
"Rating": 2
}
],
"id": "7c34718f-ef01-4b40-9a03-f0880f424fd4",
"ModifiedAt": "2021-05-28T09:55:37.6260562Z",
"_rid": "GyRkALN-kZcCAAAAAAAAAA==",
"_self": "dbs/GyRkAA==/colls/GyRkALN-kZc=/docs/GyRkALN-kZcCAAAAAAAAAA==/",
"_etag": "\"00000000-0000-0000-53a7-9c3d693501d7\"",
"_attachments": "attachments/",
"_ts": 1622195737
}
]
Now I try to apply a filter on Technology.id and Rating. Meaning I want to select all entries for example with C# with Rating = 1 and SQL with Rating = 2.
Something like
(Technology.id = "d76d59a7-c9a3-404d-91dd-cf2596ee7501" and Rating = 1) OR (Technology.id = "5686189b-ccfc-41c6-bcdb-b56f80130b45" and Rating = 2)
As TechnologyRatings is an array that doesn't work.
I also played around with ARRAY_CONTAINS but I didn't get it to work.
SELECT VALUE c FROM c JOIN t IN c.TechnologyRatings WHERE ARRAY_CONTAINS([{"id": "d76d59a7-c9a3-404d-91dd-cf2596ee7501", "Rating": 1}, {"id": "5686189b-ccfc-41c6-bcdb-b56f80130b45", "Rating": 2}], {"id": t.Technology.id, "Rating": t.Rating}, true)
How can I write such a query?
You can try this SQL:
SELECT
Distinct VALUE c
FROM c
JOIN t IN c.TechnologyRatings
WHERE (t.Technology.id = "d76d59a7-c9a3-404d-91dd-cf2596ee7501" and t.Rating = 1) OR (t.Technology.id = "5686189b-ccfc-41c6-bcdb-b56f80130b45" and t.Rating = 2)
or
SELECT
VALUE c
FROM c
WHERE
(ARRAY_CONTAINS(c.TechnologyRatings,{"Technology": {"id":"d76d59a7-c9a3-404d-91dd-cf2596ee7501"}},true) and ARRAY_CONTAINS(c.TechnologyRatings,{"Rating":1},true))
OR
(ARRAY_CONTAINS(c.TechnologyRatings,{"Technology": {"id":"5686189b-ccfc-41c6-bcdb-b56f80130b45"}},true) and ARRAY_CONTAINS(c.TechnologyRatings,{"Rating":2},true))
Here's the query:
SELECT VALUE root FROM root JOIN (SELECT VALUE EXISTS(SELECT VALUE tRatings FROM root JOIN tRatings IN root["TechnologyRatings"]
WHERE ((tRatings["Technology"]["id"] = "5686189b-ccfc-41c6-bcdb-b56f80130b45") OR (tRatings["Technology"]["id"] = "d76d59a7-c9a3-404d-91dd-cf2596ee7501")))) AS found WHERE found
Note that you must make sure to include a partition key on that query to avoid extra delays and costs on the query.
If the partition key was the 'id' field, the query would look like this:
SELECT VALUE root FROM root JOIN (SELECT VALUE EXISTS(SELECT VALUE tRatings FROM root JOIN tRatings IN root["TechnologyRatings"]
WHERE ((tRatings["Technology"]["id"] = "5686189b-ccfc-41c6-bcdb-b56f80130b45") OR (tRatings["Technology"]["id"] = "d76d59a7-c9a3-404d-91dd-cf2596ee7501")))) AS found
WHERE ((root["id"] = "5686189b-ccfc-41c6-bcdb-b56f80130b45") AND found)
The query with the partition key has the following stats
we have the below json structure. Having nested array of objects. Some arrays may be empty.
[
{
"adjustments": [
{
"id": "1_0000001",
"clientID": 1,
"adjustmentID": "0000001",
"chargeID": "0000001",
"dateOfEntry": "2019-01-29T00:00:00",
"adjustmentAmount": 200
}
],
"payments": [
{
"id": "1_0000001",
"clientID": 1,
"paymentID": "0000001",
"chargeID": "0000001",
"dateOfDeposit": "2019-01-28T00:00:00",
"dateOfEntry": "2019-01-29T00:00:00",
"paymentAmount": 250,
},
{
"id": "1_0000002",
"clientID": 1,
"paymentID": "0000002",
"chargeID": "0000001",
"dateOfDeposit": "2019-01-28T00:00:00",
"dateOfEntry": "2019-01-29T00:00:00",
"paymentAmount": 50,
}
],
"id": "1_0000001",
"clientID": 1,
"chargeID": "0000001",
"encounterID": "0000001",
"patientID": "1234567"
"dateOfServiceBegin": "2019-01-20T00:00:00",
"dateOfServiceEnd": "2019-01-20T00:00:00",
"dateOfEntry": "2019-01-21T00:00:00",
"location": "Main Campus",
"chargeTotal": 500
},
{
"adjustments": [],
"payments": [],
"id": "1_0000001",
"clientID": 1,
"chargeID": "0000001",
"encounterID": "0000001",
"patientID": "1234567"
"dateOfServiceBegin": "2019-02-20T00:00:00",
"dateOfServiceEnd": "2019-02-20T00:00:00",
"dateOfEntry": "2019-02-21T00:00:00",
"location": "Main Campus",
"chargeTotal": 500
}
]
i am trying to execute the below query
SELECT udf.getMonthAndYearPart(c.dateOfEntry) as date, sum(p.paymentAmount) as paymentAmount , sum(c.chargeTotal) as chargeAmount , sum(a.adjustmentAmount) as adjustmentAmount FROM c
JOIN p IN c.payments
JOIN a IN c.adjustments
where c.dateOfEntry >= '2019-01-11T18:30:00.000Z' and c.dateOfEntry <= '2020-12-30T18:30:00.000Z'
GROUP BY udf.getMonthAndYearPart(c.dateOfEntry)
I am expecting the below result
[
{
"date": "January-2019",
"paymentAmount": 300,
"chargeAmount": 1000,
"adjustmentAmount": 400
},
{
"date": "February-2019",
"chargeAmount": 500,
}
]
But I got only first object
[
{
"date": "January-2019",
"paymentAmount": 300,
"chargeAmount": 1000,
"adjustmentAmount": 400
}
]
Is there anything i can do without join? I want to calculate the sum of child objects amounts with group by month. Please help.
Found a solution by myself. using sub queries and group by. below one is the query in case anyone need this.
Select sum(k.totalPaymentAmount) as totalPaymentAmount,sum(k.totalAdjustmentAmount) as totalAdjustmentAmount,sum(k.totalCharge) as totalCharge,k.date as date From (SELECT
(SELECT value sum(c.paymentAmount) FROM c IN RevenueAnalytics.payments) as totalPaymentAmount,
(SELECT value sum(c.adjustmentAmount) FROM c IN RevenueAnalytics.adjustments) as totalAdjustmentAmount,
RevenueAnalytics.chargeTotal as totalCharge,
udf.getMonthAndYearPart(RevenueAnalytics.dateOfServiceBegin) as date
FROM RevenueAnalytics) k
Group BY k.date
In your case you would need to do a LEFT JOIN in your query to include the cases of documents with empty adjustments or payments. LEFT JOIN at the moment is not supported, you can vote this thread to include this feature. In the meanwhile you can create a procedure and do two separate queries, one as you are doing using joins, and the other not using joins and filtering (where clause) for entries with array_length 0 for adjustments and payments, and then aggregate all results and result.
How can you make a paginated request (limit, offset, and sort_by) using dynamoDB?
On mysql you can:
SELECT... LIMIT 10 OFFSET 1 order by created_date ASC
I'm trying this using nodejs, and in this case created_date isn't the primary key, can I query using sort key created_date?
This is my users table
{
"user_id": "asa2311",
"created_date": "2019/01/18 15:05:59",
"status": "A",
"rab_item_id": "0",
"order_id": "1241241",
"description": "testajabroo",
"id": "e3f46600-1af7-11e9-ac22-8d3a3e79a693",
"title": "test"
},
{
"user_id": "asa2311",
"status_id": "D",
"created_date": "2019/01/18 14:17:46",
"order_id": "1241241",
"rab_item_id": "0",
"description": "testajabroo",
"id": "27b5b0d0-1af1-11e9-b843-77bf0166a09f",
"title": "test"
},
{
"user_id": "asa2311",
"created_date": "2019/01/18 15:05:35",
"status": "A",
"rab_item_id": "0",
"order_id": "1241241",
"description": "testajabroo",
"id": "d5879e70-1af7-11e9-8abb-0fa165e7ac53",
"title": "test"
}
Pagination in DynamoDB is handled by setting the ExclusiveStartKey parameter to the LastEvaluatedKey returned from the previous result. There is no way to start after a specific number of items like you can with OFFSET in MySQL.
I've some CosmosDB documents like the following
{
"ProductId": 1,
"Status": true,
"Code": "123456",
"IsRecall": false,
"ScanLog": [
{
"Location": {
"type": "Point",
"coordinates": [
13.5957758,
42.7111538
]
},
"TimeStamp": 201602160957190600,
"ScanType": 0,
"UserId": "1004"
},
{
"Location": {
"type": "Point",
"coordinates": [
13.5957907,
42.7111359
]
},
"TimeStamp": 201602161246336640,
"ScanType": 0,
"UserId": "1004"
}
]
}
How can I order the query results by the TimeStamp property? I've tried using this query
SELECT c.Code, b.TimeStamp FROM c JOIN b IN c.ScanLog ORDER BY b.TimeStamp
but I receive this error
Order-by over correlated collections is not supported.
What is the correct way to do this?
JOINs with ORDER BY are currently not supported.
However, here is a user defined function (UDF) that will do the trick:
function sortScanLog (scanLog) {
function compareTimeStamps(a, b) {
return a.TimeStamp - b.TimeStamp;
}
return scanLog.sort(compareTimeStamps);
}
You use with a query like this:
SELECT c.ProductId, udf.sortScanLog(c.ScanLog) as ScanLog FROM c
If you want the opposite sort order, simply swap the a and b. So, the signature of the compareTimeStamps inner function would be:
function compareTimeStamps(b, a)
Alternatively, you can sort client-side after the results are returned.
Right now, ORDER BY clauses mixed with JOINs are not supported, the engine can look at indexed properties for JOIN operations but cannot re-order results based on the JOIN result.
You'd have to go with something like Larry offered or do the JOIN on the Query and the Sort by your own code once the results arrive, if you use C#, you can sort them with Linq for example.