Why does a Azure Cosmos query sorting by timestamp (string) cost so much more than by _ts (built in)? - azure

This query cost 265 RU/s:
SELECT top 1 * FROM c
WHERE c.CollectPackageId = 'd0613cbb-492b-4464-b66b-3634b5571826'
ORDER BY c.StartFetchDateTimeUtc DESC
StartFetchDateTimeUtc is a string property, serialized by using the Cosmos API
This query cost 5 RU/s:
SELECT top 1 * FROM c
WHERE c.CollectPackageId = 'd0613cbb-492b-4464-b66b-3634b5571826'
ORDER BY c._ts DESC
_ts is a built in field, a Unix-based numeric timestamp.
Example result (only including this field and _ts):
"StartFetchDateTimeUtc": "2017-08-08T03:35:04.1654152Z",
"_ts": 1502163306
The index is in place and follows the suggestions & tutorials how to configure a sortable string/timestamp. It looks like:
{
"path": "/StartFetchDateTimeUtc/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
}

According to this article, the "Item size,Item property count,Data consistency,Indexed properties,Document indexing,Query patterns,Script usage" variables will affect the RU.
So it is very strange that different property costs different RU.
I also create a test demo on my side(with your index and same document property). I have inserted 1000 records to the documentdb. The two different query costs same RU. I suggest you could start a new collection and test again.
The result is like this:
Order by StartFetchDateTimeUtc
Order by _ts

Related

Cosmos db null value

I have two kind of record mention below in my table staudentdetail of cosmosDb.In below example previousSchooldetail is nullable filed and it can be present for student or not.
sample record below :-
{
"empid": "1234",
"empname": "ram",
"schoolname": "high school ,bankur",
"class": "10",
"previousSchooldetail": {
"prevSchoolName": "1763440",
"YearLeft": "2001"
} --(Nullable)
}
{
"empid": "12345",
"empname": "shyam",
"schoolname": "high school",
"class": "10"
}
I am trying to access the above record from azure databricks using pyspark or scala code .But when we are building the dataframe reading it from cosmos db it does not bring previousSchooldetail detail in the data frame.But when we change the query including id for which the previousSchooldetail show in the data frame .
Case 1:-
val Query = "SELECT * FROM c "
Result when query fired directly
empid
empname
schoolname
class
Case2:-
val Query = "SELECT * FROM c where c.empid=1234"
Result when query fired with where clause.
empid
empname
school name
class
previousSchooldetail
prevSchoolName
YearLeft
Could you please tell me why i am not able to get previousSchooldetail in case 1 and how should i proceed.
As #Jayendran, mentioned in the comments, the first query will give you the previouschooldetail document wherever they are available. Else, the column would not be present.
You can have this column present for all the scenarios by using the IS_DEFINED function. Try tweaking your query as below:
SELECT c.empid,
c.empname,
IS_DEFINED(c.previousSchooldetail) ? c.previousSchooldetail : null
as previousSchooldetail,
c.schoolname,
c.class
FROM c
If you are looking to get the result as a flat structure, it can be tricky and would need to use two separate queries such as:
Query 1
SELECT c.empid,
c.empname,
c.schoolname,
c.class,
p.prevSchoolName,
p.YearLeft
FROM c JOIN c.previousSchooldetail p
Query 2
SELECT c.empid,
c.empname,
c.schoolname,
c.class,
null as prevSchoolName,
null as YearLeft
FROM c
WHERE not IS_DEFINED (c.previousSchooldetail) or
c.previousSchooldetail = null
Unfortunately, Cosmos DB does not support LEFT JOIN or UNION. Hence, I'm not sure if you can achieve this in a single query.
Alternatively, you can create a stored procedure to return the desired result.

Count and data in single query in Azure Cosmos DB

I want to return the count and data by writing it in a single Cosmos sql query.
Something like
Select *, count() from c
Or if possible i want get the count in a json document.
[
{
"Count" : 1111
},
{
"Name": "Jon",
"Age" : 30
}
]
You're going to have to issue two separate queries - one to get the total number of documents matching your query, and a second to get a page of documents.

Query to get all Cosmos DB documents referenced by another

Assume I have the following Cosmos DB container with the possible doc type partitions:
{
"id": <string>,
"partitionKey": <string>, // Always "item"
"name": <string>
}
{
"id": <string>,
"partitionKey": <string>, // Always "group"
"items": <array[string]> // Always an array of ids for items in the "item" partition
}
I have the id of a "group" document, but I do not have the document itself. What I would like to do is perform a query which gives me all "item" documents referenced by the "group" document.
I know I can perform two queries: 1) Retrieve the "group" document, 2) Perform a query with IN clause on the "item" partition.
As I don't care about the "group" document other than getting the list of ids, is it possible to construct a single query to get me all the "item" documents I want with just the "group" document id?
You'll need to perform two queries, as there are no joins between separate documents. Even though there is support for subqueries, only correlated subqueries are currently supported (meaning, the inner subquery is referencing values from the outer query). Non-correlated subqueries are what you'd need.
Note that, even though you don't want all of the group document, you don't need to retrieve the entire document. You can project just the items property, which can then be used in your 2nd query, with something like array_contains(). Something like:
SELECT VALUE g.items
FROM g
WHERE g.id="1"
AND g.partitionKey="group"
SELECT VALUE i.name
FROM i
WHERE array_contains(<items-from-prior-query>,i.id)
AND i.partitionKey="item"
This documentation page clarifies the two subquery types and support for only correlated subqueries.

MongoDB sort by custom calculation in Node.JS mongodb driver

I'm Using Node.JS MongoDB driver. I have a collection of job lists with salary and number of vacancies, I want to sort them according to one rule, if either salary or number of vacancies are greater they will get top priority in sorting, and I came up with this simple formula
( salary / 100 ) + num_of_vacancies
eg:
Top priority ones
{ salary: 5000 , num_of_vacancies: 500 } // value is 550
{ salary: 50000 , num_of_vacancies: 2 } // value is 502
And Less priority for
{ salary: 5000 , num_of_vacancies: 2 } // value is 52
But my Problem is, As far as I know, MongoDB sort takes arguments only to sort in ascending or descending order and a property to sort. How do I sort with custom expression.
The data in MongoDB looks like this // not the full varsion
{
title:"job title",
description:"job description",
salary:5000,
num_of_vacancy:50
}
This is just an option. Adjust it for a mongo driver.
$addFields we create the field to sort, named toSortLater just for semantic purposes.
add a $sort stage, and sort high values first. Change to 1 for the opposite behaviour.
db.collection.aggregate([{
$addFields:{
toSortLater:{
$add:[
{$divide:["$salary", 100]},
"$num_of_vacancies"]
}}}, {$sort:{"toSortLater":-1}}
])

Cosmos DB sql query to get highest value pr day

Im relatively new to cosmos db. I have created a database of temperature measurements where new value are added every 5 minutes.
Here is an example of an item in the collection:
{
"id": "3445609a-c4ae-44b3-b8fa-a2e55082558b",
"temp": 14.31,
"timestamp": "2020-09-24T18:56:48.7828653+00:00",
"probeid": "01",
"lightvalue": "0",
"RelativeHumidity": "50.10"
}
I initially added the timestamp value before learning that the _ts value could be used for this.
Im still in the logic design phase trying to figure out the best pattern to use in order to make this stabile, and low resource cost.
Use case would be: Max, Min, Avg measurements pr day within a given period presented in a webUI.
Does this SQL achieve your requirement?
SELECT MAX(c.temp) as max_temp,MIN(c.temp) as min_temp,AVG(c.temp) AS average_temp,LEFT(toString(TimestampToDateTime(c._ts*1000)),10) AS day
FROM c
WHERE c._ts*1000 < GetCurrentTimestamp()
GROUP BY LEFT(toString(TimestampToDateTime(c._ts*1000)),10)

Resources