Why does SELECT count(1) FROM c change values each time I query it in CosmosDB Document Explorer? - azure

I have a database with about 600-700 thousand documents. When I am in the Document Explorer and I execute "SELECT value count(1) FROM c", it returns values ranging from 64,000 to 72,000, seemingly at random. When I execute this using the Python SDK, it returns the actual count I mentioned above. Why is this?

The count query is limited by the number for RUs allocated to your collection. The response that you would have received will have a continuation token. You have to keep looking for next set of results and keep on adding it, which will give you the final count. For example, I tried a count query on my Cosmos DB, and these were the results
First execution
[
{
"$1": 184554
}
]
Next set of continuation. (By clicking Next button from Azure portal Data Explorer)
[
{
"$1": 181909
}
]
Next set of continuation. (By clicking Next button from Azure portal Data Explorer)
[
{
"$1": 25589
}
]
So, finally the count is
184554 + 181909 + 25589 = 3,92,052

Related

CosmosDB Find average time for message to complete

I need some help with a SQL query in Cosmos. I have to find the average time it takes a message to complete during a load test from the events stored in our DB.
So far I can get the start and end times like this:
SELECT
(SELECT c.TimestampUTC WHERE c.Event = 'message-accepted') as StartTime,
(SELECT c.TimestampUTC WHERE c.Event = 'message-completed') as EndTime
FROM c
WHERE c.TrackingId = 'LoadTest' AND (c.Event = 'message-accepted' OR c.Event = 'message-completed')
But I get an error when I try to get the DateTimeDiff like this:
SELECT
(SELECT c.TimestampUTC WHERE c.Event = 'message-accepted') as StartTime,
(SELECT c.TimestampUTC WHERE c.Event = 'message-completed') as EndTime,
DateTimeDiff("second", StartTime, EndTime) as TotalTime
FROM c
WHERE c.TrackingId = 'LoadTest' AND (c.Event = 'message-accepted' OR c.Event = 'message-completed')
I am stuck here because I need the difference to use the AVG function. Any help would be appreciated.
EDIT
Here is a sample of the data stored in Cosmos
{
"PartitionKey": "LoadTest",
"RowKey": "4ee9709f-c826-4a88-9d6f-240ba439eb1d",
"TrackingId": "LoadTest",
"Event": "message-accepted",
"TimestampUTC": "2022-09-14T19:12:18.8358914Z"
}
And this is the error I am getting when trying DateTimeDiff:
"Failed to query item for container enginelog:
One of the input values is invalid."
It is not giving much info which is why I am looking for help, I am following the format for the function in the documentation here
As per sample data shared in question, the assumption is that there are two different documents with Event property value "message-accepted" and "message-completed". As David mentioned, functions can't be used on data/properties available across separate document until unless property names are same.
In order to achieve what is required, you may need to write client-side code to recursively fetch the properties value from separate documents. Please refer the link

How to send the output values of a Lookup activity in an email in Data Factory?

I'm trying to send a LookUp activity output values as part of a body parameter in a POST request using LogicApp, which uses three parameters: "to", "email_body", "subject".
The LookUp activity depends on a query, and it may return from 2 rows up to 10 rows.
According to Azure, the output of the activity should look like this:
{
"count": 2,
"value": [
{
"column1":value1,
"column2":value2,
"column3":value3
},
{
"column1":value4,
"column2":value5,
"column3":value6
}
]
}
In this case, the query returned 2 rows, but how can I attach every output value to the POST body without having to use #activity('lookup_act').output.value[0].column1 and so on for every value?
The POST body is the following:
{
"email_body": "Hi, the following tables have been updated:
#{activity('lookup_act').output.value[0].column1}
#{activity('lookup_act').output.value[1].column1}",
"subject": "Update on tables",
"to": "email#domain.com"
}
I've tried using #activity('lookup_act').output.value to bring every value but it won't work.
Is there a way to call every single output value? If so, how can it be done and paste into a table?
Thanks beforehand.
There are two ways to get all values in mail:
1. Get whole lookup output array in mail.
First get the results from Lookup activity and then pass the output of this activity by converting it into a string otherwise you will get error regarding deserialization.
{"message":"#string(activity('Lookup1').output.value)",
"dataFactoryName":"#{pipeline().DataFactory}",
"pipelineName":"#{pipeline().Pipeline}",
"receiver":"#{pipeline().parameters.receiver}"}
OUTPUT
2. Get all the respective values column wise.
First get the results from Lookup activity then take a foreach loop and create append variable for every column to store every column value in single array.
ForEach activity setting:
Took append variable activity and created Idarray variable. and gave item().id as value to store all id values in a single array.
Then in web activity passed below body for getting all arrays.
{"message":"#{string(variables('Idarray'))} as Id, #{string(variables('Namearray'))} as Name, #{string(variables('ProfessionArray'))} as Profession",
"dataFactoryName":"#{pipeline().DataFactory}",
"pipelineName":"#{pipeline().Pipeline}",
"receiver":"#{pipeline().parameters.receiver}"}
OUTPUT

how to compare 2 JSON files in Azure data factory

I'm new to Azure data factory. I want to compare 2 json files through azure data factory. We need to get new list of id's in current JSON file which are not in previous JSON file. Below are the 2 sample JSON files.
Previous JSON file :
{
"count": 2,
"values": [
{
"id": "4e10aa02d0b945ae9dcf5cb9ded9a083"
},
{
"id": "cbc414db-4d08-48f2-8fb7-748c5da45ca9"
}
]
}
Current JSON file:
{
"count": 3,
"values": [
{
"id": "4e10aa02d0b945ae9dcf5cb9ded9a083"
},
{
"id": "cbc414db-4d08-48f2-8fb7-748c5da45ca9"
},
{
"id": "5ea951e3-88d7-40b4-9e3f-d787b94a43c8"
}
]
}
New id's has to perform one activity and old id's has to perform another activity.
WE are running out the time and please help me out.
Thanks in advance!
You can simply use a IfCondition Activity
If expression:
#equals(activity('Lookup1').output.value,activity('Lookup2').output.value)
Further I have used Fail Activity for False condition for better visibility.
--
Lookup1 Activity --> Json1.json
Lookup2 Activity --> Json2.json
This can be done using a single Filter Activity.
I have assigned two parameters "Old_json" and "New_json" for your Previous Json and Current Json files respectively.
In the settings of Filter activity,
Items: #pipeline().parameters.New_json.values
Condition: #not(contains(pipeline().parameters.Old_Json.values,item()))
So, this filter activity goes through each item in New json, and checks if they are present in the old json. If not present, then will give that as an output.
Output of the filter activity
Thanks #KarthikBhyresh-MT for a helpful answer.
Just to add, if (like me) you want to compare two files (or in my case, a file with the output of a SQL query), but don't care about the order of the records, you can do this using a ForEach activity. This also has the benefit of allowing a more specific error message in the case of a difference between the files.
My first If Condition checks the two files have the same row count, with the expression:
#equals(activity('Select from SQL').output.count, activity('Lookup from CSV').output.count)
The False branch leads to a Fail activity with message:
#concat(pipeline().parameters.TestName, ': CSV has ', string(activity('Lookup from CSV').output.count), ' records but SQL query returned ', string(activity('Select from SQL').output.count))
If this succeeds, flow passes to a ForEach, iterating through items:
#activity('Lookup from CSV').output.value
... which contains an If Condition with expression:
#contains(string(activity('Select from SQL').output.value), string(item()))
The False branch for that If Condition contains an Append variable activity, which appends to a variable I've added to the pipeline called MismatchedRecords. The Value appended is:
#item()
Following the ForEach, a final If Condition then checks whether MismatchedRecords contains any items:
#equals(length(variables('MismatchedRecords')), 0)
... and the False branch contains another Fail activity, with message:
#concat(string(length(variables('MismatchedRecords'))), ' records from CSV not found in SQL. Missing records: ', string(variables('MismatchedRecords')), ' SQL output: ', string(activity('Select from SQL').output.value))
The message contains specific information about the records which could not be matched, to allow further investigation.

How do I keep existing data in couchbase and only update the new data without overwriting

So, say I have created some records/documents under a bucket and the user updates only one column out of 10 in the RDBMS, so I am trying to send only that one columns data and update it in couchbase. But the problem is that couchbase is overwriting the entire record and putting NULL`s for the rest of the columns.
One approach is to copy all the data from the exisiting record after fetching it from Cbase, and then overwriting the new column while copying the data from the old one. But that doesn`t look like a optimal approach
Any suggestions?
You can use N1QL update Statments google for Couchbase N1QL
UPDATE replaces a document that already exists with updated values.
update:
UPDATE keyspace-ref [use-keys-clause] [set-clause] [unset-clause] [where-clause] [limit-clause] [returning-clause]
set-clause:
SET path = expression [update-for] [ , path = expression [update-for] ]*
update-for:
FOR variable (IN | WITHIN) path (, variable (IN | WITHIN) path)* [WHEN condition ] END
unset-clause:
UNSET path [update-for] (, path [ update-for ])*
keyspace-ref: Specifies the keyspace for which to update the document.
You can add an optional namespace-name to the keyspace-name in this way:
namespace-name:keyspace-name.
use-keys-clause:Specifies the keys of the data items to be updated. Optional. Keys can be any expression.
set-clause:Specifies the value for an attribute to be changed.
unset-clause: Removes the specified attribute from the document.
update-for: The update for clause uses the FOR statement to iterate over a nested array and SET or UNSET the given attribute for every matching element in the array.
where-clause:Specifies the condition that needs to be met for data to be updated. Optional.
limit-clause:Specifies the greatest number of objects that can be updated. This clause must have a non-negative integer as its upper bound. Optional.
returning-clause:Returns the data you updated as specified in the result_expression.
RBAC Privileges
User executing the UPDATE statement must have the Query Update privilege on the target keyspace. If the statement has any clauses that needs data read, such as SELECT clause, or RETURNING clause, then Query Select privilege is also required on the keyspaces referred in the respective clauses. For more details about user roles, see Authorization.
For example,
To execute the following statement, user must have the Query Update privilege on travel-sample.
UPDATE `travel-sample` SET foo = 5
To execute the following statement, user must have the Query Update privilege on the travel-sample and Query Select privilege on beer-sample.
UPDATE `travel-sample`
SET foo = 9
WHERE city = (SELECT raw city FROM `beer-sample` WHERE type = "brewery"
To execute the following statement, user must have the Query Update privilege on `travel-sample` and Query Select privilege on `travel-sample`.
UPDATE `travel-sample`
SET city = “San Francisco”
WHERE lower(city) = "sanfrancisco"
RETURNING *
Example
The following statement changes the "type" of the product, "odwalla-juice1" to "product-juice".
UPDATE product USE KEYS "odwalla-juice1" SET type = "product-juice" RETURNING product.type
"results": [
{
"type": "product-juice"
}
]
This statement removes the "type" attribute from the "product" keyspace for the document with the "odwalla-juice1" key.
UPDATE product USE KEYS "odwalla-juice1" UNSET type RETURNING product.*
"results": [
{
"productId": "odwalla-juice1",
"unitPrice": 5.4
}
]
This statement unsets the "gender" attribute in the "children" array for the document with the key, "dave" in the tutorial keyspace.
UPDATE tutorial t USE KEYS "dave" UNSET c.gender FOR c IN children END RETURNING t
"results": [
{
"t": {
"age": 46,
"children": [
{
"age": 17,
"fname": "Aiden"
},
{
"age": 2,
"fname": "Bill"
}
],
"email": "dave#gmail.com",
"fname": "Dave",
"hobbies": [
"golf",
"surfing"
],
"lname": "Smith",
"relation": "friend",
"title": "Mr.",
"type": "contact"
}
}
]
Starting version 4.5.1, the UPDATE statement has been improved to SET nested array elements. The FOR clause is enhanced to evaluate functions and expressions, and the new syntax supports multiple nested FOR expressions to access and update fields in nested arrays. Additional array levels are supported by chaining the FOR clauses.
Example
UPDATE default
SET i.subitems = ( ARRAY OBJECT_ADD(s, 'new', 'new_value' )
FOR s IN i.subitems END )
FOR s IN ARRAY_FLATTEN(ARRAY i.subitems
FOR i IN items END, 1) END;
If you're using structured (json) data, you need to read the existing record then update the field you want in your program's data structure and then send the record up again. You can't update individual fields in the json structure without sending it all up again. There isn't a way around this that I'm aware of.
It is indeed true, to update individual items in a JSON doc, you need to fetch the entire document and overwrite it.
We are working on adding individual item updates in the near future.

Batchjob: number of record receivable by a subquery

I'm relatively new to Apex, but I have some questions about a batch job that I am creating. I want to make a query with a subquery (please see the code). Every Portal_c can have more than 200 Exporte_r.
global Database.QueryLocator start(Database.BatchableContext BC) {
String query = 'SELECT Id, Name, (SELECT Id FROM Exporte__r) FROM Portal__c';
return Database.getQueryLocator(query);
}
global void execute(Database.BatchableContext BC, List<Portal__c> scope) {
for (Portal__c portal : scope) {
// doesn't work -> First error: Aggregate query has too many rows for direct assignment, use FOR loop
// when using FOR loop -> System.QueryException: invalid query locator
//List<Export__c> relatedExports = portal.Exporte__r;
// grab all the related Export__c records using 'getSObjects' to avoid errors described above
Export__c[] relatedExports = portal.getSObjects('Exporte__r');
if (relatedExports != null) {
for (Export__c exp : relatedExports) {
// do something
}
}
}
}
I have the following questions:
If I use List<Export__c> relatedExports = portal.Exporte__r (which I commented out) to get the sub query records then I will receive the error message: “Aggregate query has too many rows for direct assignment, use FOR loop”. The error message makes no sense for me as the SOQL is done already before. Is there any explaination?
With the solution above the maximal amount of records from type Exporte_r received per Portal_c with the sub query is 199 though I have more than 200 for some records of Portal__c, why is it limited to that number? It seems all records above 199 are ignored in this case.
Is there any possibility to receive more than 199 records from a sub query? I have tried to change the batch size but it seems it is independent of the number of records receivable by the sub query. Any idea?
Many thanks!
As per the salesforce doc http://www.salesforce.com/us/developer/docs/apexcode/Content/langCon_apex_loops_for_SOQL.htm
You might get a QueryException in a SOQLfor loop with the message
Aggregate query has too many rows for direct assignment, use FOR loop.
This exception is sometimes thrown when accessing a large set of child
records of a retrieved sObject inside the loop, or when getting the
size of such a record set. To avoid getting this exception, use a for
loop to iterate over the child records, as follows.
Integer count=0;
for (Contact c : returnedAccount.Contacts) {
count++;
// Do some other processing
}

Resources