jq | how to add two decimals correctly - decimal

For data below when I try to add prebalance, amount to match with balance, why I am getting wrong sum value. How can I get the correct values.
{
"prebalance": -2865.59,
"amount": 3000.00,
"balance": 134.41
},
{
"prebalance": -1865.59,
"amount": 2000.00,
"balance": 134.41
}
]
JQ Play link for sample data.
https://jqplay.org/s/T4gL4kOkj7X

The sum is correct, just not precise, a (general) limitation of floating-point arithmetics. You should always consider rounding, when dealing with (especially comparing to) floating-point numbers (or operations potentially producing them).
Can you help me to round it off two decimal places? i.e. 134.40999999999985 to 134.41 My number can be of any length not just 3 digits.
Round the numbers for display:
map(
.sum = .prebalance + .amount
| .sum |= (. * 100 | round / 100)
)
[
{
"prebalance": -2865.59,
"amount": 3000,
"balance": 134.41,
"sum": 134.41
},
{
"prebalance": -1865.59,
"amount": 2000,
"balance": 134.41,
"sum": 134.41
}
]
Demo
Compare the absolute difference to a given epsilon for equality tests:
1e-9 as $epsilon | map(
.sum = .prebalance + .amount
| .equal = (.sum - .balance | fabs < $epsilon)
)
[
{
"prebalance": -2865.59,
"amount": 3000,
"balance": 134.41,
"sum": 134.40999999999985,
"equal": true
},
{
"prebalance": -1865.59,
"amount": 2000,
"balance": 134.41,
"sum": 134.41000000000008,
"equal": true
}
]
Demo

Related

MongoDB compare users based on nested array

I have a collection of users with nested array Answers. I need to compare their responses, every time that it is the same i add the coeff value of the answers to a variable, then if for example it's above 10 i send back all the users with their own total coeff (above 10 so).
So question is how to compare users by going to a nested array (answers) checking if the same field have the same value (answerChoice) for the same answer (answerNumber) and taking another value of the nested array (answerCoeff) to increment into a variable and printing the total coeff and the users meeting a certain coeff amount
{
"id": "string",
"birthdate": "2021-06-18T13:53:30.443Z",
"userName": "string",
"pictures": [
"string"
],
"answers": [
{
"answerNumber": 0,
"answerChoice": 3,
"answerCoeff": 2
},
{
"answerNumber": 1,
"answerChoice": 2,
"answerCoeff": 5
}
...
],
}
Output expected :
{
"matchs": [
{
"ids": [
"string"
],
"userName": "string",
"pictures": [
"string"
],
"coeff": 0
}
]
}

Azure Gremlin edge traversal suspiciously high (Out() step) RU cost

I have a weird issue, where doing an out-operation on a few edges causes my RU cost to triple. Hope someone can help me shed light on why + what I can do to mitigate it.
I have a Graph in CosmosDB, where there are two types of vertex labels: "Profile" and "Score". Each profile has 0 or 1 score-vertices via a "ProfileHasAggregatedScore" edge. The partitionKey is the ID of the Profile.
If I make the following queries, the RU currently is:
g.V().hasLabel('Profile').out('ProfileHasAggregatedScore')
>78 RU (8 scores found)
And for reference, the cost of getting all vertices of a type is:
g.V().hasLabel('Profile')
>28 RU (110 profiles found)
g.E().hasLabel('ProfileHasAggregatedScore')
>11 RU (8 edges found)
g.V().hasLabel('AggregatedRating')
>11 RU (8 scores found)
And the cost of a single of the vertices or edges are:
g.V('aProfileId').hasLabel('Profile')
>4 RU (1 found)
g.E('anEdgeId')
> 7RU
G.V('aRatingId')
> 3.5 RU
Can someone please help me as to why, making a traversal with only a few vertices along the way (see traversal at the bottom), is more expensive than searching for everything? And is there something I can do to prevent it? Adding a has-filter with the partitionKey does not seem to help. It seems odd that traversing/finding 16 elements more (8 edges and 8 vertices) after finding 110 vertices triples the cost of the operation?
(NB. With 1000 profiles the cost of doing 1 traversal along an edge to the score node is 2200 RU. This seems high, considering the emphasis their Azure team put on it being scalable?)
Traversal if it can help (It seems most of the time is spent finding the edges with the out() step):
[
{
"gremlin": "g.V().hasLabel('Profile').out('ProfileHasAggregatedScore').executionProfile()",
"totalTime": 46,
"metrics": [
{
"name": "GetVertices",
"time": 13,
"annotations": {
"percentTime": 28.26
},
"counts": {
"resultCount": 110
},
"storeOps": [
{
"fanoutFactor": 1,
"count": 110,
"size": 124649,
"time": 2.47
}
]
},
{
"name": "GetEdges",
"time": 26,
"annotations": {
"percentTime": 56.52
},
"counts": {
"resultCount": 8
},
"storeOps": [
{
"fanoutFactor": 1,
"count": 8,
"size": 5200,
"time": 6.22
},
{
"fanoutFactor": 1,
"count": 0,
"size": 49,
"time": 0.88
}
]
},
{
"name": "GetNeighborVertices",
"time": 7,
"annotations": {
"percentTime": 15.22
},
"counts": {
"resultCount": 8
},
"storeOps": [
{
"fanoutFactor": 1,
"count": 8,
"size": 6303,
"time": 1.18
}
]
},
{
"name": "ProjectOperator",
"time": 0,
"annotations": {
"percentTime": 0
},
"counts": {
"resultCount": 8
}
}
]
}
]
enter code here

MongoDB - Math calculations

I need to calculate this ((86400 + (7200 - 28800)) % 86400) in MongoDB's aggregate. Is it possible? Or need to do this on JavaScript.
Is mongo support remainder of division?
Yes, quite possible. The arithmetic operators provide mathematic operations on numbers. The remainder is supported with the $mod operator. The desired calculation can be done using the expression, for example:
pipeline = [
{
"$project": {
"result": {
"$mod": [
{
"$add": [
86400,
{ "$subtract": [7200, 28800] }
]
},
86400
]
}
}
}
]
Executing this aggregate pipeline on a collection will yield:
db.collection.aggregate(pipeline)
Sample Output
{
"_id" : ObjectId("58aacd498caf670a837e7093"),
"result" : 64800
}

couchdb/cloudant time series querying

I have a cloudant database containing a collection of measures coming in from multiple devices.
Each device sends many measures but three are of interest : temperature, latitude and longitude.
Due to the design of the system, each value is a separate measure (I cannot easily join temperature, latitude and longitude at insertion time) and in the database I have a set of measures like so:
{
"device": "foo",
"name": "temperature",
"date": <timestamp>,
"value": <value>
},
{
"device": "foo",
"name": "latitude",
"date": <timestamp>,
"value": <value>
},
{
"device": "foo",
"name": "longitude",
"date": <timestamp>,
"value": <value>
},
So conceptually, I have 3 time series.
I would like to extract the latest measure of each of these time series, and ideally have them grouped by device.
Something like:
{
device1: {
"temperature": {
date: <date>,
value: <value>
},
"latitude": {
date: <date>,
value: <value>
},
"longitude": {
date: <date>,
value: <value>
}
},
"device2": {
...
}
}
I do not expect this exact syntax, that's just an idea of the set of data I'm expecting.
I could join the positional measures together but the question would be the same: how to get the last ts entries of each device grouped together?
In the first place, I would use a data structure like this :
{
"type":"temperature",
"date":"1472528116698",
"value":"35",
"device":"device1"
}
The type property could be either temperature,latitude,longitude.
Then you need some views. Personally, I prefer to have one _design documents by type and it will also be easier for the queries.
For example, you would have a _design document like this for the temperature :
{
"_id": "_design/temperature",
"_rev": "8-91e594df623063ed3ad7111cde09eecb",
"language": "javascript",
"views": {
"byDevice": {
"map": "function(doc) {\n if ((doc.type + '').toLowerCase() === 'temperature' && doc.device)\n emit(doc.device);\n}\n"
},
"lastestByDevice": {
"map": "function(doc) {\n if ((doc.type + '').toLowerCase() === 'temperature' && doc.device && doc.value)\n emit(doc.device,doc.value);\n}\n",
"reduce": "function(keys, values, rereduce) {\n var max = Number.MIN_VALUE;\n for (var i = 0; i < values.length; i++) {\n var val = parseFloat(values[i]);\n if (val > max)\n max = val;\n }\n return max;\n}\n"
}
}
}
Request example:
http://localhost:5984/db/_design/temperature/_view/latestByDevice?group_level=1&reduce=true
If you use the latest by device with the reduce function, it would return each device with their maximum value. We this example, you should be able to get a good start. I don't know how you receive and build your data but if you prefer to group everything by device, it's also possible.

Elasticsearch query_string combined with match_phrase

I think it's best if I describe my intent and try to break it down to code.
I want users to have the ability of complex queries should they choose to that query_string offers. For example 'AND' and 'OR' and '~', etc.
I want to have fuzziness in effect, which has made me do things I feel dirty about like "#{query}~" to the sent to ES, in other words I am specifying fuzzy query on the user's behalf because we offer transliteration which could be difficult to get the exact spelling.
At times, users search a number of words that are suppose to be in a phrase. query_string searches them individually and not as a phrase. For example 'he who will' should bring me the top match to be when those three words are in that order, then give me whatever later.
Current query:
{
"indices_boost": {},
"aggregations": {
"by_ayah_key": {
"terms": {
"field": "ayah.ayah_key",
"size": 6236,
"order": {
"average_score": "desc"
}
},
"aggregations": {
"match": {
"top_hits": {
"highlight": {
"fields": {
"text": {
"type": "fvh",
"matched_fields": [
"text.root",
"text.stem_clean",
"text.lemma_clean",
"text.stemmed",
"text"
],
"number_of_fragments": 0
}
},
"tags_schema": "styled"
},
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"_source": {
"include": [
"text",
"resource.*",
"language.*"
]
},
"size": 5
}
},
"average_score": {
"avg": {
"script": "_score"
}
}
}
}
},
"from": 0,
"size": 0,
"_source": [
"text",
"resource.*",
"language.*"
],
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "inna alatheena",
"fuzziness": 1,
"fields": [
"text^1.6",
"text.stemmed"
],
"minimum_should_match": "85%"
}
}
],
"should": [
{
"match": {
"text": {
"query": "inna alatheena",
"type": "phrase"
}
}
}
]
}
}
}
Note: alatheena searched without the ~ will not return anything although I have allatheena in the indices. So I must do a fuzzy search.
Any thoughts?
I see that you're doing ES indexing of Qur'anic verses, +1 ...
Much of your problem domain, if I understood it correctly, can be solved simply by storing lots of transliteration variants (and permutations of their combining) in a separate field on your Aayah documents.
First off, you should make a char filter that replaces all double letters with single letters [aa] => [a], [ll] => [l]
Maybe also make a separate field containing all of [a, e, i] (because of their "vocative"/transcribal ambiguity) replaced with € or something similar, and do the same while querying in order to get as many matches as possible...
Also, TH in "allatheena" (which as a footnote may really be Dhaal, Thaa, Zhaa, Taa+Haa, Taa+Hhaa, Ttaa+Hhaa transcribed ...) should be replaced by something, or both the Dhaal AND the Thaa should be transcribed multiple times.
Then, because it's Qur'anic script, all Alefs without diacritics, Hamza, Madda, etc should be treated as Alef (or Hamzat) ul-Wasl, and that should also be considered when indexing / searching, because of Waqf / Wasl in reading arabic. (consider all the Wasl`s in the first Aayah of Surat Al-Alaq for example)
Dunno if this is answering your question in any way, but I hope it's of some assistance in implementing your application nontheless.
You should use Dis Max Query to achieve that.
A query that generates the union of documents produced by its
subqueries, and that scores each document with the maximum score for
that document as produced by any subquery, plus a tie breaking
increment for any additional matching subqueries.
This is useful when searching for a word in multiple fields with
different boost factors (so that the fields cannot be combined
equivalently into a single search field). We want the primary score to
be the one associated with the highest boost.
Quick example how to use it:
POST /_search
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"boost": 1.2,
"queries": [
{
"match": {
"text": {
"query": "inna alatheena",
"type": "phrase",
"boost": 5
}
}
},
{
"match": {
"text": {
"query": "inna alatheena",
"type": "phrase",
"fuzziness": "AUTO",
"boost": 3
}
}
},
{
"query_string": {
"default_field": "text",
"query": "inna alatheena"
}
}
]
}
}
}
It will run all of your queries, and the one, which scored highest compared to others, will be taken. So just define your rules using it. You should achieve what you wanted.

Resources