ArangoDB sum by groups

ArangoDB sum by groups - arangodb

I have little problem with ArangoDB query. In postgres i did it quick and with no problems:
SELECT SUM(x) from table GROUP BY
case
when age < 18 then 'Under 18'
when age between 18 and 24 then '18-24'
when age between 25 and 67 then '25-67'
when age between 68 and 100 then '68-100'
END
I want to compare time for this execution in postgres and arango but... i have no idea how it should look like in arango.
There's posibility to create few group (FILTER + COLLECT AGGREGATE s = SUM(set return from FILTER)) in one LOOP?
Any ideas?

The following should work in AQL:
FOR doc IN collection
COLLECT group = (doc.age < 18 ? "Under 18" :
(doc.age >= 18 && doc.age <= 24 ? "18-24" :
(doc.age >= 25 && doc.age <= 67 ? "25-67" :
(doc.age >= 68 && doc.age <= 100 ? "68-100" : "other" ))))
AGGREGATE s = SUM(doc.x)
RETURN { group, s }

Related

Groovy- Difference between 2 month

I wanted to write simple groovy script which will give the different of 2 months (note: its not date).
For Example
int startMonth=1 //for Jan
int endMonth=3 //for March
the response should be 2 which is straightforward (Jan > Feb > March)
But in case :
int startMonth=11 //for Nov
int endMonth=1 //for Jan
also, then the response should be 2 i.e. the difference of month (Nov > Dec > Jan)
Can you please let me know if there is any function or any easy workaround to implement this? I am using it in Oracle VBCS Groovy script.

This is less of a groovy question and more of a logic question. One way to do this would be:
int distanceInMonths(int a, int b) {
def min = Math.min(a,b)
def max = Math.max(a,b)
Math.min(max - min, 12 + min - max)
}
assert distanceInMonths(1, 3) == 2
assert distanceInMonths(11, 1) == 2
assert distanceInMonths(12, 1) == 1
assert distanceInMonths(12, 12) == 0
assert distanceInMonths(1, 1) == 0
assert distanceInMonths(1, 12) == 1
assert distanceInMonths(12, 6) == 6
where all the assertions pass.

How to loop values of a variable through a nested while loop calling on a function in Python

I have a while loop that calls on a function (def update(i)) and performs the calculation for the required number of times (until the while loop condition is no longer met) in Python3. What I now want to do is put different values of 'i' through the while loop and therefore through the equation 'dv' (shown below). So when the while loop ends I need the whole process to repeat with the next 'i' value. All the i values are in an np.arange array called 'i_es'. I have tried to implement this with the while loop nested inside a for loop as shown below...
import numpy as np
def update(i) :
dv = (e_l - v_s[-1] + i*r_m)/tau_m
v_s.append(v_s[-1] + (dv*dt))
return is_spiked()
def is_spiked() :
if v_s[-1] > v_th:
v_s[-1] = v_r
return True
return False
r_m = 10
tau_m = 10
v_th = -40
e_l = -70
v_r = -70
v_s = [v_r]
spike_count = 0
t = 0
t_total = 1000
dt = 1
i_e_start = 4
i_e_step = 0.1
i_e_final = 5
i_es = np.arange(i_e_start,i_e_final+i_e_step,i_e_step)
for i in i_es :
while t < t_total :
if update(i) :
spike_count += 1
t += dt
print ("Current = ",+ i, " Spike count = ", + spike_count)
However, when I run this I get the following output:
Current = 4.0 Spike count = 71
Current = 4.1 Spike count = 71
Current = 4.2 Spike count = 71
Current = 4.3 Spike count = 71
Current = 4.4 Spike count = 71
Current = 4.5 Spike count = 71
Current = 4.6 Spike count = 71
Current = 4.7 Spike count = 71
Current = 4.8 Spike count = 71
Current = 4.9 Spike count = 71
Current = 5.0 Spike count = 71
I can see that the current ('i') values are increasing as they should each time but the spike count is not changing.. The answer is always from the first value ('i' = 4) run through the loop.
Can anyone help with this?
Thanks in advance.

Awk: given list of users with session data, output list of users with specific data

Not sure how to ask this question, thus I don't know how to search for it on google or SO. Let me just show you the given data. By the way, this is just an Awk exercise, its not homework. Been trying to solve this off and on for 2 days now. Below is an example;
Mon Sep 15 12:17:46 1997
User-Name = "wynng"
NAS-Identifier = 207.238.228.11
NAS-Port = 20104
Acct-Status-Type = Start
Acct-Delay-Time = 0
Acct-Session-Id = "239736724"
Acct-Authentic = RADIUS
Client-Port-DNIS = "3571800"
Framed-Protocol = PPP
Framed-Address = 207.238.228.57
Mon Sep 15 12:19:40 1997
User-Name = "wynng"
NAS-Identifier = 207.238.228.11
NAS-Port = 20104
Acct-Status-Type = Stop
Acct-Delay-Time = 0
Acct-Session-Id = "239736724"
Acct-Authentic = RADIUS
Acct-Session-Time = 115
Acct-Input-Octets = 3915
Acct-Output-Octets = 3315
Acct-Input-Packets = 83
Acct-Output-Packets = 66
Ascend-Disconnect-Cause = 45
Ascend-Connect-Progress = 60
Ascend-Data-Rate = 28800
Ascend-PreSession-Time = 40
Ascend-Pre-Input-Octets = 395
Ascend-Pre-Output-Octets = 347
Ascend-Pre-Input-Packets = 10
Ascend-Pre-Output-Packets = 11
Ascend-First-Dest = 207.238.228.255
Client-Port-DNIS = "3571800"
Framed-Protocol = PPP
Framed-Address = 207.238.228.57
So the log file contains the above data for various users. I specifically pasted this to show that this user had a login, Acct-Status-Type = Start, and a logoff, Acct-Status-Type = Stop. This counts as one session. Thus I need to generate the following output.
User: "wynng"
Number of Sessions: 1
Total Connect Time: 115
Input Bandwidth Usage: 83
Output Bandwidth Usage: 66
The problem I have is keeping the info somehow attached to the user. Each entry in the log file has the same information when the session is in Stop so I cant just regex
/Acct-Input-Packets/{inPackets =$3}
/Acct-Output-Packets/{outPackets = $3}
Each iteration through the data will overwrite the past values. What I want to do is if I find a User-Name entry and this entry has a Stop, then I want to record for that user, the input/output packet values. This is where I get stumped.
For the session values I was thinking of saving the User-Names in an array and then in the END{} count the duplicates and divide by 2 those that are greater than 2 if even. If odd then divide by two then floor it.
I don't necessarily want the answer but maybe some hints/guidance or perhaps a simple example on which I could expand on.

You can check each line for :
a date pattern : /\w+\s\w+\s[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}\s[0-9]{4}/
user name value : /User-Name\s+=\s+\"\w+\"/
status value : /Acct-Status-Type\s+=\s+\w+/
input packet value : /Acct-Input-Packets\s+=\s[0-9]+/
output packet value : /Acct-Output-Packets\s+=\s[0-9]+/
an empty line : /^$/
Once you have defined what you are looking for (above pattern), it's just a matter of conditions and storing all those data in some array.
In the following example, I store each value type above in a dedicated array for each type with a count index that is incremented when an empty line /^$/ is detected :
awk 'BEGIN{
count = 1;
i = 1;
}{
if ($0 ~ /\w+\s\w+\s[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}\s[0-9]{4}/){
match($0, /\w+\s(\w+)\s([0-9]{2})\s([0-9]{2}):([0-9]{2}):([0-9]{2})\s([0-9]{4})/, n);
match("JanFebMarAprMayJunJulAugSepOctNovDec",n[1])
n[1] = sprintf("%02d",(RSTART+2)/3);
arr[count]=mktime(n[6] " " n[1] " " n[2] " " n[3] " " n[4] " " n[5]);
order[i]=count;
i++;
}
else if ($0 ~ /User-Name\s+=\s+\"\w+\"/){
match($0, /User-Name\s+=\s+\"(\w+)\"/, n);
name[count]=n[1];
}
else if ($0 ~ /Acct-Status-Type\s+=\s+\w+/){
match($0, /Acct-Status-Type\s+=\s+(\w+)/, n);
status[count]=n[1];
}
else if ($0 ~ /^$/){
count++;
}
else if ($0 ~ /Acct-Input-Packets\s+=\s[0-9]+/){
match($0, /Acct-Input-Packets\s+=\s([0-9]+)/, n);
input[count]=n[1];
}
else if ($0 ~ /Acct-Output-Packets\s+=\s[0-9]+/){
match($0, /Acct-Output-Packets\s+=\s([0-9]+)/, n);
output[count]=n[1];
}
}
END{
for (i = 1; i <= length(order); i++) {
val = name[order[i]];
if (length(user[val]) == 0) {
valueStart = "0";
if (status[order[i]] == "Start"){
valueStart = arr[order[i]];
}
user[val]= valueStart "|0|0|0|0";
}
else {
split(user[val], nameArr, "|");
if (status[order[i]]=="Stop"){
nameArr[2]++;
nameArr[3]+=arr[order[i]]-nameArr[1]
}
else if (status[order[i]] == "Start"){
# store date start
nameArr[1] = arr[order[i]];
}
nameArr[4]+=input[order[i]];
nameArr[5]+=output[order[i]];
user[val]= nameArr[1] "|" nameArr[2] "|" nameArr[3] "|" nameArr[4] "|" nameArr[5];
}
}
for (usr in user) {
split(user[usr], usrArr, "|");
print "User: " usr;
print "Number of Sessions: " usrArr[2];
print "Total Connect Time: " usrArr[3];
print "Input Bandwidth Usage: " usrArr[4];
print "Output Bandwidth Usage: " usrArr[5];
print "------------------------";
}
}' test.txt
The values are extracted with match function like :
match($0, /User-Name\s+=\s+\"(\w+)\"/, n);
For the date, we have to parse the month string part, I've used the solution in this post to extract like :
match($0, /\w+\s(\w+)\s([0-9]{2})\s([0-9]{2}):([0-9]{2}):([0-9]{2})\s([0-9]{4})/, n);
match("JanFebMarAprMayJunJulAugSepOctNovDec",n[1])
n[1] = sprintf("%02d",(RSTART+2)/3);
All the processing of the collected values is done in the END clause where we have to group the values, I create a user array with the username as key and as value a concatenation of all your different type delimited by | :
[startDate] "|" [sessionNum] "|" [connectionTime] "|" [inputUsage] "|" [outputUsage]
With this data input (your data extended), it gives :
User: TOTO
Number of Sessions: 1
Total Connect Time: 114
Input Bandwidth Usage: 83
Output Bandwidth Usage: 66
------------------------
User: wynng
Number of Sessions: 2
Total Connect Time: 228
Input Bandwidth Usage: 166
Output Bandwidth Usage: 132
------------------------

How to improve Update query in arangodb

I have a collection which holds more than 15 million documents. Out of those 15 million documents I update 20k records every hour. But update query takes a long time to finish (30 min around).
Document:
{ "inst" : "instance1", "dt": "2015-12-12T00:00:000Z", "count": 10}
I have an array which holds 20k instances to be updated.
My Query looks like this:
For h in hourly filter h.dt == DATE_ISO8601(14501160000000)
For i in instArr
filter i.inst == h.inst
update h with {"inst":i.inst, "dt":i.dt, "count":i.count} in hourly
Is there any optimized way of doing this. I have hash indexing on inst and skiplist indexing on dt.
Update
I could not use 20k inst in the query manually so following is the execution plan for just 2 inst:
FOR r in hourly FILTER r.dt == DATE_ISO8601(1450116000000) FOR i IN
[{"inst":"0e649fa22bcc5200d7c40f3505da153b", "dt":"2015-12-14T18:00:00.000Z"}, {}] FILTER i.inst ==
r.inst UPDATE r with {"inst":i.inst, "dt": i.dt, "max":i.max, "min":i.min, "sum":i.sum, "avg":i.avg,
"samples":i.samples} in hourly OPTIONS { ignoreErrors: true } RETURN NEW.inst
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
5 CalculationNode 1 - LET #6 = [ { "inst" : "0e649fa22bcc5200d7c40f3505da153b", "dt" : "2015-12-14T18:00:00.000Z" }, { } ] /* json expression */ /* const assignment */
13 IndexRangeNode 103067 - FOR r IN hourly /* skiplist index scan */
6 EnumerateListNode 206134 - FOR i IN #6 /* list iteration */
7 CalculationNode 206134 - LET #8 = i.`inst` == r.`inst` /* simple expression */ /* collections used: r : hourly */
8 FilterNode 206134 - FILTER #8
9 CalculationNode 206134 - LET #10 = { "inst" : i.`inst`, "dt" : i.`dt`, "max" : i.`max`, "min" : i.`min`, "sum" : i.`sum`, "avg" : i.`avg`, "samples" : i.`samples` } /* simple expression */
10 UpdateNode 206134 - UPDATE r WITH #10 IN hourly
11 CalculationNode 206134 - LET #12 = $NEW.`inst` /* attribute expression */
12 ReturnNode 206134 - RETURN #12
Indexes used:
Id Type Collection Unique Sparse Selectivity Est. Fields Ranges
13 skiplist hourly false false n/a `dt` [ `dt` == "2015-12-14T18:00:00.000Z" ]
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 move-filters-up
3 move-calculations-up-2
4 move-filters-up-2
5 remove-data-modification-out-variables
6 use-index-range
7 remove-filter-covered-by-index
Write query options:
Option Value
ignoreErrors true
waitForSync false
nullMeansRemove false
mergeObjects true
ignoreDocumentNotFound false
readCompleteInput true

I assume the selection part (not the update part) will be the bottleneck in this query.
The query seems problematic because for each document matching the first filter (h.dt == DATE_ISO8601(...)), there will be an iteration over the 20,000 values in the instArr array. If instArr values are unique, then only one value from it will match. Additionally, no index will be used for the inner loop, as the index selection has happened in the outer loop already.
Instead of looping over all values in instArr, it will be better to turn the accompanying == comparison into an IN comparison. That would already work if instArr would be an array of instance names, but it seems to be an array of instance objects (consisting of at least attributes inst and count). In order to use the instance names in an IN comparison, it would be better to have a dedicated array of instance names, and a translation table for the count and dt values.
Following is an example for generating these with JavaScript:
var instArr = [ ], trans = { };
for (i = 0; i < 20000; ++i) {
var instance = "instance" + i;
var count = Math.floor(Math.random() * 10);
var dt = (new Date(Date.now() - Math.floor(Math.random() * 10000))).toISOString();
instArr.push(instance);
trans[instance] = [ count, dt ];
}
instArr would then look like this:
[ "instance0", "instance1", "instance2", ... ]
and trans:
{
"instance0" : [ 4, "2015-12-16T21:24:45.106Z" ],
"instance1" : [ 0, "2015-12-16T21:24:39.881Z" ],
"instance2" : [ 2, "2015-12-16T21:25:47.915Z" ],
...
}
These data can then be injected into the query using bind variables (named like the variables above):
FOR h IN hourly
FILTER h.dt == DATE_ISO8601(1450116000000)
FILTER h.inst IN #instArr
RETURN #trans[h.inst]
Note that ArangoDB 2.5 does not yet support the #trans[h.inst] syntax. In that version, you will need to write:
LET trans = #trans
FOR h IN hourly
FILTER h.dt == DATE_ISO8601(1450116000000)
FILTER h.inst IN #instArr
RETURN trans[h.inst]
Additionally, 2.5 has a problem with longer IN lists. IN-list performance decreases quadratically with the length of the IN list. So in this version, it will make sense to limit the length of instArr to at most 2,000 values. That may require issuing multiple queries with smaller IN lists instead of just one with a big IN list.
The better alternative would be to use ArangoDB 2.6, 2.7 or 2.8, which do not have that problem, and thus do not require the workaround. Apart from that, you can get away with the slightly shorter version of the query in the newer ArangoDB versions.
Also note that in all of the above examples I used a RETURN ... instead of the UPDATE statement from the original query. This is because all my tests revealed that the selection part of the query is the major problem, at least with the data I had generated.
A final note on the original version of the UPDATE: updating each document's inst value with i.inst seems redudant, because i.inst == h.inst so the value won't change.

Finding the number of days in a month

I am making a program to display the no. of days in the month provided by user. I am making this program at Data Flow level. As I am new to verilog, I don't know if we can use if/else conditions or case statement in data flow level. because using if/else statement will make this program piece of cake. If not how can I implement the following idea in data flow level.
if(month==4 || month==6 || month==9|| month==11)
days=30;
else
if(month==2 && leapyear==1)
days=29;
Here is my verilog incomplete code:
module LeapYear(year,month,leapOrNot,Days);
input year,month;
output leapOrNot,Days;
//if (year % 400 == 0) || ( ( year % 100 != 0) && (year % 4 == 0 ))
leapOrNot=((year&400)===0) && ((year % 100)!==0 || (year & 4)===0);
Days=((month & 4)===4 ||(month & 6)===6 ||(month & 9)===9 ||(month & 11)===11 )

You cannot use if/else in a continuous assignment, but you can use the conditional operator, which is functionally equivalent.
Try this:
assign Days = (month == 4 || month == 6 || month == 9 || month == 11) ? 30 :
(month == 2 && leapyear == 1) ? 29;
That will produce what you put in your question. But's its not the correct answer as you are missing the conditions where Days is equal to 28 or 31.
EDIT:
Here's how to combine all the conditions into a single assign statement using the conditional operator.v
assign Days = (month == 4 || month == 6 || month == 9 || month == 11) ? 30 :
(month == 2 && leapyear == 1) ? 29 :
(month == 2 && leapyear == 0) ? 28 :
31;

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

ArangoDB sum by groups - arangodb

The following should work in AQL: FOR doc IN collection COLLECT group = (doc.age < 18 ? "Under 18" : (doc.age >= 18 && doc.age <= 24 ? "18-24" : (doc.age >= 25 && doc.age <= 67 ? "25-67" : (doc.age >= 68 && doc.age <= 100 ? "68-100" : "other" )))) AGGREGATE s = SUM(doc.x) RETURN { group, s }

Related

Groovy- Difference between 2 month

How to loop values of a variable through a nested while loop calling on a function in Python

Awk: given list of users with session data, output list of users with specific data

How to improve Update query in arangodb

Finding the number of days in a month

Categories

Resources