How to group results in ArangoDb into single record? - arangodb

I have the list of events of certain type, structured on the following example:
{
createdAt: 123123132,
type: STARTED,
metadata: {
emailAddress: "foo#bar.com"
}
}
The number of types is predefined (START, STOP, REMOVE...). Users produce one or more events during time.
I want to get the following aggregation:
For each user, calculate the number of events for each type.
My AQL query looks like this:
FOR event IN events
COLLECT
email = event.metadata.emailAddress,
type = event.type WITH COUNT INTO count
LIMIT 10
RETURN {
email,
t: {type, count}
}
This produces the following output:
{ email: '_84#example.com', t: { type: 'CREATE', count: 203 } }
{ email: '_84#example.com', t: { type: 'DEPLOY', count: 214 } }
{ email: '_84#example.com', t: { type: 'REMOVE', count: 172 } }
{ email: '_84#example.com', t: { type: 'START', count: 204 } }
{ email: '_84#example.com', t: { type: 'STOP', count: 187 } }
{ email: '_95#example.com', t: { type: 'CREATE', count: 189 } }
{ email: '_95#example.com', t: { type: 'DEPLOY', count: 173 } }
{ email: '_95#example.com', t: { type: 'REMOVE', count: 194 } }
{ email: '_95#example.com', t: { type: 'START', count: 213 } }
{ email: '_95#example.com', t: { type: 'STOP', count: 208 } }
...
i.e. I got a row for each type. But I want results like this:
{ email: foo#bar.com, count1: 203, count2: 214, count3: 172 ...}
{ email: aaa#fff.com, count1: 189, count2: 173, count3: 194 ...}
...
OR
{ email: foo#bar.com, CREATE: 203, DEPLOY: 214, ... }
...
i.e. to group again the results.
I also need to sort the results (not the events) by the counts: to return e.g. the top 10 users with max number of CREATE events.
How to do that?
ONE SOLUTION
One solution is here, check the accepted answer for more.
FOR a in (FOR event IN events
COLLECT
emailAddress = event.metadata.emailAddress,
type = event.type WITH COUNT INTO count
COLLECT email = emailAddress INTO perUser KEEP type, count
RETURN MERGE(PUSH(perUser[* RETURN {[LOWER(CURRENT.type)]: CURRENT.count}], {email})))
SORT a.create desc
LIMIT 10
RETURN a

You could group by user and event type, then group again by user keeping only the type and already calculated event type counts. In the second aggregation, it is important to know into which groups the events fall to construct the result. An array inline projection can be used for that to keep the query short:
FOR event IN events
COLLECT
emailAddress = event.metadata.emailAddress,
type = event.type WITH COUNT INTO count
COLLECT email = emailAddress INTO perUser KEEP type, count
RETURN MERGE(PUSH(perUser[* RETURN {[CURRENT.type]: CURRENT.count}], {email}))
Another way would be to group by user and keep event types, then group the types in a subquery. But it is significantly slower in my test (without any indexes defined at least):
FOR event IN events
LET type = event.type
COLLECT
email = event.metadata.emailAddress INTO groups KEEP type
LET byType = (
FOR t IN groups[*].type
COLLECT t2 = t WITH COUNT INTO count
RETURN {[t2]: count}
)
RETURN MERGE(PUSH(byType, {email}))
Returning the top 10 users with the most CREATE events is much simpler. Filter for CREATE event type, then group by user and count the number of events, sort by this number in descending order and return the first 10 results:
FOR event IN events
FILTER event.type == "CREATE"
COLLECT email = event.metadata.emailAddress WITH COUNT INTO count
SORT count DESC
LIMIT 10
RETURN {email, count}
EDIT1: Return one document per user with event types grouped and counted (like in the first query), but capture the MERGE result, sort by the count of one particular event type (here: CREATE) and return the top 10 users for this type. The result is the same as with the solution given in the question. It spares the subquery a la FOR a IN (FOR event IN events ...) ... RETURN a however:
FOR event IN events
COLLECT
emailAddress = event.metadata.emailAddress,
type = event.type WITH COUNT INTO count
COLLECT email = emailAddress INTO perUser KEEP type, count
LET ret = MERGE(PUSH(perUser[* RETURN {[CURRENT.type]: CURRENT.count}], {email}))
SORT ret.CREATE DESC
LIMIT 10
RETURN ret
EDIT2: Query to generate example data (requires a collection events to exist):
FOR i IN 1..100
LET email = CONCAT(RANDOM_TOKEN(RAND()*4+4), "#example.com")
FOR j IN SPLIT("CREATE,DEPLOY,REMOVE,START,STOP", ",")
FOR k IN 1..RAND()*150+50
INSERT {metadata: {emailAddress: email}, type: j} INTO events RETURN NEW

Related

DynamoDB client doesn't fulfil the limit

Client lib : "#aws-sdk/client-dynamodb": "3.188.0"
I've a DynamoDB pagination implementation.
My user count is 98 & page size is 20. Therefore I'm expecting 5 pages & each having 20,20,20,20 & 18 users in the result.
But actually I'm getting more than 5 pages and each page having variable number of users like 10, 12, 11 ..etc.
How can I get users with proper page limit like 20, 20, 20, 20 & 18?
public async pagedList(usersPerPage: number, lastEvaluatedKey?: string): Promise<PagedUser> {
const params = {
TableName: tableName,
Limit: usersPerPage,
FilterExpression: '#type = :type',
ExpressionAttributeValues: {
':type': { S: type },
},
ExpressionAttributeNames: {
'#type': 'type',
},
} as ScanCommandInput;
if (lastEvaluatedKey) {
params.ExclusiveStartKey = { 'oid': { S: lastEvaluatedKey } };
}
const command = new ScanCommand(params);
const data = await client.send(command);
const users: User[] = [];
if (data.Items !== undefined) {
data.Items.forEach((item) => {
if (item !== undefined) {
users.push(this.makeUser(item));
}
});
}
let lastKey;
if (data.LastEvaluatedKey !== undefined) {
lastKey = data.LastEvaluatedKey.oid.S?.valueOf();
}
return {
users: users,
lastEvaluatedKey: lastKey
};
}
The scan command documentation https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.Pagination
Provides few reasons why your result may contain less results:
The result must fit in 1 MB
If a filter is applied, the data is filtered "after scan". You have a filter in your query.
From the docs
A filter expression is applied after a Scan finishes but before the
results are returned. Therefore, a Scan consumes the same amount of
read capacity, regardless of whether a filter expression is present.
...
Now suppose that you add a filter expression to the Scan. In this case, DynamoDB applies the filter expression to the six items that were returned, discarding those that do not match. The final Scan result contains six items or fewer, depending on the number of items that were filtered.
In the next section it is explained how you can verify that it is possibly your case:
Counting the items in the results
In addition to the items that match your criteria, the Scan response contains the following
elements:
ScannedCount — The number of items evaluated, before any ScanFilter is
applied. A high ScannedCount value with few, or no, Count results
indicates an inefficient Scan operation. If you did not use a filter
in the request, ScannedCount is the same as Count.

How to unwind data held in edges with a "common neighbors" style query?

I have a simple model with a single A Document collection
[{ _key: 'doc1', id: 'a/doc1', name: 'Doc 1' }, { _key: 'doc2', id: 'a/doc2', name: 'Doc 2' }]
and a single B Edge collection, joining documents A with an held weight integer on each edge.
[{ _key: 'xxx', id: 'b/xxx', _from: 'a/doc1', _to: 'a/doc2', weight: 256 }]
I'm trying to make a "common neighbors" style query, that takes 2 document as an input, and yields common neighbors of those inputs, along with respective weights (of each side).
For example with doc1 and doc26 input, here is the goal to achieve :
[
{ _key: 'doc6', weightWithDoc1: 43, weightWithDoc26: 57 },
{ _key: 'doc12', weightWithDoc1: 98, weightWithDoc26: 173 },
{ _key: 'doc21', weightWithDoc1: 3, weightWithDoc26: 98 },
]
I successfully started by targeting a single side :
FOR associated, association
IN 1..1
ANY ${d1}
${EdgeCollection}
SORT association.weight DESC
LIMIT 20
RETURN { _key: associated._key, weight: association.weight }
Then successfully went on with the INTERSECTION logic of the documentation
FOR proj IN INTERSECTION(
(FOR associated, association
IN 1..1
ANY ${d1}
${EdgeCollection}
RETURN { _key: associated._key }),
(FOR associated, association
IN 1..1
ANY ${d2}
${EdgeCollection}
RETURN { _key: associated._key })
)
LIMIT 20
RETURN proj
But I'm now struggling at extracting the weight of each side, as unwinding it on the inner RETURN clauses will make them exclusive on the intersection; thus returning nothing.
Questions :
Is there any way to make some kind of "selective INTERSECTION", grouping some fields in the process ?
Is there an alternative to INTERSECTION to achieve my goal ?
Bonus question :
Ideally, after successfully extracting weightWithDoc1 and weightWithDoc26, I'd like to SORT DESC by weightWithDoc1 + weightWithDoc26
I managed to find an acceptable answer myself
FOR associated IN INTERSECTION(
(FOR associated
IN 1..1
ANY ${doc1}
${EdgeCollection}
RETURN { _key: associated._key }),
(FOR associated
IN 1..1
ANY ${doc2}
${EdgeCollection}
RETURN { _key: associated._key })
)
LET association1 = FIRST(FOR association IN ${EdgeCollection}
FILTER association._from == CONCAT(${DocCollection.name},'/',MIN([${doc1._key},associated._key])) AND association._to == CONCAT(${DocCollection.name},'/',MAX([${doc1._key},associated._key]))
RETURN association)
LET association2 = FIRST(FOR association IN ${EdgeCollection}
FILTER association._from == CONCAT(${DocCollection.name},'/',MIN([${doc2._key},associated._key])) AND association._to == CONCAT(${DocCollection.name},'/',MAX([${doc2._key},associated._key]))
RETURN association)
SORT (association1.weight+association2.weight) DESC
LIMIT 20
RETURN { _key: associated._key, weight1: association1.weight, weight2: association2.weight }
I believe re-selecting after intersecting is not ideal and not the most performant solution, so I'm leaving it open for now to wait for a better answer.

array manipulation in node js and lodash

I have two arrays
typeArr = [1010111,23342344]
infoArr={'name':'jon,'age':25}
I am expecting following
[{'name:'jone','age':25,'type':1010111,'default':'ok'},{'name:'jone','age':25,'type':23342344,'default':'nok'}]
Code :
updaterecord(infoArr,type)
{
infoArr.type=type;
response = calculate(age);
if(response)
infoArr.default = 'ok';
else
infoArr.default = 'nok';
return infoArr;
}
createRecord(infoArr,typeArr)
{
var data = _.map(typeArr, type => {
return updaterecord(infoArr,type);
});
return (data);
}
var myData = createRecord(infoArr,typeArr);
I am getting
[{'name:'jone,'age':25.'type':23342344,'default':nok},{'name:'jone,'age':25.'type':23342344,'default':nok}]
with some reason the last record updates the previous one. I have tried generating array using index var but not sure what's wrong it keep overriding the previous item.
how can I resolve this
You are passing the entire infoArr array to your updaterecord() function, but updaterecord() looks like it's expecting a single object. As a result it is adding those properties to the array rather than individual members of the array.
It's not really clear what is supposed to happen because typeArr has two elements and infoArr has one. Do you want to add another to infoArr or should infoArr have the same number of elements as typeArr.
Assuming it should have the same number you would need to use the index the _map gives you to send each item from infoArr:
function createRecord(infoArr,typeArr) {
var data = _.map(typeArr, (type, i) => {
// use infoArr[i] to send one element
return updaterecord(infoArr[i],type);
});
return (data);
}
Edit:
I'm not sure how you are calculating default since it's different in your expected output, but based on one number. To get an array of objects based on infoArray you need to copy the object and add the additional properties the you want. Object.assign() is good for this:
let typeArr = [1010111,23342344]
let infoArr={'name':'jon','age':25}
function updaterecord(infoArr,type){
var obj = Object.assign({}, infoArr)
return Object.assign(obj, {
type: type,
default: infoArr.age > 25 ? 'ok' : 'nok' //or however your figuring this out
})
}
function createRecord(infoArr,typeArr) {
return _.map(typeArr, type => updaterecord(infoArr,type));
}
Result:
[ { name: 'jon', age: 25, type: 1010111, default: 'nok' },
{ name: 'jon', age: 25, type: 23342344, default: 'nok' } ]

Reading value of a key in an object using typescript

I have a array of objects and each object has two properties:
{key:count}
I am going to configure my chart and I should set the data source of the chart like below:
{meta: "unknown", value: [the count of unknown]},
{meta: "male", value: [the count of male]},
{meta: "female", value: [the count of female]}
Lets say my current array of objects is like:
[{"0":"10"}, {"1":"7"}, {"2":"9"}] in which 0 stands for unknown gender, 1 for male and 2 for female.
How can I set the value in the chart data in one line in such a way that it can find the count of each gender based on its key automatically from the array of objects.
Edit:
I have already written a method and it does the job:
public getKeyValue(data, key) {
for (var i = 0; i < data.length; i++) {
if (data[i].key == key)
return data[i].count;
}
return 0;
}
However, I was wondering if there's a single line code solution like LINQ.
You can, but it's not pretty.
This will do the job:
data.map(item => item.key === key ? item.count : 0).reduce((previous, current) => previous + current);
(Check an example in playground)
But I wouldn't recommend using this instead of your code, because your code won't iterate over all the array elements if one was found to match the criteria, with my solution regardless of whether or not the criteria was met, there will be two iterations over all the array elements.
For example:
var key = "key3",
data: { key: string, count: number}[] = [
{ key: "key1", count: 1 },
{ key: "key2", count: 2 },
{ key: "key3", count: 3 },
{ key: "key4", count: 4 },
{ key: "key5", count: 5 },
//...
{ key: "key1554", count: 1554 }
];
The array (data) has the length of 1554, but you're looking for the 3rd element.
Your way will have 3 iterations and then return the value, my way will have 3108 iterations, one cycle (1554) for the map function and then another cycle for the reduce function.

Match range for value with MongoDB

I have a campaign collection, which is for advertisers. It's schema is:
var campaignSchema = new mongoose.Schema({
name: String,
sponsor: String,
[...]
target: {
age: {
lower: Number,
upper: Number
},
gender: String,
locations: Array, // [ { radius: '', lon: '', lat: '' } ]
activities: Array
}
});
I need to run a query with a specific age, and return all campaigns where that age is between age.lower and age.higher.
I have read the docs for $gt and $lt, however they appear to only work one way (so I could specify a range and match a value, however I need to specify a value and match a range).
Any suggestions?
Sorry I misunderstood the problem, I got it now.
db.collection.find( { "age.lower": { $lt: value1 }, "age.upper": { $gt: value1 } } );
So, for example, if your range is 25 to 40, and value1 is 30, 25 < 30 and 40 > 30 -- match!
If you use the same range with 20, 25 !< 20 -- will not match.
You could first create a query on the lower bound of that value, sort the results in descending order and get the top document from the result. Compare that document with the upper bound, if there is a match then that document has a range which contains that age. For example:
var age = 23;
var ads = db.campaign.find({ "age.lower": {"$lte": age } }).sort({"age.lower": -1}).limit(1);
if (ads != null) {
var result = ads[0];
if (result.age.upper > age) {
return result;
} else {
return null;
}
}

Resources