ArangoDB Collect Sort and Limit

ArangoDB Collect Sort and Limit - arangodb

I am looking for a solution to what should be a simple issue. I come from a MySQL development and still have problems here and a with AQL. I want to display the comments of a user with page pagnation without duplicate article keys. I have a comment collection with
_key, user_key, content_key, comment, stamp_create
Example:
1, 4483, 200, "reply 1", "2021-04-24T14:55:55+00:00"
2, 4483, 200, "reply 2", "2021-04-24T14:56:23+00:00"
3, 4483, 201, "reply 1", "2021-04-23T17:10:15+00:00"
4, 4483, 202, "reply 1", "2021-04-23T12:30:35+00:00"
5, 4483 , 202, "reply 2", "2021-04-22T23:50:51+00:00"
Now I need the output all comments Sorted by stamp_create DESC but with summarized content_key, With Mysql a "Group By" was enough for this.
So I try it with Collect.
FOR c IN comments FILTER c.user_key =="4483" SORT c stamp_create DESC COLLECT contentKeys = c.content_key RETURN contentKeys LIMIT 0,3
Generally the result is correct, but the sorting doesn't work because COLLECT doesn't remember it. Well then with the good old DISTINCT:
FOR c IN comments FILTER c.user_key =="4483" SORT c stamp_create DESC RETURN DISTINCT contentKeys LIMIT 0,3
Result looks good and the sorting is correct, but I get only 2 results and not 3 as specified in Limit. For a page pagnation unfortunately useless, because I get more or less results (MySQL does it differently with Group By).
Next try now with a little more effort:
FOR c IN comments
FILTER c.user_key =="4483"
COLLECT contentKeys = c.content_key INTO groups
LET keys = first(FOR value IN groups[*].c SORT value.stamp_create DESC RETURN {key: value._key, stamp: value.stamp_create})
SORT keys.stamp DESC
LIMIT 0,3
RETURN keys
I load here all comments grouped in "groups", then sort them within the group by and then again in the whole.
The result is correct and I also get 3 results. Big disadvantage that is not really fast and I have many users with more than 60K comments.
The entire (filtered) data set (can also be 60K long) must be read in and then re-sorted. This must be easier to do.
So my question is, is there a performant way to make this faster and easier?
thanks

With the help of mpoeter, we found a faster and more elegant method.
Here is the solution:
FOR c IN comments
FILTER c.user_key =="4483"
COLLECT contentKeys = c.content_key AGGREGATE maxStamp = MAX(c.stamp_create) OPTIONS { method: "sorted" }
SORT maxStamp DESC
LIMIT 0,3
RETURN contentKeys
Thanks to the ArangoDB team :-)

Related

using collect in arangodb insert to create new documents

I have a collection called prodSampleNew with documents that have hierarchy levels as fields in arangodb:
{
prodId: 1,
LevelOne: "clothes",
LevelTwo: "pants",
LevelThree: "jeans",
... etc....
}
I want take the hierarchy levels and convert them into their own documents, so I can eventually build a proper graph with the hierarchy.
I was able to get this to extract the first level fo the hierarchy and put it in a new collection using the following:
for i IN [1]
let HierarchyList = (
For prod in prodSampleNew
COLLECT LevelOneUnique = prod.LevelOne
RETURN LevelOneUnique
)
FOR hierarchyLevel in HierarchyList
INSERT {"name": hierarchyLevel}
IN tmp
However, having to put a for I IN [1] at the top seems wrong and that there should be a better way.(yes I am fairly new to AQL)
Any pointers on a better way to do this would be appreciated

Not sure what you are trying to achieve exactly.
The FOR i IN [1] seems unnecessary however, so you could start your AQL query directly with the subquery to compute the distinct values from hierarchy level 1:
LET HierarchyList = (
FOR prod IN prodSampleNew
COLLECT LevelOneUnique = prod.LevelOne
RETURN LevelOneUnique
)
FOR hierarchyLevel IN HierarchyList
INSERT {"name": hierarchyLevel} IN tmp
The result should be the same.
If the question is more like "how can I get all distinct names of levels from all hierarchies", then you could use something like
LET HierarchyList = UNIQUE(FLATTEN(
FOR prod IN prodSampleNew
RETURN [ prod.LevelOne, prod.LevelTwo, prod.LevelThree ]
))
...
to produce an array with the unique names of the hierarchy levels for level 1-3.
Shouldn't this answer your question, please describe the desired result the query should produce.

Compare values inside same subdocument for findOne() [MongoDB]

I have a database full of objects that look ~exactly like this (simplified for clarity):
{
"_id": "GIFT100",
"price": 100,
"priceHistory": [
100, 110
],
"update": 1444183299242
}
What I'm trying to do is create a query document for MongoJS (or MongoDB and I can figure out the rest) that looks for the fact that priceHistory[0] < priceHistory[1].
I would want my query document to return the above record as a result. Alternatively, I could change my document code to compare price < priceHistory[0] but I believe this still leads to the same problem (comparing values inside the same document).
Any help would be appreciated, I've exhausted my Google-foo.
Edit:
I want to return a set of records that indicate a price drop since our last scan (performed daily). Basically a set of "sale" items from a data source I don't control.

You can use the $where clause, but be careful--it's slow, it cannot use your indexes, and it will perform a full table scan. Pass on whatever Javascript you want to use for comparison:
db.collection.findOne({$where: "priceHistory[0] < priceHistory[1]"})
Additionally, you can skip the $where statement if that's the only thing you're querying by:
db.collection.findOne("priceHistory[0] < priceHistory[1]")

ColdFusion: Object with duplicate values (removing duplicates)

I have a query object (SQL) with some records, the problem is that some of the records contain duplicate values. :( (I can't use DISTINCT in my SQL Query, so how to remove in my object?)
categories[1].id = 1
categories[2].id = 1
categories[3].id = 2
categories[4].id = 3
categories[5].id = 2
Now I want to get a list with 1, 2, 3
Is that possible?

I'm not quite sure why you say you can't use DISTINCT, even given the qualification you offered. It doesn't matter were a query came from (<cfquery>, <cfldap>, <cfdirectory>, built by hand) by the time it's exposed to your CFML code, it's just "a query", so you can definitely use DISTINCT on it:
<cfquery name="distinctCategories" dbtype="query">
SELECT DISTINCT id
FROM categories
</cfquery>

couchDB sorting complex key

I have a couchDB database which has several different document "types" which all relate to a main "type".
In the common blog / post example, the main type is the blog post, and the others are comments (though there are 3 different types of comments.
All of the types have a date on them, however, I wish to sort blog posts by date, but return all of the data from the comments as well. I can write an emit which produces keys like so:
[date, postID, docTypeNumber]
where docTypeNumber is 1 for post and > 1 for the different comment document types.
e.g:
["2013-03-01", 101, 1]
[null, 101, 2]
[null, 101, 2]
[null, 101, 3]
["2013-03-02", 101, 1]
[null, 102, 2]
[null, 102, 3]
Of course, If I emit this, all the nulls get sorted together. Is there a way to ignore the nulls, and group them by the seccond item in the array, but sort them by the first if it is not null?
Or, do I have to get all the documents to record the post date in order for sort to work?
I do not want to use lists, they are way too slow and I'm dealing with a potentially large data set.

You can do this by using conditionals in your map function.
if(date != null) {
emit([date, postID, docTypeNumber]);
}
else {
emit([postID, docTypeNumber]);
}
I don't know if you want your array length to be variable or not. If not, you could add the sort variable first. The following snippet could work since date and postID presumably never have the same values.
if(date != null) {
sortValue = date;
}
else {
sortValue = postID;
}
emit(sortValue, date, postID, docTypeNumber);
Update: I thought about this a little more. In general, I make my views based on queries I want to perform. So I ask myself, what do I need to query? It seems that in your case, you might have two distinct queries here. If so, I suggest having two different views. There is a performance penalty to pay since you would run two views instead of one, but I doubt it is perceivable to the user. And it might take up more disk space. The benefit for you would be clearer and more explicit code.

It seems you want to sort all the data (both the post and the comments) with post's date. Since in your design comment document does not contain post date (just comment date) it is difficult with the view collation pattern. I suggest changing the database design to have blog post ID meaningful and contain the date, eg. concatenated date with author id. This way if you emit [doc._id, doc.type] from the post and [doc.post, doc.type] from the comment document you will have post and comments grouped and sorted by date.

Querying documents containing two tags with CouchDB?

Consider the following documents in a CouchDB:
{
"name":"Foo1",
"tags":["tag1", "tag2", "tag3"],
"otherTags":["otherTag1", "otherTag2"]
}
{
"name":"Foo2",
"tags":["tag2", "tag3", "tag4"],
"otherTags":["otherTag2", "otherTag3"]
}
{
"name":"Foo3",
"tags":["tag3", "tag4", "tag5"],
"otherTags":["otherTag3", "otherTag4"]
}
I'd like to query all documents that contain ALL (not any!) tags given as the key.
For example, if I request using '["tag2", "tag3"]' I'd like to retrieve Foo1 and Foo2.
I'm currently doing this by querying by tag, first for "tag2", then for "tag3", creating the union manually afterwards.
This seems to be awfully inefficient and I assume that there must be a better way.
My second question - but they are quite related, I think - would be:
How would I query for all documents that contain "tag2" AND "tag3" AND "otherTag3"?
I hope a question like this hasn't been asked/answered before. I searched for it and didn't find one.

Do you have a maximum number of?
Tags per document, and
Tags allowed in the query
If so, you have an upper-bound on the maximum number of tags to be indexed. For example, with a maximum of 5 tags per document, and 5 tags allowed in the AND query, you could simply output every 1, 2, 3, 4, and 5-tag combination into your index, for a maximum of 1 (five-tag combos + 5 (four-tag combos) + 10 (three-tag combos) + 10 (two-tag combos) + 5 (one-tag combos) = 31 rows in the view for that document.
That may be acceptable to you, considering that it's quite a powerful query. The disk usage may be acceptable (especially if you simply emit(tags, {_id: doc._id}) to minimize data in the view, and you can use ?include_docs=true to get the full document later. The final thing to remember is to always emit the key array sorted, and always query it the same way, because you are emitting only tag combinations, not permutations.
That can get you so far, however it does not scale up indefinitely. For full-blown arbitrary AND queries, you will indeed be required to split into multiple queries, or else look into CouchDB-Lucene.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string