Is there any way to create partial index in ArangoDB? - arangodb

I want to create partial index for the collection, but the index should be applied to documents by conditions. For example, I want to check uniqueness of documents only if they have the certain field value. In other words, I'm looking for some construction of index creating:
db.person.createIndex(
{ age: 1},
{ partialFilterExpression: { age: { $gte: 18 }}
);
This example is from MongoDB and it is applying index on documents with field 'age' value greater then 18

There is no way to create a "filtered index" (like you can in SQL). According to the docs, you can include attributes, but not conditionally.
You could try a sparse index, but I think your best bet is adding the age attribute to a "skiplist" index, which supports sorting and gt/lt evaluation.
Make sure you use the explain feature to validate index usage.

Related

Efficiently count Documents with different values for a given field

I am trying to count the number of documents that are in each possible state in a particular Arango collection.
This should be possible in 1 pass over all of the documents using a bucket-sort like strategy where you iterate over all documents, if the value for the state hasn't been seen before, you add a counter with a value of 1 to a list. If you have seen that state before, you increment the counter. Once you've reached the end, you'll have a counter for each possible state in the DB that indicates how many documents are currently stored with that state.
I can't seem to figure out how to write this type of logic in AQL to submit as a query. Current strategy is like this:
Loop over all documents, filtering only docs of a particular state.
Loop over all documents, filtering only docs of a different particular state.
...
All states have been filtered.
Return size of each set
This works, but I'm sure it's much slower than it should be. This also means that if we add a new state, we have to update the query to loop over all docs an additional time, filtering based on the new state. A bucket-sort like query would be quick, and would need no updating as new states are created as well.
If these were the documents:
{A}
{B}
{B}
{C}
{A}
Then I'd like the result to be
{ A:2, B:2, C:1 }
Where A,B,&C are values for a particular field. Current strategy filters like so
LET docsA = (
FOR doc in collection
FILTER doc.state == A
RETURN doc
)
Then manually construct the return object calling LENGTH on each list of docs
Any help or additional info would be greatly appreciated
What about using a COLLECT function? (see docs here)
FOR doc IN collection
COLLECT s = doc.state WITH COUNT INTO c
RETURN { state: s, count: c }
This would return something like:
[
{ state: 'A', count: 23 },
{ state: 'B', count: 2 },
{ state: 'C', count: 45 }
]
Would that accomplish what you are after?

Increment nested field value in mongodb if exists or create nested fields

I need to increment the multi level nested field value if it exists or create the complete nested field Object structure.
Structure of my document
Doc1 {
_id:ObjectId(),
myField: {
nested:{
x: 5,
y: 10,
z: 20
}
}
}
Goal Explanation: I need a way to write a single query:
If myField exists: Increment the value of my nested field
myField.nested.x by 10.
If myField does not exists: Create the below field with initial values same as given in the Doc1.
Attempt and explanation:
db.collection('collectionName').findOneAndUpdate(
{_id:"userId","myField" : { $exists : true }},
{$inc:{'myField.nested.x':10}
})
This way, I can increment the nested field if it exists but in case of non existence I cannot set myField as same as Doc1.
Although, I can use another query after response in my NodeJs callback to achieve my required behaviour. But I need some elegant solution in a single query.
I am using MongoDB version 4.0.4, Thanks in Advance.
Try this query
If the field does not exist, $inc creates the field and sets the field to the specified value.
db.collection('collectionName').findOneAndUpdate({_id:"userId"},
{$inc:{'myField.nested.x':10}
})

Getting index of the resultset

Is there a way to get the index of the results within an aql query?
Something like
FOR user IN Users sort user.age DESC RETURN {id:user._id, order:{index?}}
If you want to enumerate the result set and store these numbers in an attribute order, then this is possible with the following AQL query:
LET sorted_ids = (
FOR user IN Users
SORT user.age DESC
RETURN user._key
)
FOR i IN 0..LENGTH(sorted_ids)-1
UPDATE sorted_ids[i] WITH { order: i+1 } IN Users
RETURN NEW
A subquery is used to sort users by age and return an array of document keys. Then a loop over a numeric range from the first to the last index of the that array is used to iterate over its elements, which gives you the desired order value (minus 1) as variable i. The current array element is a document key, which is used to update the user document with an order attribute.
Above query can be useful for a one-off computation of an order attribute. If your data changes a lot, then it will quickly become stale however, and you may want to move this to the client-side.
For a related discussion see AQL: Counter / enumerator
If I understand your question correctly - and feel free to correct me, this is what you're looking for:
FOR user IN Users
SORT user.age DESC
RETURN {
id: user._id,
order: user._key
}
The _key is the primary key in ArangoDB.
If however, you're looking for example data entered (in chronological order) then you will have to have to set the key on your inserts and/or create a date / time object and filter using that.
Edit:
Upon doing some research, I believe this link might be of use to you for AI the keys: https://www.arangodb.com/2013/03/auto-increment-values-in-arangodb/

rethinkdb: How to orderby two attributes and use between on one of those

we have a rethinkdb with tickets in it. They have a createdAt with a timestamp in milliseconds and a priority attribute.
e.g.
{
createdAt: 12345,
priority: 4,
owner: "Bob",
description: "test",
status: "new"
}
rethinkdb.db('dev').table(tableId)
.orderBy({index: 'createdAt'})
.between(timeFrom,timeTo)
.filter(filter)
.skip(paginator).limit(20).run(this.connection);
We now have the following problem. We want a query that does two orderBy ... the first would be orderBy "priority" and also by "createdAt". So given the filter and the timespan it should return the tickets with the highest priority and inside the priority the oldest should be on top.
We tried to build a compound index with priority and createdAt. That did work, but the .between didn't work as intended on this index.
rethinkdb.db('dev').table('tickets').indexCreate('prioAndCreatedAt' [rethinkdb.row('priority'), rethinkdb.row('createdAt')]).run(this.connection)
with the query:
rethinkdb.db('dev').table(tableId)
.orderBy({index: 'prioAndCreatedAt'})
.between([rethinkdb.minval, timeFrom],[rethinkdb.maxval , timeTo])
.filter(filter)
.skip(paginator).limit(20).run(this.connection);
In our minds that should order by priority first and then by createdAt and with the .between we would ignore the priority (because of the .minval and .maxval) and the just get all the tickets between timeFrom and timeTo.
Buuuut also tickets where createdAt was smaller than timeFrom were returned. So this doesn't work like we planned.
Its like this "problem": RethinkDB Compound Index Weirdness Using Between
But we cant figure out another way for this.
Since
it should return the tickets with the highest priority and inside the priority the oldest should be on top
Is there a reason not to simply use 2 orderBy?
r.db('dev').table('tickets')
.between(timeFrom, timeTo, {index: 'createdAt'})
.orderBy('createdAt')
.orderBy(r.desc('priority'))
Then you can pipe your filter/paginator on this selection. It will provide tickets within the correct range, ordered by descending priority then by ascending creation date (the way SQL considers with ORDER BY priority, createdAt). And it avoids the (documented) behavior of between with compound indexes.
I think your query only supposed to work when the createdAt is also the primary key. Is it? Otherwise you can create an additional index on the createdAt field and use it in your between statement:
r.db('dev').table('tickets').indexCreate('createdAt', r.row('createdAt'))
r.db...
.between([rethinkdb.minval, timeFrom],[rethinkdb.maxval , timeTo], {index:"createdAt"})
you can also use multiple orderby as described by #Stock Overflaw, but it only works correctly if you put both conditions into one orderBy statement:
r.db('dev').table('tickets')
.between(timeFrom, timeTo, {index: 'createdAt'})
.orderBy(r.asc('createdAt'), r.asc('priority'))
keep in mind that this is less performant, because it doesn't use the indexes.

Sort by a array element (document) field - MongoDB/Mongoose

This is the concerned part from the schema
`
var CandidateSchema = new Schema({
calculateScore:[{
jobname:{type:Schema.ObjectId,ref: 'Job'}
,Score:{type:Number,default:0}
}]
})
`
A candidate can apply to multiple jobs and get scored differently for different jobs. I want to sort the candidates depending on the specific job's Score. Any Idea?
Assuming the variable objectId holds the ObjectId of the referred Job, you can aggregate the records to get the records sorted by the score of that particular Job.
Since the stage operator $project does not support the $elemeMatch operation, we cannot use it to directly get the Job sub document that we want and sort based on it.
$project a separate field named temp_score to have a copy of the original calculateScore array.
$redact other sub documents from calculateScore other than whose jobname contains the
id we are looking for. Now calculateScore will contain only one
element in the array, i.e the element whose jobname is the id
that we have specified.
Based on this sub document's score sort the records in descending
order.
Once the sorting is done, project our original calculatescore
field, which is in temp_score.
The code:
var objectId = ObjectId("ObjectId of the referred Job"); // Needs to be fetched
// from the Job collection.
model.aggregate(
{$project:{"temp_score":{"level":{$literal:1},
"calculateScore":"$calculateScore"},
"calculateScore":1}},
{$redact:{$cond:[
{$and:[
{$eq:[{$ifNull:["$jobname",objectId]},objectId]},
{$ne:["$level",1]}
]
},
"$$DESCEND",
{$cond:[{$eq:["$level",1]},
"$$KEEP","$$PRUNE"]}]}},
{$sort:{"calculateScore.Score":-1}},
{$project:{"_id":1,
"calculateScore":"$temp_score.calculateScore"}},
function(err,res)
{
console.log(res);
}
);

Resources