ArangoDB slow query even with index - arangodb

I'm using ArangoDB version 3.9, I have a document-based collection named myCollection2. Each document in this collection has a 'likes' attribute, which is holding a float numeric value. Documents are dummy data now, where I used the following query to create it.
FOR i IN 1..998700000
INSERT {
title: RANDOM_TOKEN(32),
description: RANDOM_TOKEN(32),
by: RANDOM_TOKEN(32),
url: CONCAT(CONCAT("http://www.",RANDOM_TOKEN(32)),".com"),
tags: [RANDOM_TOKEN(10), RANDOM_TOKEN(10), RANDOM_TOKEN(10)],
likes: FLOOR(RAND() * (51)),
comments: [
{
user:RANDOM_TOKEN(24),
message: RANDOM_TOKEN(100),
dateCreated: DATE_ISO8601(946681801000 + FLOOR(RAND() * 1000000000000)),
likes: FLOOR(RAND() * (51))
}
]
} IN myCollection2
Then I added a persistent index to the collection on the likes attribute and used the query below to find documents with some value.
FOR s in myCollection2
FILTER s.likes == 29.130405590990936
return s
knowing that the value 29.130405590990936 actually exists in some documents, the above query is taking about ~8 ms, which is great. However, when using some other value that doesn't actually exist, say for example 10, the query takes almost about 1 hour, which is crazy. Am I missing something here?

Related

How to create a multi-threaded insert query in ArangoDB AQL

I installed ArangoDB Enterprise Edition for evaluation in single server mode, where I'm using the following AQL script to insert a billion random documents into my collection for testing purposes.
FOR i IN 1..1000000000
INSERT {
title: RANDOM_TOKEN(32),
description: RANDOM_TOKEN(32),
by: RANDOM_TOKEN(32),
url: CONCAT(CONCAT("http://www.",RANDOM_TOKEN(32)),".com"),
tags: [RANDOM_TOKEN(10), RANDOM_TOKEN(10), RANDOM_TOKEN(10)],
likes: FLOOR(RAND() * (51)),
comments: [
{
user:RANDOM_TOKEN(24),
message: RANDOM_TOKEN(100),
dateCreated: DATE_ISO8601(946681801000 + FLOOR(RAND() * 1000000000000)),
likes: FLOOR(RAND() * (51))
}
]
} IN myCollection2
With some simple calculations, it seems like this query would take about 12 hours on my device. However, I have no longer than 3 hours from the moment I installed the server for evaluation before my license expires. So I'm wondering if there is any way to accelerate this query or even run it in multiple threads.

MongoDB update query in subarray

An update in the array of objects inside another array of objects.
mongodb field that I'm working on:
otherFields: values,
tasks: [
{
_id: mongodb.objectID(),
title: string,
items:[{
_id: mongodb.objectID(),
title: string,
completed: boolean //field need to be update.
}]
},
{}...
],
otherFields: value
sample mongodb document
I need to find the document using the task_id and the item_id and update a completed field in item of a task. Using the mongoose findOneAndUpdate method
const path = "tasks.$.items." + item_id + "completed";
collectionName.findOneAndUpdate(
{ _id: req.user._id, "tasks._id": taskID },
{ $set: { [path]: true }});
The above query doesn't work!!!
There is no need to use multiple query conditions, since you'd like to update a specific item that has an unique ID. Therefore you could use something along the lines:
collectionName.findOneAndUpdate(
{ 'tasks.items._id': itemID },
...
);
Keep in mind this structure is far away from optimized as it would basically look through the entire database...
Also now that I think of it, you'd also have issue with the update, as there are two nested arrays within the document. Read more here: How to Update Multiple Array Elements in mongodb

index new document and get the indexed document in the same query

it is possible to index a new document and return him after he succeeded indexed?
I tried to take the _id that returns but I'm using 2 queries and the index action takes some time and the second query not find the _id so it not always doing it perfectly.
this is the query that index the document:
const query = await elsaticClient.index({
routing: "dasdsad34_d",
index: "milan",
body: {
text: "san siro",
user: {
user_id: "3",
username: "maldini",
},
tags: ["Forza Milan","grande milan"],
publish_date: new Date(),
likes: [],
users_tags: [1,5],
type: {
name: "comment",
parent: "dasdsad34_d",
},
},
});
No, its not possible with default behavior. By default, Elasticsearch has only a near real time support. Its default refresh interval is 1 second as index refresh is deemed as a costly operation.
In order to overcome this, in your indexing operation, you can add refresh=true. You can get further details from below links.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html
Please note that this is NOT a recommended option as this comes with huge overhead. Only use this, if your inserts into this index in question are having a very very low number.
Recommended way is to use refresh=wait_for on your indexing operation. But this has a downside of waiting for a second for the natural refresh to complete. So if you have default refresh interval set to 1 and are okay with this as an acceptable trade off, then this is the way to go.
However, if you have a higher refresh interval set, then the wait time for the indexing operation will be as high the refresh interval. So choose your option carefully.

Cloudant Query using $or operator gives warning - "no matching index found, create an index to optimize query time" though indexing is present?

Cloudant Query using $or operator gives warning:
“no matching index found, create an index to optimize query time”
though indexing is present? The sample information is shown below:
Index USED:
db.index({
ddoc: "document_id",
type: "json",
index: {
fields: ["emailid", "mobileno"]
}
});
Query USED:
selector: {
$or: [
{
emailid: email_id
},
{
mobileno: mobile
}
]
}
You can find an issue in the couchdb project discussing something similar. "$or operator slow"
In the issue they conclude that the same field has to be present in both sides of the $or in order to select an index.
Your case doesn't meet this condition so the query will fall back into the _all_docs index (full scan of the db contents)

Getting element in array that matched text index query

I have the following Schema set up on mongoose
{
_id:
store:
offers:[{
name:
price:
}]
}
I decided to indexed offers.name as follows
uniSchema.index({'offers.name':'text'});
and now I'm trying to do searches on that index
Stores.find({$text:{$search:VALUE}}, callback);
But this way whenever there's a hit on the search the whole store is listed, and I'm unable to figure out from which offer the match came from.
Is there a way to do this with indexes on mongoose? Figuring out which array element matched the query?
I'm not sure that's possible with a $text index.
With a straight query you can use a projection to do the same:
> db.array_find.find({ "offers.name": "firstname"},{"offers.$" : 1} )
But the text query doesn't reference the array directly so offers.name is not able to be used in the projection.
> db.array_find.find({ $text: { $search: "firstname"} },{"offers.$" : 1} )
error: {
"$err" : "Can't canonicalize query: BadValue Positional projection 'offer.$' does not match the query document.",
"code" : 17287
}
The problem with attempting any type of post processing of the array from a result document is your not going to be using mongo's text indexing but some approximation of it.
You may need a different data structure
_id:
store: 2
offer:
name: this is a single offer
price: 4
_id:
store: 2
offer:
name: this is the next offer
price: 5

Resources