AQL: collection not found. non-blocking query - arangodb

If I run this query:
FOR i IN [
my_collection[*].my_prop,
my_other_collection[*].my_prop,
]
RETURN i
I get this error:
Query: AQL: collection not found: my_other_collection (while parsing)
It's true that 'my_other_collection' may not exist, but I still want the result from 'my_collection'.
How can I make this error non-blocking ?

A missing collection will cause an error and this can not be ignored or suppressed. There is also no concept of late-bound collections which would allow you to evaluate a string as collection reference at runtime. In short: this is not supported.
The question is why you would want to use such a pattern in the first place. I assume that both collections exists, then it will materialize the full array before returning anything, which is presumably memory intensive.
It would be much better to either keep the documents of both collections in a single collection (you may add an extra attribute type to distinguish between them) or to use an ArangoSearch view, so that you can search indexes attributes across collections.

Beyond the two methods already mentioned in the previous answer (Arango search and single collection), you could do this in JavaScript , probably inside Foxx:
Check if collections exist with db_collection(collection-name)
Then build the query string using aql fragments and use union to merge the fragments to pull results from the different collections.
Note that ,if the collections are large, you probably will want to filter that results instead of just pulling all the documents.

Related

Get an object in collection by object index

I have a collection of objects
[{name: Peter}, {name: Evan}, {name: Michael}];
I and i want to get an object for example {name: Evan} by his index(1).
How can i pull this out?
I tried get All objects by find() and then get an object with index but it's not a good idea in terms of speed.
There are a few notable aspects of this question. In the comments you clarify:
yes they are different documents. By the index I mean const users = await User.find(); users[1] // {name: "Evan"}
Probably what you are looking to do here is something along the lines of:
const users = await User.find().skip(1).limit(1);
This will return a cursor that will contain just the single document that you are looking for.
Keep in mind, however, that the without providing a sort to the operation the database is free to return the results in any order. So the "index" (position) is not guaranteed to be consistent without the sort clause.
I tried get All objects by find() and then get an object with index but it's not a good idea in terms of speed.
In general, your current approach requires that the database iterate through all of the items being skipped which can be slow. Limiting the results at least reduces the amount of network activity that is required. Depending on what you are trying to achieve, you could consider setting a smaller batch size (and iterating the cursor) or using range queries as outlined on that page.

Mongo DB like search with count is very slow on 50 million collection data

In my application, I have a collection of 50 million data. I am using like search and then count the results on a particular field(i.e Patientfirstname). I also created an index on the Patientfirstname field it improved the performance but still it is taking a lot of time.
db.patients.find({"Patientfirstname":{"$regex":"Testuser"}}).count() without index 40 sec
db.patients.find({"Patientfirstname":{"$regex":"Testuser"}}).count() after adding index on the Patientfirstname field 31 sec
db.patients.find({"Patientfirstname":{"$regex":"Testuser"}}).count()
I tried with a different approach (aggregate) but still, response is very slow
db.patients.aggregate.([{$match:{"Patientfirstname":{"$regex":"Testuser"}}},
{$project:{"Patientfirstname":1,"_id":1}},
{$group : {_id:"$Patientfirstname", count:{$sum:1}}},
{$sort:{"count":-1}} ])
this query also takes the same to time fetch the results 31 sec
another approach was tried but the results are not correct
select only the field from the entire collection and then apply like search and count and result.
db.patients.find({},{Patientfirstname:1,_id:1}).count({"Patientfirstname":{"$regex":"Testuser"}})
applying a filter in the count is not working, entire collection count is displayed
Please help in this query to fetch results faster.Thanks in advance
So here is the deal:
As rightly pointed in the comments, $regex is an operator that would not perform well with or without indexes. Here is the reason why:
Queries without indexes are slow because they executed using COLLSCAN - which is essentially iteration of the whole 50 Million documents on the disk one-by-one, filtering data and returning only the ones that match. Disks being an inherently slow piece of hardware does not help the situation either.
Now, When indexed - MongoDB creates a B-Tree in the RAM. And $regex operator being not very selective in nature, it forces a complete Tree Scan (as compared to a reduced / partial tree scan in case of equalities or ranges) in the index b-tree - which is as bad as a Collection Scan itself. The only reason you get a benefit on 9 seconds is because this Tree Scan occurs in the RAM and not the disk.
Having said that, there are a few alternatives to it:
Optimize your $regex. From the MongoDB Documentation itself:
For case sensitive regular expression queries, if an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a "prefix expression", which means that all potential matches start with the same string. This allows MongoDB to construct a "range" from that prefix and only match against those values from the index that fall within that range.
A regular expression is a "prefix expression" if it starts with a caret (^) or a left anchor (\A), followed by a string of simple symbols. For example, the regex /^abc.*/ will be optimized by matching only against the values from the index that start with abc.
Additionally, while /^a/, /^a./, and /^a.$/ match equivalent strings, they have different performance characteristics. All of these expressions use an index if an appropriate index exists; however, /^a./, and /^a.$/ are slower. /^a/ can stop scanning after matching the prefix.
Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes.
Create a Text Index - This would tokenize your text string and enable faster text based searches
If you are deployed on MongoDB Atlas - Then you can use Atlas Search which is a Lucene based Text Search Engine (Works almost like elasticsearch on steroids). This offers significantly greater performance and functionalities like fuzzy text search, text automcomplete etc.

Core Data: storing ordered values in a one-to-many relationship

I'm building a workout app that has an entity called Workout and another one called Exercise.
A workout can contain multiple exercises (thus a one-to-many relationship). I want to show the users of my app the exercises contained in a workout but in an ordered way (it's not the same to start with strength exercises as with the cardio ones).
Apparently, when establishing this kind of relationship in Core Data, I need to use an NSSet, because if I try to use for example an Array where its elements are ordered, I get the following error:
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: 'Unacceptable type of value for to-many relationship: property = "consistsOf"; desired type = NSSet; given type = __NSArray0; value = (
).'
I have tried to check the "ordered" checkmark in my model, but then I get an error saying "Workout.consistsOf must not be ordered".
I have also tried to use an NSDictionary whose keys would be the position and the values would be the exercises themselves, but I'm getting the same error as above.
How can I show the users the exercises that a workout consists of in an ordered way?
Thanks a lot in advance!
P.S.: Here's a screenshot of the properties of my model.
Ordered relationships use NSOrderedSet, but CloudKit doesn't support ordered sets, so you can't use an ordered relationship and CloudKit in the same data model.
To keep an order, you need to have some property on Exercise that would indicate the order. This could be as simple as an integer property called something like index. You'd sort the result based on the index value. If there's something else that also indicates order-- like a date, maybe?-- use that instead of adding a new property.

Why is this ArangoDB query too slow?

I am new ArangoDB user and I am using the following query
FOR i IN meteo
FILTER
i.`POM` == "Maxial"
&& TO_NUMBER(i.`TMP`) < 4.2
&& DATE_TIMESTAMP(i.`DTM`) > DATE_TIMESTAMP("2014-12-10")
&& DATE_TIMESTAMP(i.`DTM`) < DATE_TIMESTAMP("2014-12-15")
RETURN
i.`TMP`
on a 2 million document collection. It has an index on the three fields that are filtered. It takes aprox. 9 secs on the Web Interface.
Is it possible to run it faster?
Thank you
Hugo
I have no access to the underlying data and data distribution nor the exact index definitions, so I can only give rather general advice:
Use the explain() command in order to see if the query makes use of indexes, and if yes, which.
If explain() shows that no index is used, check if the attributes contained in the query's FILTER conditions are actually indexed. There is the db.<collection>.getIndexes() command to check which attributes are indexed.
If indexes are present but not used by query, the indexes may have the wrong type. For example, a hash index will only be used for equality comparisons (i.e. ==) but not for other comparison types (<, <=, >, >= etc.). A hash index will only be used if all the indexed attributes are used in the query's FILTER conditions. A skiplist index will only be used if at least its first attribute is used is used in a FILTER condition. If further of the skiplist index attributes are specified in the query (from left-to-right), they may also be used and allow to filter more documents.
Only a single index will be picked when scanning a collection. Having multiple, separate indexes on "POM", "TMP", and "DTM" won't help this query because it will only use one of them per collection that it iterates over. Instead, I suggest trying to put multiple attributes into an index if the query could benefit from this.
The more selective an index is, the better. For example, an index on a single attribute may filter a lot of documents, but a combined index on multiple attributes may filter even more. For this particular query, a skiplist index on [ "POM", "DTM" ] may be the right choice (in combination with 6.)
The only attribute for which the optimizer may consider an index lookup in the given original query is the "POM" attribute. The reason is that the other attributes are used inside function calls (i.e. TO_NUMBER(), DATE_TIMESTAMP()). In general, indexes will not be used for attributes which are used inside functions (e.g. for TO_NUMBER(i.tmp) < 4.2 no index will be used. Same for DATE_TIMESTAMP(i.DTM) > DATE_TIMESTAMP("2014-12-10"). Modifying the conditions so the indexed attributes are directly compared to some constant or a one-time calculated value can enable more candidate indexes. If possible, try to rewrite the conditions so that only the indexed attributes are present on the one side of the comparison. For this particular query, it would be better to use i.DTM > "2014-12-10" instead of DATE_TIMESTAMP(i.DTM) > DATE_TIMESTAMP("2014-12-10").

Optimized way of negation of values in solr?

I am trying to search the results for the negation of particular id in solr. It have found that this can be done in two ways:
(1) fq=userid:(-750376)
(2) fq=-userid:750376
Both are working fine and both are giving correct results. But I can one tell me which is the better way of either two. Which one should I prefer?
You can find out what query the fq parameter's value is parsed into by turning on debugQuery (add the parameter debug=true). Then, in the Solr response, there should be an entry "parsed_filter_queries" under "debug", and the entry should show the string representation of the parsed filter query (or queries) being used.
In your case, both forms of fq should be parsed into the same query, i.e. a boolean query with a single clause stating that the term userid:750376 must not occur. Therefore, which form you use does not matter, at least in terms of correctness or performance.
For us the query looks little different. But for Solr, both are same.
First, Solr parse the query provided by you. Then search for the result. In your case, for both the queries Solr's "parsed_filter_queries" is fq=-userid:750376 only.
fq=userid:(-750376)
fq=-userid:750376
You can check this by enabling debugQuery from Admin window. You can also pass debugQuery=true with query. Hope this will help.

Resources