Get an object in collection by object index - node.js

I have a collection of objects
[{name: Peter}, {name: Evan}, {name: Michael}];
I and i want to get an object for example {name: Evan} by his index(1).
How can i pull this out?
I tried get All objects by find() and then get an object with index but it's not a good idea in terms of speed.

There are a few notable aspects of this question. In the comments you clarify:
yes they are different documents. By the index I mean const users = await User.find(); users[1] // {name: "Evan"}
Probably what you are looking to do here is something along the lines of:
const users = await User.find().skip(1).limit(1);
This will return a cursor that will contain just the single document that you are looking for.
Keep in mind, however, that the without providing a sort to the operation the database is free to return the results in any order. So the "index" (position) is not guaranteed to be consistent without the sort clause.
I tried get All objects by find() and then get an object with index but it's not a good idea in terms of speed.
In general, your current approach requires that the database iterate through all of the items being skipped which can be slow. Limiting the results at least reduces the amount of network activity that is required. Depending on what you are trying to achieve, you could consider setting a smaller batch size (and iterating the cursor) or using range queries as outlined on that page.

Related

AQL: collection not found. non-blocking query

If I run this query:
FOR i IN [
my_collection[*].my_prop,
my_other_collection[*].my_prop,
]
RETURN i
I get this error:
Query: AQL: collection not found: my_other_collection (while parsing)
It's true that 'my_other_collection' may not exist, but I still want the result from 'my_collection'.
How can I make this error non-blocking ?
A missing collection will cause an error and this can not be ignored or suppressed. There is also no concept of late-bound collections which would allow you to evaluate a string as collection reference at runtime. In short: this is not supported.
The question is why you would want to use such a pattern in the first place. I assume that both collections exists, then it will materialize the full array before returning anything, which is presumably memory intensive.
It would be much better to either keep the documents of both collections in a single collection (you may add an extra attribute type to distinguish between them) or to use an ArangoSearch view, so that you can search indexes attributes across collections.
Beyond the two methods already mentioned in the previous answer (Arango search and single collection), you could do this in JavaScript , probably inside Foxx:
Check if collections exist with db_collection(collection-name)
Then build the query string using aql fragments and use union to merge the fragments to pull results from the different collections.
Note that ,if the collections are large, you probably will want to filter that results instead of just pulling all the documents.

How to create an iterable object from a WikipediaPage object?

I'm trying to change a document ranker from a project that needs a model that takes a huge amount of memory to be trained to a simpler one based on wikipedia library.
From queries, a list of query that contains only one query at the moment
queries : ['What is the population of Toulon']
I would like to change the way it was ranked to the closest doc using wikipedia.page() function. Yet for the good functioning of this ranker I know that I need an interable object at the end. Indeed I tried
# Rank documents for queries.
if len(queries) == 1:
# ranked = [self.ranker.closest_docs(queries[0], k=n_docs)]
ranked = [wikipedia.page(queries),wikipedia.page(queries)] # which is stupid I know, but don't know how to do it differently yet.
all_docids, all_doc_scores = zip(*ranked)
and got an all_docids, all_doc_scores = zip(*ranked) TypeError: zip argument #1 must support iteration error.
Until now I have two wikipedia pages :
<WikipediaPage 'Toulon'> <WikipediaPage 'Toulon'>

Redis sort by likes inside list of hashes?

Sorry if my terminology is wrong, but I have a list of feed hashes.
So ie feed:1, feed:2, feed:3 inside those hashes I have some keys and values. ie inside feed:1 I have likes:300.
I have a list called feeds:fid which lists all the feed ids. So if I want to grab all the feeds I can just do a method like this in my node.js
module.getObjects = function(keys, callback) {
helpers.multiKeys(redisClient, 'hgetall', keys, callback);
};
I am not sure how I can sort them so I get all feed items sorted by most liked? Ideally I just want to get the "hottest feed" items.
I am curious how I can go about this in redis?
This would be difficult to do in your current set of things.
You can however use a single sorted set to store likes along with feed ids.
So,whenever a like happens, you store the like in your hash, but also do an
ZINCRBY operation on the same feed key in the sorted set.
-- At any point of time the sorted set will contain the feed ids as keys, and number of likes on the key as the score.
-- To get top or hottest feeds, you just do a ZREVRANGE operation, which will give you top N items with maximum likes.
-- To keep both your operations atomic, you shall use redis transactions to have data always synched between hash and sorted set.

How can I query MongoDB in a blocking way ? (not sure if this is the right way though)

Let me introduce my problem : I'm currently developing a web app using Node.js, Express and MongoDB (mongoose driver), and I would like, when the user requests /save, to generate an unique ID (made of random letters and digits) in order to redirect the request to /save/id.
Therefore I want my /save route to query MongoDB for a list of existing IDs, and generate a random ID which is not present in the list.
Any idea on how to do that ?
I think both the terms and their relation to each other are a bit unclear her. We first have to see a more precise definition of what the _id field and its default field type of ObjectId are.
The _id field
For each document, there has to be a mandatory, unique _id. The only constraint of that field is that its values have to be unique. In case this field is not explicitly contained in the incoming data, MongoDB will create one with an ObjectId. However, the _id can hold any value, even subdocuments, as long as it is unique. The following is a perfectly valid documents and might even make sense in some use cases:
{
_id: {
name: "Roger",
surname: "Rabbit"
}
}
To enforce uniqueness, a unique index is created over _id.
ObjectId
ObjectId is the default field type of _id. It is an alphanumerical, truncated representation of some values, namely
seconds since epoch
machine identifier, which itself is constructed from various values, iirc (though I can not remember how)
the ID of the process generating the ObjectId
a counter whith random initialisation
The answer to your question
Contrary to what has been told many times and here again, there is absolutely now drawback in using a custom value for the _id field instead of the default ObjectId. Quite the contrary: If you have a field which is (semantically) unique and you query by it, that is an awesome candidate, since most likely you will need an index on this, anyway. Let's take some easy example: the social security number (SSN) in an insurance company application. Compare the next two documents:
{
_id: <SomeObjectId>,
SSN: 12345678,
name: {
first: "Roger",
last: "Rabbit"
}
}
In order to query for the values you need, you need at least 3 indices here: The default on on _id, a unique one on SSN and a multikey one on name. For any given query, only one index can be used. This might become a problem in aggregations (which the math guys in your insurance will do a lot). Additionally you have the pretty useless ObjectId as additional data, which can become costly if you have several million datasets.
Now compare to this document:
{
_id: {
SSN: 12345678,
firstname: "Roger",
lastname: "Rabbit"
}
}
First of all: we hold the same information without useless data and we have only a single index, which can easily translate to Gigabytes of free RAM. Second of all, we added some implicit semantics, because the uniqueness of an entry is enforced over SSN and the complete name, which might become handy if a name is changed due to a marriage. Wether or not this makes sense is a design decision. I decided to use the full name as part of the _id mainly for the sake of this example.
So it is perfectly valid to create an _id yourself as long as you ensure it is unique. And here is the problem. Creating something unique isn't that easy. For the above example, it might be a hash of _id's values (never use clear text SSNs!). This could be done like this (kudos to #maerics):
var crypto = require('crypto'),
shasum = crypto.createHash('sha1');
shasum.update("Roger");
shasum.update("Rabbit");
shasum.update("12345678");
console.log(shasum.digest('hex'));
// Something like "0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"
You could either add this hash to the _id field , though it might make sense to have a separate field, increasing our index count to two.
Please note that this hashing procedure is only necessary if you would otherwise disclose sensitive data in the URLs!
However, just using ObjectId for the _id because it is a convenient way to create a unique value is lazy system design. As shown, smart system design may save you Gigabytes in RAM, which easily translates into multiple servers for large databases.

Grouping using Map and Reduce

I have some documents with a "status" field of "Green", "Red", "Amber".
I'm sure it's possible to use MapReduce to produce a grouped response containing three keys (one for each status), each with a value containing an array of all the documents with that key. However, I'm struggling on how to use re(reduce) functions.
Map function:
function(doc) {
emit(doc.status, doc);
}
Reduce function: ???
This is not a problem that reduce is intended to solve; reduce in CouchDB is for aggregation.
If I understand you correctly, you want this;
Map:
function(doc) {
for (var i in doc.status) {
emit(doc.status[i], null);
}
}
You can then find all docs of status Green with;
/_design/foo/_view/bar?key="Green"&include_docs=true
This will return a list of all docs with that status. If you wish to find docs of more than one status in a single query, then use http POST with a body of this form;
{"keys":["Green", "Red"]}
HTH,
B.
Generally speaking, you will not use a reduce function to obtain your list of documents. A reduce is meant to take a list, and reduce it to a single value. In fact, there is an upper limit to the size of a reduce value anyways, and using entire documents will trigger a reduce_overflow error. Examples of reduces are counts, sums, averages, etc. Stick with the map query, and you will have your values collated and sorted by the status value.
On another, possibly unrelated note, I would not emit the document with your view. You can just use the include_docs view query parameter, and achieve the same effect, while saving disk-space in the process. The trade-off is that internally the doc will have to be retrieved one-by-one. (but since they're indexed already by _id anyways, it's usually a negligible difference.

Resources