I am looking for a poor perfomance solution to sort my couchdb view by value (because large data). I am using on my NodeJS Application the "nano" package to get the database/view connection.
I created a CouchDB View Map Function configure a key value pair and the Reduce to _count:
function (doc) {
emit(doc.msg, 1);
}
To get my View i am using:
alice.view('VIEWNAME', 'INDEXNAME', {'group': true).then((body) => {
body.rows.forEach((doc) => {
console.log(doc.key + " " + doc.value);
}
}
The View returns for example "Hello" as doc.key and "155" as doc.value. So i have 155 Documents with the Key Hello.
Now i want to sort my View DESC from Value:
Hello 155
Foo 140
Bar 100
But the only Sorted Version of my View is by Key i get my View sorted by Key ASC.
I tried serval solution on NodeJS side but i dont want to lose much perfomance.
You cannot do this. Views, by nature, are sorted by key only. The solution is to create a second view that sorts by a different key.
I build a work around on NodeJS Side:
body.rows.sort(function (a, b) {
return parseInt(b.value,10) - parseInt(a.value,10);
});
Im not 100 percent satisfied, but the solution helped me. I still cannot estimate how much perfomanace the sorting takes on the NodeJS side
Related
So I got a small database, It's not going to grow much more and I'm trying to get one document from the db in an API that I implemented in python so that with a given document Id I retrieve the document in the db. However, I find it a little hard to put the user to write a random number from the db. All I require is a function that modifies each document by setting an id field and to Auto-Increment. As I said, it's not going to grow that much and the performance isn't really an issue here.
So far what I've been able to do is this:
var i = 0
db.MyCollection.update({},
{$set : {"new_field":1}},
{upsert:false,
multi:true}
i ++;),
I achieved to set an id field but it sets the same number to each document (the count of every document) So let's say that if the db has 10 docs, it'll set the Id to 10.
Find-and-modify operation returns the document updated (before or after the update depending on returnDocument setting). You can use this with $inc to implement a counter. Ruby example where c is a collection:
irb(main):005:0> c['foo'].insert_one(counter:true,count:1)
=> #<Mongo::Operation::Insert::Result:0x8040 documents=[{"n"=>1, "opTime"=>{"ts"=>#<BSON::Timestamp:0x00005609f260b7e0 #seconds=1594961771, #increment=2>, "t"=>1}, "electionId"=>BSON::ObjectId('7fffffff0000000000000001'), "ok"=>1.0, "$clusterTime"=>{"clusterTime"=>#<BSON::Timestamp:0x00005609f260b538 #seconds=1594961771, #increment=2>, "signature"=>{"hash"=><BSON::Binary:0x8060 type=generic data=0x0000000000000000...>, "keyId"=>0}}, "operationTime"=>#<BSON::Timestamp:0x00005609f260b290 #seconds=1594961771, #increment=2>}]>
irb(main):011:0> c['foo'].find_one_and_update({counter:true},{'$inc':{count:1}})
=> {"_id"=>BSON::ObjectId('5f112f6b2c97a6281f63f575'), "counter"=>true, "count"=>1}
irb(main):012:0> c['foo'].find_one_and_update({counter:true},{'$inc':{count:1}})
=> {"_id"=>BSON::ObjectId('5f112f6b2c97a6281f63f575'), "counter"=>true, "count"=>2}
irb(main):013:0> c['foo'].find_one_and_update({counter:true},{'$inc':{count:1}})
=> {"_id"=>BSON::ObjectId('5f112f6b2c97a6281f63f575'), "counter"=>true, "count"=>3}
irb(main):014:0> c['foo'].find_one_and_update({counter:true},{'$inc':{count:1}})
=> {"_id"=>BSON::ObjectId('5f112f6b2c97a6281f63f575'), "counter"=>true, "count"=>4}
Why not just use this logic? Instead of updating all via one query, just launch multiple queries one by one? Mongo will do it pretty fast, even if you have >1M docs in database (according to your phrase: I got a small database) because pre-builded index on _id field.
this is a javasript code, but I guess, you'll understand the logic of it
let all_documents = db.MyCollection.find({});
for (let i = 0; i < all_documents.length; i++) {
db.MyCollection.update({_id: all_documents[i]._id }, {$set : {"new_field": i}}, {upsert:false})
}
I am brand new to noSQL, couchDB, and mapreduce and need some help.
I have the same question discussed here {How to use reduce in Fauxton} but do not understand the answer:(.
I have a working map function:
function (foo) {
if(foo.type == "blog post");
emit(foo)
}
which returns 11 individual documents. I want to modify this to return foo.type along with a count of 1.
I have tried:
function (doc) {
if(doc.type == "blog post");
return count(doc)
}
and "_count" from the Reduce panel, but clearly am doing something wrong as the View does not return anything.
Thanks in advance for any assistance or guidance!
In Fauxton, the Reduce step is kind of awkward and unintuitive to find.
Select _count in the "Reduce (optional)" popup below where you type
in your Map.
Select "Save Document and then Build Index". That will display your
map results.
Find the "Options" button at the top next to a gears icon. If you see a
green band instead, close the green band with the X.
Select Options, then the "Reduce" check-circle. Select Run Query.
Map
So when you build a map function, you are literally creating a dictionnary or map which are key:value data structures.
Your map function should emit keys that you will query. You can also emit a value but if you intend to simply get the associated document, you don't have to emit any values. Why? Because there is a query parameter that can be used to return the document associated (?include_docs=true).
Reduce
Then, you can have reduce function which will be called for every result with the same keys. Every result with the same key will be processed through your reduce function to reduce the value.
Corrected example
So in your case, you want to map document the document per type I suppose.
You could create a function that emit documents that have the type property.
function(doc){
if(doc.type)
emit(doc.type);
}
If you query this view, you will see that the keys of each rows will be the type of the document. If you choose the _count reduce function, you should have the number of document per types.
When querying the view, you have to specify : group=true&reduce=true
Also, you can get all the document of type blog postby querying with those parameters : ?key="blog post"
I have the following data(example) -
1 - "Value1A"
1 - "Value1B"
1 - "Value1C"
2 - "Value2A"
2 - "Value2B"
I'm using Multimaps for the above data, such that the key 1, has 3 values(Value1A, Value1B, Value1C) and key 2 has 2 values(Value2A, Value2B).
When I try to retrieve all the values for a given key using the get function, it works. But I want to get the key given the value. i.e. if I have "Value1C", I want to use this to get its key 1, from the Multimap. Is this possible, if so how and if not what other than Multimap can I use to achieve this result.
Thanks for the help
https://www.npmjs.com/package/multimap
It is not possible to do this with a single operation, You will need to choose beetween use some extra memory or consume CPU resource.
Use more memory
In this case you need to store the data in a reverse mapping. So you will have another map to store as "Value1C" -> 1. This solution can cause consistency issues, since all the operations will need to be updated in both map. The original one and the reverse one.
The example for this code is basic:
//insert
map.set(1, "Value1C");
reverseMap.set("Value1C", 1);
//search
console.log(map.get(reverseMap.get("Value1C")));
Use more CPU
In this cause you will need to do a search throught all the values, this will be an O(n) complexity. It is not good if your list is too big, even worst in a single thread environment like Node.js.
Check the code example below:
function findValueInMultiMap(map, value, callback){
map.forEachEntry(function (entry, key) {
for(var e in entry){
if(entry[e]==value){
callback(map.get(key));
}
}
});
}
findValueInMultiMao(map, 'Value1C', function(values){
console.log(values);
});
I'm using a MongoDB mapReduce to code a ranking feed algorithm, it almost works but the latest thing to implement is the pagination. The map reduce supports the results limitation but how could I implement the offset (skipping) based e.g. on the latest viewed _id of the results, knowing that I'm using mongoose?
This is the procedure I wrote:
o = {};
o.map = function() {
//log10(likes+comments) / elapsed hours from the post creation
emit(Math.log(this.likes + this.comments + 1) / Math.LN10 / Math.abs((now - this.createdAt) / 6e7 + 1), this);
};
o.reduce = function(key, values) {
//sort the values, when they have the same score
values.sort(function(a, b) {
a.createdAt - b.createdAt;
});
//serialize the values, because mongoose does not support multiple returned values
return JSON.stringify(values);
};
o.scope = {now: new Date()};
o.limit = 15;
Posts.mapReduce(o, function(err, results) {
if (err) return console.log(err);
console.log(results);
});
Also, if the mapReduce it's not the way to go, do you suggest other on how to implement something like this?
What you need is a page delimiter which is not the id of the latest viewed as you say, but your sorting property. In this case, it seems to be the formula Math.log(this.likes + this.comments + 1) / Math.LN10 / Math.abs((now - this.createdAt) / 6e7 + 1).
So, in your mapReduce query needs to hold a where value of that formula above. Or specifically, 'formula >= . And also it needs to hold the value of createdAt at the last page, since you don't sort by that. (Assuming createdAt is unique). So yourqueryof mapReduce would saywhere: theFormulaExpression, createdAt: { $lt: lastCreatedAt }`
If you do allow multiple identical createdAt values, you have to play a little outside of the database itself.
So you just search by formula.
Ideally, that gives you one element with exactly that value, and the next ones sorted after that. So in reply to the module caller, remove this first element off the array (and make sure you actually ask for more results then you need because of this).
Now, since you allow for multiple similar values, you need another identifying prop, say, object id or created_at. Your consumer (caller of this module) will have to provide both (last value of the score, createdAt of the last object). Say you have a page split exactly in the middle - one or more objects is on the previous page, another set on the next
. You'd have to not simply remove the top value (because that same score is already served on the previous page), but possibly several of them from the top.
Then it goes really crazy, because potentially your whole page was already served - compare the _ids, look for the first one after the one your module caller has provided you with. Or look into the data and determine how many matching values like that are there, try to get at least as many more values from mapReduce then you have on your actual page size.
Aside from that, I would do this with aggregation instead, it should be much more preformant.
I have a Couchdb database with documents of the form: { Name, Timestamp, Value }
I have a view that shows a summary grouped by name with the sum of the values. This is straight forward reduce function.
Now I want to filter the view to only take into account documents where the timestamp occured in a given range.
AFAIK this means I have to include the timestamp in the emitted key of the map function, eg. emit([doc.Timestamp, doc.Name], doc)
But as soon as I do that the reduce function no longer sees the rows grouped together to calculate the sum. If I put the name first I can group at level 1 only, but how to I filter at level 2?
Is there a way to do this?
I don't think this is possible with only one HTTP fetch and/or without additional logic in your own code.
If you emit([time, name]) you would be able to query startkey=[timeA]&endkey=[timeB]&group_level=2 to get items between timeA and timeB grouped where their timestamp and name were identical. You could then post-process this to add up whenever the names matched, but the initial result set might be larger than you want to handle.
An alternative would be to emit([name,time]). Then you could first query with group_level=1 to get a list of names [if your application doesn't already know what they'll be]. Then for each one of those you would query startkey=[nameN]&endkey=[nameN,{}]&group_level=2 to get the summary for each name.
(Note that in my query examples I've left the JSON start/end keys unencoded, so as to make them more human readable, but you'll need to apply your language's equivalent of JavaScript's encodeURIComponent on them in actual use.)
You can not make a view onto a view. You need to write another map-reduce view that has the filtering and makes the grouping in the end. Something like:
map:
function(doc) {
if (doc.timestamp > start and doc.timestamp < end ) {
emit(doc.name, doc.value);
}
}
reduce:
function(key, values, rereduce) {
return sum(values);
}
I suppose you can not store this view, and have to put it as an ad-hoc query in your application.