Couchdb how to map reduce - node.js

This is my sample JSON structure stored in couchdb:
[{"_id":"567567983d6229ccf572c1a2fcad2fbd6","_rev":"1-8666754b35b18c92f005bb64d9c04712a5f","startTime":1467985647,"uuid":"216743afa424dfsf","from":"IN","to":"NG","duration":"121"},{"_id":"4774f983d6229ccf572c1a2fcad2fbd6","_rev":"1-8e9fb35b18c92f005bb64d9c04712a5f","startTime":1467983347,"uuid":"2134jl13k4j343l243","from":"US","to":"DE","duration":"210"}]
Using reduce function can we produce an output like:
{
outgoing : {US:1, IN:1}, inbound: {NG:1, DE:1}, duration:331
}

I would not use a reduce function for this. The view documentation says:
If you don’t reduce your values to a single scalar value or a small fixed-sized object or array with a fixed number of scalar values of small sizes, you are probably doing it wrong.
Instead, you could use a list function, which allows you to transform the rows of a given view result in any way you like.
I found this guide helpful: Rendering Content Based-On Multiple Documents with List Functions

Related

How to get value from IMAP (hazelcast) given the list of keys?

Problem we are trying to solve:
Give a list of Keys, what is the best way to get the value from IMap given the number of entries is around 500K?
Also we need to filter the values based on fields.
Here is the example map we are trying to read from.
Given IMap[String, Object]
We are using protobuf to serialize the object
Object can be say
Message test{ Required mac_address eth_mac = 1, ….// size can be around 300 bytes }
You can use IMap.getAll(keySet) if you know the keys beforehand. It's much better than single gets since it'll be much less network trips in a bulk operation.
For filtering, you can use predicates on IMap.values(predicate), IMap.entryset(predicate) or IMap.keyset(predicate) based on what you want to filter.
See more: http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#distributed-query

Storing a list of mixed types in Cassandra

In Cassandra, when specifying a table and fields, one has to give each field a type (text, int, boolean, etc.). The same applies for collections, you have to give lock a collection to specific type (set<text> and such).
I need to store a list of mixed types in Cassandra. The list may contain numbers, strings and booleans. So I would need something like list<?>.
Is this possible in Cassandra and if not, What workaround would you suggest for storing a list of mixed type items? I sketched a few, but none of them seem the right way to go...
Cassandra's CQL interface is strictly typed, so you will not be able to create a table with an untyped collection column.
I basically see two options:
Create a list field, and convert everything to text (not too nice, I agree)
Use the thift API and store everything as is.
As suggested at http://www.mail-archive.com/user#cassandra.apache.org/msg37103.html I decided to encode the various values into binary and store them into list<blob>. This allows to still query the collection values (in Cassandra 2.1+), one just needs to encode the values in the query.
On python, simplest way is probably to pickle and hexify when storing data:
pickle.dumps('Hello world').encode('hex')
And to load it:
pickle.loads(item.decode('hex'))
Using pickle ties the implementation to python, but it automatically converts to correct type (int, string, boolean, etc.) when loading, so it's convenient.

Mapping arbitrary objects to indices

Let's assume that we have some objects (strings, for example). It is well known that working with indices (i.e. with numbers 1,2,3...) is much more convenient than with arbitrary objects.
Is there any common way of assigning an index for each object? One can create a hash_map and store an index in the value, but that will be memory-expensive when the number of objects is too high to be placed into the memory.
Thanks.
You can store the string objects in a sorted file.
This way, you are not storing the objects in memory.
Your mapping function can search for the required object in the sorted file.
You can create a hash map to optimize the search.

Grouping using Map and Reduce

I have some documents with a "status" field of "Green", "Red", "Amber".
I'm sure it's possible to use MapReduce to produce a grouped response containing three keys (one for each status), each with a value containing an array of all the documents with that key. However, I'm struggling on how to use re(reduce) functions.
Map function:
function(doc) {
emit(doc.status, doc);
}
Reduce function: ???
This is not a problem that reduce is intended to solve; reduce in CouchDB is for aggregation.
If I understand you correctly, you want this;
Map:
function(doc) {
for (var i in doc.status) {
emit(doc.status[i], null);
}
}
You can then find all docs of status Green with;
/_design/foo/_view/bar?key="Green"&include_docs=true
This will return a list of all docs with that status. If you wish to find docs of more than one status in a single query, then use http POST with a body of this form;
{"keys":["Green", "Red"]}
HTH,
B.
Generally speaking, you will not use a reduce function to obtain your list of documents. A reduce is meant to take a list, and reduce it to a single value. In fact, there is an upper limit to the size of a reduce value anyways, and using entire documents will trigger a reduce_overflow error. Examples of reduces are counts, sums, averages, etc. Stick with the map query, and you will have your values collated and sorted by the status value.
On another, possibly unrelated note, I would not emit the document with your view. You can just use the include_docs view query parameter, and achieve the same effect, while saving disk-space in the process. The trade-off is that internally the doc will have to be retrieved one-by-one. (but since they're indexed already by _id anyways, it's usually a negligible difference.

Need Explanation of couchdb reduce function

From http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
The couchdb reduce function is defined as
function (key, values, rereduce) {
return sum(values);
}
key will be an array whose elements
are arrays of the form [key,id]
values will be an array of the values
emitted for the respective elements
in keys
i.e. reduce([ [key1,id1], [key2,id2], [key3,id3] ], [value1,value2,value3], false)
I am having trouble understanding when/why the array of keys would contain different key values. If the array of keys does contain different key values, how would I deal with it?
As an example, assume that my database contains movements between accounts of the form.
{"amount":100, "CreditAccount":"account_number", "DebitAccount":"account_number"}
I want a view that gives the balance of an account.
My map function does:
emit( doc.CreditAccount, doc.amount )
emit( doc.DebitAccount, -doc.amount )
My reduce function does:
return sum(values);
I seem to get the expected results, however I can't reconcile this with the possibility that my reduce function gets different key values.
Is my reduce function supposed to group key values first? What kind of result would I return in that case?
By default, Futon "groups" your results, which means you get a fresh reduce per key—in your case, an account. The group feature is for exactly this situation.
Over the raw HTTP API, you will get one total reduce for all accounts which is probably not useful. So remember to use group=true in your own application to be sure you get summaries per account.

Resources