Running couchdb view with group=false (SELECT DISTINCT type behavior) - couchdb

I was following this guide on couchdb http://guide.couchdb.org/draft/cookbook.html#unique in order to return a distinct list from a view.
My map function looks like:
function(doc) {
if(doc.PartnerName !=null) {
emit(doc.PartnerName, null);
}
}
And, I have a reduce function:
function(keys, values) {
return true;
}
When I run this by hitting:
/dbName/_design/Partners/_view/my-view-name
I get this back:
{"rows":[
{"key":null,"value":true}
]}
If I add ?reduce=false to the end, I get back sort of the desired result:
{
"total_rows":11,"offset":0,
"rows":[
{"id":"a","key":"PARTNER_ONE","value":null},
{"id":"b","key":"PARTNER_ONE","value":null},
{"id":"c","key":"PARTNER_ONE","value":null},
{"id":"d","key":"PARTNER_ONE","value":null},
{"id":"e","key":"PARTNER_ONE","value":null},
{"id":"f","key":"PARTNER_ONE","value":null},
{"id":"g","key":"PARTNER_TWO","value":null},
{"id":"h","key":"PARTNER_TWO","value":null},
{"id":"i","key":"PARTNER_TWO","value":null},
{"id":"j","key":"PARTNER_THREE","value":null},
{"id":"k","key":"PARTNER_FOUR","value":null}
]}
However, I'm ideally trying to get a distinct list, so in the above example, it'd be PARTNER_ONE, PARTNER_TWO, PARTNER_THREE, PARTNER_FOUR

I think you are missing the group=true parameter. Try to query
/dbName/_design/Partners/_view/my-view-name?group=true
and see if that gives you the correct result.

Related

Using find{ } on a map where the whole map is evaluated not each element

I created some mixin methods. Code and example below:
URL.metaClass.withCreds = { u, p ->
delegate.openConnection().tap {
setRequestProperty('Authorization', "Basic ${(u + ':' + p).bytes.encodeBase64()}")
}
}
URLConnection.metaClass.fetchJson = {
delegate.setRequestProperty('Accept', 'application/json')
delegate.connect()
def code = delegate.responseCode
def result = new JsonSlurper().parse(code >= 400 ? delegate.errorStream : delegate.inputStream as InputStream)
[
ok : code in (200..299),
body: result,
code: code
]
}
example usage:
new URL("$baseUrl/projects/$name").withCreds(u, p).fetchJson().find {
it.ok
}?.tap{
it.repos = getRepos(it.key).collectEntries { [(it.slug): it] }
}
}
When I dont use find(), my object is, as expected, a map with those 3 elements. When I use find it is a Map.Entry with key ok and value true
which produces this error:
groovy.lang.MissingPropertyException: No such property: ok for class: java.util.LinkedHashMap$Entry
Possible solutions: key
It occured to me when I wrote this post that it was treated the map as an iterable and thus looking at every entry which I have subsequently verified. How do I find on the whole map? I want it.ok because if it's true, I need to carry it forward
There is no such method in Groovy SDK. Map.find() runs over an entry set of the map you call method on. Based on expectation you have defined I'm guessing you are looking for a function that tests map with a given predicate and returns the map if it matches the predicate. You may add a function that does to through Map.metaClass (since you already add methods to URL and URLConnection classes). Consider following example:
Map.metaClass.continueIf = { Closure<Boolean> predicate ->
predicate(delegate) ? delegate : null
}
def map = [
ok : true,
body: '{"message": "ok"}',
code: 200
]
map.continueIf { it.ok }?.tap {
it.repos = "something"
}
println map
In this example we introduced a new method Map.continueIf(predicate) that tests if map matches given predicate and returns a null otherwise. Running above example produces following output:
[ok:true, body:{"message": "ok"}, code:200, repos:something]
If predicate is not met, map does not get modified.
Alternatively, for more strict design, you could make fetchJson() method returning an object with corresponding onSuccess() and onError() methods so you can express more clearly that you add repos when you get a successful response and optionally you create an error response otherwise.
I hope it helps.

how to check if any of given queries return any result in knex

I have two queries:
a) select id from ingredietns where name = my_param;
b) select word_id from synonyms where name = my_param;
Both return 0 or 1 row. I can also add limit 1 if needed (or in knex first()).
I can translate each into knex like this:
knex("ingredients").select('id').where('name', my_param) //do we need first()?
knex("synonyms").select('word_id').where('name', my_param) //do we need first()?
I need function called "ingredientGetOrCreate(my_param)". This function would
a) check if any of above queries return result
b) if any of these return, then return ingredients.id or synonyms.word_id - only one could be returned
c) if record doesn't eixst in any of tables, I need to do knex inesrt aand return newly added id from function
d) later I am not sure I also understand how to call this newly create function.
Function ingredientGetOrCreate would be used later as seperate function or in the following scenario (like "loop") that doesn't work for me either:
knex("products") // for each product
.select("id", "name")
.map(function (row) {
var descriptionSplitByCommas = row.desc.split(",");
Promise.all(descriptionSplitByCommas
.map(function (my_param) {
// here it comes - call method for each param and do insert
ingredientGetOrCreate(my_param)
.then(function (id_of_ingredient) {
knex('ingredients_products').insert({ id_of_ingredient });
});
...
I am stuck with knex and Promise queries because of asynchronouse part. Any clues, please?
I though I can somehow use Promise.all or Promise.some to call both queries.
P.S. This is my first day with nodejs, Promise and knex.
As far as I decode your question, it consists of two parts:
(1) You need to implement upsert logic (get-or-create logic).
(2) Your get part requires to query not a single table, but a pair of tables in specific order. Table names imply that this is some sort of aliasing engine inside of your application.
Let's start with (2). This could definitely be solved with two queries, just like you sense it.
function pick_name (rows)
{
if (! rows.length) return null
return rows[0].name
}
// you can sequence queries
function ingredient_get (name)
{
return knex('ingredients')
.select('id').where('name', name)
.then(pick_name)
.then(name =>
{
if (name) return name
return knex('synonyms')
.select('word_id').where('name', name)
.then(pick_name)
})
}
// or run em parallel
function ingredient_get (name)
{
var q_ingredients = knex('ingredients')
.select('id').where('name', name)
.then(pick_name)
var q_synonyms = knex('synonyms')
.select('word_id').where('name', name)
.then(pick_name)
return Promise.all([ q_ingredients, q_synonyms ])
.then(([name1, name2]) =>
{
return name1 || name2
})
}
Important notions here:
Both forms works well and return first occurence or JS' null.
First form optimizes count of queries to DB.
Second form optimizes answer time.
However, you can go deeper and use more SQL. There's a special tool for such task called COALESCE. You can consult your SQL documentation, here's COLASCE of PostgreSQL 9. The main idea of COALESCE is to return first non-NULL argument or NULL otherwise. So, you can leverage this to optimize both queries and answer time.
function ingredient_get (name)
{
// preparing but not executing both queries
var q_ingredients = knex('ingredients')
.select('id').where('name', name)
var q_synonyms = knex('synonyms')
.select('word_id').where('name', name)
// put them in COALESCE
return knex.raw('SELECT COALESCE(?, ?) AS name', [ q_ingredients, q_synonyms ])
.then(pick_name)
This solution guarantees single query and furthermore DB engine can optimize execution in any way it sees appropriate.
Now let's solve (1): We now got ingredient_get(name) which returns Promise<string | null>. We can use its output to activate create logic or return our value.
function ingredient_get_or_create (name, data)
{
return ingredient_get(name)
.then(name =>
{
if (name) return name
// …do your insert logic here
return knex('ingredients').insert({ name, ...data })
// guarantee homohenic output across get/create calls:
.then(() => name)
})
}
Now ingredient_get_or_create do your desired upsert logic.
UPD1:
We already got ingredient_get_or_create which returns Promise<name> in any scenario (both get or create).
a) If you need to do any specific logic after that you can just use then:
ingredient_get_or_create(…)
.then(() => knex('another_table').insert(…))
.then(/* another logic after all */)
In promise language that means «do that action (then) if previous was OK (ingredient_get_or_create)». In most of the cases that is what you need.
b) To implement for-loop in promises you got multiple different idioms:
// use some form of parallelism
var qs = [ 'name1', 'name2', 'name3' ]
.map(name =>
{
return ingredient_get_or_create(name, data)
})
var q = Promise.all(qs)
Please, note, that this is an agressive parallelism and you'll get maximum of parallel queries as your input array provides.
If it's not desired, you need to limit parallelism or even run tasks sequentially. Bluebird's Promise.map is a way to run map which analogous to example above but with concurrency option available. Consider the docs for details.
There's also Bluebird's Promise.mapSeries which conceptually is an analogue to for-loop but with promises. It's like map which runs sequentially. Look the docs for details.
Promise.mapSeries([ 'name1', 'name2', 'name3' ],
(name) => ingredient_get_or_create(name, data))
.then(/* logic after all mapSeries are OK */)
I believe the last is what you need.

Map/Reduce differences between Couchbase & CloudAnt

I've been playing around with Couchbase Server and now just tried replicating my local db to Cloudant, but am getting conflicting results for my map/reduce function pair to build a set of unique tags with their associated projects...
// map.js
function(doc) {
if (doc.tags) {
for(var t in doc.tags) {
emit(doc.tags[t], doc._id);
}
}
}
// reduce.js
function(key,values,rereduce) {
if (!rereduce) {
var res=[];
for(var v in values) {
res.push(values[v]);
}
return res;
} else {
return values.length;
}
}
In Cloudbase server this returns JSON like:
{"rows":[
{"key":"3d","value":["project1","project3","project8","project10"]},
{"key":"agents","value":["project2"]},
{"key":"fabrication","value":["project3","project5"]}
]}
That's exactly what I wanted & expected. However, the same query on the Cloudant replica, returns this:
{"rows":[
{"key":"3d","value":4},
{"key":"agents","value":1},
{"key":"fabrication","value":2}
]}
So it somehow only returns the length of the value array... Highly confusing & am grateful for any insights by some M&R ninjas... ;)
It looks like this is exactly the behavior you would expect given your reduce function. The key part is this:
else {
return values.length;
}
In Cloudant, rereduce is always called (since the reduce needs to span over multiple shards.) In this case, rereduce calls values.length, which will only return the length of the array.
I prefer to reduce/re-reduce implicitly rather than depending on the rereduce parameter.
function(doc) { // map
if (doc.tags) {
for(var t in doc.tags) {
emit(doc.tags[t], {id:doc._id, tag:doc.tags[t]});
}
}
}
Then reduce checks whether it is accumulating document ids from the identical tag, or whether it is just counting different tags.
function(keys, vals, rereduce) {
var initial_tag = vals[0].tag;
return vals.reduce(function(state, val) {
if(initial_tag && val.tag === initial_tag) {
// Accumulate ids which produced this tag.
var ids = state.ids;
if(!ids)
ids = [ state.id ]; // Build initial list from the state's id.
return { tag: val.tag,
, ids: ids.concat([val.id])
};
} else {
var state_count = state.ids ? state.ids.length : state;
var val_count = val.ids ? val.ids.length : val;
return state_count + val_count;
}
})
}
(I didn't test this code, but you get the idea. As long as the tag value is the same, it doesn't matter whether it's a reduce or rereduce. Once different tags start reducing together, it detects that because the tag value will change. So at that point just start accumulating.
I have used this trick before, although IMO it's rarely worth it.
Also in your specific case, this is a dangerous reduce function. You are building a wide list to see all the docs that have a tag. CouchDB likes tall lists, not fat lists. If you want to see all the docs that have a tag, you could map them.
for(var a = 0; a < doc.tags.length; a++) {
emit(doc.tags[a], doc._id);
}
Now you can query /db/_design/app/_view/docs_by_tag?key="3d" and you should get
{"total_rows":287,"offset":30,"rows":[
{"id":"project1","key":"3d","value":"project1"}
{"id":"project3","key":"3d","value":"project3"}
{"id":"project8","key":"3d","value":"project8"}
{"id":"project10","key":"3d","value":"project10"}
]}

Getting a list of documents with a maximum field value from CouchDB view

Let's say I have blog entries like these in my CouchDB database:
{"name":"Mary", "postdate":"20110412", "subject":"this", "message":"blah"}
{"name":"Joe", "postdate":"20110411", "subject":"that", "message":"yadda"}
{"name":"Mary", "postdate":"20110411", "subject":"and this", "message":"blah-blah"}
{"name":"Joe", "postdate":"20110410", "subject":"And other thing", "message":"yada-yada"}
{"name":"Jane", "postdate":"20110409", "subject":"Serious stuff", "message":"Not really"}
It's pretty easy to get a list of all posts. But how do I get a list of latest posts from all the users?
Like that:
{"name":"Mary", "postdate":"20110412", "subject":"this", "message":"blah"}
{"name":"Joe", "postdate":"20110411", "subject":"that", "message":"yadda"}
{"name":"Jane", "postdate":"20110409", "subject":"Serious stuff", "message":"Not really"}
Try with this map function:
function(doc) {
if (doc.postdate && doc.name) {
emit([doc.name, doc.postdate], 1);
}
}
and the following reduce function:
function(keys, values, rereduce) {
var max = 0,
ks = rereduce ? values : keys;
for (var i = 1, l = ks.length; i < l; ++i) {
if (ks[max][0][1] < ks[i][0][1]) max = i;
}
return ks[max];
}
and query it with group_level=1. It gives you the _id of the posts, then you can retrieve them all with a single query with the keys parameter or using a POST.
I am not sure if this is the best approach, but it seems to work.
UPDATE: fixed map to handle rereduce correctly.
You're going to emit the postdate as the key because keys are sorted. For example, this is what your map function will look like...
function(doc) {
if(doc.postdate) {
emit(doc.postdate, doc);
}
}
That will give you all the docs sorted ascending by postdate. If you want descending then query with ?descending=true
Cheers.

Select top/latest 10 in couchdb?

How would I execute a query equivalent to "select top 10" in couch db?
For example I have a "schema" like so:
title body modified
and I want to select the last 10 modified documents.
As an added bonus if anyone can come up with a way to do the same only per category. So for:
title category body modified
return a list of latest 10 documents in each category.
I am just wondering if such a query is possible in couchdb.
To get the first 10 documents from your db you can use the limit query option.
E.g. calling
http://localhost:5984/yourdb/_design/design_doc/_view/view_name?limit=10
You get the first 10 documents.
View rows are sorted by the key; adding descending=true in the querystring will reverse their order. You can also emit only the documents you are interested using again the querystring to select the keys you are interested.
So in your view you write your map function like:
function(doc) {
emit([doc.category, doc.modified], doc);
}
And you query it like this:
http://localhost:5984/yourdb/_design/design_doc/_view/view_name?startkey=["youcategory"]&endkey=["youcategory", date_in_the_future]&limit=10&descending=true
here is what you need to do.
Map function
function(doc)
{
if (doc.category)
{
emit(['category', doc.category], doc.modified);
}
}
then you need a list function that groups them, you might be temped to abuse a reduce and do this, but it will probably throw errors because of not reducing fast enough with large sets of data.
function(head, req)
{
% this sort function assumes that modifed is a number
% and it sorts in descending order
function sortCategory(a,b) { b.value - a.value; }
var categories = {};
var category;
var id;
var row;
while (row = getRow())
{
if (!categories[row.key[0]])
{
categories[row.key[0]] = [];
}
categories[row.key[0]].push(row);
}
for (var cat in categories)
{
categories[cat].sort(sortCategory);
categories[cat] = categories[cat].slice(0,10);
}
send(toJSON(categories));
}
you can get all categories top 10 now with
http://localhost:5984/database/_design/doc/_list/top_ten/by_categories
and get the docs with
http://localhost:5984/database/_design/doc/_list/top_ten/by_categories?include_docs=true
now you can query this with a multiple range POST and limit which categories
curl -X POST http://localhost:5984/database/_design/doc/_list/top_ten/by_categories -d '{"keys":[["category1"],["category2",["category3"]]}'
you could also not hard code the 10 and pass the number in through the req variable.
Here is some more View/List trickery.
slight correction. it was not sorting untill I added the "return" keyword in your sortCategory function. It should be like this:
function sortCategory(a,b) { return b.value - a.value; }

Resources