How to filter by document attributes in couchdb? - couchdb

Could someone explain to me how I can filter documents with multiple attributes by using arrays and keys?
For example I have a document with the attribute a, b, c and d. I would like to filter by an user selected value from attribute "a". Later I would like to narrow the results with a value from the attribute "c" or maybe a value from the attribute from "d".
Does anyone have suggestions how to accomplish this task elegant?

Assuming your doc looks like:
{ 'a': 123, 'b': 456, 'c': 789, ... }
You can create a view like this:
function(doc){
emit([doc.a, doc.b, doc.c], doc)
}
You can then use the startkey and endkey parameters to access the views, whilst restricting results to a specific subset:
...&startkey=[123,]&endkey=[123,{}] // Shows all results with doc.a=123
...&startkey=[123,]&endkey=[123,456] // Shows all results with doc.a=123 and doc.b<=456
However all elements will be sorted in a single list and all you can ever access is a subsection of this list. So if you want to access documents where 123 <= doc.a <= 456 and doc.b between 123 and 456, you'll have to create two separate views, one for doc.a and one for doc.b and then have your client app identify the documents returned by both views.

Related

Find a document in a nested query mongodb

I have two collection A and B where in B there is a field that has the id of A as shown in the below. The code below returns nothing.
So I am trying to figure out if I want to use mongodb shell, how can I use the return cursor of A in order to point to ID values and send it to the value of AID key.
db.B.find({
'AID': db.A.find({'item': 'x'},
{'id': 1})
});

How to get count of documents that would be added if one selects another aggregation options of an array-field in elastic search

Let's say we have four documents with a tags field. It can contain multiple strings, let's say foo, bar and baz.
docA.tags = ['foo']
docB.tags = ['bar']
docC.tags = ['foo', 'bar']
docD.tags = ['foo', 'baz']
I query the docs using aggregations so I get the four documents and a list of three buckets with the count that matches the specific tag.
buckets = [
{key: 'bar', doc_count: 2}, // docB, docC
{key: 'foo', doc_count: 3}, // docA, docC, docD
{key: 'baz', doc_count: 1} // docD
]
If I now run another query and add one of those tags – lets say foo – as a terms-filter to the query, I only get the docs (docA, docC, docD) that have this tag. That's what I want.
But I also get another list of possible aggregations with updated counts.
buckets = [
{key: 'bar', doc_count: 1}, // docC
{key: 'baz', doc_count: 1}, // docD
]
But these counts don't really match what's happening. They reflect the count of documents that match both of the tags, the one I selected in the first place (foo) AND the one of the bucket (bar or baz).
But if I then select a second tag – let's say baz – I get documents that have been tagged with foo OR baz. That's because I use the terms filter.
So what I really want is this
buckets = [
{key: 'bar', doc_count: 1}, //docB
{key: 'baz', doc_count: 0},
]
How can I achieve that the counts are appropriate. They should reflect the count of documents that would be added if I select the second tag. An example of this is here.
I already tried to use post_filter but that always gives me the first result. Than a min_doc_count-flag to the aggs, but this only shows me the combinations that would result in count=0.
I have a solution for this, but it seem pretty complicated to me. For this I would have to run another request for each aggregation where I invert the filter criteria. So in the upper example I have to make a query to all docs that don't have the tag foo and match the rest of the query. The aggregation results would be exactly what I needed.
It sounds like you're trying to do something a little atypical for facets/aggregations.
(However, it's not invalid... it makes a lot of sense to understand how the size of your selection will change through the application of a filter)
What I think you're asking for is:
Display results for: QUERY AND FILTER
Display term aggregation counts for: QUERY NOT FILTER
You mentioned you're doing subsequent request(s) for counts? You should be able to construct this aggregation request inside your main search request.
Structurally it's:
match: (QUERY) or match_all
aggregations:
filter: { not: (FILTER) }
aggregations: { terms: ... }
post_filter: (FILTER)
That post_filter is executed after the aggregations are calculated (but still applied to the search results) so your results will be what you expect.
The aggregations are working in the scope of the search query alone. (The postfilter has not been applied yet.)
The filter aggregation excludes all documents matching FILTER from the search query results before the Terms Aggregation calculates the counts.
(giving you the left outside edge of the Venn shown above, but just for the counts)

couchDB sorting complex key

I have a couchDB database which has several different document "types" which all relate to a main "type".
In the common blog / post example, the main type is the blog post, and the others are comments (though there are 3 different types of comments.
All of the types have a date on them, however, I wish to sort blog posts by date, but return all of the data from the comments as well. I can write an emit which produces keys like so:
[date, postID, docTypeNumber]
where docTypeNumber is 1 for post and > 1 for the different comment document types.
e.g:
["2013-03-01", 101, 1]
[null, 101, 2]
[null, 101, 2]
[null, 101, 3]
["2013-03-02", 101, 1]
[null, 102, 2]
[null, 102, 3]
Of course, If I emit this, all the nulls get sorted together. Is there a way to ignore the nulls, and group them by the seccond item in the array, but sort them by the first if it is not null?
Or, do I have to get all the documents to record the post date in order for sort to work?
I do not want to use lists, they are way too slow and I'm dealing with a potentially large data set.
You can do this by using conditionals in your map function.
if(date != null) {
emit([date, postID, docTypeNumber]);
}
else {
emit([postID, docTypeNumber]);
}
I don't know if you want your array length to be variable or not. If not, you could add the sort variable first. The following snippet could work since date and postID presumably never have the same values.
if(date != null) {
sortValue = date;
}
else {
sortValue = postID;
}
emit(sortValue, date, postID, docTypeNumber);
Update: I thought about this a little more. In general, I make my views based on queries I want to perform. So I ask myself, what do I need to query? It seems that in your case, you might have two distinct queries here. If so, I suggest having two different views. There is a performance penalty to pay since you would run two views instead of one, but I doubt it is perceivable to the user. And it might take up more disk space. The benefit for you would be clearer and more explicit code.
It seems you want to sort all the data (both the post and the comments) with post's date. Since in your design comment document does not contain post date (just comment date) it is difficult with the view collation pattern. I suggest changing the database design to have blog post ID meaningful and contain the date, eg. concatenated date with author id. This way if you emit [doc._id, doc.type] from the post and [doc.post, doc.type] from the comment document you will have post and comments grouped and sorted by date.

What is in the reduce function arguments in CouchDB?

I understand that the reduce function is supposed to somewhat combine the results of the map function but what exactly is passed to the reduce function?
function(keys, values){
// what's in keys?
// what's in values?
}
I tried to explore this in the Futon temporary view builder but all I got were reduce_overflow_errors. So I can't even print the keys or values arguments to try to understand what they look like.
Thanks for your help.
Edit:
My problem is the following. I'm using the temporary view builder of Futon.
I have a set of document representing text files (it's for a script I want to use to make translation of documents easier).
text_file:
id // the id of the text file is its path on the file system
I also have some documents that represent text fragments appearing in the said files, and their position in each file.
text_fragment:
id
file_id // correspond to a text_file document
position
I'd like to get for each text_file, a list of the text fragments that appear in the said file.
Update
Note on JavaScript API change: Prior to Tue, 20 May 2008 (Subversion revision r658405) the function to emit a row to the map index, was named "map". It has now been changed to "emit".
That's the reason why there is mapused instead of emitit was renamed. Sorry I corrected my code to be valid in the recent version of CouchDB.
Edit
I think what you are looking for is a has-many relationship or a join in sql db language. Here is a blog article by Christopher Lenz that describes exactly what your options are for this kind of scenario in CouchDB.
In the last part there is a technique described that you can use for the list you want.
You need a map function of the following format
function(doc) {
if (doc.type == "text_file") {
emit([doc._id, 0], doc);
} else if (doc.type == "text_fragment") {
emit([doc.file_id, 1], doc);
}
}
Now you can query the view in the following way:
my_view?startkey=["text_file_id"]&endkey;=["text_file_id", 2]
This gives you a list of the form
text_file
text_fragement_1
text_fragement_2
..
Old Answer
Directly from the CouchDB Wiki
function (key, values, rereduce) {
return sum(values);
}
Reduce functions are passed three arguments in the order key, values and rereduce
Reduce functions must handle two cases:
When rereduce is false:
key will be an array whose elements are arrays of the form [key,id], where key is a key emitted by the map function and id is that of the document from which the key was generated.
values will be an array of the values emitted for the respective elements in keys
i.e. reduce([ [key1,id1], [key2,id2], [key3,id3] ], [value1,value2,value3], false)
When rereduce is true:
key will be null
values will be an array of values returned by previous calls to the reduce function
i.e. reduce(null, [intermediate1,intermediate2,intermediate3], true)
Reduce functions should return a single value, suitable for both the value field of the final view and as a member of the values array passed to the reduce function.

How to restrict rows in a list inside a document?

I have a document something like:
{"name":"Stock levels",
"content":[
{"sku":"328143",
"name":"Battery",
"stocklevel":"100",
"warehouse":"london"},
{"sku":"328143",
"name":"Battery",
"stocklevel":"20",
"warehouse":"manchester"},
{"sku":"328143",
"name":"Battery",
"stocklevel":"30",
"warehouse":"brighton"}]}
Where the list "content" could have quite a lot of rows.
What I want to do is return an internal row count and just one row from the list.
e.g.
{"name":"Stock levels",
"rows" : "2300",
"content":[
{"sku":"328143",
"name":"Battery",
"stocklevel":"100",
"warehouse":"london"}]}
How might I achieve this in CouchDb? My initial thought is using a list to effectively rebuild the document and inserting the extra rows field and restricting the number of rows return internally, but I am not sure if this is the best approach.
Thanks
You can use a view,
the following example allow you to search based on document id
(which is emit as key)
function(doc)
{
if (doc._id == "xxx")
{
emit(doc._id, {name:doc.name, rows:doc.content.length, content:doc.content[0]});
}
}

Resources