I have a Couchdb that stores documents each of each has a prefix field. Prefixes are unique so they can actually be used as IDs
Say:
_id=1 {prefix="AAABBBCCC", ...}
_id=2 {prefix="AAABBBDDD", ...}
_id=3 {prefix="AAABBE", ...}
_id=4 {prefix="AAAFF", ...}
I need to query these documents retrieving an appropriate document (always one full match on the prefix) using a key that is longer but completly matches the prefix. Prefix length varies, key length is constant.
query_key = AAABBBCCC123 => _id1
query_key = AAABBBDDD456 => _id2
query_key = AAABBEEEEEEE => _id3
query_key = AAABxxxxxxxx => Null
Any idea how can this be done in Couch?
Make a view emitting doc.prefix. Then query descending with startkey set to your query key with limit=1. The resulting prefix might be yours but you have to confirm.
You can either confirm the prefix in the client, or with a _list function. A _list function probably does not help with performance so I would consider doing it in the client, unless you have many clients in many languages, and you can standardize on a single URL to query with the same output.
The map function should look like this
function(doc) {
emit(doc.prefix, doc);
}
and you should search for documents with the substring function in the key.
Like this:
_design/doc/_view/viewname?key=QUERY_KEY.substring(0, FIXED_KEY_LENGTH)
Related
For example I have a thousands of documents with same structure, for example:
{
"key_1":"value_1",
"key_2":"value_2",
"key_3":"value_3",
...
...
}
And I need to get, let's say key_1, key_3 and key_23 from some set of documents with known IDs, for example, I need to process only 5 documents while my DB contains several thousands. Each time I have a different set of keys and document IDs. Is it possible to get that information for a one request?
You can use a list function (see: this, this, and this).
Since you know the ids, you can then query _all_docs with the list function:
POST /{db}/_design/{ddoc}/_list/{func}/_all_docs?include_docs=true&columns=["key_1","key_2","key_3"]
Accept: application/json
Content-Length: {whatever}
{
"keys": [
"docid002",
"docid005"
]
}
The list function needs to look at documents, and send the appropriate JSON for each one. Not tested:
(function (head, req) {
send('{"total_rows":' + head.total_rows + ',"offset":' + head.offset + ',"rows":[');
var columns = JSON.parse(req.query.columns);
var delim = '';
var row;
while (row = getRow()) {
var doc = {};
for (var k in columns) {
doc[k] = row.doc[k];
}
row.doc = doc;
send(delim + toJSON(row));
delim = ',';
}
send(']}');
})
Whether this is a good idea, I'm not sure. If your documents are big, and bandwidth savings important, it might.
Yes, that’s possible. Your question can be broken up into two distinct problems:
Getting only a part of the document (in your example: key_1, key_3 and key_23). This can be done using a view. A view is saved into a design document. See the wiki for more info on how to create views.
Retrieving only certain documents, which are defined by their ID. When querying views, you cannot only specify a single ID (or rather key), but also an array of keys, which is what you would need here. Again, see the section on querying views in the wiki for explanations and examples.
Even though you only need a subset of values from a document, you may find that the system as a whole performs better if you just ask for the entire document then select the values you need from that result.
To only get the specific key value pairs you need to create a view that has view entries with a multipart key consisting of the doc id and doc item name, with value of the corresponding doc item.
So your map function would look something like:
function(doc){
for(var i = 1; i < doc.keysInDoc; i++){
var k = "key_"+i;
emit([doc._id, k], doc.[k]);
}
}
You can then use multi key lookup with each key being of the form ["docid12345", "key_1"], ["docid56789", "key_23"], etc.
So a query like:
http://host:5984/db/_design/design/_view/view?&keys=[["docid002","key_8"],["docid005","key_7"]]
will return
{"total_rows":84,"offset":67,"rows":[
{"id":"docid002","key":["docid002","key_8"],"value":"value d2_k8"},
{"id":"docid005","key":["docid005","key_12"],"value":"value d5_k12"}
]}
suppose i have the following data in my database:
[1,2],[2,1],[1,3],[3,1]...
were the numbers represent the a and b values of the formula a*x+b
what i now want is a query that returns the difference to a given point x,y.
for example: the point [2,6] is given. i want my query to return
[1,2] = -2 (1*2+2=4 4-6=-2)
[2,1] = -1 (2*2+1=5 5-6=-1)
[1,3] = -1 (1*2+3=5 4-6=-1)
[3,1] = 1 (3*2+1=7 7-6=-1)
I know how to do this in SQL but the data is already in a couchdb. I'm quite new to the NoSQL world and was wondering if something like this would be possible in couchdb.
what you can do is to use the standard MapReduce functionality of CouchDB.
Map is function you put in a view, which finds your data. You can have various criteria how to locate the docs you need. Next, if you specify so in the query with reduce=true, a reduce function is executed on each document that matched the map condition. You can use JavaScript to perform various operations on the document's values.
In your case, the map can look something like this:
function(doc) {
if(doc.a && doc.b) {
emit(doc._id,[doc.a, doc.b]);
}
}
then, the reduce gets called, like this:
function(keys, values, rereduce) {
var res;
//do something with values...
return res;
}
In your case keys will be list of document ID's and values will be the array of your a & b fields.
When you call the MapReduce (depending what method you use to access the DB), you should specify reduce=true.
Good resources on MapReduce (and on Views, Sorting and List funtions) are:
http://guide.couchdb.org/draft/views.html
http://www.slideshare.net/okurow/couchdb-mapreduce-13321353
Another way to go is to use a list function on the Map result, if you want to output the result in HTML. A good reason to use List function is that you can pass arguments to it with querystring, in your case it may be the point for which you want to calculate distances.
For detailed description on List functions, have a look here:
http://guide.couchdb.org/draft/transforming.html
Hope this helps.
I understand that the reduce function is supposed to somewhat combine the results of the map function but what exactly is passed to the reduce function?
function(keys, values){
// what's in keys?
// what's in values?
}
I tried to explore this in the Futon temporary view builder but all I got were reduce_overflow_errors. So I can't even print the keys or values arguments to try to understand what they look like.
Thanks for your help.
Edit:
My problem is the following. I'm using the temporary view builder of Futon.
I have a set of document representing text files (it's for a script I want to use to make translation of documents easier).
text_file:
id // the id of the text file is its path on the file system
I also have some documents that represent text fragments appearing in the said files, and their position in each file.
text_fragment:
id
file_id // correspond to a text_file document
position
I'd like to get for each text_file, a list of the text fragments that appear in the said file.
Update
Note on JavaScript API change: Prior to Tue, 20 May 2008 (Subversion revision r658405) the function to emit a row to the map index, was named "map". It has now been changed to "emit".
That's the reason why there is mapused instead of emitit was renamed. Sorry I corrected my code to be valid in the recent version of CouchDB.
Edit
I think what you are looking for is a has-many relationship or a join in sql db language. Here is a blog article by Christopher Lenz that describes exactly what your options are for this kind of scenario in CouchDB.
In the last part there is a technique described that you can use for the list you want.
You need a map function of the following format
function(doc) {
if (doc.type == "text_file") {
emit([doc._id, 0], doc);
} else if (doc.type == "text_fragment") {
emit([doc.file_id, 1], doc);
}
}
Now you can query the view in the following way:
my_view?startkey=["text_file_id"]&endkey;=["text_file_id", 2]
This gives you a list of the form
text_file
text_fragement_1
text_fragement_2
..
Old Answer
Directly from the CouchDB Wiki
function (key, values, rereduce) {
return sum(values);
}
Reduce functions are passed three arguments in the order key, values and rereduce
Reduce functions must handle two cases:
When rereduce is false:
key will be an array whose elements are arrays of the form [key,id], where key is a key emitted by the map function and id is that of the document from which the key was generated.
values will be an array of the values emitted for the respective elements in keys
i.e. reduce([ [key1,id1], [key2,id2], [key3,id3] ], [value1,value2,value3], false)
When rereduce is true:
key will be null
values will be an array of values returned by previous calls to the reduce function
i.e. reduce(null, [intermediate1,intermediate2,intermediate3], true)
Reduce functions should return a single value, suitable for both the value field of the final view and as a member of the values array passed to the reduce function.
I have a Couchdb database with documents of the form: { Name, Timestamp, Value }
I have a view that shows a summary grouped by name with the sum of the values. This is straight forward reduce function.
Now I want to filter the view to only take into account documents where the timestamp occured in a given range.
AFAIK this means I have to include the timestamp in the emitted key of the map function, eg. emit([doc.Timestamp, doc.Name], doc)
But as soon as I do that the reduce function no longer sees the rows grouped together to calculate the sum. If I put the name first I can group at level 1 only, but how to I filter at level 2?
Is there a way to do this?
I don't think this is possible with only one HTTP fetch and/or without additional logic in your own code.
If you emit([time, name]) you would be able to query startkey=[timeA]&endkey=[timeB]&group_level=2 to get items between timeA and timeB grouped where their timestamp and name were identical. You could then post-process this to add up whenever the names matched, but the initial result set might be larger than you want to handle.
An alternative would be to emit([name,time]). Then you could first query with group_level=1 to get a list of names [if your application doesn't already know what they'll be]. Then for each one of those you would query startkey=[nameN]&endkey=[nameN,{}]&group_level=2 to get the summary for each name.
(Note that in my query examples I've left the JSON start/end keys unencoded, so as to make them more human readable, but you'll need to apply your language's equivalent of JavaScript's encodeURIComponent on them in actual use.)
You can not make a view onto a view. You need to write another map-reduce view that has the filtering and makes the grouping in the end. Something like:
map:
function(doc) {
if (doc.timestamp > start and doc.timestamp < end ) {
emit(doc.name, doc.value);
}
}
reduce:
function(key, values, rereduce) {
return sum(values);
}
I suppose you can not store this view, and have to put it as an ad-hoc query in your application.
How are multiple range queries implemented in CouchDB? For a single range condition, startkey and endkey combination works fine, but the same thing is not working with a multiple range condition.
My View function is like this:
"function(doc){
if ((doc['couchrest-type'] == 'Item')
&& doc['loan_name']&& doc['loan_period']&&
doc['loan_amount'])
{ emit([doc['template_id'],
doc['loan_name'],doc['loan_period'],
doc['loan_amount']],null);}}"
I need to get the whole docs with loan_period > 5 and
loan_amount > 30000. My startkey and endkey parameters are like this:
params = {:startkey =>["7446567e45dc5155353736cb3d6041c0",nil,5,30000],
:endkey=>["7446567e45dc5155353736cb3d6041c0",{},{},{}],:include_docs => true}
Here, I am not getting the desired result. I think my startkey and endkey params are wrong. Can anyone help me?
A CouchDB view is an ordered list of entries. Queries on a view return a contiguous slice of that list. As such, it's not possible to apply two inequality conditions.
Assuming that your loan_period is a discrete variable, this case would probably be best solved by emit'ing the loan_period first and then issuing one query for each period.
An alternative solution would be to use couchdb-lucene.
You're using arrays as your keys. Couchdb will compare arrays by comparing each array element in increasing order until two element are not equal.
E.g. to compare [1,'a',5] and [1,'c',0] it will compare 1 whith 1, then 'a' with 'c' and will decide that [1,'a',5] is less than [1,'a',0]
This explains why your range key query fails:
["7446567e45dc5155353736cb3d6041c0",nil,5,30000] is greater ["7446567e45dc5155353736cb3d6041c0",nil,5,90000]
Your emit statement looks a little strange to me. The purpose of emit is to produce a key (i.e. an index) and then the document's values that you are interested in.
for example:
emit( doc.index, [doc.name, doc.address, ....] );
You are generating an array for the index and no data for the view.
Also, Couchdb doesn't provide for an intersection of views as it doesn't fit the map/reduce paradigm very well. So your needs boil down to trying to address the following:
Can I produce a unique index which I can then extract a particular range from? (using startkey & endkey)
Actually CouchDB allows views to have complex keys which are arrays of values as given in the question:
[template_id, loan_name, loan_period, loan_amount]
Have you tried
params = {:startkey =>["7446567e45dc5155353736cb3d6041c0",nil,5,30000],
:endkey=>["7446567e45dc5155353736cb3d6041c0",{}],:include_docs => true}
or perhaps
params = {:startkey =>["7446567e45dc5155353736cb3d6041c0","\u0000",5,30000],
:endkey=>["7446567e45dc5155353736cb3d6041c0","\u9999",{}],:include_docs => true}