I have a design document in CouchDB. I've set up views and filters.
{
"_id": "_design/my_index_id",
"_rev": "17-fa5c543fcc80f4420aa98d58f7a07130",
"views": {
"jobsbyid": {
"map": "function (doc,req) {if (doc.type === 'job') {emit(doc.id);}}"
}
},
"filters": {
"myfilter": "function (doc, req) {return req.query.type === 'job'}"
}
What's the different between views and filters. In terms of performance, use cases and usage. When to use Views and when to use filters?
In CouchDB you have different filtering options for the replication process. All of them are documented here CouchDB filtering options
About filtering, You should have in cosideration that filtering is one of the most expesive operation in CouchDB that could drive you into some performance degradation problems as long the database grows. You can check this answer Filtered Sync between CouchDB and PouchDB
The usage of filters or views are almost the same in terms of performance as they are filtering the whole database in each filtering request. This is stated in the doc
Using _view filter doesn’t queries the view index files, so you cannot
use common view query parameters to additionally filter the changes
feed by index key. Also, CouchDB doesn’t returns the result instantly
as it does for views - it really uses the specified map function as
filter.
Moreover, you cannot make such filters dynamic e.g. process the
request query parameters or handle the User Context Object - the map
function is only operates with the document.
The advantaje of the use of views for filtering is that you are reusing map functions for filtering.
So use cases of both approaches are very similar except that the filters may access to the query params or the security context.
Related
Switching from mongo to pouchdb (with Cloudant), i like the "one database per user" concept, but is there a way to create multiple collections/tables per database ?
Example
- Peter
- History
- Settings
- Friends
- John
- History
- Settings
- Friends
etc...
Couchdb does not have the concept of collections. However, you can achieve similar results using type identifiers on your documents in conjunction with Couchdb views.
Type Identifiers
When you save a document in Couchdb add a field that specifies the type. For example, you would store a friend like so:
{
_id: "XXXX",
type: "Friend",
first_name: "John",
...
}
And you would store history like this:
{
_id: "XXXX",
type: "History",
url: "http://www.google.com",
...
}
Both of these documents would be in the same database, and if you queried all documents on that database then you would receive both.
Views
You can create views that filter on type and then query those views directly. For example, create a view to retrieve friends like so (in Cloudant you can go to add new Design Document and you can copy and paste this directly):
{
"_id" : "_design/friends",
"views" : {
"all" : {
"map" : "function(doc){ if (doc.type && doc.type == 'Friend') { emit(doc._id, doc._rev)}}"
}
}
}
Let's expand the map function:
function(doc) {
if (doc.type && doc.type == "Friend") {
emit(doc._id, doc._rev);
}
}
Essentially this map function is saying to only associate documents to this view that have type == "Friend". Now, we can query this view and only friends will be returned:
http://SERVER/DATABASE/_design/friends/_view/all
Where friends = name of the design document and all = name of the view. Replace SERVER with your server and DATABASE with your database name.
You can find more information about views here:
https://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
You could look into relational-pouch for something like this. Else you could do "3 databases per user." ;)
I may not fully understand what you need here but in general you can achieve what you describe in 3 different ways in CouchDB/Cloudant/PouchDB.
Single document per person (Peter, John). Sure - if the collections are not enormous and more importantly if they are not updated by different users concurrently (or worse in different database instances) leading to conflicts then, in JSON just an element for each collection, holding an array and you can manipulate everything with just one document. Makes access a breeze.
Single document per collection (Peter History, Peter Settings ect). Similar constraints, but you could create a document to hold each of these collections. Provided they will not be concurrently modified often, you would then have a document for Peter's History, and another for Peter's Settings.
Single document per item. This is the finest grain approach - lots of small simple documents each containing one element (say a single History entry for Peter). The code gets slightly simpler because removing items becomes a delete and many clients can update items simultaneously, but now you depend on Views to bring all the items into a list. A view with keys [person, listName, item] for example would let you access what you want.
Generally your data schema decisions come down to concurrency. You mention PouchDB so it may be that you have a single threaded client and option 1 is nice and easy?
I'd like to use a _list function to format the output of the _all_docs view.
I see a patch was merged to support this usecase, however from the docs and the comments I can't figure out what the end-point for this would be.
I've tried on Cloudant, which doesn't seem to work:
/db/_design/[design-doc]/_list/[list-name]/_all_docs
Is it the case that this is not supported on Cloudant? I don't have a CouchDB install at hand to test with.
This feature is not currently supported in Cloudant, but should be coming soon. As a workaround, you can create a custom view called _all_docs in the same design document as your list function:
…
"views": {
"_all_docs": {
"map": "function(doc) { emit(doc._id, {\"rev\": doc._rev}) }"
},
…
This will create a redundant index, which isn't ideal. This custom _all_docs also won't return design documents, where the "real" _all_docs will return design documents.
I prepare to use CouchDB to my project. but cannot find a way to implement a view like an SQL SELECT * FROM Employees WHERE LastName NOT IN (SELECT LastName FROM Managers). In other words, I want to get a set from view A but not in view B. Question: how to implement not-in condition in CouchDB?
Keeping employees and managers lists different sets of documents is using relational structure where you DB is not relational. If, for some reason, you are forced to do that, you need some way to distinguish the scheme of the doc (from which table it is). Lets say you are doing it with field scheme:
{ _id: "EMPL_ID", scheme: "employee", ... }
{ _id: "MNGR_ID", scheme: "manager", employee: "EMPL_ID", ... }
Then you can use map:
function (doc) {
if (!doc.scheme) return;
if (doc.scheme != "manager") emit(doc.last_name, doc);
}
If, for some strange reason, you cannot do that, and you only have the reference to employee doc in manager doc, you can emit both documents:
function (doc) {
if (some_test_for_being_employee_scheme(doc))
emit([doc._id, 1], doc);
if (doc.emp_id)
emit([doc.emp_id, 0], null);
}
You will get the list of employees with keys ["employee_id", 1], and each manager is preceded with the row labeled as manager (key [..., 0]). This will require some space, but with list function you can filter out managers easily and the client will receive from DB only the non-managers.
Keep in mind, that it is only the workaround for not making proper DB design.
If you change the model to make it fit a document-oriented database, this would be easy. I generally keep a "type" key in all of my documents to keep different types of documents straight. If you have a single "person" type and decorate all "person" documents who are also "manager" with a separate key, you could easily emit view keys only for non-managerial personnel. If you opt to have a separate "manager" type, you could similarly restrict emitted view keys to non-managers only.
I think the answer is simply: you can't mix view results. Views are independent.
However, there is a strategy called view collation that probably solves your problems. I suggest reading this: http://wiki.apache.org/couchdb/View_collation
To summarize it: You need to use different document types and then use a single view to collate the results.
As specified here, a filter can be used with the _changes feed like this:
curl "$HOST/db/_changes?filter=app/important"
Now I am trying to use this pattern with a standard view access, like this:
curl -X GET $HOST/db/_design/live_data/_view/all-comments&filter=live_data/bytag?tag=testing
I have also tried ? instead of &:
curl -X GET $HOST/db/_design/live_data/_view/all-comments?filter=live_data/bytag?tag=testing
But the filter has no effect: all documents are shown, even those which should not be validated by the filter.
The filter that I am using is:
function(doc, req)
{
for( var i in doc.tags ) {
if(doc.tags[i] == req.query.tag) {
return true;
}
}
return false;
}
Am I doing something wrong in the curl calls?
Is it at all possible to use views together with filters, or are filters limited to the _changes feed? I have seen no examples of filters except related to _changes
Yes, it seems that filters are limited to _changes requests only.
If you want to filter data from views, you can use "startkey" and "endkey" parameters with possibly more complex json keys and/or reduce grouping levels to achieve your desired results.
If this worked it would impose on CouchDB the responsibility of iterating through all the view's records and execute the filter function on them. It is not a good way of doing things, especially when you have the possibility of pre-indexing documents using views and arrays of keys (like [date, tag]).
But nothing will forbid you of implementing this filter in a backend of your own. You would only have to load all the view documents from CouchDB, execute the filter function on them and return. But that wouldn't be fast.
It seems you are unable to nest databases in CouchDB. How do people work around this limitation? For example, assume I want to create a blogging engine where each domain has a separate database. Within each database I might want a Users database, an Orders database, etc. to contain the various user documents, order documents, and so forth.
The obvious way seems to be a flat structure where the database name demarcates the artificial boundary between database nesting levels with a hyphen:
myblog.com-users
myblog.com-posts
myblog.com-comments
anotherblog.com-users
anotherblog.com-posts
anotherblog.com-comments
...hundreds more...
Another solution would be to keep the lower-level databases and mark each document with the top-level value:
users database containing a document User1, with field instance="Test" or a field domain="myblog.com"
I think you're misusing the term database here. There is no reason you can't store the users, posts, and comments data in a single couchdb database. Your couchdb views can separate out the user documents from the posts documents, from the comments documents.
example map function for user documents in a couchdb database:
function(doc) {
if (doc.type = 'user') { // only return user documents
emit([doc.domain, doc.id], doc); // the returned docs will be sorted by domain
}
}
see View Api for ways to restrict that views results by domain using startkey and endkey with view collation.
I think the best solution is to have one database per domain, each storing domain specific data.