CouchDB set-difference/not-in condition - couchdb

I prepare to use CouchDB to my project. but cannot find a way to implement a view like an SQL SELECT * FROM Employees WHERE LastName NOT IN (SELECT LastName FROM Managers). In other words, I want to get a set from view A but not in view B. Question: how to implement not-in condition in CouchDB?

Keeping employees and managers lists different sets of documents is using relational structure where you DB is not relational. If, for some reason, you are forced to do that, you need some way to distinguish the scheme of the doc (from which table it is). Lets say you are doing it with field scheme:
{ _id: "EMPL_ID", scheme: "employee", ... }
{ _id: "MNGR_ID", scheme: "manager", employee: "EMPL_ID", ... }
Then you can use map:
function (doc) {
if (!doc.scheme) return;
if (doc.scheme != "manager") emit(doc.last_name, doc);
}
If, for some strange reason, you cannot do that, and you only have the reference to employee doc in manager doc, you can emit both documents:
function (doc) {
if (some_test_for_being_employee_scheme(doc))
emit([doc._id, 1], doc);
if (doc.emp_id)
emit([doc.emp_id, 0], null);
}
You will get the list of employees with keys ["employee_id", 1], and each manager is preceded with the row labeled as manager (key [..., 0]). This will require some space, but with list function you can filter out managers easily and the client will receive from DB only the non-managers.
Keep in mind, that it is only the workaround for not making proper DB design.

If you change the model to make it fit a document-oriented database, this would be easy. I generally keep a "type" key in all of my documents to keep different types of documents straight. If you have a single "person" type and decorate all "person" documents who are also "manager" with a separate key, you could easily emit view keys only for non-managerial personnel. If you opt to have a separate "manager" type, you could similarly restrict emitted view keys to non-managers only.

I think the answer is simply: you can't mix view results. Views are independent.
However, there is a strategy called view collation that probably solves your problems. I suggest reading this: http://wiki.apache.org/couchdb/View_collation
To summarize it: You need to use different document types and then use a single view to collate the results.

Related

MongoDb and Storing Relationships Between Objects

I am currently planning the development of an application using Node and I am stuck as to whether or not I should use MongoDb as a databse. Ideally I would like to use it. I understand how it works in general, but what I don't understand is how to reference other objects within a document model.
For example, let's say I have two objects; a User and an Order object.
{
Order : {
Id: 1,
Amount: 23.95
}
}
{
User: {
Id: 1,
Orders: [ ]
}
}
Essentially, a User will place an order, and upon creation of that Order object, I would like for the User object to update the Orders array appropriately.
First of all, I hear alot about MongoDb lacking relational functionality. So would I be able to store a reference to that order in the Orders array, perhaps by ID? Or should I just store a duplicate of the order object into the array?
If I were you, I would have a field named userId in Order to keep a reference to the user creating the order. Because the relation between User and Order is one-to-many, User may have many Order but Order only have one User.

Can I create multiple collections per database?

Switching from mongo to pouchdb (with Cloudant), i like the "one database per user" concept, but is there a way to create multiple collections/tables per database ?
Example
- Peter
- History
- Settings
- Friends
- John
- History
- Settings
- Friends
etc...
Couchdb does not have the concept of collections. However, you can achieve similar results using type identifiers on your documents in conjunction with Couchdb views.
Type Identifiers
When you save a document in Couchdb add a field that specifies the type. For example, you would store a friend like so:
{
_id: "XXXX",
type: "Friend",
first_name: "John",
...
}
And you would store history like this:
{
_id: "XXXX",
type: "History",
url: "http://www.google.com",
...
}
Both of these documents would be in the same database, and if you queried all documents on that database then you would receive both.
Views
You can create views that filter on type and then query those views directly. For example, create a view to retrieve friends like so (in Cloudant you can go to add new Design Document and you can copy and paste this directly):
{
"_id" : "_design/friends",
"views" : {
"all" : {
"map" : "function(doc){ if (doc.type && doc.type == 'Friend') { emit(doc._id, doc._rev)}}"
}
}
}
Let's expand the map function:
function(doc) {
if (doc.type && doc.type == "Friend") {
emit(doc._id, doc._rev);
}
}
Essentially this map function is saying to only associate documents to this view that have type == "Friend". Now, we can query this view and only friends will be returned:
http://SERVER/DATABASE/_design/friends/_view/all
Where friends = name of the design document and all = name of the view. Replace SERVER with your server and DATABASE with your database name.
You can find more information about views here:
https://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
You could look into relational-pouch for something like this. Else you could do "3 databases per user." ;)
I may not fully understand what you need here but in general you can achieve what you describe in 3 different ways in CouchDB/Cloudant/PouchDB.
Single document per person (Peter, John). Sure - if the collections are not enormous and more importantly if they are not updated by different users concurrently (or worse in different database instances) leading to conflicts then, in JSON just an element for each collection, holding an array and you can manipulate everything with just one document. Makes access a breeze.
Single document per collection (Peter History, Peter Settings ect). Similar constraints, but you could create a document to hold each of these collections. Provided they will not be concurrently modified often, you would then have a document for Peter's History, and another for Peter's Settings.
Single document per item. This is the finest grain approach - lots of small simple documents each containing one element (say a single History entry for Peter). The code gets slightly simpler because removing items becomes a delete and many clients can update items simultaneously, but now you depend on Views to bring all the items into a list. A view with keys [person, listName, item] for example would let you access what you want.
Generally your data schema decisions come down to concurrency. You mention PouchDB so it may be that you have a single threaded client and option 1 is nice and easy?

Is a type property the correct way to store different data entities in CouchDB?

I'm trying to wrap my head around CouchDB. I'm trying to switch off of MongoDB to CouchDB because I think the concept of views are more appealing to me. In CouchDB it looks like all records are stored in a single database. There is no concept of collections or anything, like in MongoDB. So, when storing different data entities such as users, blog posts, comments, etc, how do you differentiate between them from within your map reduce functions? I was thinking about just using some sort of type property and for each item I'd just have to make sure to specify the type, always. This line of thought was sort of reinforced when I read over the CouchDB cookbook website, in which an example does the same thing.
Is this the most reliable way of doing this, or is there a better method? I was thinking of alternatives, and I think the only other alternative way is to basically embed as much as I can into logical documents. Like, the immediate records inside of the database would all be User records, and each User would have an array of Posts, in which you just add all of the Posts to. The downside here would be that embedded documents wouldn't get their own id properties, correct?
Using type is convenient and fast when creating views. Alternatively you can consider using a part of the JSON document. I.e., instead of defining:
{
type: "user",
firstname: "John",
lastname: "Smith"
}
You would have:
{
user: {
firstname: "John",
lastname: "Smith"
}
}
And then in the view for emitting documents containing user information, instead of using:
function (doc) {
if (doc.type === "user") emit(null, doc);
}
You would write:
function (doc) {
if (doc.user) emit(null, doc);
}
As you can see there is not much difference. As you have already realized 1st approach is the most widely used but second (afaik) is well accepted.
Regarding the question of storing all Posts of one User in one single document. Depends on how you plan to update your document. Remember that you need to write the whole document each time that you update (unless you use attachments). That means that each time a user writes a new Post you need to retrieve the document containing the array of Posts, add/modify one element and update the document. Probably too much (heavy).

Mongoose: Only return one embedded document from array of embedded documents

I've got a model which contains an array of embedded documents. This embedded documents keeps track of points the user has earned in a given activity. Since a user can be a part of several activities or just one, it makes sense to keep these activities in an array. Now, i want to extract the hall of fame, the top ten users for a given activity. Currently i'm doing it like this:
userModel.find({ "stats.activity": "soccer" }, ["stats", "email"])
.desc("stats.points")
.limit(10)
.run (err, users) ->
(if you are wondering about the syntax, it's coffeescript)
where "stats" is the array of embedded documents/activeties.
Now this actually works, but currently I'm only testing with accounts who only has one activity. I assume that something will go wrong (sorting-wise) once a user has more activities. Is there anyway i can tell mongoose to only return the embedded document where "activity" == "soccer" alongside the top-level document?
Btw, i realize i can do this another way, by having stats in it's own collection and having a db-ref to the relevant user, but i'm wondering if it's possible to do it like this before i consider any rewrites.
Thanks!
You are correct that this won't work once you have multiple activities in your array.
Specifically, since you can't return just an arbitrary subset of an array with the element, you'll get back all of it and the sort will apply across all points, not just the ones "paired" with "activity":"soccer".
There is a pretty simple tweak that you could make to your schema to get around this though. Don't store the activity name as a value, use it as the key.
{ _id: userId,
email: email,
stats: [
{soccer : points},
{rugby: points},
{dance: points}
]
}
Now you will be able to query and sort like so:
users.find({"stats.soccer":{$gt:0}}).sort({"stats.soccer":-1})
Note that when you move to version 2.2 (currently only available as unstable development version 2.1) you would be able to use aggregation framework to get the exact results you want (only a particular subset of an array or subdocument that matches your query) without changing your schema.

Sorting on date after querying on multiple specific keys

Document structure:
{
"Type":"post"
"LastModified":"2010-11-01 21:55",
"CategoryID":3,
"ID":12
}
Having a bunch of different post docs in different categories is great. But I can't seem to figure out how to make a view which returns the documents ordered by date, when selecting the ones in, e.g., category 3 and 5. The categories are not known, the limit query should work still.
I've tried different approaches to the view but nothing comes close to achieving the desired result.
In SQL it could probably be done by something like this:
SELECT * FROM document WHERE document.CategoryID in (3,5) ORDER BY document.LastModified DESC;
I could just query a view like this the required number of times, manually sorting and paging the data:
function(doc) {
emit(doc.CategoryID, doc.ID);
}
So does anyone know if it's possible to avoid doing that and just have couchdb be a bit smarter?
I can think of 2 possible solutions.
Emit both CategoryID and LastModified in the map function.
function(doc) {
emit([doc.CategoryID, doc.LastModified], null);
}
Now you can query the view for ?startkey=[3]&endkey=[3, {}] to get all the docs with CategoryID=3 sorted by LastModified. To get docs for multiple CategoryIDs, you need to merge the sorted results.
Use couchdb-lucene to build the index. couchdb-lucene can handle complex queries.

Resources