Couchdb-lucene and ad-hoc queries for the authenticated user - couchdb

I'm using CouchDB to store data coming from various sources and couchdb-lucene to allow ad-hoc queries. That's important for me because I display the data in a feed and I want this feed to be filterable. CL seems perfect for that.
However, I also want to introduce permissions to the feed app - a user should only be able to see a feed item if he/she has the permission to see it.
Now, I would like to be able to run ad-hoc queries and only return the feed items that the currently authenticated user has permissions to read.
The only solution that I could figure out (so far) was to add a 'permissions' field to each feed item where I store all the permission for the other users (obviously skipping the users that have no permissions for this item at all)
permissions: [{user_id: '123', read: true, write: true}, ...]
and then index this array in CL.
While this will probably work, I feel kind of bad being forced to nest the permissions metadata in the feed item...it might even be a better solution than keeping it separate, but I just don't like that I don't seem to have a choice here.
The only other solution (well, other than dumping CouchDB) would be to run the ad-hoc query without being concerned about the permissions, then run a second query on the server that selects all "my items" and do a set intersection. But those sets can be huge (and if I chunk it, it would require possibly many DB requests => slow).
Is my solution fine or is there anything better? Or is CouchDB just not a good fit for such queries?
Cheers!

You are on the right path with keeping that permission data on the document itself. This will be the easiest way for you to build views later on, which will enable you to check for user permissions. So dont worry and just let it flow in that direction. Feeling bad about nesting that data probably comes from previous ages when you were using SQL and RDBMS'es, where you'd want to normalize the hell out of each table. This time it's completely different :)
Btw, the only possibility to do "JOINS" in CouchDB is to use Linked Documents. If you are interested you can give that a try. However it wont enable you to look inside the linked document, while creating a view.

Related

How to structure relationships in Azure Cosmos DB?

I have two sets of data in the same collection in cosmos, one are 'posts' and the other are 'users', they are linked by the posts users create.
Currently my structure is as follows;
// user document
{
id: 123,
postIds: ['id1','id2']
}
// post document
{
id: 'id1',
ownerId: 123
}
{
id: 'id2',
ownerId: 123
}
My main issue with this setup is the fungible nature of it, code has to enforce the link and if there's a bug data will very easily be lost with no clear way to recover it.
I'm also concerned about performance, if a user has 10,000 posts that's 10,000 lookups I'll have to do to resolve all the posts..
Is this the correct method for modelling entity relationships?
As said by David, it's a long discussion but it is a very common one so, since I have on hour or so of "free" time, I'm more than glad to try to answer it, once for all, hopefully.
WHY NORMALIZE?
First thing I notice in your post: you are looking for some level of referential integrity (https://en.wikipedia.org/wiki/Referential_integrity) which is something that is needed when you decompose a bigger object into its constituent pieces. Also called normalization.
While this is normally done in a relational database, it is now also becoming popular in non-relational database since it helps a lot to avoid data duplication which usually creates more problem than what it solves.
https://docs.mongodb.com/manual/core/data-model-design/#normalized-data-models
But do you really need it? Since you have chosen to use JSON document database, you should leverage the fact that it's able to store the entire document and then just store the document ALONG WITH all the owner data: name, surname, or all the other data you have about the user who created the document. Yes, I’m saying that you may want to evaluate not to have post and user, but just posts, with user info inside it.This may be actually very correct, as you will be sure to get the EXACT data for the user existing at the moment of post creation. Say for example I create a post and I have biography "X". I then update my biography to "Y" and create a new post. The two post will have different author biographies and this is just right, as they have exactly captured reality.
Of course you may want to also display a biography in an author page. In this case you'll have a problem. Which one you'll use? Probably the last one.
If all authors, in order to exist in your system, MUST have blog post published, that may well be enough. But maybe you want to have an author write its biography and being listed in your system, even before he writes a blog post.
In such case you need to NORMALIZE the model and create a new document type, just for authors. If this is your case, then, you also need to figure out how to handler the situation described before. When the author will update its own biography, will you just update the author document, or create a new one? If you create a new one, so that you can keep track of all changes, will you also update all the previous post so that they will reference the new document, or not?
As you can see the answer is complex, and REALLY depends on what kind of information you want to capture from the real world.
So, first of all, figure out if you really need to keep posts and users separated.
CONSISTENCY
Let’s assume that you really want to have posts and users kept in separate documents, and thus you normalize your model. In this case, keep in mind that Cosmos DB (but NoSQL in general) databases DO NOT OFFER any kind of native support to enforce referential integrity, so you are pretty much on your own. Indexes can help, of course, so you may want to index the ownerId property, so that before deleting an author, for example, you can efficiently check if there are any blog post done by him/her that will remain orphans otherwise.
Another option is to manually create and keep updated ANOTHER document that, for each author, keeps track of the blog posts he/she has written. With this approach you can just look at this document to understand which blog posts belong to an author. You can try to keep this document automatically updated using triggers, or do it in your application. Just keep in mind, that when you normalize, in a NoSQL database, keep data consistent is YOUR responsibility. This is exactly the opposite of a relational database, where your responsibility is to keep data consistent when you de-normalize it.
PERFORMANCES
Performance COULD be an issue, but you don't usually model in order to support performances in first place. You model in order to make sure your model can represent and store the information you need from the real world and then you optimize it in order to have decent performance with the database you have chose to use. As different database will have different constraints, the model will then be adapted to deal with that constraints. This is nothing more and nothing less that the good old “logical” vs “physical” modeling discussion.
In Cosmos DB case, you should not have queries that go cross-partition as they are more expensive.
Unfortunately partitioning is something you chose once and for all, so you really need to have clear in your mind what are the most common use case you want to support at best. If the majority of your queries are done on per author basis, I would partition per author.
Now, while this may seems a clever choice, it will be only if you have A LOT of authors. If you have only one, for example, all data and queries will go into just one partition, limiting A LOT your performance. Remember, in fact, that Cosmos DB RU are split among all the available partitions: with 10.000 RU, for example, you usually get 5 partitions, which means that all your values will be spread across 5 partitions. Each partition will have a top limit of 2000 RU. If all your queries use just one partition, your real maximum performance is that 2000 and not 10000 RUs.
I really hope this help you to start to figure out the answer. And I really hope this help to foster and grow a discussion (how to model for a document database) that I think it is really due and mature now.

Combine CouchDB databases with replication while recording source db

I’m just starting out with CouchDB (2.1), and I’m planning to use it to replicate confidential per-user data from a mobile app up to my server. I’ve read that per-user databases are the best way to do this, and I’ve set that up. Each database has a mix of user-created documents of types Foo and Bar.
Now, I’d also like to be able to collect multi-user slices of that data together into one database and build views on it for admin reporting. Say I want a database which contains all the Foos from all users. So far so good, an entry in _replicator with a filter from each user database to one target does the job.
But looking at the combined database, I can’t tell which user a given Foo came from. I could write the user id into each document within the per-user database but that seems redundant and adds the complexity of validation. Is there any other way?
CouchDB's replicator simply tries to match up the exact state of a given document in the target database — and if it can't, it stores ± the exact source contents anyway (as a conflicting version).
Furthermore the _rev field of a document, which the replication system uses to check if a document needs to be updated, is actually based on (a hash over) the other document fields.
So unfortunately you can't add metadata during replication. This would indeed be handy for this and other per-user vs. shared replication situations, but it's not something CouchDB currently supports, and it would break some optimizations to add support for it.
I could write the user id into each document within the per-user database but that seems redundant and adds the complexity of validation. Is there any other way?
Including something like a .user field in each document is the right solution.
As far as being redundant, I wouldn't think of it that way — or at least, not as a bad thing. You'll find with CouchDB (and like other NoSQL stores) there's a trend to "denormalize" data to begin with. Especially given the things replication lets me do operationally and architecturally, I'd much rather have a self-contained document than one that relies on metadata derived from a database name.
I'm not sure exactly how in your case an extra field will make validation more complex, so I can't fully speak to that. You do want to make sure the user writing the document has set it "honestly", and so yes there is a bit more complication, but usually not too burdensome in most cases.

Complex Finds in Domain Driven Design

I'm looking into converting part of an large existing VB6 system, into .net. I'm trying to use domain driven design, but I'm having a hard time getting my head around some things.
One thing that I'm completely stumped on is how I should handle complex find statements. For example, we currently have a screen that displays a list of saved documents, that the user can select and print off, email, edit or delete. I have a SavedDocument object that does the trick for all the actions, but it only has the properties relevant to it, and I need to display the client name that the document is for and their email address if they have one. I also need to show the policy reference that this document may have come from. The Client and Policy are linked to the SavedDocument but are their own aggregate roots, so are not loaded at the same time the SavedDocuments are.
The user is also allowed to specify several filters to reduce the list down. These to can be from properties that are stored on the SavedDocument or the Client and Policy.
I'm not sure how to handle this from a Domain driven design point of view.
Do I have a function on a repository that takes the filters and returns me a list of SavedDocuments, that I then have to turn into a different object or DTO, and fill with the additional client and policy information? That seem a little slow as I have to load all the details using multiple calls.
Do I have a function on a repository that takes the filters and returns me a list of SavedDocumentsForList objects that contain just the information I want? This seems the quickest but doesn't feel like I'm using DDD.
Do I load everything from their objects and do all the filtering and column selection in a service? This seems the slowest, but also appears to be very domain orientated.
I'm just really confused how to handle these situations, and I've not really seeing any other people asking questions about it, which masks me feel that I'm missing something.
Queries can be handled in a few ways in DDD. Sometimes you can use the domain entities themselves to serve queries. This approach can become cumbersome in scenarios such as yours when queries require projections of multiple aggregates. In this case, it is easier to use objects explicitly designed for the respective queries - effectively DTOs. These DTOs will be read-only and won't have any behavior. This can be referred to as the read-model pattern.

Why isnt there a read analog of validate_doc_update in couchdb?

I am posing it as a suggested feature of couchdb because thats is the best way to express what i would like to achieve, and as a rant because i have not found a good reason for its lack:
Why not have a validate_doc_read(doc, userCtx) function so that I can implemen per-document read control? It would work exactly as validate_doc_update works, by throwing an error when you want to deny the read. What am I missing? Has someone found a workaround for per-document read control?
I'm not sure what the actual reason is, but having read validation would make reads very slow, and view indexes very hard to update incrementally (or perhaps impossible meaning that you'd basically have to have a per-user index).
The way to implement what you want is via filtered replication, so you create a new DB with only the documents you want a given user to be able to read.
The main problem to create a validate_doc_read, is how do we work with reduce functions with that behavior.
I can't believe thar a validate_doc_read is the best solution because we will give away one feature in favour of another.
In this way, you must restrict the view access using a proxy.

How to implement CQS with in memory changes?

Having Watched this video by Greg Yound on DDD
http://www.infoq.com/interviews/greg-young-ddd
I was wondering how you could implement Command-Query Separation (CQS) with DDD when you have in memory changes?
With CQS you have two repositories, one for commands, one for queries.
As well as two object groups, command objects and query objects.
Command objects only have methods, and no properties that could expose the shape of the objects, and aren't to be used to display data on the screen.
Query objects on the other hand are used to display data to the screen.
In the video the commands always go to the database, and so you can use the query repository to fetch the updated data and redisplay on the screen.
Could you use CQS with something like and edit screen in ASP.NET, where changes are made in memory and the screen needs to be updated several times with the changes before the changes are persisted to the database?
For example
I fetch a query object from the query repository and display it on the screen
I click edit
I refetch a query object from the query object repository and display it on the form in edit mode
I change a value on the form, which autoposts back and fetches the command object and issues the relevant command
WHAT TO DO: I now need to display the updated object as the command made changes to the calculated fields. As the command object has not been saved to the database I can't use the query repository. And with CQS I'm not meant to expose the shape of the command object to display on the screen. How would you get a query object back with the updated changes to display on the screen.
A couple of possible solutions I can think of is to have a session repository, or a way of getting a query object from the command object.
Or does CQS not apply to this type of scenario?
It seems to me that in the video changes get persisted straight away to the database, and I haven't found an example of DDD with CQS that addresses the issue of batching changes to a domain object and updating the view of the modified domain object before finally issuing a command to save the domain object.
So what it sounds like you want here is a more granular command.
EG: the user interacts with the web page (let's say doing a check out with a shopping cart).
The multiple pages getting information are building up a command. The command does not get sent until the user actually checks out where all the information is sent up in a single command to the domain let's call it a "CheckOut" command.
Presentation models are quite helpful at abstracting this type of interaction.
Hope this helps.
Greg
If you really want to use CQS for this, I would say that both the Query repo and the Write repo both have a reference to the same backing store. Usually this reference is via an external database - but in your case it could be a List<T> or similar.
Also for the rest of your concerns ...
These are more so concerns with eventual consistency as opposed to CQRS. You do not need to be eventually consistent with CQRS you can make the processing of the command also write to the reporting store (or use the same physical store for both as mentioned) in a consistent fashion. I actually recommend people to do this as their base architecture and to later come throught and introduce eventual consistency where needed as there are costs azssociated with it.
In memory, you would usually use the Observer design pattern.
Actually, you always want to use this pattern but most databases don't offer an efficient way to call a method in your app when something in the DB changes.
The Unit of Work design pattern from Patterns of Enterprise Application Architecture matches CQS very well - it is basically a big Command that persist stuff in the database.
JdonFramework is CQRS DDD java framework, it supply a domain events + Asynchronous pattern, more details https://jdon.dev.java.net/

Resources