Architecture decisions for system comprising mobile app with database in cloud and varying user restriction levels - azure

I am looking to develop an app that is to be used by a fairly small number of people and which has to store and recall data from a cloud database. Users should have various access levels in that some can create stuff, some just read, others modify, some can do anything etc. Just like you would do on a file system.
I am currently considering Azure (very new to it) and thinking what would be the components involved in the project. Obviously, a mobile app (Xamarin.Forms) would be front end. Some kind of Cosmos DB or another database in the cloud. Blob storage too for the media files created by users. But my main question is how to implement the control of what user can do what actions to which data.
A simple way would be to do it within the app itself, but that is counter intuitive and a security risk. Even though this is internal app used by people in the same or sister organizations, it really sounds bad.
Best option would be if that's handled by database itself, but I am not aware of existence of such mechanism. Hopefully, this actually exists and someone will point me in the right direction.
Only other way I see is having some kind of mid layer, still on the back end but just before database. However that also seems clunky and am also unaware of how to even implement it "in cloud".
What would be my actual options?
To clarify, it's about having permissions assigned based on certain columns of a table, for example, and not about having different tables with different user that share parts of data.
That's why it is "Architecture decisions" question, and not "how do i give read permissions to user X of my database Y".
An answer might be "Database X" has what you want. Or, least favourably, "There's no way to offload that to DB. You will have to keep all data separately, so that users can only operate on their set of data, and then collate stuff on the backend". Or something in between, perhaps.

I'm not knowledgeable with Azure or any of that other stuff, but every DBMS will have user accounts that enable different permissions, eg for Apache Derby, MySQL, etc.
I would never implement authentication on the client side.

Related

Is CouchDB/PouchDB a viable solution for my project? Any advice is welcome

I have been reading up a lot about CouchDB (and PouchDB) and am still unsure what the best option would be for a project of mine.
I do have a possible way to solve the project in my head based on what I have read so far, but I am unsure about things like performance and would love to get some insights. Or perhaps there's a better place to ask this question? Please let me know if that's the case! (Already tried their IRC channel and the mailing list, but no answers there as of yet)
So the project is basically an 'offline-first' mobile application. The users are device installers. They get assigned a few locations and devices to install every day. They need to walk around buildings and update the data (eg. device X has been installed at location Y; Or property A of device B on location C has been changed to D, etc...)
Some more info about the basic data.
There are users, they are the device installers. They need to log into the app.
There are locations, all the places that the device installers need to visit.
There are devices, all the different devices that can be installed by the users.
There are todos, basically a planned installation for a specific user at a specific location for specific devices.
Of course I have tried to simplify the data, but this should contain the gist.
Now, these are important characteristics of the application:
Users, locations and devices can be changed by an administrator (back-end software).
Todos can be planned by an administrator (back-end software).
App user (device installer) only sees his/her own todos/planning for today + 1 week ahead.
Multiple app users (device installers) might be assigned to the same location and/or todos, because for a big building there might be multiple installers at work.
Automatic synchronization between the data in each app in use and the global database.
Secure, it should only be possible for user X to request his/her own todos/planning.
Taking into account these characteristics I currently have the following in mind:
One global 'master' database containing all users, locations, devices, todos.
Filtered replication/sync using a selector object which for every user replicates only the data that may be accessible for this specific user.
Ionic application using PouchDB which does full/normal replication/sync with his/her own user database.
Am I correct in assuming the following?
The user of the application using PouchDB will have full read access on his own user database which has been filtered server-side?
For updating data I can make use of validate_doc_update to check whether the user may or may not modify something?
Any changes done on the PouchDB database will be replicated to the 'user' database?
These changes will then also be replicated from the 'user' database to the global 'master' database?
Any changes done on the global 'master' database will be replicated to the 'user' database, but only if required (only if there have been new/changed(/deleted) documents for this user)?
These changes will then also be replicated from the 'user' database to the PouchDB database for the mobile app?
If all this holds true, then it might be a good fit for this project. At least I think so? (Correct me if I'm wrong!) But I did read some 'performance' problem regarding filtered replication. Suppose there are hundreds of users (device installers) (there aren't this many right now, but there might be in the future). Then would it be a problem to have this filtered replication running for hundreds of 'user' databases? I did read about CouchDB 2.0 and 2.1 having a selector object to do filtered replication instead of the usual JS MapReduce which is supposed to be up to 10x faster. But my question is still: does this work well, even for hundreds (or even thousands) of 'filtered' databases? I don't know enough about the underlying algorithms and limitations but I am wondering whether any change to the global 'master' database does or does not require expensive calculations to run to decide which 'filtered' databases to replicate to. And if it does... does it matter in practice?
Please, any advice would be welcome. I did also consider using other databases. My first approach would actually have been to use a relational database. But one of the required characteristics of this app must be the real-time synchronization. In the past I have been able to handle this myself using revision fields in a RDBMS and with a lot of code, but I would really prefer something as elegant as CouchDB/PouchDB for the synchronization. This is really an area that would save me a lot of headache. Keeping this in mind, what are my options? Am I going in a possible right path or could performance become an issue down the road?
Also note that I have also thought about having separate databases for each user ('one database per user'), but I think it might not be the best fit for this project because some todos might be assigned to multiple users and when one user updates something for a todo, it must be updated for the other user as well.
Hopefully some CouchDB experts can shed some light on my questions. Much appreciated!
I understand there might be some debate but I am only interested in the facts and expertise of others.

node.js api gateway implementation and passport authentication

I am working on implementing a microservices-based application using node.js. While searching for examples on how to implement the api gateway, I came across the following article that seems to provide an example on implementing the api gateway: https://memz.co/api-gateway-microservices-docker-node-js/. Though, finding example for implementing the api gateway pattern in node.js seems to be a little hard to come by so far, this article seemed to be a really good example.
There are a few items that are still unclear and I am still have issues finding doc. on.
1) Security is a major item for the app. I am developing, I am having trouble seeing where the authentication should take place (i.e. using passport, should I add the authentication items in the api gateway and pass the jwt token along with the request to the corresponding microservice as the user's logged in information is needed for certain activities? The only issue here seems to be that all of the microservices would need passport in order to decrypt the jwt token to get the user's profile information. Would the microservice be technically, inaccessible to the outside world except through the api gateway as this seems to be the aim?
2) How does this scenario change if I need to scale to multiple servers with docker images on each one? How would this affect load balancing, as it seems like something would have to sit at a higher level to deal with load balancing?
I can tell that much depends on your application requirements. Really.
I'm now past the 5 years of experience in production microservices using several languages going from medium to very large scale system.
None of them shared the same requirements, and without having a deep understanding of what you need and what are your business (product) requirements it would be hard to know what's the right answer, by the way I'll try to share some experience to help you get it right.
Ideally you want the security to be encapsulated in an external service, so that you can update and apply new policies faster. Also you'll be able to deprecate all existing tokens should you find a breach in your system or if someone in your team inadvertedly pushes some secret key (or cert) to an external service.
You could handle authentication on each single service or using an edge newtwork tool (such as the API Gateway). Becareful choosing how to handle it because each one has it's own privileges:
Choosing the API Gateway your services will remain lighter and do not need to know anything about the authentication steps, but surely at some point you'll need to know who the authenticated user is and you need some plain reference to it (a JSON record, a link or ID to a "user profile" service). How you do it it's up to your requirements and we can even go deeper talking about different pros and cons about each possible choice applicable for your case.
Choosing to handle it at the service level requires you (and your teams) to understand better about the security process taking place (you can hide it with a good library) and you'll need to give them support from your security team (it's may also be yourself btw you know the more service implementing security, the more things you'll have to think about to avoid adding unnecessary features). The big problem here is that you'll often end up stopping your tasks to think about what would help you out on this particular service and you'll be tempted to extend your authentication service (and God, unless you really know what you're doing, don't add a single call not needed for authentication purposes).
One thing is easy to be determined: you surely need to think about tokens (jwt, jwe or, again, whatever your requirements impose).
JWT has good benefits, but data is exposed to spoofing, so never put in there sensitive data or things you wouldn't publicly share about your user (e.g. an ID is probably fine, while security questions or resolution to 2FA would not). JWE is an encrypted form of the spec. A common token (with no meaning) would require a backend to get the data, but it works much like cookie-sessions and data is not leaving your servers.
You need to define yourself the boundaries of your services and do yourself a favor: make each service boundaries clean, defined and standard.
Try to define common policies and standardize interactions, I know it may be easier to add a queue here, a REST endpoint there, a RPC there, but you'll soon end up with a bunch of IPC you will not be able to handle anymore and it will soon catch your attention.
Also if your business solution is pretty heavy to do I don't think it's a good idea to do yourself the API Gateway, Security and so on. I'd go with open source, community supported (or even company-backed if you have some budget) and production-tested solutions.
By definition microservice architectures are very dynamic, you'll fight to keep it immutable between each deployment version, but unless you're a big firm you cannot effort keeping live thousands of servers. This means you'll discover bugs that only presents under certain circumstances you cannot spot in other environments (it happens often to not be able to reproduce them).
By choosing to develop the whole stack yourself you agree with having to deal with maintenance and bug-discovery in your whole stack. So when you try to load a page that has 25 services interacting you know it may be failing because of a bug in: your API Gateway, your Security implementation, your token parser, your user account service, your business service A to N, your database service (if any), your database load balance (if any), your database instance.
I know it's tempting to do everything, but try to keep it flat and do what you need to do. By following this path you'll think about your product, which I think is what's the most important think to do now.
To complete my answer, about the scaling issues:
it doesn't matter. Whatever choice you pick it will scale seamlessly:
API Gateway should be able to work on a pool of backends (so from that server you should be able to redirect to N backend machines you can put live when you need to, you can even have some API to support automatic registration of new instances, or even simples put the IP of an Elastic Load Balancer or HAproxy or equivalents, and as you add backends to them it will just work -you have moved the multiple IPs issue from the API Gateway to one layer down).
If you handle authentication at services level (and you have an API Gateway) see #1
If you handle authentication at services level (without an API Gateway) then you need to look at some other level in your stack: load balancing (layer 3 or layer 7), or the DNS level, you can use several features of DNS to put different IPs to answer from, using even advanced features like Anycast if you need latency distribution.
I know this answer introduced a lot of other questions, but I really tried to answer your question. The fact is that you need to understand and evaluate a lot of things when planning a microservice architecture and I'd not write a SLOC without a very-written-plan printed on every wall of my office.
You'll often need to go mental focus and exit from a single service to review the global vision and check everything is going fine.
I don't want to scare you, I'm rather trying to make you think to succeed.
I just want you to make sure you correctly evaluated all of the possibilities before to decide to do everything from scratch.
P.S. Should you choose to act using an API gateway be sure to limit services to only accept requests through it. On the same machine just start listening on localhost, on multiple machines you'll need some advanced networking rule depending on your operating system.
Good Luck!

Postgresql database objects security

I am developing an application which will use a postgreSQL database backend. I want to prevent the user from seeing some of the db objects - stored procs implementation in particular etc.
The "obvious" (but perhaps wrong) way is to GRANT CRUD access to the database to a specific user who will have an encrypted uname/pwd?
Is there another way of doing this?.
Note: My target audience are largely non-programmers, so I don't need anything that is "unbreakable" (assuming such a thing existed).
PostgreSQL doesn't really support limiting visibility of procedure source code, user or database lists, etc. The best thing to do is accept that, or implement the procedure in C or PL/Java where it's somewhat harder to examine at the cost of considerably greater complexity of implementation.
In general, you should not have the database/table owner be the day-to-day operational user of the DB. Create a new user and GRANT it only the rights that it needs.
Most of the system catalogs have default SELECT rights granted to public so you really want to limit access you would need to explicitly REVOKE that access then GRANT it back to the database owner and any other users that should have it. You're likely to want to limit access to pg_proc if you want to limit access to procedure sources, for example. Such an approach is limited and fragile (root can always gain PostgreSQL superuser access, and from there do anything), but you've said that's probably OK for your purposes.
Messing with the system catalogs isn't really supported and can cause issues with metadata access in JDBC, psql, etc. See this related answer. If you mess with the catalogs and something breaks, you get to keep the pieces.
BTW, if you modify the catalogs, please try to avoid asking questions about databases with hand-modified catalogs here. At minimum specify extremely clearly that you have messed with the system catalogs and exactly what you have done. If possible, reproduce the issue on an unmodified database first.

Is CouchDB per-user database approach feasible for users with lots of shared data?

I want to implement a webapp - a feed that integrates data from various sources and displays them to users. A user should only be able to see the feed items that he has permissions to read (e.g. because they belong to a project that he is a member of). However, a feed item might (and will) be visible by many users.
I'd really like to use CouchDB (mainly because of the cool _changes feed and map/reduce views). I was thinking about implementing the app as a pure couchapp, but I'm having trouble with the permissions model. AFAIK, there are no per-document permissions in CouchDB and this is commonly implemented using per-user databases and replication.
But when there is a lot of overlap between what various users see, that would introduce a LOT of overhead...stuff would be replicated all over the place and duplicated in many databases. I like the elegance of this approach, but the massive overhead just feels like a dealbreaker... (Let's say I have 50 users and they all see the same data...).
Any ideas how on that, please? Alternative solution?
You can enforce read permissions as described in CouchDB Authorization on a Per-Database Basis.
For write permissions you can use validation functions as described on CouchDB
The Definitive Guide - Security.
You can create a database for each project and enforce the permissions there, then all the data is shared efficiently between the users. If a user shares a feed himself and needs permissions on that as well you can make the user into a "project" so the same logic applies everywhere.
Using this design you can authorize a user or a group of users (roles) for each project.
Other than (as victorsavu3 has suggested already) handling your read auth in a proxy between your app and couch, there are only two other alternatives that I can think of.
First is to just not care, disk is cheap and having multiple copies of the data may seem like a lot of unnecessary duplication, but it massively simplifies your architecture and you get some automatic benefits like easy scaling up to handle load (by just moving some of your users' DBs off to other servers).
Second is to have the shared data split into a different DB. This will occasionally limit things you can do in views (eg. no "Linked Documents") but this is not a big deal in many situations.

Windows BackupRead / BackupWrite and ACLs

I have been trying to understand what should be the right way in using BackupRead and BackupWrite for backing up data on a computer and especially about restoring it reliably.
Now I understand how to use the API and have been successful. However there's one thing that bothers me.
You can backup, beside the file content itself, any alternate data streams also the security information (ACLs).
Now if I would store the ACL data for backup and then later, once the data needs to be restored on a different machine OR a newly setup machine what should I do with the SIDs which are related to the ACL?
The SID is most likely no longer valid for the machine and how should the right user be selected?
Now I am looking at this on a bigger scale let's say this is a computer with multiple users and hundreds or thousands of objects with different settings this would be mess to get the data restored with the security settings applied to them again.
Is this something, if the user of the software wishes to backup the security settings, what the user has to take about himself and update them accordingly or what?
Additionally BackupRead and BackupWrite will give me the raw binary data of those items which is not all too hard to use however obviously this API does not even intend to face this issue.
Anyone has an idea how a backup application should handle this situation? What is your thought, or any pointers on guidelines for this specific topic?
Thanks a lot.
I think you understand correctly the problems with backup and restore of data. I think that correct understanding of problems is a half of its solving. I suppose that you are, like the most of users of the stackoverflow site, mostly software developer and not an administrator of a large network. So you see on the problem from another side of software developer and not from the side of the administrator. An administrator knows the restrictions of backup and restore of ACLs and already use it.
In general you should understand that the main purpose of backups to save the data and to restore the data later on the same computer or server. Another standard case is: one restore backup from one server to another server after the changing of hardware. In the case the old server will no more exist. Mostly one makes backups of servers and organize to work on the clients so, that no important data will be saved of the client computer.
In the most cases the backed up data has Domain Groups SIDs, Domain Users SIDs, well-known SIDs or SID aliases from the BUILTIN domain in the security descriptors. In the case one need make no changes of SIDs at all. If the administrator do will make some changes in ACL he can use different existing utilities like SubInACL.exe.
If you write Backup/Restore software which you want use for moving the data with the security information you can include in the backup some additional meta-information about the local SIDs of accounts/groups used in the saved security descriptors. In the Restore software you can provide the possibilities to replace SIDs from the saved security descriptors. Many year ago I wrote for one large customer some utilities to clear up the SIDs in SD in the file system, registry and services after domain migration. It was not so complex. So I suggest that you could implement the same feature in you Backup/restore software.
I do believe the Backup* APIs are primarily intended to backup and restore on the same machine, which would render the SID problem irrelevant. However, assuming a scenario where you need to restore a backup on a new install, here's my thoughts on solutions.
For well-known SIDs such as Everyone, Creator Owner and so on, there isn't really any problem.
For domain dependent SIDs you can store them as is, and upon restore you could fixup the domain part, if needed. Likely you should store the domain name as well for such SIDs.
For local users and groups, you should at least store the user/group name for each SID. Fixup on restore could be partially automatic based on these names, or manual (assuming an user interface for the application) where you ask the user whether he wishes to map this user to a new local user, convert these SIDs to a well-known SID, or keep as is.
Most of the issues related to such SIDs can (and probably typically will) be possible to handle automatically. I'd certainly appreciate a backup application that was smart enough to do the restore I asked it to and figure out that "Erik" on the old machine must be "Erik" on the new machine as well.
And a side note, if you do decide to go with such a solution, remember how annoying it is to start an overnight data transfer just to get back to something 5% done blocking on a popup it could just as easily defer :)

Resources