Putting permission and authorization logic in the database

Putting permission and authorization logic in the database - security

I have a conceptual/architectural question I'd like to get some input on. Before I go into the details, I'd like to mention that I am well-aware of the arguments against putting application logic into the database, and the importance of maintaining abstraction and separation of concerns.
That being said, the application in question is a fairly simple one, in which performance is relatively more important than best practices. It is a new app, built with very modern technologies but on old school principles (stored procs, no ORM, etc).
I have a fairly complex "summary view" which is going to be driven by data provided by a stored procedure. Most of the elements of this view are going to have permission logic (not trivial, but nothing too complex) which will change both the appearance as well as the nature of the data based on the permissions of the currently logged in user (i.e. some data could be anonymized, other could be hidden, etc).
All data, as well as membership and ACL records, are stored in the same database.
So, the question is where to put the logic of applying user rights. The two options are:
1) Bring back all relevant data from the database into domain objects, then apply permissions in the middle-tier
2) Pass a user ID to the stored proc, and have it pass back an already prepared result to the middle-tier
At first look, conceptually, the no-brainer seems to be to throw it into the middle-tier (1st option) and leave the database concerned with what it does best - reading and writing data. However, the stored procedure will already be "tailored" to the specific view (think something along the lines of a report), and not used for anything else; thus, it just seems easier and lighter to process the permissions inside of the stored proc and bring back a prepared result with all the permissions already applied (a result that would have less data), rather than bringing back ALL the data into the middle tier and processing permissions there (eventually to discard half of the data anyway).
I am a bit torn and would appreciate some input. Intuitively, the 2nd option seems like a better fit, but "feels" very wrong.

Design the stored procedure to accept privileges enabled or features to use, e.g.
PROCEDURE retrieve_data (id_or_other_params,
anonymize := FALSE,
hide := FALSE,
some_other_feature := FALSE);
Procedure concerns now data retrieval only, with modifying options read from external source. Now let middle tier authenticate the user, decide which options to use and pass them to the procedure.

Related

Entity Framework database first approach using stored procedures

I am looking to build the data access layer of my MVC5 application. In our project we are going for database first approach with stored procedures only as team is more conversant with SQL and would like to perform all CRUD operations via stored procedures.
I am looking for good examples that show the implementation of this approach. I want to see how the entities are mapped. As this would be stored procedures in the database getting mapped to classes in .net.

I think its time for your team to become "conversant" with EF if you are going to use it. Doing every single CRUD operation with stored procedures is not the path I would take. If the stored procedure is doing something simple as:
Get the company record with ID 1
Then I would not use stored procedure and use EF. For more complex operations, stored procedures can be used. Therefore, you and your team may want to have team work session to decide on when to use stored procedures and when not to. Once you have decided, the whole team should stick to that approach. If you need to change it, have another meeting and make sure everyone is in the know. It is important for everyone to follow the same pattern once the team has agreed to it.
How to use stored procedures with EF?
I would start with one test stored procedure to see how the whole thing works. Once you and your team know exactly how the process works with EF, then put together a design, conventions etc. and then the whole team should follow the same pattern.
Write a test stored procedure which returns a resulset.
Create the EDMX by connecting to your database from Visual Studio.
Add the stored procedure to your EDMX.
Use the model browser to add a Function Import. This will create a method in your context which you can call like any other method, but underneath it will call your stored procedure. Please see this answer for more on how to do this step.
Step 4 will create a class based on your stored procedure's resultset.
Note
You may need to set this flag to off, TEMPORARILY, for EF to create the complex type based on your stored procedure result set.
SET FMTONLY OFF
See this answer for more about the flag.

CQRS design: nosql data view

This is a "language agnostic" question.
I started to study the CQRS pattern.
I've a simple question. I'm supposing to have 2 different storage layer: one relational for the commands(Mysql etc..) and one NoSql (mongo,cassandra.. etc) for the "query"?
Let me explain a little example:
1) As a user I want to insert a "Todo task"
Command: "Create Task" and will insert a new task into a database which have the User and the Todo tables.
2) As a user I'm able to see a list of created task
Query: "GetTasks" that will return a "view" with a collection of task taken from a non sql table named "UserTasks" which have a user and a list of created task.
Is the right approach? I'm sorry if the language is poor, it's just a little example.
If it seems a good approach (again, don't consider details) what is the best approach to keep updated the data stores?
I'm thinking to raise an event like "TaskCreated" and take the new task and insert those information in the nosql storage.
Thanks!

I can't really understand what you're looking for. but... typically, a command would be something that results in side effects. Queries don't cause side effects. GetTasks wouldn't really be a command, but a query.
Your "CreateTask" would be a command, which would result in the task added to the relevant data store(s). Your GetTasks query would retrieve that information from a datastore. It doesn't really matter if you're using a SQL or NoSQL store for this.
The "CommandStore" is typically the store that has just enough data to enforce invariants. In your case, what data is required for that? Is some information required to decide whether or not a task can be registered? For example, say, you have a requirement that a user can have at most 3 "todo"s. In this case, a table in the "Command Store" storing (UserId, Todo Count) is enough. You could also use (UserId, [TodoId]) - ie. store a list of todo ids so that you can gain idempotence. All other information about the user and tasks would be query data, and would be in the query store.
Hope that makes sense.

While there are times when you may wish to store commands, you generally don't. Rather a popular approach is to store the domain events that occur as a result of the commands.This is referred to as Event Sourcing. This would make 'STOREA' a store of events or to put it another way, an event stream. 'STOREB' is typically referred to as the Read Model. It has a de-normalised structure optimised for read speed. It is kept up to date via de-normalisers which respond to specific events. A key point to note here is that there is often a lag between the event being raised and the read model being updated. This in my opinion is a good thing but needs to be thought about when designing the UI.
For more info take a look at CQRS – A Step-by-Step Guide to the Flow of a typical Application
I hope that helps

Couchdb-lucene and ad-hoc queries for the authenticated user

I'm using CouchDB to store data coming from various sources and couchdb-lucene to allow ad-hoc queries. That's important for me because I display the data in a feed and I want this feed to be filterable. CL seems perfect for that.
However, I also want to introduce permissions to the feed app - a user should only be able to see a feed item if he/she has the permission to see it.
Now, I would like to be able to run ad-hoc queries and only return the feed items that the currently authenticated user has permissions to read.
The only solution that I could figure out (so far) was to add a 'permissions' field to each feed item where I store all the permission for the other users (obviously skipping the users that have no permissions for this item at all)
permissions: [{user_id: '123', read: true, write: true}, ...]
and then index this array in CL.
While this will probably work, I feel kind of bad being forced to nest the permissions metadata in the feed item...it might even be a better solution than keeping it separate, but I just don't like that I don't seem to have a choice here.
The only other solution (well, other than dumping CouchDB) would be to run the ad-hoc query without being concerned about the permissions, then run a second query on the server that selects all "my items" and do a set intersection. But those sets can be huge (and if I chunk it, it would require possibly many DB requests => slow).
Is my solution fine or is there anything better? Or is CouchDB just not a good fit for such queries?
Cheers!

You are on the right path with keeping that permission data on the document itself. This will be the easiest way for you to build views later on, which will enable you to check for user permissions. So dont worry and just let it flow in that direction. Feeling bad about nesting that data probably comes from previous ages when you were using SQL and RDBMS'es, where you'd want to normalize the hell out of each table. This time it's completely different :)
Btw, the only possibility to do "JOINS" in CouchDB is to use Linked Documents. If you are interested you can give that a try. However it wont enable you to look inside the linked document, while creating a view.

Code generation against Sprocs?

I'm trying to understand choices for code generation tools/ORM tools and discover what solution will best meet the requirements that I have and the limitations present.
I'm creating a foundational solution to be used for new projects. It consists of ASP.NET MVC 3.0, layers for business logic and data access. The data access layer will need to go against Oracle for now, and then switch to SQL this year as the db migration is finished.
From a DTO standpoint mapping to custom types in the solution, what ORM/code generation tool will work with creating my needed code but can ONLY access Stored Procs in Oracle and SQL.?
Meaning, I need to generate the custom objects that are the artifacts from and being pushed to the stored procedures as the parameters, I don't need to generate the sprocs themselves, they already exist. I'm looking for the representation of what the sproc needs and gives back to be generated into DTOs. In some cases I can go against views and generate DTOs. I'm assuming most tools already do this. But for 90% of the time, I don't have access directly to any tables or views, only stored procs.
Does this make sense?

ORMs are best at mapping objects to tables (and/or views), not mapping objects to sprocs.
Very few tools can do automated code generation against whatever output a sproc may generate, depending on the complexity of the sproc. It's much more straight-forward to code generate the input to a sproc as that is generally well defined and clear.
I would say if you are stuck with sprocs, your options for using third party code to help reduce your development and maintenance time are severely limited.
I believe either LinqToSql or EntityFramework (or both?) are capable of some magic with regards to SQL Server to try to mostly automatically figure out what a sproc may be returning. I don't think it works all the time, it's just sophisticated guess work and I seriously doubt it would work with Oracle. I am not aware of anything else software-wise that even attempts to figure out what a sproc may return.
A sproc can return multiple diverse record sets that can be built dynamically by the sproc depending on the input and data in the database. A technical solution to automatically anticipating sproc output seems like it would require the following:
A static set of underlying data in the database
The ability to pass all possible inputs to the sproc and execute the sproc without any negative impact or side effects
That would give you a static set of possible outputs for any given valid input. A small change in the data in the database could invalidate everything.
If I recall correctly, the magic Microsoft did was something like calling the sproc passing NULL for all input parameters and assuming the output is always exactly the first recordset that comes back from the database. That is clearly an incomplete solution to the problem, but in simple cases it appears to be magic because it can work very well some of the time.

Hiding sensitive/confidential information in log files

How would you go about hiding sensitive information from going into log files? Yes, you can consciously choose not to log sensitive bits of information in the first place, but there can be general cases where you blindly log error messages upon failures or trace messages while investigating a problem etc. and end up with sensitive information landing in your log files.
For example, you could be trying to insert an order record that contains the credit card number of a customer into the database. Upon a database failure, you may want to log the SQL statement that was just executed. You would then end up with the credit card number of the customer in a log file.
Is there a design paradigm that can be employed to "tag" certain bits of information as sensitive so that a generic logging pipeline can filter them out?

My current practice for the case in question is to log a hash of such sensitive information. This enables us to identify log records that belong to a specific claim (for example a specific credit-card number) but does not give anybody the power to just grab the logs and use the sensitive information for their evil purposes.
Of course, doing this consistently involves good coding practices. I usually choose to log all objects using their toString overloads (in Java or .NET) which serializes the hash of the values for fields marked with a Sensitive attribute applied to them.
Of course, SQL strings are more problematic, but we rely more on our ORM for data persistence and log the state of the system at various stages then log SQL queries, thus it is becomes a non-issue.

I would personally regard the log files themselves as sensitive information and make sure to restrict access to them.

Logging a credit card number could be a PCI violation. And if you aren't PCI compliant, you will be charged higher card-processing fees. Either don't log sensitive information, or encrypt your entire log files.
Your idea of "tagging" sensitive information is intriguing. You could have a special data type for Sensitive information, that wrapped the real, underlying data type. Whenever this object is rendered as a character string, it just returns "***" or whatever.
However, this could require widespread coding changes, and requires a level of concious vigilance similar to that needed to avoid logging sensitive information in the first place.

In your example, you should be encrypting the credit card number or, better yet, not even storing it in the first place.
If, say, you were logging something else, like a login, you might want to explicitly replace a password with *****.
However, this manages to neatly avoid answering the question you've posed in the first place. In general, when dealing with sensitive information, it should be encrypted on its way to any form of permanent storage, be it a database file or a log file. Assume that a Bad Guy is going to be able to get their hands on either, and protect the information accordingly.

If you know what you're trying to filter, you may run you log output through a Regex cleaning expression before you log it.

Regarding SQL statements specifically, if your language supports it, you should be using parameters instead of putting values in the statement itself. In other words:
select * from customers where credit_card = ?
Then set the parameter to the credit card number.
Of course, if you plan to log SQL statements with parameters filled in, you'd need some other way to filter out sensitive data.

Refer this tool, created exactly for this use case.
If you want to mask only selected field, during logging and keep other field values as is. you can try this.
https://github.com/senthilaru/sp-util
<dependency>
<groupId>com.immibytes</groupId>
<artifactId>sp-utils</artifactId>
<version>1.0.0-RELEASE</version>
</dependency>

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string