Enforcing property-level authorization in domain objects

Enforcing property-level authorization in domain objects - security

I am implementing a RESTful service that has a security model requiring authorization at three levels:
Resource level authorization - determining whether the user has
access to the resource (entity) at all. For example, does the
current user have permission to view customers in general. If not,
then everything stops there.
Instance level authorization - determining whether the user has
access to a specific instance of a resource (entity). Due to
various rules and the state of the entity, the current user may not
be granted access to one of more customers within the set of
customers. E.g., a customer may be able to view their own information but not the information of another customer.
Property level authorization - determine which properties the user
has access to on an instance of the resource (entity). We have many
business rules that determine whether a user may see and/or change
individual properties of a resource (entity). For example, the
current user may be able to see the customer's name but not their
address or phone number as well as able to see and add notes.
Implementing resource-level authorization is straight-forward; however, the other two are not. I believe the solution for instance-level authorization will reveal itself with a solution to the (harder, imo) property-level authorization question. The latter issue is complicated by the fact that I need to communicate the authorization decisions, by property, in the response message (ala hypermedia) - in other words, this isn't something I can simply enforce in property setters.
With each request to the service, I must use the current user's information to perform these authorization checks. In the case of a GET request for either a list of resources or an individual resource, I need to tell the API layer which attributes the current user can see (is visible) and whether the attribute is read-only or editable. The API layer will then use this information to create the appropriate response message. For instance, any property that is not visible will not be included in the message. Read-only properties will be flagged so the client application can render the property in the appropriate state for the user.
Solutions like application services, aspects, etc. work great for resource-level authorization and can even be used for instance-level checks, but I am stuck determining how best to model my domain so the business rules and checks enforcing the security constraints are included.
NOTE: Keep in mind that this goes way beyond role-based security in that I am getting the final authorization result based on business rules using the current state of the resource and environment (along with verifying access using the permissions granted to the current user via their roles).
How should I model the domain so I have enforcement of all three types of authorization checks (in a testable way, with DI, etc)?

Initial Assumptions
I suppose the following architecture:
Stateless Scaling Sharding
. .
. .
. .
+=========================================+
| Service Layer |
+----------+ | +-----------+ +---------------+ | +------------+
| | HTTP | | | | | | Driver, Wire, etc. | |
| Client | <============> | RESTful | <====> | Data Access | <=======================> | Database |
| | JSON | | Service | DTO | Layer | | ORM, Raw, etc. | |
| | | | | | | | | |
+----------+ | +-----------+ +---------------+ | +------------+
+=========================================+
. .
. .
. .
Initially, let us suppose you are authenticating the Client with the Service Layer and obtain a particular token, that encodes the authentication and authorization information in it.
First of all, what I first think of is to process all the requests and then only filter them depending on authorization. This will make the whole thing much simpler and much easier to maintain. However, of course there might be some requests requiring expensive processing, in which case this approach is absolutely not effective. On the other hand, the heavy-load requests will most probably involve resource level access, which as you have stated is easy to organise, and it is possible to detect and authorise in Service Layer at API or at least Data Access levels.
Further Thoughts
As for instance and property level authorization, I will not even try to put it into the Data Access Layer and would completely isolate it beyond the API level, i.e. starting from Data Access Layer no any layer would even be aware of it. Even if you request a list of 1M objects and want to emit one or two properties from all objects for that particular client, it would be preferable to fetch the whole objects and then only hide the properties.
Another assumption is that your model is a clear DTO, i.e. simply a data container, and all the business logic is implemented in the Service Layer, particularly API level. And say you pass the data across HTTP encoded as JSON. So anyway, somewhere in front of the API layer, you are going to have a small serialization stage, to transform your model into JSON. So this stage is where I think is the ideal place to put the instance and property authorization.
Suggestion
If it comes to the property level authorization, I think there is no reasonable way to isolate the model from security logic. Be it a rule-based, role-based or whatever-based authorisation, the process is going to be verified upon a piece of value from the authentication/authorisation token provided by the Client. So, in the serialisation level, you will get basically two parameters, the token and the model, and accordingly will serialise appropriate properties or the instance as a whole.
When it comes to defining the rules, roles and whatevers per property for the model, it can be done in various ways depending on the available paradigms, i.e. depending on the language the Service Layer will be implemented. The definitions can be done heavily utilising Annotations (Java) or Decorators (Python). For emitting specific properties, Python will come handy with its dynamic typing and hacky features, e.g. Descriptors. In case of Java, you might end up encapsulating properties into a template class, say AuthField<T>.
Summary
Summing all up, I would suggest to put the instance and property authorization in front of the API Layer in the stage of serialisation. Thus basically, the roles/rules will be assigned in the model and the authorisation will be performed in the serialiser, being provided the model and token.

Since a comment was added recently, I thought I'd update this post with my learnings since I originally asked the question...
Simply put, my original logic was flawed and I was trying to do too much in my business entities. Following a CQRS approach helped make the solution clearer.
For state changes, the "write model" uses the business entity and a "command handler"/"domain service" performs authorization checks to make sure the user has the necessary permissions to update the requested object and, where applicable, change specific properties of that object. I still debate whether the property-level checks belong inside the business entity methods or outside (in the handler/service).
In the case of the "read model", a "query handler"/"domain service" checks the resource- and instance-level authorization rules so only objects to which the user has access are returned. The handler/service uses a mapper object that applies the property-level authorization rules to the data when constructing the DTO(s) to return. In other words, the data access layer returns the "projection" (or view) regardless of authorization rules and the mapper ignores any properties to which the current user does not have access.
I dabbled with using dynamic query generation based on the list of fields to which the user has access but found that to be overkill in our case. But, in a more complex solution with larger, more complex entities, I can see that being more viable.

Related

REST URI's and caching for GET requests when the list to be returned depends on the rights of the user

It is a multi-tenant serverless system.
The system has groups with permissions.
Users derive permissions based on the groups they are in.
If it makes a difference, we are using Cognito for authentication and it is a stateless application.
For example:
GET endpoint for sites (so sites that the logged-in user has access to based on the groups they are in)
GET endpoint for devices (so sites that the logged-in user has access to based on the groups they are in)
In REST APIs. "The idea is that the data returned by an endpoint should depend solely on the parameters passed meaning two different users should receive the same result for the identical request.
"
What should the REST URI look like to ensure the above-stated idea? Since the deciding factor for the list here is "groups" and thus effective permissions, I was thinking we could pass the groups a user in, in the URI in sorted order to leverage caching on GET endpoints as well, Is there a better way to do it?

In REST APIs. "The idea is that the data returned by an endpoint should depend solely on the parameters passed meaning two different users should receive the same result for the identical request. "
No this is not strictly true. It can be a desirable property, but absolutely not needed. In fact, if you build a proper hypermedia REST api, you would likely want to hide links/actions that the current user is not allowed to use.
Furthermore, a cache will never store responses and send to different users if an AUthorization header is present on the request.
Anyway, there could be other reasons to want this.. maybe it's a simpler design for your case, and there is a pretty reasonable solution.
What I'm inferring from your question is that you might have two endpoints:
/sites
/devices
They return different things depending on who's accessing. Instead of using those kind of routes, you could just do:
/user/1234/sites
/user/1234/devices
Now every user has their own separate 'sites' and 'devices' collection. The additional benefit is that if you ever want to let a user find the list of sites or devices from another user, the API is ready to support that.

The idea is that the data returned by an endpoint should depend solely
on the parameters passed
This is called the statelessness constraint, but if you check the parameters always include auth parameters because of this. The idea is keeping the session data on the client side, because managing sessions becomes a problem when you have several million users and multiple servers all around the world. Since the parameters include auth data, the response can depend on this data, so you can use here the exact same endpoints for users with different permissions.
As of the responses you might want to send back hyperlinks, which represent the available operations. The concept is the same here, if the user does not have permission for the actual operation, then they won't get a hyperlink for that operation and in theory they should never get a 403 status either, because you must follow the hyperlinks you got from the service instead of hardcoding URI templates into your client. So you have to handle less errors and junk requests, and another reason here that you can change your URI templates without breaking the clients. This is called hypermedia as the engine of application state, it is part of the uniform interface constraint.

External id as domain identity

Our application sends/receives a lot of data to/from a third party we work with.
Our domain model is mainly populated with that data.
The 'problem' we're having is identifying a 'good' candidate as domain identity for the aggregate.
It seems like we have 3 options:
Generate a domain identity (UUID or DB-sequence...);
Use the External-ID as domain identity that comes along with all data from the external source.
Use an internal domain identity AND External-ID as a separate id that 'might' be used for retrieval operations; the internal id is always leading
About the External-ID:
It is 100% guaranteed the ID will never change
The ID is always managed by the external source
Other domains in our system might use the external-id for retrieval operations
Especially the last point above convinced us that the external-id is not an infrastructural concern but really belongs to the domain.
Which option should we choose?
** UPDATE **
Maybe I was not clear about the term '3th party'.
Actually, the external source is our client who is active in the Car industry. Our application uses client's master data to complete several 'things'. We have several Bounded Contexts (BC) like 'Client management', 'Survey', 'Appointment',
'Maintenance' etc.
Our client sends us 'Tasks' that describe something needs te be done.
That 'something' might be:
'let client X complete survey Y'
'schedule/cancel appointment for client X'
'car X for client Y is scheduled for maintenance at position XYZ'
Those 'Tasks' always have a 'task-id' that is guaranteed to be unique.
We store all incoming 'Tasks' in our database (active record style). Every possible action on a task maps with a domain event. (Multiple BCs might be interested in the same task)
Every BC contains one or more aggregates which distribute some domain events to other BCs. For instance, when an appointment is canceled a domain event is triggered, maintenance listens to that event to get some things done.
However, our client expects some message after every action that is related to a Task. Therefore we always need to use the 'task-id'.
To summarize things:
Tasks have a task-id
Tasks might be related to multiple BCs
Every BC sends some 'result message' to the client with the related task-id
Task-ids are distributed by domain events
We keep every (internally) persisted task up-to-date
Hopefully, I was clear enough about the use of the external-id (= task-id) and our different BCs.

My gut feeling would be to manage your own identity and not rely on a third party service for this, so option 3 above. Difficult to say without context though. What is the 3rd party system? What is your domain?
Would you ever switch the 3rd party service?
You say other parts of your domain might use the external id for querying - what are they querying? Your internal systems or the 3rd party service?
[Update]
Based on the new information it sounds like a correlationId. I'd store it alongside the other information relevant to the aggregates.

As a general rule, I would veto using a DB-sequence number as a identifier; the domain model should be independent of the choice of persistence; the domain model writes the identifier to the database, rather than the other way around (if the DB wants to be tracking a sequence number for its own purposes, that's fine).
I'm reluctant to use the external identifier, although it can make sense in some circumstances. A given entity, like "Customer" might have representations in a number of different bounded contexts - it might make sense to use the same identifier for all of them.
My default: I would reach for a name based uuid, using the external ID as part of the seed, which gives a simple mapping from external to internal.

Validating domain object properties in the Application layer. Is it okay?

In DDD, the Application layer is supposed to just perform coordination tasks, whereas the Domain layer is responsible of validating the business rules.
My question is about validating the domain object properties. For example, I need to validate that a required property has some value in it before persisting it to the database through repositories.
In terms of DDD, is it acceptable to perform this sort of property validation in the Application layer?

Kinds of validation
In the situation you describe, there are two different validation steps that you need to consider separately:
Input validation. This is the responsibility of an app service. The goal is to ensure that no garbage or harmful data enters the system.
Protecting model invariants. This is your domain logic. Whenever something in the domain changes, you need to make sure that the changes are valid within your domain, i.e. all invariants still hold.
Validating domain invariants as part of an app service
Note that sometimes you also want to validate domain invariants in an app service. This could be necessary if you need to communicate invariant violations back to the client. Doing this in the domain would make your domain logic client-specific, which is not what you want.
In this situation, you need to take care that the domain logic does not leak into the app service. One way to overcome this problem and at the same time make a business rule accessible to both the domain and the app service is the Specification Pattern.
Here is an answer of mine to another question that shows an example implementation for the specification pattern.

You can validate incoming data in your ui layer.
For example you can you symfony forms validation or just check for necessary data inside your layer with Rest.
What about Domain Layer, it depends.
You didn't precise what kind of domain object it is.
Mostly you do such kind of validation by creating Value Object, with creation logic inside. For example Email Value Object, you can't create wrong one, otherwise it will throw exception.
Aggregates can perform validation before executing method and it's called invariants. For example, user has method becomeVIP, inside a method there is constraint, that only user with name 'Andrew', can become a VIP.
So you don't do validation after the action, but before the action. You don't let your aggregate go into wrong state.
If you have logic, which is not correlated with aggregate you put it in domain service, for example email uniqueness check.

Rather than "validating hat a required property has some value in it" at the periphery of the Domain, I prefer to make sure that it can never become null in the Domain the first place.
You can do that by forcing consumers of the constructors, factories and methods of that entity to always pass a value for the property.
That being said, you can also enforce it at the Application level and in the Presentation layer (most web application frameworks provide convenient ways of checking it these days). Better 2 or 3 verifications than one. But the domain should be the primary source of consistency.

DDD Aggregates Validation

I am building an application that will expose part of its features through RESTful services and my application packages is organized as below
Application --> This package contains the RESTfull services
Model --> Contains the domain model the aggregates, Value Objects,...
Infrastructure --> Contains the set of classes required to access the database
Mongo DB --> My DB
The application package exposes the endpoint
CastReview(UUID reviewedEntityId, string review)
The review the retrieved from the body of the request and it is mandatory.
Now my question is where the validation should occur
Should I keep the validation logic inside the aggregate and inside the application I just construct instance of the aggregate and check if the aggregate is valid
Or Should I have the validation inside the application package as well as inside the aggregate

For Aggregates, I wouldn't call it validation but invariant enforcement, since they are supposed to be always valid. You don't just modify an aggregate and then have it checked by an external validator, aggregates enforce their own invariants.
Some rules are clearly domain invariants since you have to have deep knowledge of aggregate data to enforce them, and some are definitely applicative rules (e.g. email confirmation == email). But sometimes the lines are blurred. I would definitely check at a client-side and applicative level that the review is not null or empty, and at the same time I wouldn't consider a Review Aggregate OK if it has a null review, so I would do both. But this might be domain-dependent and YMMV.

Integrity constraints (or "invariants", if you prefer that term) should be defined in the (domain/design/data) Model. Then they should be checked multiple times:
In the front-end User Interface (on input/change and on submit) for getting responsive validation.
In the back-end Application or Infrastructure before save.
And in the DBMS (before commit), if your DB is shared with other applications.
See also my article Integrity Constraints and Data Validation.

Ensuring query restrictions are honored during SaveChanges - Breeze security

Consider a typical Breeze controller that limits the results of a query to entities that the logged in user has access to. When the browser calls SaveChanges, does Breeze verify on the server that the entities reported as modified are from the original set?
To put it another way, does the EFContextProvider (in the case Entity Framework) keep track of entities that have been handed out, so it can check against malicious data passed to SaveChanges? Or does BeforeSaveEntity need to validate that the user has access to the changed entities?

You must guard against malicious data in your BeforeSaveEntity or BeforeSaveEntities methods.
The idea that the EFContextProvider would keep track of entities that have already been handed out is probably something that we would NOT want to do because
The EFContextProvider would no longer be stateless, which was a design goal to facilitate scaling.
You would still need to guard against malicious data for "Added" entities in the BeforeXXX methods.
It is actually a valid use case for some of our users to "modify" entities without having first queried them.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string