T-SQL schemata to organize code - security

I have a ms sql server database with a growing number of stored procedures and user defined functions and I see some need to organize the code better. My idea was to split sps and functions over several schemata. The default schema would hold the the sps called from the outside. The API of the database in other words. A second schema would hold internal code, that should not be called from the outside. I would probably do the same for tables: Some contain "raw" data, some hold precalculated data for optimizations, ...
As I have never used schema, I have several questions:
Does this make sense at all?
Are there any implications that I'm not aware of? For example performance issues when a sp in Schema A is using a table in Schema X?
Is it possible to restrict the "outer world" to use only sps in a certain schema? For example: User A is only allowed to call objects in schema A, but sps in schema A are still allowed to use tables in schema B?
As this question is somewhat subjective, I have marked it as "community wiki". Hope that is ok.

yes, it makes sense
no difference in performance if all schemas have the same owner (ownership chaining)
yes, permission schemas explicitly per client or have some check internally
We uses schemas to separate data, internals SPs, internal functions, and then SPs per client.
One advantage is we GRANT permissions on the schema not on objects, which is what I personally needed to clarify in my question, before we started using them.

Related

Azure Cosmos DB Update Pattern

I have recently started using Cosmos DB for a project and I am running into a few design issues. Coming from a SQL background, I understand that related data should be nested within documents on a NoSQL DB. This does mean that documents can become quite large though.
Since partial updates are not supported, what is the best design pattern to implement when you want to update a single property on a document?
Should I be reading the entire document server side, updating the value and writing the document back immeadiately in order to perform an update? This seems problematic if the documents are large which they inevitably would be if all your data is nested.
If I take the approach of making many smaller documents and infer relationships based on IDs I think this would solve the read/write immeadiately for updates concern but it feels like I am going against the concept of a NoSQL and in essence I am building a relational DB.
Thanks
Locking and latching. That's what needs to happen if partial updates become possible. It's a difficult engineering problem to keep a <15ms write latency SLA with locking.
This seems problematic if the documents are large which they inevitably would be if all your data is nested.
Define your fear — burnt Request Units, app host memory, ingress/egress network traffic? You believe this is a problem but you're not stating concrete results. I'm not saying you're wrong or doubting the efficiency of the partial update approach, i'm just saying the argument is thin.
Usually you want to JOIN nothing in NoSQL, so i'm totally with you on the last paragraph.
Whenever you are trying to create a document try to consider this:
Does the part of document need separate access . If yes then create a referenced document and if no then create a embedded document.
And if you want to know what to choose, i think you should need to take a look at this question its for MongoDb but will help you Embedded vs Referenced Document
Embed or Reference is the most common problem I face while designing document structure in NoSQL world.
In embedded relationship, child entities has been embedded into the parent document. In Reference relationship, child entities in separate documents and their parent in another document, basically having two (or more) types of documents.
There is no one relationship pattern fits all. The approach you should take depends on the Retrieve and Update to be done on the data is being designed.
1.Do you need to retrieve all the child entities along with the parent entities? If Yes, use embedded relationships.
2.Do your use case allow entities being retrieved individually? This case use relationship pattern.
Majority of the use cases I have worked, I used relationship pattern. For example: Social Graph (Profiles with Relationship Tree), Proximity Points (GeoJSON based proximity search), Classified Listing etc.
Relationship Pattern is also easier to update and maintain, as the entities are stored in individual documents.
Partial Updates are now supported by Cosmos DB:
Azure Cosmos DB Partial Document Update feature (also known as Patch
API) provides a convenient way to modify a document in a container.
Currently, to update a document the client needs to read it, execute
Optimistic Concurrency Control checks (if necessary), update the
document locally and then send it over the wire as a whole document
Replace API call.
Partial document update feature improves this experience
significantly. The client can only send the modified properties/fields
in a document without doing a full document replace operation
Read more here: https://learn.microsoft.com/en-us/azure/cosmos-db/partial-document-update

Multiple Data Transfer Objects for same domain model

How do you solve a situation when you have multiple representations of same object, depending on a view?
For example, lets say you have a book store. Within a book store, you have 2 main representations of Books:
In Lists (search results, browse by category, author, etc...): This is a compact representation that might have some aggregates like for example NumberOfAuthors and NumberOfRwviews. Each Author and Review are entities themselves saved in db.
DetailsView: here you wouldn't have aggregates but real values for each Author, as Book has a property AuthorsList.
Case 2 is clear, you get all from DB and show it. But how to solve case 1. if you want to reduce number of connections and payload to/from DB? So, if you don't want to get all actual Authors and Reviews from DB but just 2 ints for count for each of them.
Full normalized solution would be 2, but 1 seems to require either some denormalization or create 2 different entities: BookDetails and BookCompact within Business Layer.
Important: I am not talking about View DTOs, but actually getting data from DB which doesn't fit into Business Layer Book class.
For me it sounds like multiple Query Models (QM).
I used DDD with CQRS/ES style, so aggregate roots are producing events based on commands being passed in. To those events multiple QMs are subscribed. So I create multiple "views" based on requirements.
The ES (event-sourcing) has huge power - I can introduce another QMs later by replaying stored events.
Sounds like managing a lot of similar, or even duplicate data, but it has sense for me.
QMs can and are optimized to contain just enough data/structure/indexes for given purpose. This is the way out of "shared data model". I see the huge evil in "RDMS" one for all approach. You will always get lost in complexity of managing shared model - like you do.
I had a very good result with the following design:
domain package contains #Entity classes which contain all necessary data which are stored in database
dto package which contains view/views of entity which will be returned from service
Dto should have constructor which takes entity as parameter. To copy data easier you can use BeanUtils.copyProperties(domainClass, dtoClass);
By doing this you are sharing only minimal amount of information and it is returned in object which does not have any functionality.

Could someone please explain this quote, preferably in beginner's language?

On SQLAlchemy's documentation page the author starts with a philosophy,
SQL databases behave less like object collections the more size and
performance start to matter; object collections behave less like
tables and rows the more abstraction starts to matter.
I'm scratching my head trying to understand the idea behind these two sentences, but failed. Could someone give an example illustrate the idea here? Thanks.
When you are creating an application using an Object Oriented language and a SQL database, you are simultaneously working with two very different conceptual models for storing information:
The relational model says how to store data in tables and rows and how to link elements through keys and joins.
The object model establishes a way to store entities with attributes in memory (usually) and how to set links between them using pointers or references.
So, let's say that you have an User entity that is linked to addresses and other users in your application. Those entities will need to be stored in the form several tables in the database (users table, addresses table and a many to many table for associating users to users, for instance). At the same time, if your code uses object oriented constructs, users and addresses will exist in memory in the form of objects with references between them, pointing to objects of the same or different kind.
The thing is, moving information between those two different worlds is much much more difficult than it looks at first:
You might associate one object with one row in a table, but that is not always possible and sometimes a single object must be associated to multiple rows in different tables.
Inheritance and polymorphic behavior are particularly difficult to map to a relational model.
Traversing objects and querying the database are vastly different actions.
Performance factors to take into account in an object model and a relational model are completely different.
And those are just a few examples. ORMs such as SQLAlchemy are essentially translators that convert information from one world into the other and back.
What I think that Mike Bayer was trying to convey is: the more you adapt your entity information to the object model (lots of inheritance, polymorphism, traversal of objects, ...), the farther it will resemble the natural structure in a relational model and the more performance concessions you will be making. And the other way around: the more you design your tables to perform well and be optimized for your queries, the less they will adapt to a natural structure of objects.
Martin Fowler has a nice write-up about the need of this translation in this article: ORM Hate (from which I took the above image).
Edit: further clarification on the abstraction vs performance issue
At the end, I think that the bottom line of that SQLAlchemy presentation text is: many ORMs hide the relational side of the relational-object oriented translation to make things easier. With them you only have to worry about the object oriented side, and the library is in charge of taking away the burden of dealing with the database. You get persistence for your objects without having to deal with SQL. However, they incur in a performance penalty in doing so, because the details of working with the database are abstracted away and you have no control over them. And those details are essential when you have to optimize performance. SQLAlchemy takes the opposite approach. It hides nothign of the relational side, you are in control of how SQL is generated and when and when not use joins, subqueries and other SQL constructs. That makes it a much more complex library to learn, but at the same time you are in control of the whole relational-object oriented translation process.

DDD: How to handle large collections

I'm currently designing a backend for a social networking-related application in REST. I'm very intrigued by the DDD principle. Now let's assume I have a User object who has a Collection of Friends. These can be thousands if the app and the user would become very successful. Every Friend would have some properties as well, it is basically a User.
Looking at the DDD Cargo application example, the fully expanded Cargo-object is stored and retrieved from the CargoRepository from time to time. WOW, if there is a list in the aggregate-root, over time this would trigger a OOM eventually. This is why there is pagination, and lazy-loading if you approach the problem from a data-centric point of view. But how could you cope with these large collections in a persistence-unaware DDD?
As #JefClaes mentioned in the comments: You need to determine whether your User AR indeed requires a collection of Friends.
Ownership does not necessarily imply that a collection is necessary.
Take an Order / OrderLine example. An OrderLine has no meaning without being part of an Order. However, the Customer that an Order belongs to does not have a collection of Orders. It may, possibly, have a collection of ActiveOrders if a customer is limited to a maximum number (or amount) iro active orders. Keeping a collection of historical orders would be unnecessary.
I suspect the large collection problem is not limited to DDD. If one were to receive an Order with many thousands of lines there may be design trade-offs but the order may much more likely be simply split into smaller orders.
In your case I would assert that the inclusion / exclusion of a Friend has very little to do with the consistency of the User AR.
Something to keep in mind is that as soon as you start using you domain model for querying your start running into weird sorts of problems. So always try to think in terms of some read/query model with a simple query interface that can access your data directly without using your domain model. This may simplify things.
So perhaps a Relationship AR may assist in this regard.
If some paging or optimization techniques are the part of your domain, it's nothing wrong to design domain classes with this ability.
Some solutions I've thought about
If User is aggregate root, you can populate your UserRepository with method GetUserWithFriends(int userId, int firstFriendNo, int lastFriendNo) encapsulating specific user object construction. In same way you can also populate user model with some counters and etc.
On the other side, it is possible to implement lazy loading for User instance's _friends field. Thus, User instance can itself decide which "part" of friends list to load.
Finally, you can use UserRepository to get all friends of certain user with respect to paging or other filtering conditions. It doesn't violate any DDD principles.
DDD is too big to talk that it's not for CRUD. Programming in a DDD way you should always take into account some technical limitations and adapt your domain to satisfy them.
Do not prematurely optimize. If you are afraid of large stress, then you have to benchmark your application and perform stress tests.
You need to have a table like so:
friends
id, user_id1, user_id2
to handle the n-m relation. Index your fields there.
Also, you need to be aware whether friends if symmetrical. If so, then you need a single row for two people if they are friends. If not, then you might have one row, showing that a user is friends with the other user. If the other person considers the first a friend as well, you need another row.
Lazy-loading can be achieved by hidden (AJAX) requests so users will have the impression that it is faster than it really is. However, I would not worry about such problems for now, as later you can migrate the content of the tables to a new structure which is unkown now due to the infinite possible evolutions of your project.
Your aggregate root can have a collection of different objects that will only contain a small subset of the information, as reference to the actual business objects. Then when needed, items can be used to fetch the entire information from the underlying repository.

Modelling Access Control in MongoDB

Does anyone have an example of modelling access control in MongoDB? The situation I'm thinking of is:
There are a set of resources, each being their own document (e.g. cars, people, trees etc.).
A user can gain access to a resource through an explicit grant, or implicitly by being the owner of a resource, existing in another collection (e.g. a role) or some other implicit ways.
In one collection.find() method, that could have skip and limit options applied (for pagination), is there a way to check all these explicit and implicit paths and produce a result of resources a user has access to?
In MySQL we have modelled this using a grants table with resource id, granting user id, authorized user id and operation (read, write etc.). We then, in one query, select all resources where at least one subquery is true, and the subqueries then check all the different paths to access e.g. one checks for a grant, one checks for ownership etc.
I just can't wrap my head around doing this in MongoDB, I'm not sure if it's even possible...
Thanks
You can't query more than one document at a time. Ideally, shouldn't access control be a part of the business logic. Your backend php/c#/language ought to ensure that the current request is authorized. If so, then simply query the requested document.
If you feel, you need to implement the exact same structure in mongodb, which I suggest you don't, then you will need to embed all those fields (the ones from the other mysql tables that help you identify whether the request is authorized) in each and every document of every collection. You will be duplicating data (denormalizing it). Which brings the headache of ensuring that all the copies are updated and have the same value.
Edit 1:
Lets talk about Car document. To track its owner, you will have owner property (this will contain the _id of the owner document). To track all users who can 'use' (explicit grant) the car, you will have an array allowerdDrivers (this will contain the _id of each user document). Lets assume the current user making the request belong to the 'admin' role. The user document will have an array applicableRoles that store the _id of each role document applicable.
To retrieve all cars that the user has access to, you only need to make two queries. One to fetch his roles. If he is an admin, return ALL cars. If he is not, then make another query where owner equals his id or allowedDrivers contains his id.
I understand your actual use case may be more complicated, but chances are there is a document-oriented way of solving that. You have to realize that the way data is modelled in documents is vastly different from how you would model it in a RDbMS.
Doing it in business logic would be painfully slow and inefficient.
How so? This is business logic, if user a owns post b then let them do the action (MVC style), otherwise don't.
That sounds like business logic to me and most frameworks consider this business logic to be placed within the controller action (of the MVC paradigm); i.e. in PHP Yii:
Yii::app()->roles->hasAccess('some_view_action_for_a_post', $post)
I think that by doing it in the database end you have confused your storage layer with your business layer.
Also with how complex some role based permission actions can get the queries you commit must be pretty big with many sub selects. Considering how MySQL creates and handles result sets (sub selects ARE NOT JOINS) I have a feeling these queries do not scale particularly well.
Also you have to consider when you want to change the roles, or a function that defines a role, that can access a certain object you will have to change your SQL queries directly instead of just adding the role to a roles table and assigning the object properties for that role and assigning users that role (AKA code changes).
So I would seriously look into how other frameworks in other languages (and your own) do their RBAC because I think you have blurred the line and made your life quite hard with what you have done, in fact here might be a good place to start: Group/rule-based authorization approach in node.js and express.js

Resources