Using federations to partition for multiple tenants - azure

Given the following "facts" I have gleaned from reading around this.
Federations are separate databases from the moment they are created.
As copies of the original, they will not alter automatically if I alter the original's schema.
As separate databases you cannot cross join.
Each federation is priced as a separate db.
I will have to provide a TenantId field to each table I want to federate.
If these are correct, what are the advantages to using federation to achieve multi-tenancy over simply separate dbs? Or if there're not correct please put me straight.
Note, we have a small number of tenants, maybe 20.

Your understanding is correct.
There are a few interesting aspects of Federations that you may find useful. First it is a relatively flexible partitioning environment. For example you can group 10 tenants into the first member, and 50 in the second, based on usage patterns of your customers. Or you could simply isolate a single customer that is using the system more than the others.
Another important concept is that you can have multiple federations per database. So you could have a Customer federation and a SalesHistory federation for example.
Last but not least you may want to read this article that discusses connection pool fragmentation that occurs in traditional sharding models, but is not an issue with SQL Database Federations.

Related

What is the recommended approach towards multi-tenant databases in Cassandra?

I'm thinking of creating a multi-tenant app using Apache Cassandra.
I can think of three strategies:
All tenants in the same keyspace using tenant-specific fields for security
table per tenant in a single shared DB
Keyspace per tenant
The voice in my head is suggesting that I go with option 3.
Thoughts and implications, anyone?
There are several considerations that you need to take into account:
Option 1: In pure Cassandra this option will work only if access to database will be always through "proxy" - the API, for example, that will enforce filtering on tenant field. Otherwise, if you provide an CQL access, then everybody can read all data. In this case, you need also to create data model carefully, to have tenant as a part of composite partition key. DataStax Enterprise (DSE) has additional functionality called row-level access control (RLAC) that allows to set permissions on the table level.
Options 2 & 3: are quite similar, except that when you have a keyspace per tenant, then you have flexibility to setup different replication strategy - this could be useful to store customer's data in different data centers bound to different geographic regions. But in both cases there are limitations on the number of tables in the cluster - reasonable number of tables is around 200, with "hard stop" on more than 500. The reason - you need an additional resources, such as memory, to keep auxiliary data structures (bloom filter, etc.) for every table, and this will consume both heap & off-heap memory.
I've done this for a few years now at large-scale in the retail space. So my belief is that the recommended way to handle multi-tenancy in Cassandra, is not to. No matter how you do it, the tenants will be hit by the "noisy neighbor" problem. Just wait until one tenant runs a BATCH update with 60k writes batched to the same table, and everyone else's performance falls off.
But the bigger problem, is that there's no way you can guarantee that each tenant will even have a similar ratio of reads to writes. In fact they will likely be quite different. That's going to be a problem for options #1 and #2, as disk IOPs will be going to the same directory.
Option #3 is really the only way it realistically works. But again, all it takes is one ill-considered BATCH write to crush everyone. Also, want to upgrade your cluster? Now you have to coordinate it with multiple teams, instead of just one. Using SSL? Make sure multiple teams get the right certificate, instead of just one.
When we have new teams use Cassandra, each team gets their own cluster. That way, they can't hurt anyone else, and we can support them with fewer question marks about who is doing what.

Choosing the ideal multi-tenancy architecture for an ASP.NET Core application

I am currently working on an application that will be hosted on Azure. As it does not make sense to have an instance of it running for each customer (you'll see why), it's going to be a multi-tenancy solution.
To be honest: I'm only starting to gather experience with web applications, so I apologize if the answer to my question is obvious.
Question: Which multi-tenancy concept will be most beneficial for my application, considering the following assumptions:
Many tenants (ideally hundreds or even more, we'll see...)
consisting of few user accounts per tenant (<5-10 in most cases, up to 200 for a hand full of tenants)
dealing with mostly small amounts of data (<100 entries in <20 tables)
changes in data occur a few times a day (approx. <50 changes per
user per day)
The application needs to stay responsive (of course)
My thoughts:
Database-per-Tenant: Does not make sense as the DB won't be utilized
much, therefore not cost effective at all
Table-per-Tenant: Could be a good solution, guess this should scale
pretty good?
Tenant-column within the entities: Could be a problem with scaling, right? Could be
better when using charding on the tenant id?
I would really appreciate your help and some "shared experience" in order to choose the not-so-painful path.
A good summary of the different models can be found here:
https://www.linkedin.com/pulse/database-design-multi-tenant-applications-dharmendar-kumar/
Based on my experience on Azure I would recommend CosmosDB with the following options:
partitioned collections: if tenants are evenly distributed and have similar requirements
collection per tenant: if some tenants have scale or special requirements
mix between the preceding two.
Cosmos DB has a lot of benefits e.g sharding, global distribution, performance, freedom of consistency models as well as a good sql support.

Data access layer patterns using azure function

We are currently working on a design using Azure functions with Azure storage queue binding.
Each message in the queue represents a complete transaction. An Azure function will be bound to that queue so that the function will be triggered as soon as there is a new message in the queue.
The function will then commit the transaction in a SQL DB.
The first-cut implementation is also complete; and it's working fine. However, on retrospective, we are considering the following:
In a typical DAL, there are well-established design patterns using entity framework, repository patterns, etc. However, we didn't find a similar guidance/best practices when implementing DAL within a server-less code.
Therefore, my question is: should such patterns be implemented with Azure functions (this would be challenging :) ), or should the server-less code be kept as light as possible or this is not a use-case for azure functions, at all?
It doesn't take anything too special. We're using a routine set of library DLLs for all kinds of things -- database, interacting with other parts of Azure (like retrieving Key Vault secrets for connection strings), parsing file uploads, business rules, and so on. The libraries are targeting netstandard20 so we can more easily migrate to Functions v2 when the right triggers become available.
Mainly just design your libraries so they're highly modularized, so you can minimize how much you load to get the job done (assuming reuse in other areas of the system is important, which it usually is).
It would be easier if dependency injection was available today. See this for a few ways some of us have hacked it together until we get official DI support. (DI is on the roadmap for Functions, I believe the 3.0 release.)
At first I was a little worried about startup time with the library approach, but the underlying WebJobs stack itself is already pretty heavy, and Functions startup performance seems to vary wildly anyway (on the cheaper tiers, at least). During testing, one of our infrequently-executed Functions has varied from just ~300ms to a peak of about ~3800ms to parse the exact same test file, with all but ~55ms spent on startup).
should such patterns be implemented with Azure functions (this would
be challenging :) ), or should the server-less code be kept as light
as possible or this is not a use-case for azure functions, at all?
My answer is NO.
There should be patterns to follow, but the traditional repository patterns and CRUD operations do not seem to be valid in the cloud era.
Many strong concepts we were raised up to adhere to, became invalid these days.
Denormalizing the data base became something not only acceptable but preferable.
Now designing a pattern will depend on the database you selected for your solution and also depends of the type of your application and the type of your data.
This is a link for general guideline when you do Table Storage design Guidelines.
Is your application read-heavy or write-heavy ? The design will vary accordingly.
Are you using Azure Tables or Mongo? There are design decisions based on that. Indexing is important in Mongo while there is non in Azure table that you can do.
Sharding consideration.
Redundancy Consideration.
In modern development/Architecture many principles has changed, each Microservice has its own database that might be totally different that any other Microservices'.
If you read along the guidelines that I provided, you will see what I mean.
Designing your Table service solution to be read efficient:
Design for querying in read-heavy applications. When you are designing your tables, think about the queries (especially the latency sensitive ones) that you will execute before you think about how you will update your entities. This typically results in an efficient and performant solution.
Specify both PartitionKey and RowKey in your queries. Point queries such as these are the most efficient table service queries.
Consider storing duplicate copies of entities. Table storage is cheap so consider storing the same entity multiple times (with different keys) to enable more efficient queries.
Consider denormalizing your data. Table storage is cheap so consider denormalizing your data. For example, store summary entities so that queries for aggregate data only need to access a single entity.
Use compound key values. The only keys you have are PartitionKey and RowKey. For example, use compound key values to enable alternate keyed access paths to entities.
Use query projection. You can reduce the amount of data that you transfer over the network by using queries that select just the fields you need.
Designing your Table service solution to be write efficient:
Do not create hot partitions. Choose keys that enable you to spread your requests across multiple partitions at any point of time.
Avoid spikes in traffic. Smooth the traffic over a reasonable period of time and avoid spikes in traffic.
Don't necessarily create a separate table for each type of entity. When you require atomic transactions across entity types, you can store these multiple entity types in the same partition in the same table.
Consider the maximum throughput you must achieve. You must be aware of the scalability targets for the Table service and ensure that your design will not cause you to exceed them.
Another good source is this link:

Microservices Per DB table

I ran into the microservices architecture for e-commerce application where each table has it's own micro service basically with CRUD operations (something like rest client for each table).
Now I am thinking about combine and model them around business domains, before that I wanted to know does anyone encountered such situation and is it right architecture or not.
Any suggestions will be very helpful.
Thanks.
Each microservice should have its own set of SQL tables that no other microservice can access. But having one microservice per SQL table, and having each microservice just support CRUD operations is generally an anti-pattern: it turns a powerful DBMS and query language into a simple record manager: no cross-table transactions, joins, filtering, sorting, pagination, etc.
You're mixing up different, unrelated things.
(micro)services are logical entities that do some specific task. they communicate with other services to perform a larger-scope task.
Tables/CRUD/SQL/NO-SQL come from an entirety different level. its where data is saved and how its accessed.
Its true that services use SQL and have tables. Its also probably a good idea to have separate tables for each service. I would even go as far as saying that if 2 services directly use the same table you're probably looking at a design problem.
but you can't equate services with tables, conceptually, they belong in different worlds.
Microservices are logical block for any application , combining them at sql level dosen't make any sense.
For eg: let's consider you create an order service , which allow customer to place order.
Now a order contain order items as well and may have a reference of customer object , for all these you might end up creating multiple tables. So don't just think sql table and microservices together
If you still have doubts post a more exact question , will help :)

Cloudant number of database limitation

I'm planing on having my database stored in Cloudant.
Our application is multi-tenant. We currently do the separation to tenants based on a value in some of our tables which will naturally translation to value in a document. Another way is to have database per tenant. We currently have around 100 tenants and hopefully will grow to 500-2000 in our best projections.
What is the pros and cons between all tenants in one db vs. db per tenant?
Is there limitation on the number of database we can create and work with concurrently?
This is a good and involved question. There are pros and cons to both models. The main advantage to one large database is that you can analyze (search, mapreduce, etc) across all users very easily. The main advantage of one-db-per-user is that every user has their own data "sandbox", which may be nice for your SLA. Additionally, that means that the amount of data in each user database can be relatively small.
If you can provide more details about the data you are storing, the relational modeling, and the queries you hope to be able to do, I can probably give you a more satisfying answer.

Resources