What are the best practices in building applications that support multiple tenants such as Software as a Service?
Links to white papers that expand on this topic are greatly appreciated.
For the database:
A. Put everything on the same database, put a tenant_id column on your tables
Pros: Easy to do
Cons: Very prone to bugs: it's easy to leak data from one tenant to another.
B. Put everything on the same database, but put each tenant in its own namespace (postgresql calls them schemas)
Pros: Provides better data leak protection than option A
Cons: Not supported by all databases. AFAIK PostgreSQL and Oracle supports it.
C. Setup one database per tenant
Pros: Absolutely no chance of data leaking from one tenant to another
Cons: Setting up new tenants is more complicated. Database connections are expensive.
I only learned the above ideas from Guy Naor. Here's a link to his presentation:
http://aac2009.confreaks.com/06-feb-2009-14-30-writing-multi-tenant-applications-in-rails-guy-naor.html
You might find some valuable advise in a series of blog posts by Oren Eini.
This is one of the last posts in the series, with links to previous posts: http://ayende.com/Blog/archive/2008/08/16/Multi-Tenancy--Approaches-and-Applicability.aspx
Related
I am currently working on an application that will be hosted on Azure. As it does not make sense to have an instance of it running for each customer (you'll see why), it's going to be a multi-tenancy solution.
To be honest: I'm only starting to gather experience with web applications, so I apologize if the answer to my question is obvious.
Question: Which multi-tenancy concept will be most beneficial for my application, considering the following assumptions:
Many tenants (ideally hundreds or even more, we'll see...)
consisting of few user accounts per tenant (<5-10 in most cases, up to 200 for a hand full of tenants)
dealing with mostly small amounts of data (<100 entries in <20 tables)
changes in data occur a few times a day (approx. <50 changes per
user per day)
The application needs to stay responsive (of course)
My thoughts:
Database-per-Tenant: Does not make sense as the DB won't be utilized
much, therefore not cost effective at all
Table-per-Tenant: Could be a good solution, guess this should scale
pretty good?
Tenant-column within the entities: Could be a problem with scaling, right? Could be
better when using charding on the tenant id?
I would really appreciate your help and some "shared experience" in order to choose the not-so-painful path.
A good summary of the different models can be found here:
https://www.linkedin.com/pulse/database-design-multi-tenant-applications-dharmendar-kumar/
Based on my experience on Azure I would recommend CosmosDB with the following options:
partitioned collections: if tenants are evenly distributed and have similar requirements
collection per tenant: if some tenants have scale or special requirements
mix between the preceding two.
Cosmos DB has a lot of benefits e.g sharding, global distribution, performance, freedom of consistency models as well as a good sql support.
I curently have an application writen in php using the symfony framework. Rather than have seperate installs for customer on a hosted server, I would like to move to as SaaS model with one install for all customers posible running of google code or another cloud based service. I am not tied to PHP though i would like to have the benifits of a good framework.
So the chalenge: If all customers are using the same application we then have fin a way isolating each customers data. Customers do for eample have admin access and can manager their own users and privlages. At a simplistic leve you could just have a organisation identifier in each table take and add that to all database operations. However most application framewors use and ORM of some kind, and I have not been able to find one that will easly / seemlesly facinatate this at a leve the has minimum impact on the application code.
Has anyone looked at this, are there any good aproaches to this problem?
As Itay says, a multi-tenant system is a common requirement. A while back I was doing some research on this problem and came across a pretty good presentation on the different ways to handle this issue, and the pros and cons of each: http://aac2009.confreaks.com/06-feb-2009-14-30-writing-multi-tenant-applications-in-rails-guy-naor.html
This particular presentation is targeted to a Rails audience, but the principles are the same as with any language.
The approach you described is common, and PHP (One of the strengths) will allow you to comparatively easily go into the ORM code and modify it to your needs.
Second approach is to create a separate DB for each organization and a joint DB for shared resources.
A bit of a design challenge (but just a bit).
if you are really big, then you will even need to consider a separate DB server for each organization (I would say this is a serious overkill in 99.99999% of the cases).
This MSDN article gives you a very good overview of Data Architecture in Multi-tenancy: http://msdn.microsoft.com/en-us/library/aa479086.aspx
Im writing a 'proof of concept' application to investigate the possibility of moving a bespoke ASP.NET ecommerce system over to Windows Azure during a necessary re-write of the entire application.
Im tempted to look at using Azure Table Storage as an alternative to SQL Azure as the entities being stored are likely to change their schema (properties) over time as the application matures further, and I wont need to make endless database schema changes. In addition we can build refferential integrity into the applicaiton code - so the case for considering Azure Table Storage is a strong one.
The only potential issue I can see at this time is that we do a small amount of simple reporting - i.e. value of sales between two dates, number of items sold for a particular product etc.
I know that Table Storage doesnt support aggregate type functions, and I believe we can achieve what we want with clever use of partitions, multiple entity types to store subsets of the same data and possibly pre-aggregation but Im not 100% sure about how to go about it.
Does anyone know of any in-depth documents about Azure Table Storage design principles so that we make proper and efficient use of Tables, PartitionKeys and entity design etc.
there's a few simplistic documents around, and the current books available tend not to go into this subject in much depth.
FYI - the ecommerce site has about 25,000 customers and takes about 100,000 orders per year.
Have you seen this post ?
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx
Pretty thorough coverage of tables
I think there are three potential issues I think in porting your app to Table Storage.
The lack of reporting - including aggregate functions - which you've already identified
The limited availability of transaction support - with 100,000 orders per year I think you'll end up missing this support.
Some problems with costs - $1 per million operations is only a small cost, but you can need to factor this in if you get a lot of page views.
Honestly, I think a hybrid approach - perhaps EF or NH to SQL Azure for critical data, with large objects stored in Table/Blob?
Enough of my opinion! For "in depth":
try the storage team's blog http://blogs.msdn.com/b/windowsazurestorage/ - I've found this very good
try the PDC sessions from Jai Haridas (couldn't spot a link - but I'm sure its there still)
try articles inside Eric's book - http://geekswithblogs.net/iupdateable/archive/2010/06/23/free-96-page-book---windows-azure-platform-articles-from.aspx
there's some very good best practice based advice on - http://azurescope.cloudapp.net/ - but this is somewhat performance orientated
If you have start looking at Azure storage such as table, it would do no harm in looking at other NOSQL offerings in the market (especially around document databases). This would give you insight into NOSQL space and how solution around such storages are designed.
You can also think about a hybrid approach of SQL DB + NOSQL solution. Parts of the system may lend themselves very well to Azure table storage model.
NOSQL solutions such as Azure table have their own challenges such as
Schema changes for data. Check here and here
Transactional support
ACID constraints. Check here
All table design papers I have seen are pretty much exclusively focused on the topics of scalability and search performance. I have not seen anything related to design considerations for reporting or BI.
Now, azure tables are accessible through rest APIs and via the azure SDK. Depending on what reporting you need, you might be able to pull out the information you require with minimal effort. If your reporting requirements are very sophisticated, then perhaps SQL azure together with Windows Azure SQL Reporting services might be a better option to consider?
I'm just wondering if anyone who has experience on Azure Table Storage could comment on if it is a good idea to use 1 table to store multiple types?
The reason I want to do this is so I can do transactions. However, I also want to get a sense in terms of development, would this approach be easy or messy to handle? So far, I'm using Azure Storage Explorer to assist development and viewing multiple types in one table has been messy.
To give an example, say I'm designing a community site of blogs, if I store all blog posts, categories, comments in one table, what problems would I encounter? On ther other hand, if I don't then how do I ensure some consistency on category and post for example (assume 1 post can have one 1 category)?
Or are there any other different approaches people take to get around this problem using table storage?
Thank you.
If your goal is to have perfect consistency, then using a single table is a good way to go about it. However, I think that you are probably going to be making things more difficult for yourself and get very little reward. The reason I say this is that table storage is extremely reliable. Transactions are great and all if you are dealing with very very important data, but in most cases, such as a blog, I think you would be better off just 1) either allowing for some very small percentage of inconsistent data and 2) handling failures in a more manual way.
The biggest issue you will have with storing multiple types in the same table is serialization. Most of the current table storage SDKs and utilities were designed to handle a single type. That being said, you can certainly handle multiple schemas either manually (i.e. deserializing your object to a master object that contains all possible properties) or interacting directly with the REST services (i.e. not going through the Azure SDK). If you used the REST services directly, you would have to handle serialization yourself and thus you could more efficiently handle the multiple types, but the trade off is that you are doing everything manually that is normally handled by the Azure SDK.
There really is no right or wrong way to do this. Both situations will work, it is just a matter of what is most practical. I personally tend to put a single schema per table unless there is a very good reason to do otherwise. I think you will find table storage to be reliable enough without the use of transactions.
You may want to check out the Windows Azure Toolkit. We have designed that toolkit to simplify some of the more common azure tasks.
I'm rebuilding an application from the ground up. At some point in the future...not sure if it's near or far yet, I'd like to move it to Azure. What decisions can I make today, that will make that migration easier.
I'm going to be dealing with large amounts of data, and like the idea of Azure Tables...are there some specific persistance choices I can make now that will mimick Azure Tables so that when the time comes the pain of migration will be lessened?
A good place to start is the Windows Azure Guidance
If you want to use Azure Tables eventually, you could design your database where all tables are a primary key, plus a field with XML data.
I would advise to plan along the lines of almost-infinitely scalable solutions (see Pat Helland's paper on Life beyond distributed transactions) and the CQRS approach in general. This way you'll be able to avoid common pitfalls of the distributed apps generally and Azure table storage peculiarities.
This really helps us to work with Azure and Cloud Computing at Lokad (data-sets are quite large plus various levels of scalability are needed).