Limitations on Windows Azure Table Storage accounts - azure

I am designing a multi-tennant web-based SaaS application that will be hosted on Windows Azure and use Table Storage.
The only limits I have found so far are:
5 storage accounts per subscription
100 TB maximum per storage account
1 MB per entity
I am deciding how to best partition my storage for multiple customers:
Option 1: Give each customer their own storage account. Not likely, considering the 5 account default limit.
Option 2: Give each customer their own set of tables. Prefix the table names with customer identifiers, such as a Books table split as "CustA_Books", "CustB_Books", etc.
Option 3: Have one set of tables, but prefix the partition keys to split the customers. So one "Books" table with partition keys of "CustA_Fiction", "CustA_NonFiction", "CustB_Fiction", "CustB_NonFiction", etc.
What are the pros and cons for options 2 and 3? Is there a limit to the number of tables in a single account that might affect option 2?

There are no limits to the number of tables you can create in Windows Azure. Your only limits ar the ones you have already listed. Well... I guess there are other limits if you consider the size of the entity attribute is always 64KB or less or if you consider batch options (100 entities or 4MB, whatever is the lesser).
Anyhow, the thing to keep in mind here is that your PartitionKey is going to be the most important thing you make. If you create a PK with the customer name in it, you get some good partitioning benefits. The downside to this is that if you mix the customer data in the same table, you make it harder on yourself to delete data (if you ever need to delete a customer). So, you can use the table as another level of partitioning. The PK you create is scoped to the table you create it under.
What I would consider here is if you ever need to delete the data in bulk or if you ever need to query data across customers (tenants). For the first one, it makes a ton of sense to use separate tables per customer so a delete is one operation versus at best 1 per 100 entities. However, if you need to query across tenants it is harder to join this data when you have multiple tables (that would require multiple queries).
All things being equal, I would use the tables as another level of partitioning if there is no overlap in tenant functionality and make my life easier should I want to delete a tenant. So, I guess that is option 2.
HTH

I highly suggest Option 2
We are also going this route because it adds a nice level or federation for the customer data. As the answered comment mentions it is easier to manage adding/deleting customers. Another benefit that we have noticed is the 'copy-abilty' of a customers data. This approach makes it much easier to move customer specific data to other storage accounts or to development environments for testing without affecting the entire lot.
In the SaaS world it also enables customers to get a copy of their own data with little effort, which is also a concern of many SaaS users.

Another alternative:
Imagine you have N storage accounts, the limit is 100 storage accounts per subscription. Each storage account have a table per customer.
For table request operations with Partition Key, like Insert, Update, Delete or a point query, you calculate hash value of customer name + partition key, calculate its modular of base N (total number of storage accounts), find the index of the exact storage account and forward the request to the correct storage account / table.
For read requests with no partition key, like a range query. Then you would need to broadcast the request to all storage accounts and merge the results.
One of the other things to keep in mind specifically around naming multiple storage accounts. Avoid naming the accounts lexicographically, that will cause them to be served from the same partition server on Azure backend and against their recommended scalability best practises. If you have N storage accounts. prefix each storage account name with a 3 digit hash, so they would be evenly distributed.

Related

How to identify per user cost and manage in azure

I am developing a website that uses Azure B2C, Azure Storage (Blobs, Tables, Queues and file share) among others. I want to restrict the user transaction of... say file uploads/Downloads with some giga bytes and then give them a message that their quota is over for this month.
Is this possible for keeping track of individual B2C customer in Azure as a website owner? what's the best approach that is available to handle this?
Thanks in Advance,
Murthy
Actually, Azure Storage don't have any feature to restrict customer's consumption.
The only way might meet your need is that using a script, whatever language azure support.
To be brief, the script's logic could be:
Create a table with customers' information.
Set the limit of every user. Write the function for automate operating usage and remaining memory, and store the usage field's value and the remaining memory field's value to the table. I use Last to present remaining memory in the table.
Compare the file size with the customer's memory remain, when the upload api requested. If the 'Last' memory have 10k more than the file to be uploaded, allow uploading, otherwise, deny the request.
If upload succeed, get the file size when customer upload/download file from storage, and stored it in the table.
The table just like this: (Just for example, you should modify with your need)

Azure Search with multiple indexes

I need to enable full text & faceted search for a service that stores each customer data in a separate Azure SQL database. Each database in turn stores customer's multiple projects data. Each database can contain n number of projects. Each customer's project's data is accessed as a isolated data repository. Therefore, I need search and facets to be limited to each project's data. Since Azure search supports finite number of Indexes, I am not sure how to best leverage it in my scenario? Moreover, searchable data across projects will have different set of information that needs to be searched. Therefore, columns in Index will vary from project to project in each database.
How to best address this problem through Azure search?
Take a look at the Design patterns for multitenant SaaS applications and Azure Search. In particular, in some cases you can share an index across tenants and use filters to isolate data - see this section. The drawback of this approach is that sharing data across tenants can affect search relevance (since term frequency / document frequency are scoped to an index), but in many scenarios this is acceptable.

How to scale multi tenant application for almost 2000 tenants

I want to create a multi tenant application. How to scale the db or partition because tenants will be more than 2000.
Is it correct to have individual db for each tenant.
Is it correct to split tenants by geographical region.Reporting will be a problem to get from individual tenant or generatinh report across multiple tenants.
This question may be off topic as too opinion based but here's a few things to note about DocumentDB:
You are limited to 5 databases by default.
Pricing is by collection, so it would be very costly to partition tenants into their own collection.
The common way to do this is to have a field with tenantID on each document and put all of your tenants into one collection. You may be surprised how much data fits in one collection and you can spill-over to a new collection when one fills up or you are constantly exceeding your resource unit limit.

Microsoft Azure DocumentDB vs Azure Table Storage

For several recent years, Microsoft offers a "NoSQL" key/value storage, called "Table Storage" (http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-tables/)
Table Storage offers a high performance, scalability (via partitioning) and relatively low cost. A primary drawback of Tables that only Partition and Row keys can be indexed - so making queries on values is very inefficient.
Recently Microsoft announced a new "NoSQL" service, called "DocumentDB" (http://azure.microsoft.com/en-us/documentation/services/documentdb/)
Instead of storing a list of properties (like Tables do), DocumentDB stores JSON objects. The whole object being indexed - so efficient queries may be created based on every property and any nested property of stored objects.
Microsoft says that DocumentDB provides high performance and scalability as well.
If that's so - why anyone would use Table Storage over DocumentDB? It sounds like DocumentDB provides the same functionality as Tables, but with additional capabilities such as the ability to index anything.
I will glad if someone could make a comparison between DocumentDB and Table Storage, highlighting cons and pros of each one.
Both are NoSQL technologies, but they are massively different. Azure Tables is a simple Key/Value store and does not support complex functionality like complex queries (most of them will require a full partition/table scan anyway, which will kill your performance and your cost savings), custom indexing (indexing is based on PartitionKey and RowKey only, you currently can't index on any other entity property and searching for anything other than PartitionKey/RowKey combination will require a partition/table scan), or stored procedures. You also can't batch read requests for multiple entities (through batch write requests are supported if all the entities belong to the same partition). For a real-life application of Azure Tables, see HERE.
If your data needs (particularly around querying them) are simple (like in the example above), then Azure Tables provide what you need, you might end up using that in favor of DocDB due to pricing, performance and storage capacity. For example, Azure Tables performance target is 20.000 operations per second. Trying to get that same level of performance on DocDB will have a significantly higher service cost for you. Also, Azure tables are limited by the capacity of your Azure storage account (500TB), whereas DocDB storage is limited by the capacity units you buy.
Table Services is mainly a key-value type NOSQL and DocumentDB is (as the name suggests) a Document Type NoSQL store. What you are asking is essentially the difference between these two types of NOSQL approaches. If you shape your research according to this you should be able to get a better understanding for sure.
Just to keep things simple I suggest you consider the differences between how DocumentDB and Table Services are priced. Not only the cost of these services vary a lot from each other but the fact that DocumentDB works on a "provision first" model and Table Services are offered on a pure consumption based pricing might give you some clues on your compare/contrast.
Let me ask you this; why would I use DocumentDB if the features in Table Services well serve my needs? ;) I suggest you to take a look at how the current Azure Diagnostics tooling use Azure Storage Services, how Storage Metrics use Azure Storage on itself to get a sense of how useful Table Services would be and how overkill DocumentDB might be in some situations.
Hope this helps.
I think that the comparison is all about trading price for performance. Table Services are just Storage Services, which seem to cap out at 20,000 ops/second, but paying for that kind of throughput all the time (because Storage gives it to us all the time) is $1,200/month. Crazy money.
Table services have simple indexes, so queries are very limited. Good for anything that is written and read via IDs. DocumentDB indexes the entire document, so a query can be done on any property.
And lastly, Table services are bound by the storage constraint of the Storage account it's on (which could get crazy high given negotiation with Microsoft directly), where DocumentDB storage seems unlimited.
So it's a balance. Do you have a LOT of data (hundreds of gigs, or terabytes) that you need in one place? DocumentDB. Do you need to support complex queries? DocumentDB. Do you have data that needs to come and go fast, but based on a 1-to-2 property lookup? Table services. Would you trade having to code around a simple index in order to avoid paying through the nose for throughput? Table services.
And Redis, someone mentioned that... man, I dunno. Even the existence of persistence in a caching framework (which Redis offers) doesn't turn it into a tech of choice... There is a huge difference between a persistent store that holds data that is "often used, but may be missing or time-retired", like a cache would, and a persistent store that guarantees your data to be there.
A real life example:
I have to store some tokens, retrieve them, delete them. Only query ever done will be based on User ID.
So I use Table Storage, as it fulfill my requirement perfectly. I save the token against User ID.
Document DB seemed to be overkill for this.
Here is the answer from microsoft's official docs
Common attributes of Cosmos DB, Azure Table Storage, and Azure SQL Database:
99.99 availability SLA
Fully managed database services
ISO 27001, HIPAA and EU Model Clauses Compliant
The following table shows the uncommon attributes of Azure Cosmos DB,
Azure Table Storage

Windows Azure Cache with a multi tenant application

I have in development a multi tenant application that I am deploying to azure.
I would like to take advantage of the windows azure cache service as it looks like it will be a great performance improvement vs hitting the database for each call.
Lets say I have 2 tables . Businesses and Customers. A business can have multiple customers and the business table contains details about the business.
Business details don't change often but customer information is changing constantly for each of the different tenants.
I assume I need 2 named instances (1 for business details and 1 for customers)
Is 2 named caches enough or do I need separate these for each of the tenants? I think 2 would be ok as if I have to create separate for each it will get expensive pretty quickly.
Thank you.
Using different named caches is interesting if you have different cache requirements (Expiry policy, default TTL, Notifications, High Availability, ...).
In you case you could simply look at using different Regions per tenant:
Windows Azure Cache supports the creation and use of user-defined regions. A region is a subgroup for cached items. Regions also support the annotation of cached items with additional descriptive strings called tags. Regions support the ability to perform search operations on any tagged items in that region.
This would allow you to split your named cache (you would only need one), in regions per tenant holding the businesses and customers for that tenant. And if the businesses don't change that often, you can simple change the TTL for those items to 1, 2, .. hours.

Resources