I need to enable full text & faceted search for a service that stores each customer data in a separate Azure SQL database. Each database in turn stores customer's multiple projects data. Each database can contain n number of projects. Each customer's project's data is accessed as a isolated data repository. Therefore, I need search and facets to be limited to each project's data. Since Azure search supports finite number of Indexes, I am not sure how to best leverage it in my scenario? Moreover, searchable data across projects will have different set of information that needs to be searched. Therefore, columns in Index will vary from project to project in each database.
How to best address this problem through Azure search?
Take a look at the Design patterns for multitenant SaaS applications and Azure Search. In particular, in some cases you can share an index across tenants and use filters to isolate data - see this section. The drawback of this approach is that sharing data across tenants can affect search relevance (since term frequency / document frequency are scoped to an index), but in many scenarios this is acceptable.
Related
I have a single Azure SQL Server and a single database in it. I want a solution to store specific records of selected tables in this database in different regions.
as an example, I have a users table with all PII data in it. these users can be from anywhere from the world. but i would want to store user records who are from EU region to be stored only in EU region.
To add it - i want all the other table records related to a specific user as well to get stored in that user's region.
from application perspective, i would be able to query across all users and all related tables to have dashboard data for the global users.
Any pointers to solve this scenario would be helpful for me.
Another approach could be sharding the database. Use horizontal sharding to store the rows for each country/region in a separate database in that country/region. The Elastic Database Client library will use a shardmap do most of the sharding work for you (assuming you are using .NET). You can use the country code in your shardmap to split regional data.
Reference Architecture: https://learn.microsoft.com/en-us/azure/architecture/patterns/sharding
Elastic Database Client: https://learn.microsoft.com/en-us/azure/sql-database/sql-database-elastic-database-client-library
Here is one approach... When your user/tenant registers for your service they will need to pick where their data should reside. This is referred to as data residency. Then on subsequent requests to read or write data your application's repository layer needs to be aware of who the request is executing as so it can lookup the appropriate connection string and connect to that database to retrieve/write the data.
The routing data can be replicated to multiple regions and/or housed in a single location as it would not contain PII. The Azure Web App can be single region hosted (as depicted in the image below) or it can be replicated to multiple regions and traffic routed to it via a global traffic manager.
This approach supports the case where an European user picks to have their data reside in France but happens to be visiting the united states.
This picture shows how this might look. A guy named Barry Luijbregts has a nice pluralsight video that delves into this approach. https://www.pluralsight.com/courses/azure-paas-building-global-app
Good luck!
We are developing a mobile app that should scale for thousands of users and we are using Azure Search as our main storage. According to Azure pricing model the query limits are set to 15 queries per second/per unit for the standard plan. With these limits and with a system that should scale with thousands of concurent users we would hit the limits pretty quickly.
In our situation is Azure Search not the right option when scaling for thousands of concurrent users?
Would DocumentDB be a better option?
Thanks!
Interesting that you're using Azure Search as your primary storage, as it's not built to be a database engine. The storage is specifically for search content (type typical pattern is to use Azure Search in conjunction with a database engine, such as SQL Database or DocumentDB, for example), using results to point back to the "system of record" content in your database.
The scale for Search is specifically for full-text-search queries your users will generate. And Azure Search scales per unit, with each unit offering 15 searches / second. So, you can scale far beyond 15/sec if you buy more search units.
However: Don't confuse this with database engine queries. You asked about DocumentDB, so using that as an example: You can query far beyond 15/second with that database engine, and that scales independently. Same goes for any VM-based database solution, SQL Database, etc - they all can scale.
This really comes down to whether you need full-text-search at high volume. If so, great - just scale Azure Search to the number of units you need, to handle your request traffic. If you can do more database-specific searches, without driving your request through Azure Search, then you don't need to scale out as much, and can take advantage of the native database query capabilities.
One thing to add to David's excellent answer - if your scenario is primarily search driven and you don't need to store data for purposes other than search and are OK with eventual consistency, then using Azure Search as the primary store may be fine.
Also, 15 requests per second query throughput of Azure Search is just a ballpark - it's neither a hard limit nor a promise. Depending on your data and query complexity, the actual throughput can be significantly (many times) higher or lower.
I have in development a multi tenant application that I am deploying to azure.
I would like to take advantage of the windows azure cache service as it looks like it will be a great performance improvement vs hitting the database for each call.
Lets say I have 2 tables . Businesses and Customers. A business can have multiple customers and the business table contains details about the business.
Business details don't change often but customer information is changing constantly for each of the different tenants.
I assume I need 2 named instances (1 for business details and 1 for customers)
Is 2 named caches enough or do I need separate these for each of the tenants? I think 2 would be ok as if I have to create separate for each it will get expensive pretty quickly.
Thank you.
Using different named caches is interesting if you have different cache requirements (Expiry policy, default TTL, Notifications, High Availability, ...).
In you case you could simply look at using different Regions per tenant:
Windows Azure Cache supports the creation and use of user-defined regions. A region is a subgroup for cached items. Regions also support the annotation of cached items with additional descriptive strings called tags. Regions support the ability to perform search operations on any tagged items in that region.
This would allow you to split your named cache (you would only need one), in regions per tenant holding the businesses and customers for that tenant. And if the businesses don't change that often, you can simple change the TTL for those items to 1, 2, .. hours.
I have a few questions regarding Microsoft SQL Azure Federations:
1) Can I created a federated DB on an active Database or do I need to deploy federations ahead of time?
2) Do I need to make any changes to the SQL queries to comply with how I query federations, or I can continue to use my regular queries as I was working against one SQL Server Database?
3) When I split my database and after some time I see that one of the shards is very busy and almost full, how I tackle this problem using federations? - Do I need to split only that single federated table that is 90% full, or I need to recreate the splitting strategy by using a a less broader range. The problem is that one specific user can be very active, so what strategy I use to making sure that I won't need to re-create the federated strategy due to one very active federated table / user?
4) When I have different tables that I want to split with different primary keys, how the sharding will work then. for example:
From what I understand:
[Blogs]
blog_id
info
[Blog_Posts]
id
blog_id
post_content
So if I decide to shard based on the blog_id from 0-1000, 1-2001 I will have two federated tables. But how much more federated tables I have if I add more tables that have different keys other than blog_id, will I have more federated tables?
Thanks
Please be more precise and concrete and ask one question at a time. You have better chance for getting an answer to all of the questions when asked separately. Now let me try covering some of your questions.
1) Can I created a federated DB on an active Database or do I need to
deploy federations ahead of time?
You can certainly create a Federation(s) within an existing DB. There is no limitation to creating Federations in just a new/empty DB. However, creating a federation in an Active DB will do nothing for you. You have to realize that Federations are separate DBs. A Federation (or Federation member) knows nothing about the Federations Root DB (the DB where you created the federation). So you have to think on migrating schema/data from the Active DB (or the Federations Root) once you create your federation.
2) Do I need to make any changes to the SQL queries to comply with how
I query federations, or I can continue to use my regular queries as I
was working against one SQL Server Database?
Most probably YES. Windows Azure SQL Database Federations is a Scale-Out mechanism for the DB tier. This means, that like any Web Application needs a "special" design to work in a farm-like environment (i.e. scale-out environment like Windows Azure), a DataBase will also need a "special" design to work in a scale-out environment. There is no magic-wand with SQL Azure Federations that will make your code work. You have to design it to Work.
3) When I split my database and after some time I see that one of the
shards is very busy and almost full, how I tackle this problem using
federations? - Do I need to split only that single federated table
that is 90% full, or I need to recreate the splitting strategy by
using a a less broader range. The problem is that one specific user
can be very active, so what strategy I use to making sure that I won't
need to re-create the federated strategy due to one very active
federated table / user?
This is all about partitioning strategy. You have to very carefully design your federation key and how you partition your data across different shards. You can always SPLIT any federation, as long as you keep the Atomic Units in single shard.
4) When I have different tables that I want to split with different
primary keys, how the sharding will work then.
If you want to split different tables on different keys, than you will have different federations, each one with its own federation key and own tables.
A good video worth watching if you are up for SQL Federations: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI408
I am designing a multi-tennant web-based SaaS application that will be hosted on Windows Azure and use Table Storage.
The only limits I have found so far are:
5 storage accounts per subscription
100 TB maximum per storage account
1 MB per entity
I am deciding how to best partition my storage for multiple customers:
Option 1: Give each customer their own storage account. Not likely, considering the 5 account default limit.
Option 2: Give each customer their own set of tables. Prefix the table names with customer identifiers, such as a Books table split as "CustA_Books", "CustB_Books", etc.
Option 3: Have one set of tables, but prefix the partition keys to split the customers. So one "Books" table with partition keys of "CustA_Fiction", "CustA_NonFiction", "CustB_Fiction", "CustB_NonFiction", etc.
What are the pros and cons for options 2 and 3? Is there a limit to the number of tables in a single account that might affect option 2?
There are no limits to the number of tables you can create in Windows Azure. Your only limits ar the ones you have already listed. Well... I guess there are other limits if you consider the size of the entity attribute is always 64KB or less or if you consider batch options (100 entities or 4MB, whatever is the lesser).
Anyhow, the thing to keep in mind here is that your PartitionKey is going to be the most important thing you make. If you create a PK with the customer name in it, you get some good partitioning benefits. The downside to this is that if you mix the customer data in the same table, you make it harder on yourself to delete data (if you ever need to delete a customer). So, you can use the table as another level of partitioning. The PK you create is scoped to the table you create it under.
What I would consider here is if you ever need to delete the data in bulk or if you ever need to query data across customers (tenants). For the first one, it makes a ton of sense to use separate tables per customer so a delete is one operation versus at best 1 per 100 entities. However, if you need to query across tenants it is harder to join this data when you have multiple tables (that would require multiple queries).
All things being equal, I would use the tables as another level of partitioning if there is no overlap in tenant functionality and make my life easier should I want to delete a tenant. So, I guess that is option 2.
HTH
I highly suggest Option 2
We are also going this route because it adds a nice level or federation for the customer data. As the answered comment mentions it is easier to manage adding/deleting customers. Another benefit that we have noticed is the 'copy-abilty' of a customers data. This approach makes it much easier to move customer specific data to other storage accounts or to development environments for testing without affecting the entire lot.
In the SaaS world it also enables customers to get a copy of their own data with little effort, which is also a concern of many SaaS users.
Another alternative:
Imagine you have N storage accounts, the limit is 100 storage accounts per subscription. Each storage account have a table per customer.
For table request operations with Partition Key, like Insert, Update, Delete or a point query, you calculate hash value of customer name + partition key, calculate its modular of base N (total number of storage accounts), find the index of the exact storage account and forward the request to the correct storage account / table.
For read requests with no partition key, like a range query. Then you would need to broadcast the request to all storage accounts and merge the results.
One of the other things to keep in mind specifically around naming multiple storage accounts. Avoid naming the accounts lexicographically, that will cause them to be served from the same partition server on Azure backend and against their recommended scalability best practises. If you have N storage accounts. prefix each storage account name with a 3 digit hash, so they would be evenly distributed.