Indentity Column vs GUID for Scaling Up - azure

I am designing a SQL database for the Azure cloud and I am wondering about the use of IDENTITY vs GUID columns for primary keys, especially when scaling up the database. I already understand the size, performance and clustering differences between them but I am concerned about what will happen to an IDENTITY column if we scale up OR Geo-replicate the database? Would using an IDENTITY column eventually catch up to us in a bad way if we needed to scale?
I've tried searching online for best practices regarding this in SQL Azure but all the documentation I am finding seems to be from 2010 or 2012.
Thanks in advance for the help!

You only have to be concerned when writing to multiple different primary databases that operate over the same dataset. This could be the case if you shard you DB for example. Whenever you only have one DB you write to, there should be no problem.
Geo-replication is not a concern as the secondaries are (if at all accessible) read only and IDs are only generated on the primary. The IDENTITY will work correctly after failovers.

Related

Azure Table Storage for housing Application Configuration

-- I am exploring Azure functionality and am wondering if Azure Table Storage can be an easy way for holding application configuration for an entire environment. It would be easy to see and change (adding list values etc.). Can someone please guide me on whether this is a good idea? I would expect this table to hold no more than 2000 rows if all our applications were moved over to Azure.
Partition Key --> Project Name + Component Name (Azure Function/Logic App)
Row Key --> Parameter Key
Value column --> Parameter Value
-- For securing password/keys, I can use the Azure Key Vault.
There are different ways of storing application configurations:
Key Vault (as you stated) for sensitive information. Ex. tokens, keys, connection strings. It can be standardized and extended to any type of resources for ease of storing and retrieving these.
Application Settings, found under each App Service. This approach assumes you have an App Service for each of your app.
Release Pipeline, such as Azure DevOps Services (AzDo). AzDo has variables that can be global to the release pipeline or some that can be specific to each stages
I am exploring Azure functionality and am wondering if Azure Table
Storage can be an easy way for holding application configuration for
an entire environment. It would be easy to see and change (adding list
values etc.). Can someone please guide me on whether this is a good
idea?
Considering Azure Tables is a key/value pair store, it is certainly a good idea to store application configuration values there. Only thing I would recommend is that you incorporate some kind of caching layer between your application and table storage so that you don't end up making calls to table storage every time you need to fetch a setting.
I would expect this table to hold no more than 2000 rows if all our
applications were moved over to Azure.
Considering the number of entities is going to be less than 2000, I think your design would have no impact in querying the entities however I think your design is good. For best performance, please ensure that you're including both PartitionKey and RowKey while querying. At the very least, include PartitionKey in your query.
Please see this for more details: https://learn.microsoft.com/en-us/azure/cosmos-db/table-storage-design-guide.
For securing password/keys, I can use the Azure Key Vault.
That's the way to go for storing sensitive data in Azure.
Have you looked at the App Configuration service?
There are client libraries in .NET, Java, TypeScript and Python to interact with the service that you can leverage in your application.

Azure Storage Account for Tables

So first of all I'd like to say I'm no DBA nor coder, I'm just a regular IT person that works as support for network and infrastructure, however, I like to get familiar with technologies in general and understand the basics of it, let's say how they work, implemented with no additional specific details.
I've been reading about Azure Storage Accounts in regards to tables. As IT, I had to implement simple file shares via SMB 3.0 in order to have them mapped on our network, I've come across other options such as blobs, tables and queues. I've read about them however I'm trying to get the main functionality of tables for a coder.
Correct me if I am wrong, when you code an app with a database, you can put the database on same/different server, and that can be on premise or on the cloud and you kind of link both together.
And as far as Im concerned and what I was able to find out investigating on the web, these tables are NoSQL and no constraints, you create the tables and data through Visual Studio thanks to an API, then that information is reflect on your storage.
How is this is useful when using it for the app you're developing?
I've been reading about Azure Storage Accounts in regards to tables. As IT, I had to implement simple file shares via SMB 3.0 in order to have them mapped on our network, I've come across other options such as blobs, tables and queues. I've read about them however I'm trying to get the main functionality of tables for a coder.
And as far as Im concerned and what I was able to find out investigating on the web, these tables are NoSQL and no constraints, you create the tables and data through Visual Studio thanks to an API, then that information is reflect on your storage.
Azure Storage Accounts is a "box" to keep your Blobs, Tables, Queues, Files organised from the management point of view and for the access control. Each storage type is good for it's specific tasks.
If the world would have just one super storage which will solve all our possible cases for storing, querying and managing the data then there would not be such variety of different databases, storage types etc. available.
If you need to share the files as a "network folder" - try Azure Files.
If your coders need a database storage, then the first question would be what are the requirements to the database do they have? What is the purpose of that database would be, etc. Azure, particularly, has a lot of different database solutions, and again, each of them good for some specific task, and can be not a good choice for other tasks.
As to Azure Tables, from the official docs:
Azure Table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schemaless design.
So, if your coders do need to store such data, then yes, that would be one of the possible choices.
Correct me if I am wrong, when you code an app with a database, you can put the database on same/different server, and that can be on premise or on the cloud and you kind of link both together.
Correct. But also you can have your own server with the database which you need to manage yourself, or you can choose some cloud service which will provide the database for you but will keep the underlying server and other maintenance activity managed for you, so you no need to worry/spend your time on that.
How is this is useful when using it for the app you're developing?
It is important to understand what your requirements are for data storage in order to pick a proper one. This question perhaps should be addressed not to you, but to your coders, who are building the app and can consolidate their requirements to the database store. Usually, they will tell you exactly what they need, and you may give them some ideas or advice of the alternatives, if any (That may be a similar solution with extra functionality or the way how the data is stored or processed, or have more built in integrations that may be important for you, or a decision whether keep own installation or use cloud managed service)
For your further possible question about When should I use a NoSQL database instead of a relational database? Is it okay to use both on the same site? see this thread
Update based on further questions:
If I develop an application with a database whose tables are on Azure, can I call let's say functions or data from it to my main application that is hosted on premise? What's the benefit of doing that versus hosting the tables on premise other than it's largely scalable and highly available?
Perhaps you need to better understand the relationship between App (Application) and DB (Database). The Database is a standalone system, which store the data, reply to the incoming queries (receive request, process it, return the result). In overall to the DB is not important who is requesting the data. It is a "passive" system. (There are some cases when DB can trigger further processes in data processing pipelines, but that is beyond this scope).
The App in opposite is an active system in App<->DB relationship. (Also leave behind more advanced designs where App is not just a 1 system). App receive requests, process them (may do external requests to other "services" if that is necessary), give a response (with or without data) to the requester. In App<->DB relationship the external requests is what happening. At some point App need some data from the DB, so App make a request to the DB, obtain the response and continue its own logic.
Where App server and DB server are placed is not that important (for simplicity). The important part is whether DB server is accessable for the requests. DB can be on-prem with public static IP address, it can be in cloud on your own server which has public static IP address (sometimes that is archived in different ways but we skip that for simplicity), that can be a Database as a Service cloud solution, where you do not need to have a server and configure the database, but have a url endpoint which you need to use to query the DB.
I appreciate the answer, and I pretty much agree with what you're saying.
But my questions goes beyond what the requirements are for the developers.
I'll modify the question. If I develop an application with a database whose tables are on Azure, can I call let's say functions or data from it to my main application that is hosted on premise? What's the benefit of doing that versus hosting the tables on premise other than it's largely scalable and highly available?
Azure Storage Tables are the "Notepad" of NoSQL Databases. If you want quick and easy key/value pairs, tables is the way to go. If you are looking for the "Word" of NoSQL in Azure then Cosmos DB is where it's at. Cosmos DB offers global distrobution, better features and better SLA (see comparison). Tables are cheaper too.
Azure also supports MySQL, PostGreSQL, MariaDB and MSSQL as PaaS offerings if you wish to use a traditional database.

Microsoft Azure DocumentDB vs Azure Table Storage

For several recent years, Microsoft offers a "NoSQL" key/value storage, called "Table Storage" (http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-tables/)
Table Storage offers a high performance, scalability (via partitioning) and relatively low cost. A primary drawback of Tables that only Partition and Row keys can be indexed - so making queries on values is very inefficient.
Recently Microsoft announced a new "NoSQL" service, called "DocumentDB" (http://azure.microsoft.com/en-us/documentation/services/documentdb/)
Instead of storing a list of properties (like Tables do), DocumentDB stores JSON objects. The whole object being indexed - so efficient queries may be created based on every property and any nested property of stored objects.
Microsoft says that DocumentDB provides high performance and scalability as well.
If that's so - why anyone would use Table Storage over DocumentDB? It sounds like DocumentDB provides the same functionality as Tables, but with additional capabilities such as the ability to index anything.
I will glad if someone could make a comparison between DocumentDB and Table Storage, highlighting cons and pros of each one.
Both are NoSQL technologies, but they are massively different. Azure Tables is a simple Key/Value store and does not support complex functionality like complex queries (most of them will require a full partition/table scan anyway, which will kill your performance and your cost savings), custom indexing (indexing is based on PartitionKey and RowKey only, you currently can't index on any other entity property and searching for anything other than PartitionKey/RowKey combination will require a partition/table scan), or stored procedures. You also can't batch read requests for multiple entities (through batch write requests are supported if all the entities belong to the same partition). For a real-life application of Azure Tables, see HERE.
If your data needs (particularly around querying them) are simple (like in the example above), then Azure Tables provide what you need, you might end up using that in favor of DocDB due to pricing, performance and storage capacity. For example, Azure Tables performance target is 20.000 operations per second. Trying to get that same level of performance on DocDB will have a significantly higher service cost for you. Also, Azure tables are limited by the capacity of your Azure storage account (500TB), whereas DocDB storage is limited by the capacity units you buy.
Table Services is mainly a key-value type NOSQL and DocumentDB is (as the name suggests) a Document Type NoSQL store. What you are asking is essentially the difference between these two types of NOSQL approaches. If you shape your research according to this you should be able to get a better understanding for sure.
Just to keep things simple I suggest you consider the differences between how DocumentDB and Table Services are priced. Not only the cost of these services vary a lot from each other but the fact that DocumentDB works on a "provision first" model and Table Services are offered on a pure consumption based pricing might give you some clues on your compare/contrast.
Let me ask you this; why would I use DocumentDB if the features in Table Services well serve my needs? ;) I suggest you to take a look at how the current Azure Diagnostics tooling use Azure Storage Services, how Storage Metrics use Azure Storage on itself to get a sense of how useful Table Services would be and how overkill DocumentDB might be in some situations.
Hope this helps.
I think that the comparison is all about trading price for performance. Table Services are just Storage Services, which seem to cap out at 20,000 ops/second, but paying for that kind of throughput all the time (because Storage gives it to us all the time) is $1,200/month. Crazy money.
Table services have simple indexes, so queries are very limited. Good for anything that is written and read via IDs. DocumentDB indexes the entire document, so a query can be done on any property.
And lastly, Table services are bound by the storage constraint of the Storage account it's on (which could get crazy high given negotiation with Microsoft directly), where DocumentDB storage seems unlimited.
So it's a balance. Do you have a LOT of data (hundreds of gigs, or terabytes) that you need in one place? DocumentDB. Do you need to support complex queries? DocumentDB. Do you have data that needs to come and go fast, but based on a 1-to-2 property lookup? Table services. Would you trade having to code around a simple index in order to avoid paying through the nose for throughput? Table services.
And Redis, someone mentioned that... man, I dunno. Even the existence of persistence in a caching framework (which Redis offers) doesn't turn it into a tech of choice... There is a huge difference between a persistent store that holds data that is "often used, but may be missing or time-retired", like a cache would, and a persistent store that guarantees your data to be there.
A real life example:
I have to store some tokens, retrieve them, delete them. Only query ever done will be based on User ID.
So I use Table Storage, as it fulfill my requirement perfectly. I save the token against User ID.
Document DB seemed to be overkill for this.
Here is the answer from microsoft's official docs
Common attributes of Cosmos DB, Azure Table Storage, and Azure SQL Database:
99.99 availability SLA
Fully managed database services
ISO 27001, HIPAA and EU Model Clauses Compliant
The following table shows the uncommon attributes of Azure Cosmos DB,
Azure Table Storage

SQL Azure Federation Splitting Design and Querying

I have a few questions regarding Microsoft SQL Azure Federations:
1) Can I created a federated DB on an active Database or do I need to deploy federations ahead of time?
2) Do I need to make any changes to the SQL queries to comply with how I query federations, or I can continue to use my regular queries as I was working against one SQL Server Database?
3) When I split my database and after some time I see that one of the shards is very busy and almost full, how I tackle this problem using federations? - Do I need to split only that single federated table that is 90% full, or I need to recreate the splitting strategy by using a a less broader range. The problem is that one specific user can be very active, so what strategy I use to making sure that I won't need to re-create the federated strategy due to one very active federated table / user?
4) When I have different tables that I want to split with different primary keys, how the sharding will work then. for example:
From what I understand:
[Blogs]
blog_id
info
[Blog_Posts]
id
blog_id
post_content
So if I decide to shard based on the blog_id from 0-1000, 1-2001 I will have two federated tables. But how much more federated tables I have if I add more tables that have different keys other than blog_id, will I have more federated tables?
Thanks
Please be more precise and concrete and ask one question at a time. You have better chance for getting an answer to all of the questions when asked separately. Now let me try covering some of your questions.
1) Can I created a federated DB on an active Database or do I need to
deploy federations ahead of time?
You can certainly create a Federation(s) within an existing DB. There is no limitation to creating Federations in just a new/empty DB. However, creating a federation in an Active DB will do nothing for you. You have to realize that Federations are separate DBs. A Federation (or Federation member) knows nothing about the Federations Root DB (the DB where you created the federation). So you have to think on migrating schema/data from the Active DB (or the Federations Root) once you create your federation.
2) Do I need to make any changes to the SQL queries to comply with how
I query federations, or I can continue to use my regular queries as I
was working against one SQL Server Database?
Most probably YES. Windows Azure SQL Database Federations is a Scale-Out mechanism for the DB tier. This means, that like any Web Application needs a "special" design to work in a farm-like environment (i.e. scale-out environment like Windows Azure), a DataBase will also need a "special" design to work in a scale-out environment. There is no magic-wand with SQL Azure Federations that will make your code work. You have to design it to Work.
3) When I split my database and after some time I see that one of the
shards is very busy and almost full, how I tackle this problem using
federations? - Do I need to split only that single federated table
that is 90% full, or I need to recreate the splitting strategy by
using a a less broader range. The problem is that one specific user
can be very active, so what strategy I use to making sure that I won't
need to re-create the federated strategy due to one very active
federated table / user?
This is all about partitioning strategy. You have to very carefully design your federation key and how you partition your data across different shards. You can always SPLIT any federation, as long as you keep the Atomic Units in single shard.
4) When I have different tables that I want to split with different
primary keys, how the sharding will work then.
If you want to split different tables on different keys, than you will have different federations, each one with its own federation key and own tables.
A good video worth watching if you are up for SQL Federations: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI408

Windows Azure and multiple storage accounts

I have an ASP.NET MVC 2 Azure application that I am trying to switch from being single tenant to multi-tenant. I have been reviewing many blogs and posts and questions here on Stack Overflow, but am still trying to wrap my head around the specifics of what's right for this particular app.
Currently the application stores some information in a SQL Azure database, as well as some other info in an Azure Storage Account. I'm considering writing the tenant provisioning code to simply create a new database for a new tenant, along with a new azure storage account. This brings me to the following question:
How will I go about testing this approach locally? As far as I can tell, the local Azure Storage Emulator only has 1 storage account. I'm not sure if I'm able to create others locally. How will I be able to test this locally? Or will it be possible?
There are many aspects to consider with multitenancy, one of which is data architecture. You also have billing, performance, security and so forth.
Regarding data architecture, let's first explore SQL storage. You have the following options available to you: add a CustomerID (or other identifyer) that your code will use to filter records, use different schema containers for different customers (each customer has its own copy of all the database objects owned by a dedicated schema in a database), linear sharding (in which each customer has its own database) and Federation (a feature of SQL Azure that offers progressive sharding based on performance and scalability needs). All these options are valid, but have different implications on performance, scalability, security, maintenance (such as backups), cost and of course database design. I couldn't tell you which one to choose based on the information you provided; some models are easier to implement than others if you already have a code base. Generally speaking a linear shard is the simplest model and provides strong customer isolation, but perhaps the most expensive of all. A schema-based separation is not too hard, but requires a good handle on security requirements and can introduce cross-customer performance issues because this approach is not shared-nothing (for customers on the same database). Finally Federations requires the use of a customer identifyer and has a few limitations; however this technology gives you more control over performance distribution and long-term scalability (because like a linear shard, Federation uses a shared-nothing architecture).
Regarding storage accounts, using different storage accounts per customer is definitively the way to go. The primary issue you will face if you don't use separate storage accounts is performance limitations, such as the maximum number of transactions per second that can be executed using a single storage account. As you are pointing out however, testing locally may be a problem; however consider this: the local emulator does not offer 100% parity with an Azure Storage Account (some functions are not supported in the emulator). So I would only use the local emulator for initial development and troubleshooting. Any serious testing, including multitenant testing, should be done using real storage accounts. This is the only way you can fully test an application.
You should consider not creating separate databases, but instead creating different object namespaces within a single SQL database. Each tenant can have their own set of tables.
Depending on how you are using storage, you can create separate storage containers or message queues per client.
Given these constraints you should be able to test locally with the storage emulator and local SQL instance.
Please let me know if you need further explanation.

Resources