Best practices for storing data with Azure Functions - azure

I've been working a lot with microservices recently and the common pattern is that every service is responsible for its own data. thus service "A" can not access service "B" data directly without talking to service "B" via some http api or message queue.
Now I've started to pick up some work with azure functions for the first time. I've looked at a fair few examples and they all seem to have any old function just dabbling with data in a shared data store (Which seems like we're going back to the old style of having a massive monolithic database).
I was just wondering if there was a common pattern to follow with data storage when using Function as a Service? And where does the responsibilities lie?

The following screen snippet is an example of the event-driven distributed model of the business processors in the cloud-based solutions without using a monolithic database. More details about this concept and technique can be found in my article Using Azure Lease Blob
Note, that the each Business Context has own Lease Blob for holding a state of the processing with references to other resources such as metadata, config, data, results, etc. This concept allows to create a matrix (multi) dimensional business processing model, where each sub-nested process can have own Lease Blob.

Related

Azure Storage Account for Tables

So first of all I'd like to say I'm no DBA nor coder, I'm just a regular IT person that works as support for network and infrastructure, however, I like to get familiar with technologies in general and understand the basics of it, let's say how they work, implemented with no additional specific details.
I've been reading about Azure Storage Accounts in regards to tables. As IT, I had to implement simple file shares via SMB 3.0 in order to have them mapped on our network, I've come across other options such as blobs, tables and queues. I've read about them however I'm trying to get the main functionality of tables for a coder.
Correct me if I am wrong, when you code an app with a database, you can put the database on same/different server, and that can be on premise or on the cloud and you kind of link both together.
And as far as Im concerned and what I was able to find out investigating on the web, these tables are NoSQL and no constraints, you create the tables and data through Visual Studio thanks to an API, then that information is reflect on your storage.
How is this is useful when using it for the app you're developing?
I've been reading about Azure Storage Accounts in regards to tables. As IT, I had to implement simple file shares via SMB 3.0 in order to have them mapped on our network, I've come across other options such as blobs, tables and queues. I've read about them however I'm trying to get the main functionality of tables for a coder.
And as far as Im concerned and what I was able to find out investigating on the web, these tables are NoSQL and no constraints, you create the tables and data through Visual Studio thanks to an API, then that information is reflect on your storage.
Azure Storage Accounts is a "box" to keep your Blobs, Tables, Queues, Files organised from the management point of view and for the access control. Each storage type is good for it's specific tasks.
If the world would have just one super storage which will solve all our possible cases for storing, querying and managing the data then there would not be such variety of different databases, storage types etc. available.
If you need to share the files as a "network folder" - try Azure Files.
If your coders need a database storage, then the first question would be what are the requirements to the database do they have? What is the purpose of that database would be, etc. Azure, particularly, has a lot of different database solutions, and again, each of them good for some specific task, and can be not a good choice for other tasks.
As to Azure Tables, from the official docs:
Azure Table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schemaless design.
So, if your coders do need to store such data, then yes, that would be one of the possible choices.
Correct me if I am wrong, when you code an app with a database, you can put the database on same/different server, and that can be on premise or on the cloud and you kind of link both together.
Correct. But also you can have your own server with the database which you need to manage yourself, or you can choose some cloud service which will provide the database for you but will keep the underlying server and other maintenance activity managed for you, so you no need to worry/spend your time on that.
How is this is useful when using it for the app you're developing?
It is important to understand what your requirements are for data storage in order to pick a proper one. This question perhaps should be addressed not to you, but to your coders, who are building the app and can consolidate their requirements to the database store. Usually, they will tell you exactly what they need, and you may give them some ideas or advice of the alternatives, if any (That may be a similar solution with extra functionality or the way how the data is stored or processed, or have more built in integrations that may be important for you, or a decision whether keep own installation or use cloud managed service)
For your further possible question about When should I use a NoSQL database instead of a relational database? Is it okay to use both on the same site? see this thread
Update based on further questions:
If I develop an application with a database whose tables are on Azure, can I call let's say functions or data from it to my main application that is hosted on premise? What's the benefit of doing that versus hosting the tables on premise other than it's largely scalable and highly available?
Perhaps you need to better understand the relationship between App (Application) and DB (Database). The Database is a standalone system, which store the data, reply to the incoming queries (receive request, process it, return the result). In overall to the DB is not important who is requesting the data. It is a "passive" system. (There are some cases when DB can trigger further processes in data processing pipelines, but that is beyond this scope).
The App in opposite is an active system in App<->DB relationship. (Also leave behind more advanced designs where App is not just a 1 system). App receive requests, process them (may do external requests to other "services" if that is necessary), give a response (with or without data) to the requester. In App<->DB relationship the external requests is what happening. At some point App need some data from the DB, so App make a request to the DB, obtain the response and continue its own logic.
Where App server and DB server are placed is not that important (for simplicity). The important part is whether DB server is accessable for the requests. DB can be on-prem with public static IP address, it can be in cloud on your own server which has public static IP address (sometimes that is archived in different ways but we skip that for simplicity), that can be a Database as a Service cloud solution, where you do not need to have a server and configure the database, but have a url endpoint which you need to use to query the DB.
I appreciate the answer, and I pretty much agree with what you're saying.
But my questions goes beyond what the requirements are for the developers.
I'll modify the question. If I develop an application with a database whose tables are on Azure, can I call let's say functions or data from it to my main application that is hosted on premise? What's the benefit of doing that versus hosting the tables on premise other than it's largely scalable and highly available?
Azure Storage Tables are the "Notepad" of NoSQL Databases. If you want quick and easy key/value pairs, tables is the way to go. If you are looking for the "Word" of NoSQL in Azure then Cosmos DB is where it's at. Cosmos DB offers global distrobution, better features and better SLA (see comparison). Tables are cheaper too.
Azure also supports MySQL, PostGreSQL, MariaDB and MSSQL as PaaS offerings if you wish to use a traditional database.

Service Fabric Stateful services with single DB Persistance service

I'm about to start a project that requires very fast response times and high availability, i have done a few service fabric projects before so i'm feeling pretty confident about that.
I'm currently leaning towards a specific design, based on stateful content services as main datasource with a single data persistance service saving to a database of some sort.
Read operations are done by web-api
Write operations are done by Azure service bus communication with Rebus as handler.
Content services
The content services are stateful services which on commit sends a message to the persistance service with the object saved in the reliable dictionary, serialized as json.
The content services them selves will be responsible for json deserialization in the event that they need to restore the data.
Restore scenarios could be when the entire dictionary for some reason is lost or when a reset message is put on the bus.
Persistance service
Recieves a message from the bus and stores the included entity, to a data store (Not yet decided, maybe table storage).
Serves an entire repository of data when a service need to reload data.
Only concerns itself with storing and retrieving data, no integrity checks
I'm really unsure about whether this is a feasible way of designing a system, that also has a high amount of user data.
what are your thoughts on this design?
I ended up pursuing this solution and it works quite well and performs very well but it needs extensive testing in order to make sure that everything works as expected.

Multiple Instances of Azure Worker Roles for non-transaction integration tasks

We have an upcoming project where we'll need to integrate with 3rd parties over a variety of transports to get data from them.
Things like WCF Endpoints & Web API Rest Endpoints are fine.
However in 2 scenario's we'll need to either pick up auto-generated emails containing xml from a pop3 account OR pull the xml files from an External SFTP account.
I'm about to start prototyping these now, but I'm wondering are there any standard practices, patterns or guidelines about how to deal with these non-transactional systems, in a multi-instance worker role environment. i.e.
What happens if 2 workers connect to the pop account at the same time or the same FTP at the same time.
What happens if 1 worker deletes the file from the FTP while another is in mid-download.
Controlling duplication shouldn't be an issue, as we'll be logging everything on application side to a database, and everything should be uniquely identifiable so we'll be able to add if-not-exists-create-else-skip logic to the workers but I'm just wondering is there anything else I should be considering to make it more resilient/idempotent.
Just thinking out loud, since the data is primarily files and emails one possible thing you could do is instead of directly processing them via your worker roles first thing you do is save them in blob storage. So there would be some worker role instances which will periodically poll the POP3 server / SFTP site and pull the data from the there and push them in blob storage. When the blob is written, same instance can delete the data from the source as well. With this approach you don't have to worry about duplicate records because blob will be overwritten (assuming each message/file has a unique identifier and the name of the blob is that identifier).
Once the file is in your blob storage, you can write a message in a Windows Azure Queue which has details about this blob (may be blob URL etc.). Then using 'Get' semantics of Windows Azure Queues, your worker role instances start fetching and processing these messages. Because of Get semantic, once a message is fetched from the queue it becomes invisible to other callers (worker roles instances in this case). This way you could take care of duplicate message processing.
UPDATE
So I'm trying to combat against two competing instances pulling the same file at the same moment from the SFTP
For this, I'll pitch my favorite Master/Slave Concept:). Essentially the idea is that each instance will try to acquire a lease on a single blob. The instance which acquires the lease becomes the master and others slave. Master would fetch the data from SFTP while slaves will wait. I've described this concept in my blog post which you can read here: http://gauravmantri.com/2013/01/23/building-a-simple-task-scheduler-in-windows-azure/, though the context of the blog is somewhat different.
have a look the recently released Cloud Design Patterns. you might be able to find the corresponding pattern and sample code for what you need.

Transferring objects between azure worker roles

I'm trying to find a good way to send objects between the worker roles of two different azure applications. They are very simple objects(only about 20 properties on each one) and there could be as many as 10,000 of these objects transferred at a time. I'm thinking azure table storage would do the job, but I am not sure if theres anything else out there that would do a better job. I thought about serializing them and using blob storage, but Id like to know what the correct approach would be.
I think Azure Table Storage is an excellent system for that. If you need to manage which worker role processes individual items, you might consider using a Queue to communicate between the worker roles. Microsoft Patterns and Practices created the "CQRS Journey" set of documentation to formalize how you can manage the flow of data through your Azure services with multiple roles.

Windows Azure and multiple storage accounts

I have an ASP.NET MVC 2 Azure application that I am trying to switch from being single tenant to multi-tenant. I have been reviewing many blogs and posts and questions here on Stack Overflow, but am still trying to wrap my head around the specifics of what's right for this particular app.
Currently the application stores some information in a SQL Azure database, as well as some other info in an Azure Storage Account. I'm considering writing the tenant provisioning code to simply create a new database for a new tenant, along with a new azure storage account. This brings me to the following question:
How will I go about testing this approach locally? As far as I can tell, the local Azure Storage Emulator only has 1 storage account. I'm not sure if I'm able to create others locally. How will I be able to test this locally? Or will it be possible?
There are many aspects to consider with multitenancy, one of which is data architecture. You also have billing, performance, security and so forth.
Regarding data architecture, let's first explore SQL storage. You have the following options available to you: add a CustomerID (or other identifyer) that your code will use to filter records, use different schema containers for different customers (each customer has its own copy of all the database objects owned by a dedicated schema in a database), linear sharding (in which each customer has its own database) and Federation (a feature of SQL Azure that offers progressive sharding based on performance and scalability needs). All these options are valid, but have different implications on performance, scalability, security, maintenance (such as backups), cost and of course database design. I couldn't tell you which one to choose based on the information you provided; some models are easier to implement than others if you already have a code base. Generally speaking a linear shard is the simplest model and provides strong customer isolation, but perhaps the most expensive of all. A schema-based separation is not too hard, but requires a good handle on security requirements and can introduce cross-customer performance issues because this approach is not shared-nothing (for customers on the same database). Finally Federations requires the use of a customer identifyer and has a few limitations; however this technology gives you more control over performance distribution and long-term scalability (because like a linear shard, Federation uses a shared-nothing architecture).
Regarding storage accounts, using different storage accounts per customer is definitively the way to go. The primary issue you will face if you don't use separate storage accounts is performance limitations, such as the maximum number of transactions per second that can be executed using a single storage account. As you are pointing out however, testing locally may be a problem; however consider this: the local emulator does not offer 100% parity with an Azure Storage Account (some functions are not supported in the emulator). So I would only use the local emulator for initial development and troubleshooting. Any serious testing, including multitenant testing, should be done using real storage accounts. This is the only way you can fully test an application.
You should consider not creating separate databases, but instead creating different object namespaces within a single SQL database. Each tenant can have their own set of tables.
Depending on how you are using storage, you can create separate storage containers or message queues per client.
Given these constraints you should be able to test locally with the storage emulator and local SQL instance.
Please let me know if you need further explanation.

Resources