Document-centric event scheduling on Azure - azure

I'm aware of the many different ways of scheduling system-centric events in Azure. E.g. Azure Scheduler, Logic Apps, etc. These can be used for things like backups, sending batch emails, or other maintenance functions.
However, I'm less clear on what technology is available for events relating to a large volume of documents or records.
For example, imagine I have 100,000 documents in Cosmos and some of the datetime properties on those documents relate to events: e.g. expiry, reminders, escalations, timeouts, etc. Each record has a different set of dates and times.
What approaches are there to fire off code whenever one of those datetimes is reached?
Stuff I've thought of so far:
Have a scheduled task that runs once per minute and looks for anything relating to that particular minute in Cosmos then does "stuff".
Schedule tasks on Service Bus queues with a future date as-and-when the Cosmos records are created and then have something to receive those messages and do "stuff".
But are there better ways of doing this? Is there a ready-made Azure service that would take away much of the background infrastructure work and just let me schedule a single one-off event at a particular point in time and hit a webhook or something like that?
Am I mis-categorising Azure Scheduler as something that you'd use for a handful of regularly scheduled tasks rather than the mixed bag of dates and times you'd find in 100,000 Cosmos records?
FWIW, in my use-case there isn't really a precision issue - stuff scheduled for 10:05:00 happening at 10:05:32 would be perfectly acceptable, for example.
Appreciate your thoughts.

First of all, Azure Schedular will be replaced by Azure Logic Apps:
Azure Logic Apps is replacing Azure Scheduler, which is being retired. To schedule jobs, follow this article for moving to Azure Logic Apps instead.
(source)
That said, Azure Logic Apps is one of your options since you can define a logic apps that starts a one time job by using a delay activity. See the docs for details.
It scales very well and you can pay for what you use (or use a fixed pricing model).
Another option is using a durable azure function with a timer in it. Once elapsed, you could do your thing. You can use a consumption plan as well, so you pay only for what you use or you can use a fixed pricing model. It also scales very well so hundreds of those instances won't be a problem.
In both cases you have to trigger the function or logic app when the Cosmos records are created. Put the due time as context in the trigger and there you go.
Now, given your statement
I'm aware of the many different ways of scheduling system-centric events in Azure. E.g. Azure Scheduler, Logic Apps, etc. These can be used for things like backups, sending batch emails, or other maintenance functions.
That is up to you. You can do anything you want. You don't specify in your question what work needs to be done when the due time is reached but I doubt it is something you can't do with those services.

Related

How to handle an Azure Function rerunning when using message queue binding?

I have a v1 Azure Function that is triggered by a message being written to the Azure Storage message queue.
The Azure Function needs to perform multiple updates to SharePoint Online. Occasionally these operations fail. This results in the message being returned to the queue and being reprocessed.
When I developed the function, I didn't consider that it might partially complete and then restart. I've done a little research and it sounds like I need to modify it to be re-entrant.
Is there a design pattern that I should follow to cater for this without having to add a lot of checks to determine if an operation has already been carried out by a previous execution? Alternatively, is there any Azure functionality that can help (beyond the existing message retries and poison queue)
It sounds like you will need to do some re-engineering. Our team had a similar issue and wrote a home-grown solution years ago. But we eventually scrapped our solution and went with Azure Durable Functions.
Not gonna lie - this framework has some complexity and it took me a bit to wrap my head around it. Check out the function chaining pattern.
We have processing that requires multiple steps that all must be completed. We're spanning multiple data stores (Updating Cosmos Db, Azure SQL, Blob Storage, etc), so there's no support for distributed transactions across multiple PaaS offerings. Durable Functions will allow you to break your process up into discrete steps. If a step fails, an orchestrator will re-run that step based on a retry policy.
So in a nutshell, we use Durable Task Activity functions to attempt each step. If the step fails due to what we think is a transient error, we retry. If it's an unrecoverable error, we don't retry.

How to process current and previous IoT Hub event from an azure function?

I have a simple scenario where I want to take the diff between current value of a parameter and previous value from IoT hub telemetry messages and attach this result and send to Time Series Insights environment (via an event hub if required so).
How can I achieve this? I am studying about Azure functions but not able to figure out how to exactly go about it.
The minimum timestamp difference between messages is 1 second and only edge devices (at max perhaps 3) will send the telemetry data. Each edge device might be collecting data from around 500 devices.
I am looking for a guidance on logical steps involved and a few critical pieces of Python code
Are these telemetry messages or property changes? Also what's the scale (number of devices)? To do this effectively you need to ensure you have both the current and previous values, which means storing the last reported value and timestamp externally as it could be a long time between. The Event Hub is not guaranteed to have all past messages (default is 24h), so if there's a long lag between messages it's not the right store to rely on.
Durable Entities can be used to store state (using something similar to the Actor Model). These are persisted in Azure Storage so at extremely high throughput a memory-only calculation option might make sense with delayed persistence, but you can build a memory-caching layer into your function to help if needed. This is likely going to be the best bet for what you want to do.
For most people the performance hit of going to Azure storage and back is minimal and Durable Entities will be the easiest path forward.
If you are doing it in a near-real time stream, the best solution is to use Azsure Streaming Analytics using the LAG operator. ASA has a bunch of useful features that you will need such as the PARTITION BY and event ordering policies. Beware, ASA can be expensive to run and hard to work with, but is a good service for commercial solutions.
If you don't need near-real time, a plain 'ol python script that queries (blob) persisted data is a good option, and can be wrapped up in an Azure function if it doesn't take too long to run.
Azure functions are not recommended for stateful message processing. You simply have insufficient control of the number of function instances running, the size of the batch, etc. So it is impossible to consistently and confidently know what the 'previous' timeseries value is. With Azure functions, you have to develop assuming that concurrency is never going to be an issue, which you cannot do with streaming IoT data.

What's the best solution to execute scheduled tasks for later?

This is not really a code related question, but more of a solutions oriented question.
To provide some context, I have a web application with a serverless Node.js backend which lets users define a recurring time period at which an endpoint will execute and perform some action. For example, a user decides that every 9 days, the action will be fired.
I am well aware of solutions such as GCP Cloud Tasks (which is what I'm currently using) and AWS SQS, but they have limits. For instance, GCP's Cloud Tasks has a max schedule limit of 720 hours (30 days). This means that my users can only schedule tasks at a future date within 720 hours in the future whereas I would prefer to give them the flexibility to schedule tasks for up to one year.
Is there currently a cloud solution that would allow me to perform such a feature?
I am suspecting that this is definitely possible because of Stripe's subscriptions. They allow yearly subscriptions, which are similar to what I'm using, but with an extended limit. I'm not exactly sure how Stripe engineers accomplished this under the hood (while keeping scalability as a requirement since cronjobs are quite expensive), but after googling solutions to no avail, this problem had me wondering if there existed a more appropriate solution or whether my solution is fine as it is.
I'm also aware that there are workarounds to "extend" the Cloud Task 720 hours limit, but I want to explore options before diving into those kind of workarounds. (e.g. Cronjobs for each user would work well but expensive at scale, One cronjob checking for all the current day's tasks to schedule but at scale, the cloud function might time out since I'm using serverless backend, etc.)
You should have a look to cloud Workflow. For each user request, launch an execution with the parameters, they can be "delay" or "url to call"; define those that you need.
So, when an execution start, wait the delay, and then call the URL. An execution can last up to 1 year, that fit your requirements

using cloud services to aggregate and group real-time statistics in a time window to trigger notifications

I'm trying to build a real-time achievements processor for things like:
every time there is a new participant in a thread, send a notification to the last 3 participants
group and aggregate activity stream notifications by type per day
This description of event stream processing seems like a good fit for what I need https://en.wikipedia.org/wiki/Event_stream_processing
If the use case were just to update or trigger from single events, I can use one of the many cloud queue or publisher services from amazon or azure, things like Kinesis or SQS and use say an AWS lambda function to process messages from the queue. Azure seems like it offers something called an Event Hub which can act as the data stream broadcaster. Essentially, have a cloud queue of all actions/events and multiple notification processors as subscribers to the events stream(s) and the logic triggers and aggregations and achievement awards are encapsulated in each achievement processor.
However, since I need to group items by some arbitrary rules (each achievement can have many grouping parameters), I can't just simply look at the latest event in the action queue to process each achievement in real-time. Would I have to keep a set in memory to make this efficient? The alternative is to have each achievement processor do a database lookup with every event (e.g. to select all events for the day that match this type) but I'm worried if I do that it will not be very performant. I've heard mention of things like spark streaming and snowplow, so I'm wondering if there is both a pattern and a product on either AWS or Azure cloud services that can be useful to solve this in a very scalable and simple manner - and if the existing data streaming services on azure and aws (event hubs and kinesis) would fit this data-aggregation use case.
Both Azure and AWS now offer something that can fit this use case:
https://azure.microsoft.com/en-us/services/stream-analytics/
and
https://aws.amazon.com/kinesis/analytics/
disclaimer: I'm a Product Manager at Striim
Just for the sake of answering the question, Striim lets you run SQL queries on lives streams of data, aggregate it with time/count/hybrid windows, and trigger alerts. It's horizontally scalable as well.
Striim is available on both Azure and AWS marketplace. THe other nice thing the same pipeline can easily be transferred between clouds and also run on premise.

Architecting multi-service enterprise applications using Azure cloud services

I have some questions regarding architecting enterprise applications using azure cloud services.
Back Story
We have a system made up of about a dozen WCF Windows Services on a SQL backend. We currently have about 10 clients but expect that to grow to potentially a hundred with perhaps a hundred fold increase in the throughput demands on the system. The current system is poorly engineered and is simply not capable of scaling. So now appears to be the appropriate juncture to reengineer on the azure platform.
Process Flow
Let me briefly describe a simplified set of the services and the process flow and then ask some questions I have regarding utilizing azure cloud services to build the new system.
Service A is logged on to an external systems and downloads data continuously
Service B is logged on to a second external systems and downloads data continuously
There can only ever be one logged in instance each of services A and B.
Both A and B hand off their data to Service C which reconciles the data from the two external sources.
Validated and reconciled data is then passed from C to Service D which performs some accounting functions and then passes the resulting data to Services E and F.
Service E is continually logged in to an external system and uploads data to it.
Service F generates reports and publishes them to clients via FTP etc
The system is actually far more complex than this but the above illustrates the processes involved. The system runs 24 hours a day 6 days a week. Queues will be used to buffer messaging between all the services.
We could just build this system using Azure persistent VMs and utilise the service bus, queues etc but that would ties us in to vertical scaling strategy. How could we utilise cloud services to implement it given the following questions.
Questions
Given that Service A, B and E are permanently logged in to external systems there can only ever be one active instance of each. If we implement these as single instance worker roles there is the issue with downtime and patching (which is unacceptable). If we created two instances of each is there a standard way to implement active-passive load balancing with worker roles on azure or would we have to build our own load balancer? Is there another solution to this problem that I haven’t thought of?
Services C and D are a good candidates to scale using multiple worker role instance. However each instance would have to process related data. For example, we could have 4 instances each processing data for 5 individual clients. How can we get messages to be processed in groups (client centric) by each instance? Also, how would we redistribute load from one instance to the remaining instances when patching takes place etc. For example, if instance 1, which processes data for 5 clients, goes down for OS patching, the data for its clients would then have to be processed by the remaining instances until it came back up again. Similarly, how could we redistribute the load if we decide to spin up additional worker roles?
Any insights or suggestions you are able to offer would be greatly appreciated.
Mat
Question #1: you will have to implement your own load-balancing. This shouldn't be terribly complex as you could use Blob storage Lease functionality to keep a mutex on some blob in the storage from one instance while holding the connection active to your external system. Every X period of time you could renew the lease if you know that connection is still active and successful. Every other worker in the Role could be checking on that lease to see if it expires. If it ever expires, the next worker would jump in and acquire the lease, and then open the connection to the external source.
Question #2: Look into Azure Service Bus. It has a capability to allow clients to process related messages. More info here: http://geekswithblogs.net/asmith/archive/2012/04/02/149176.aspx
All queuing methodologies imply that if a message gets picked up but does not get processed within a configurable amount of time, it goes back onto the queue so that the next available instance can pick it up and process it
You can use something like AzureWatch to monitor the depth of your queues (storage or service bus) and auto-scale number of instances in your C and D roles to match; and monitor instance statuses for roles A, B and E to make sure there is always a healthy instance there and auto-scale if quantity of ready instances drop to 0.
HTH
First, back up a step. One of the first things I do when looking at application architecture on Windows Azure is to qualify whether or not the app is a good candidate for migration to Windows Azure. I particularly look at how much integration is in the application — integration is always more difficult than expected, doubly so when doing it in the cloud. If most of your workload needs to be done through a single, always-on connection, then you are going to struggle to get the availability and scalability that we turn to the cloud for.
Without knowing the detail of your application, but by way of example, assume services A & B are feeds from a financial data provider. Providers of data feeds are really good at what they do, have high availability, and provide 'enterprise grade' (whatever that means) for enterprise grade costs. Their architectures are also old-school and, in some cases, very rigid. So first off, consider asking your feed provider (that gives to a login/connection and expects you to pull data) to push data to you via a web service. Exposed web services are the solution to scaling and performance, and are used from table storage on Azure, to high throughput database services like DynamoDB. (I'll challenge any enterprise data provider to explain how a service like Amazon S3 is mickey-mouse.) If your data supplier pushed data to a web service via an agreed API, you could perform all sorts of scaling and availability on the service for a low engineering cost.
Your alternative is, as you are discovering, to build a whole lot of stuff to make sure that your architecture fits in with the single-node model of your data supplier. While it can be done, you are going to spend a lot of engineering cash on hand-rolling a whole bunch of distributed computing principles. If you are going to have an active-passive architecture, you need to implement a leader election algorithm in order to determine when a passive node should become active. This is not as trivial as it sounds as an active node may look like it has disappeared, but is still processing — and you don't want to slot another one in its place. So then you will implement a heartbeat, or even a separate 'witness' node that does nothing other than keep an eye on which nodes are alive in order to do something about them. You mention that downtime and patching is unacceptable. So what is acceptable? A few minutes or a few seconds, or less than a second? Do you want the passive node to take over from where the other left off, or start again?
You will probably find that the development cost to implement all of this is lower than the cost of building and hosting a highly available physical server. Perhaps you can separate the loads and run the data feed services in a co-lo on a physical box, and have the heavy lifting of the processing done on Windows Azure. I wouldn't even look at Azure VMs, because although they don't recycle as much as roles, they are subject to occasional problems — at least more than enterprise-grade hardware. Start off with discussions with your supplier of the data feeds — they may have a solution, or one that can be cobbled together (e.g. two logins for the price of one, and the 'second' account/instance mostly throws away its data).
Be very careful of traditional enterprise integration. They ask for things that seem odd in today's cloud-oriented world. I've had a request that my calling service have a fixed ip address, for example. You may find that the code that you have to write to work around someone else's architecture would be better spent buying physical servers. Push back on the data providers — it is time that they got out of the 90s.
[Disclaimer] 'Enterprises', particularly those that are in financial services, keep saying that their requirements are special — higher throughput, higher security, high regulations and higher availability. With the exception of a very few cases (e.g. high frequency trading), I tend to call 'bull' on most of this. They are influenced by large IT budgets and vendors of expensive kit taking them to fancy lunches, and are indoctrinated to their server-hugging beliefs. My individual view on the enterprise hardware/software/services business has influenced this answer. Your mileage may vary.

Resources