Designing a timer-triggered processor which relies on data from events - azure

I am trying to design a timer-triggered processor (all in azure) which will process a set of records that are set out for it to be consumed. It will be grouping it based on a column, creating files out of it, and dumping in a blob container. The records that it will consume are supposed to be generated based on an event - when the event is raised, containing a key, which can be used to query the data for the record (the data/ record being generated is to be pulled from different services.)
This is what I am thinking currently
Event is raised to event-grid-topic
Azure Function(ConsumerApp) is event triggered, reads the key, calls a service API to get all the data, stores that record in storage
table, with flag ready to be consumed.
Azure Function(ProcessorApp) is timer triggered, will read from the storage table, group based on another column, create and dump them as
files. This can then mark the records as processed, if not updated
already by ConsumerApp.
Some of my questions on these, apart from any way we can do it in a different better way are -
The table storage is going to fill up quickly, which will again decrease the speed to read the 'ready cases' so is there any better approach to store this intermediate & temporary data? One thing which I thought was to regularly flush the table or delete the record from the consumer app instead of marking it as 'processed'
The service API is being called for each event, which might increase the strain on that service/its database. should I group the call for records as a single API call, since the processor will run only after a said interval, or is there a better approach here?
Any feedback on this approach or a new design will be appreciated.

If you don't have to process data on Step 2 individually, you can try saving it in a blob too and add a record the blob path in Azure Table Storage to keep minimal row count.
Azure Table Storage has partitions that you can use to partition your data and keep your read operations fast. Partition scan is faster compared to table scan. In addition, Azure Table Storage is cheap, but if you have pricing concern. Then you can write a clean up function to periodically clean the processed rows. Keeping the processed rows around for a reasonable time is usually a good idea. Because you may need those for debugging issues.
By batching multiple calls in a single call, you can decrease network I/O delay. But resource contention will remain at service level. You can try moving that API to a separate service if possible to scale it separately.

Related

Reading change feed from specific date/time in multiregion account

We have an API where we store the configuration in a container in Cosmos DB. We are considering using the Cosmos change feed to subscribe to configuration changes using a change feed processor in order to be able to remove the configurations from cache when they are changed. We have deployments in multiple Azure regions, thus our account is multiregion write account. Now, I read in the documentation that
Starting the change feed processor at a specific date and time is not supported in multi-region write accounts.
What does it mean in practice? Will the processor read and handle all changes from the beginning every time the API process is restarted? Is there any way to pass around this limitation?
Your Cosmos DB account either has 1 write region (with as many read region replicas as you want) or has all regions being both write and read regions). Reference: https://learn.microsoft.com/azure/cosmos-db/sql/how-to-multi-master
You can start a change feed processor with 3 different starting points:
Now
The beginning of the collection lifetime
Some particular point in time
This note means that if your account has multiple write regions (instead of 1 write region), you can only start a change feed from Now or the Beginning, you cannot start a Change Feed from a specific point in time.

Scale CosmosDB binding for Azure Functions per logical partition

I would like my Azure function to scale per logical partition instead of per physical partition. I've tested the Azure Function binding and it does scale out when I have multiple physical partitions (in my test I needed to increase our RU's from 2000 to 20000). But I don't need that much RU since I'm using it as an event store. I'm not querying the data, just processing each message through my Azure function. So I'm wondering if there is a way to let Azure Functions scale out per partition. I see that in the new v3 lib there is a ChangeFeedOptions.PartitionKey property but that class is internal and I'm not sure it does what I want.
I basically want to have as many Azure Functions running as there are new messages grouped per logical partition. What would be the best way to achieve that?
As of today this is not possible. It's not up to the user of the CF SDK to do the lease management. The CF SDK will do that for us and there is nothing we can do to change it.
The only way to theoretically actually have one lease per logical partition is to have a logical partition big enough to occupy the whole of a physical partition. This however means that you are about to hit 10GB of data in a single partition which would be the main concern you would have at this point.
I wouldn't worry about the scaling though. The CF will spawn as many leases as it needed to scale seamlessly and this scaling depends solely on the volume of data in the database and the amount of RUs allocated.

Move data between EventHubs in different regions

I have webapps spread out in a number of different regions. Each app put data in a region-local event hub. After this I want to collect all the data in a central event hub so I can do processing of all the data in one place. What is the best way to move data from one event hub to another? The different regions have on the order of 1000 messages per second they need to put into the hubs.
Ideas I have tried:
Let the webapp write directly to the central event hub. The downside is that the connection between regions can be bad. Every day I would get a lot of timeouts between southeast Asia and north Europe.
Use a stream analytics job to move from one to the other. This seems to work ok, except that it is not 100% reliable with high load. My job stopped for no reason and had to be manually restarted (after 15 minutes of downtime) to work again.
While my first answer would have been to try your #2 above, it didn't work for you (for whatever reason, I haven't tried Stream Analytics myself), you pretty much know what you have to do: copy data from one event hub to the other.
Thus write an EventHub consumer that copies the message from one EventHub to another potentially wrapping it in an envelope if you need to bring some of the metadata along with it (enqueued time for example). If your destination event hub goes down, just keep retrying and don't commit progress until you succeed in sending the message over (since unless you parse the bodies you shouldn't have poison messages). No matter which solution you use you're going to have duplicate messages arrive in the central eventhub so plan for that by including unique ids inside the payload or designing the matter otherwise.
Obviously ensure that you have enough partitions on the central Event Hub to handle the load from all the other ones and you'll certainly want local partitions since 1000/second is the per partition write limit.
You'll still have the choice to make of whether to put the copier locally or centrally, my inclination is locally but you can test it both ways with the same code (though your commit/offset tracker should probably be in the same place as the copier runs).
So yeah stuff can go down, just make sure to start it up again preferably automatically when it does (and put in monitoring on how far behind your copying processes are). It'd be great if Stream Analytics did it reliably enough, but alas.
You also have choices as to how partitions are assigned to copier workers. Constant assignment is not a bad choice if the workers are guaranteed to start up again quickly (ie are on managed thing that will keep X alive). The auto assignment of partitions seems somewhat likely to lead to partitions that are forgotten for brief periods of time before rebalancing but just choose your poison.

creating multi threaded windows service

I need guidance in creating a Multi Threaded Windows Service.
The Service needs to read records from a database Table and saves it to
another service(ServiceB). The table might contain thousands of records which
the service needs to read 100 at a time and do something and saves
it to ServiceB and again take another 100 records from the table
and process it and saves it to Service B. And the process should continue
like that till it finishes all the records.
At the same time this service will get the results from ServiceB.
and update the table with the result from ServiceB.
So this service needs to do two thing simultaneously.
Any idea what is the best method to do it or any help is appreciated.
I think you need to check about Thread. Microsoft has a good Tutorial for Thread.

How to implement critical section in Azure

How do I implement critical section across multiple instances in Azure?
We are implementing a payment system on Azure.
When ever account balance is updated in the SQL-azure, we need to make sure that the value is 100% correct.
But we have multiple webroles running, thus they would be able to service two requests concurrently from different customers, that would potentially update current balance for one single product. Thus both instances may read the old amount from database at the same time, then both add the purchase to the old value and the both store the new amount in the database. Who ever saves first will have it's change overwritten. :-(
Thus we need to implement a critical section around all updates to account balance in the database. But how to do that in Azure? Guides suggest to use Azure storage queues for inter process communication. :-)
They ensure that the message does not get deleted from the queue until it has been processed.
Even if a process crash, then we are sure that the message will be processed by the next process. (as Azure guarantee to launch a new process if something hang)
I thought about running a singleton worker role to service requests on the queue. But Azure does not guarantee good uptime when you don't run minimum two instances in parallel. Also when I deploy new versions to Azure, I would have to stop the running instance before I can start a new one. Our application cannot accept that the "critical section worker role" does not process messages on the queue within 2 seconds.
Thus we would need multiple worker roles to guarantee sufficient small down time. In which case we are back to the same problem of implementing critical sections across multiple instances in Azure.
Note: If update transaction has not completed before 2 seconds, then we should role it back and start over.
Any idea how to implement critical section across instances in Azure would be deeply appreciated.
Doing synchronisation across instances is a complicated task and it's best to try and think around the problem so you don't have to do it.
In this specific case, if it is as critical as it sounds, I would just leave this up to SQL server (it's pretty good at dealing with data contentions). Rather than have the instances say "the new total value is X", call a stored procedure in SQL where you simply pass in the value of this transaction and the account you want to update. Somthing basic like this:
UPDATE Account
SET
AccountBalance = AccountBalance + #TransactionValue
WHERE
AccountId = #AccountId
If you need to update more than just one table, do it all in the same stored procedure and wrap it in a SQL transaction. I know it doesn't use any sexy technologies or frameworks, but it's much less complicated than any alternative I can think of.

Resources