I want to get job change notifications from Azure Media Services using Storage queue. I read the statement It is also possible that some state change notifications get skipped here.
Can someone tell me how reliable is this service ?. Is it a very small percentage of all messages or is it quite visible ?
We do not provide a guarantee on delivery, but I would say that it is highly reliable.
There are cases where notifications can get dropped (a storage outage so that Media Services cannot deliver notifications to a queue for an extended period of time for example). I would recommend that you have a fallback plan that if a job hasn’t been updated by notification for some expected duration, that you have backup code to GET the job details from the API directly.
The “expected duration” is a little bit difficult to predict since job processing times vary by operation and input content size/complexity. There also can be queuing depending on the number of jobs submitted and the number of reserved units you have.
Hope that helps,
John
Related
I have a simple scenario where I want to take the diff between current value of a parameter and previous value from IoT hub telemetry messages and attach this result and send to Time Series Insights environment (via an event hub if required so).
How can I achieve this? I am studying about Azure functions but not able to figure out how to exactly go about it.
The minimum timestamp difference between messages is 1 second and only edge devices (at max perhaps 3) will send the telemetry data. Each edge device might be collecting data from around 500 devices.
I am looking for a guidance on logical steps involved and a few critical pieces of Python code
Are these telemetry messages or property changes? Also what's the scale (number of devices)? To do this effectively you need to ensure you have both the current and previous values, which means storing the last reported value and timestamp externally as it could be a long time between. The Event Hub is not guaranteed to have all past messages (default is 24h), so if there's a long lag between messages it's not the right store to rely on.
Durable Entities can be used to store state (using something similar to the Actor Model). These are persisted in Azure Storage so at extremely high throughput a memory-only calculation option might make sense with delayed persistence, but you can build a memory-caching layer into your function to help if needed. This is likely going to be the best bet for what you want to do.
For most people the performance hit of going to Azure storage and back is minimal and Durable Entities will be the easiest path forward.
If you are doing it in a near-real time stream, the best solution is to use Azsure Streaming Analytics using the LAG operator. ASA has a bunch of useful features that you will need such as the PARTITION BY and event ordering policies. Beware, ASA can be expensive to run and hard to work with, but is a good service for commercial solutions.
If you don't need near-real time, a plain 'ol python script that queries (blob) persisted data is a good option, and can be wrapped up in an Azure function if it doesn't take too long to run.
Azure functions are not recommended for stateful message processing. You simply have insufficient control of the number of function instances running, the size of the batch, etc. So it is impossible to consistently and confidently know what the 'previous' timeseries value is. With Azure functions, you have to develop assuming that concurrency is never going to be an issue, which you cannot do with streaming IoT data.
I have a system where losing messages from Azure Service Bus would be a disaster, that is, the data would be lost forever with no practical means to repair the damage without major disruption.
Would I ever be able to rely on ASB entirely in this situation? (even if it was to go down for hours, it would always come back up with the same messages it was holding when it failed)
If not, what is a good approach for solving this problem where messages are crucial and cannot be lost? Is it common to simply persist the message to a database prior to publishing so that it can be referred to in the event of a disaster?
Azure Service Bus doesn’t lose messages. Once a message is successfully received by the broker, it’s there and doesn’t go anywhere. Where usually things go wrong is with message receiving and processing, which is user custom code. That’s one of the reasons why Service Bus has PeekLock receive mode and dead-lettering based on delivery count.
If you need strong guarantees, don’t reinvent the wheel yourself. Use messaging frameworks that do it for you, such as NServiceBus or MassTransit.
With azure service bus you can make this and be sure 99.99% percent,
at worst case you will find your message at the dead-letter queues but it will be never deleted.
Another choice is to use Azure Storage Queue and setting TTL to -1 it will give a infinity life time ,
but because i'am a little bit an old school and to be sure 101% I would suggest an manual solution using azure table storage,
so it's you who decide when add/delete or update a ligne because the criticty of information and data that you work with
I'm aware of the many different ways of scheduling system-centric events in Azure. E.g. Azure Scheduler, Logic Apps, etc. These can be used for things like backups, sending batch emails, or other maintenance functions.
However, I'm less clear on what technology is available for events relating to a large volume of documents or records.
For example, imagine I have 100,000 documents in Cosmos and some of the datetime properties on those documents relate to events: e.g. expiry, reminders, escalations, timeouts, etc. Each record has a different set of dates and times.
What approaches are there to fire off code whenever one of those datetimes is reached?
Stuff I've thought of so far:
Have a scheduled task that runs once per minute and looks for anything relating to that particular minute in Cosmos then does "stuff".
Schedule tasks on Service Bus queues with a future date as-and-when the Cosmos records are created and then have something to receive those messages and do "stuff".
But are there better ways of doing this? Is there a ready-made Azure service that would take away much of the background infrastructure work and just let me schedule a single one-off event at a particular point in time and hit a webhook or something like that?
Am I mis-categorising Azure Scheduler as something that you'd use for a handful of regularly scheduled tasks rather than the mixed bag of dates and times you'd find in 100,000 Cosmos records?
FWIW, in my use-case there isn't really a precision issue - stuff scheduled for 10:05:00 happening at 10:05:32 would be perfectly acceptable, for example.
Appreciate your thoughts.
First of all, Azure Schedular will be replaced by Azure Logic Apps:
Azure Logic Apps is replacing Azure Scheduler, which is being retired. To schedule jobs, follow this article for moving to Azure Logic Apps instead.
(source)
That said, Azure Logic Apps is one of your options since you can define a logic apps that starts a one time job by using a delay activity. See the docs for details.
It scales very well and you can pay for what you use (or use a fixed pricing model).
Another option is using a durable azure function with a timer in it. Once elapsed, you could do your thing. You can use a consumption plan as well, so you pay only for what you use or you can use a fixed pricing model. It also scales very well so hundreds of those instances won't be a problem.
In both cases you have to trigger the function or logic app when the Cosmos records are created. Put the due time as context in the trigger and there you go.
Now, given your statement
I'm aware of the many different ways of scheduling system-centric events in Azure. E.g. Azure Scheduler, Logic Apps, etc. These can be used for things like backups, sending batch emails, or other maintenance functions.
That is up to you. You can do anything you want. You don't specify in your question what work needs to be done when the due time is reached but I doubt it is something you can't do with those services.
There must be a solution to this already but i'm having an issue finding it.
We have data stored in table storage and we are syncing it with an offline capable client web app over a restful api (Web API).
We are using a high watermark(currently a date time) to make sure we only download the data which has changed/added.
e.g. clients/get?watermark=2013-12-16 10:00
The problem we are facing with this approach is what happens in the edge case where multiple servers are inserting data whilst a get happens. There is a possibility that data could be inserted with a timestamp lower than the client's timestamp.
Should we worry about this or can someone recommend a better way of doing this?
I believe our main issue is inserting the data into the store. At this point there is no way to guarantee the timestamp used or the Azure box has the correct time against the other azure boxes.
Are you able to insert data into queues when inserting data into table storage? If you are able to do so, you can build off a sync that monitors the queue and inserts data based upon what's in the queue. This will allow you to not worry about timestamps and date-sync issues.
Will also make your table storage scanning faster, as you'll be able to go direct to table storage by Partition/Row keys that would presumably be in the queue messages
Edited to provide further information:
I re-read your question and realized you're looking to sync with many client applications and not necessary with a single premise-sync system which I assumed originally.
In this case, I'm slightly tweaking my suggestion:
Consider using Service Bus and publishing messages to a Service Bus Topic, everytime you change/insert Azure Table Story (ATS) entity. This message could contain an individual PartitionKey/RowKey or perhaps some other meta information as to which ATS entities have been changed.
Your individual disconnectable clients would subscribe to the Service Bus Topic through an individual Service Bus Topic Subscription and be able to pull and handle individual service bus messages and sync whatever ATS entities described in those messages.
This way you'll not really care about last-modified timestamps of your entities and only care about handling pulling messages from the service bus topic. If your client pulls all of the messages from a topic and synchronizes all of the entities that those messages describe, it has synchronized itself, regardless of the number of workers that are inserting data into ATS and timestamps with which they insert those entities.
When you're working in a disconnected/distributed environment is hard to keep things in sync based on actual time (for this to work correctly the time needs to be in sync between all actors).
Instead you should try looking at logical clocks (like a vector clock). You'll find plenty of Java examples but if you're planning to do this in .NET the examples are pretty limited.
On the other hand you might want to take a look at how the Sync Framework handles synchronization.
I am trying to understand how can I make Azure Service Bus Topic to be scaleable to handle >10,000 requests/second from more than 50 different clients. I found this article at Microsoft - http://msdn.microsoft.com/en-us/library/windowsazure/hh528527.aspx. This provides lot of good input to scale azure service bus like creating multiple message factories, sending and receiving asynchronously, doing batch send/receive.
But all these input are from the publisher and subscriber client perspective. What if the node running the Topic can not handle the huge number of transactions? How do I monitor that? How do I have the Topic running on multiple nodes? Any input on that would be helpful.
Also wondering if any one has done any capacity testing with Topic/Queue and I am eager to see those results...
Thanks,
Prasanna
If you need 10K or 100K or 1M or more requests per seconds take a look at what's being done on the highway. More traffic, more lanes.
You can get effectively arbitrary flow rates out of Service Bus by partitioning your traffic across multiple entities. Service Bus gives a number of assurances about reliability, e.g. that we don't lose messages once we took them from you or that we assign gapless sequence numbers, and that has throughput impact on the individual entities like a single Topic. That's exactly like a highway lane just being able to deal with X cars/hour. Make more lanes.
Since these replies, Microsoft has released a ton of new capability.
Azure Auto-Scale can monitor the messages in a queue (or CPU load)
and start or stop instances to maintain that target.
Service Bus introduced Partitioned Queue's (& topics). This lets you send messages over multiple queues but they look like a single queue to you API. Dramatically increasing the throughput of a Queue.
Before you do that I'd recommend you try:-
Async & Batched writes to the queue.
Change the Prefetch parameter on the Reads.
Also look at Receive.OnMessage() to ensure you get the messages the millisec they are available.
This will improves your perf from ~5 messages / sec to many 100's or 1,000's per sec.
The Service Bus has its limitations "Capacity and Quotas", check out this article for a very good overview of these: https://learn.microsoft.com/en-gb/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted
I suggest you reach out to your local MSFT Specialist if you have a use case that will push the boundaries of Azure Service Bus, MSFT have dedicated teams in Redmond (around the world) that can help you design and push these boundaries at massive scale, this is the Windows Azure CAT (Customer Advisory Team). Their goal is to solve real world customer problems and it sounds like you might have one...
You need to performance and load test to reach all the answers to your questions above based on your specific scenario.
The Azure CAT team have a wealth of metrics on capacity and load testing with the Service Bus (Azure in general), these are not always publicly available so again reach out if you can...
If it can handle that many requests, you want to make sure that you receive the messages in such a way that you don't hit the max size of the topic. You can use multiple instances of a worker role in Azure to listen to specific subscriptions so you would be able to process messages faster without getting near the max size.