Proper way of implementing Azure Stream Analytics notifications/alarms service - azure

I'm working with sensor systems where each sensor sends a new reading every 15 seconds.
Each sensor type also has defined rules that when triggered will generate an alarms output - e.g. sensor of type "temperature" sends a value that is higher than MAX temperature allowed.
Lets assume sensor with ID "XXX_01" sends 2 readings in 30 seconds, each reading has higher value than MAX value allowed.
Event in: 01/10/2018 12:00:00
{ id:"XXX_01", value: 90, "temperature" }
Event in: 01/10/2018 12:15:00
{ id:"XXX_01", value: 95, "temperature" }
Now, I want to notify the end user that there is an alarm - I have to send out some sort of a notification to end user(s). The problem and confusion is that I do not want to send out the alarms twice.
Assuming I use something like Twilio to send SMS or just send out Email notifications, I don't want to spam my end users with a new notification every 15 seconds assuming incoming sensor readings stay above MAX value allowed.
What kind of an Azure Service, architecture or design paradigm could I use to avoid such issue?

I have to say that A (don't want to spam users notification) and B (alarm high temperature as soon as it touches MAX line) have some contradictions. It's hard to implement it.
In my opinion, you can send notifications to users at a fixed frequency.
1.In that frequency period, such as 1 minute, use Azure stream analytics service to receive sensor data every 15 seconds.
2.Then output the data to Azure Storage Queue.
3.Then use Azure Queue Time Trigger to get latest temperature value in the Azure Storage Queue current messages every 1 minute. If it touches MAX line,then send notification to end users. If you want to notify user that it touched MAX line no matter it already dropped, then just sort the messages by value and judge it.
4.Finally, empty the queue.

using Azure Stream Analytics, you can trigger the alert when the threshold is passed AND if it's the first time in the last 30s for example.
I give you the sample SQL for this example:
SELECT *
FROM input
WHERE ISFIRST(second, 30) OVER (WHEN value> 90)=1
Let us know if you have any further question.

I also agree with Jay's response about contradiction.
But one more way we can handle it, I also faced similar issue in one of my assignment, what I tried is keeping track of once sent alarm via cache (i.e. redi cache, memcache etc) and every time check if alarm already sent then don't sent . Obviously trade-off is that everytime we needs to check, but that's the concern that you needs to decide
We can also extend same to notify user if max temperature is reset to normal.
Hope this helps.

Related

Google PubSub: drop nacked message after n retries

Is there way to configure pull subscription in the way that messages which caused error and were nacked, were re-queued (and so that redelivered) no more than n times?
Ideally on the last processing if it also failed I would like to handle this case (for example, log that this message is given up to process and will be dropped).
Or probably it's possible to find out, how much times received message was tried to be processed before?
I use node.js. I can see a lot of different options in the source code by am not sure how should I achieve desired behaviour.
Cloud Pub/Sub supports Dead Letter Queues that can be used to drop nacked messages after a configurable number of retries.
Currently, there is no way in Google Cloud Pub/Sub to automatically drop messages that were redelivered some designated number of times. The message will stop being delivered once the retention deadline has passed for that message (by default, seven days). Likewise, Pub/Sub does not keep track of or report the number of times a message was delivered.
If you want to handle these kinds of messages, you'd need to maintain a persistent storage keyed by message ID that you could use to keep track of the delivery count. If the delivery count exceeds your desired threshold, you could write the message to a separate topic that you use as a dead letter queue and then acknowledge original message.

How to set the number of retries for Azure DocumentDB output binding in Azure Function?

Based on this question, it seems like writing to Azure DocDB output binding in Azure Function will be retried 10 times if throttled (HTTP 429). I haven't verified this myself though.
I would like to increase this limit on the number of retries. My data comes in big chunks in a small amount of time and then with a very long period of downtime, which means that getting 429 and waiting for a bit is okay for my purpose. I must guarantee though, that no data is dropped.
One way for me to solve this is to increase the RTU limit in Document DB to make sure I don't get 429 during the time big chunks of data come in, but it's already at about 2.5 times of what I need during downtime period. Is there anyway to make the retries run infinitely until it succeeds, or less ideally, increase the number of retries to something more than 10?
Why don't you change the approach and instead of inserting documents right away you can make use of service bus and implement a dead letter queue, here are some links:
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-dead-letter-queues
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-service-bus
https://blog.jeroenmaes.eu/2017/01/process-service-bus-dead-letter-message-with-azure-functions/
The idea is having something like this:
Current function instead of saving the data in DocumentDB, it will be sending it the the service bus (you just change the output binding)
Another function will process every message of the service bus and if it failed (you can manage a timeout in the function and then move the message to a dead letter queue)
Another function that will process any message in the dead letter queue
You just need to make a small change in the first function and create two more, might sound too complicated but you'll have strong consistency in the data. In all of the above links there's an example of what I mentioned here.

pubnub calculate publish latencies

I'd like to calculate PubNub publish latencies for PubNub clients before they actually begin publishing. Is there a preferred way to to this for PubNub?
To make my use case more clear, Im trying to synchronize clients, and these clients do not need to be synchronized at a wall-clock time, since they could be global. Hence this solution, wouldnt be necessary in my case (but it did point me in the right direction)
So I could still obtain a per-client latency calculation based on the above link, but that's for fetching the timestoken using the Time API. This was relevant for the above use-case which depended on clients syncing to a particular wall-clock time, hence a time-token was anyways required to be fetched
However in my case I dont need a timetoken. All clients can be synced using a simple wait for (k - latency) interval where k is a constant for all clients .
Therefore while I can use the timetoken method of calculating latency, I would prefer to know the actual publish latencies (unless there is no vast difference between the two)
Here are some steps I worked out myself to determine latency for publish
determine the local time (in milliseconds): start = now()
Client sends out a message with payload[ {"Type" = "latencyCheck"}, {"me"
= "MyPubNubUUID" }]
When Client receives message of the above signature with its own Id,
it sets another variable end = now()
latency to send a message and receive it yourself was : end - start

How to get events count from Microsoft Azure EventHub?

I want to get events count from Microsoft Azure EventHub.
I can use EventHubReceiver.Receive(maxcount) but it is slow on big number of big events.
There is NamespaceManager.GetEventHubPartition(..).EndSequenceNumber property that seems to be doing the trick but I am not sure if it is correct approach.
EventHub doesn't have a notion of Message count, as EventHub is a high-Throughput, low-latency durable stream of events on cloud - getting the CORRECT current count at a given point of time, could be wrong the very next milli-second!! and hence, it wasn't provided :)
Hmm, we should have named EventHubs something like a StreamHub - which would make this obvious!!
If what you are looking for is - how much is the Receiver lagging behind - then EventHubClient.GetPartitionRuntimeInformation().LastEnqueuedSequenceNumber is your Best bet.
As long as no messages are sent to the partition this value remains constant :)
On the Receiver side - when a message is received - receivedEventData.SequenceNumber will indicate the Current sequence number you are processing and the diff. between EventHubClient.GetPartitionRuntimeInformation().LastEnqueuedSequenceNumber and EventData.SequenceNumber can indicate how much the Receiver of a Partition is lagging behind - based on which, the receiver process can Scale up or down the no. of Workers (work distribution logic).
more on Event Hubs...
You can use Stream Analytics, with a simple query:
SELECT
COUNT(*)
FROM
YourEventHub
GROUP BY
TUMBLINGWINDOW(DURATION(hh, <Number of hours in which the events happened>))
Of course you will need to specify a time window, but you can potentially run it from when you started collecting data to now.
You will be able to output to SQL/Blob/Service Bus et cetera.
Then you can get the message out of the output from code and process it. It is quite complicated for a one off count, but if you need it frequently and you have to write some code around it, it could be the solution for you.

How does Windows Azure Service Bus Queues Duplicate Detection work?

I know that you can set duplicate detection to work over a time period with an azure service bus queue. However, does anyone know whether this works based on the objects in the queue?
So if I have an object with an id of "SO_1" which gets put on the queue and is subsequently consumed, is the duplicate detection still valid?
What I think I'm asking is - is it the timeframe and the object, or just the timeframe that make the queue decide what is a duplicate?
http://blog.iquestgroup.com/en/windows-azure-service-bus-duplicate-detection/#.UaiXrd7frIU
When we activate duplication, the Windows Azure Service Bus will start to store a history of our messages. This period of time can be configured to range from only a few minutes to days. If a duplicate message is sent to the Service Bus, the service will automatically ignore the message.
Posting this to clarify on a couple of misconceptions in the responses found above,
Enabling duplicate detection helps keep track of the application-controlled MessageId of all messages sent into a queue or topic during a specified time window. If any new message is sent carrying a MessageId that has already been logged during the time window, the message is reported as accepted (the send operation succeeds), but the newly sent message is instantly ignored and dropped. No other parts of the message other than the MessageId are considered. (the blog referenced in one of the responses says the message content cannot be duplicate which is not correct).
Default value of duplicate detection time history now is 30 seconds, the value can range between 20 seconds and 7 days.
Refer this documentation for more details.
This actually just bit me, the default seems to be to have it enabled and the default time is 10 minutes. The "key" is the MessageId. In our case, in most scenarios duplicate detection is fine, but in some it was bad news (especially with the 10 minute range). To get around this, we introduced a "breaker":
// For this message, we need to prevent dups from being detected
msg.MessageId = messageId + "_" + DateTime.Now.ToString("u");
If you just want to prevent "spamming" you might consider setting the duplicate detection window to the minimum (20 seconds). (Personally, I would love to see a threshold as low as 5 seconds).
The current ranges allowed are 20 seconds to 7 days.
You will have to create message id based on object e.g. hash of object and enable duplicate message detection in topic/queue.
Azure Service Bus duplicate detection points to keep in mind:
• Duplicate is identified based on SessionId(if present), PartitionKey(if present), and MessageId in a time window
• Duplicate detection time window:
o 20 secs to 7 days (default : 10 mins)
o Larger window can impact throughput due to matching, better to keep as small window as possible
• Duplicate detection can be enabled only while creating topic/queue, window can be update at any point of time
• Duplicate messages will be ignored/dropped
ref: https://learn.microsoft.com/en-us/azure/service-bus-messaging/duplicate-detection

Resources