Event retention in Microsoft Azure EventHub

Event retention in Microsoft Azure EventHub - azure

I was checking on details about message retention in event hub.
Suppose, I have set the retentionPolicy as 1 day and I had send some messages. Then, if I change the message retentionPolicy to 3 days, will the existing eventData also be retained for 3 days?

Absolutely Yes.
And one more important detail about retention policy - EventHubs does not apply the retention policy at message level. Its at file-system level. EventHubs is a high-throughput event ingestion pipeline. In-short it's a stream of events on cloud - to provide higher thruput & performance - we don't deal with any event-level operations (for example, an equivalent offering is ServiceBus Q/Topics - TimeToLive property on Message). Behind the covers - eventhubs actually stores data in pages (of lets say, for explanation sake 10 MB). Retention policy will be applied only on these pages. So - some of your messages which were sent 10 days before might also still be present, even if you have a retention policy of 1 day - if you have very-very-low data rates on the hub - making into that page.

Yes, If you use UpdateEventHubAsync to update the message retention period. However, the actual message cleanup time is not guaranteed. The azure infrastructure may decide to cleanup based on its dynamics.

Related

Is it possible to query historical data from azure event hub?

I read that event hub has retention period up to 60days. So is it possible to query the historical data from event hub?
Will it delete the processed events automatically? Assuming yes - If not so - what is the point of storing the processed message?

Event Hubs represents a persistent stream of events, meaning that data is not deleted until its retention period is reached. Once an event is older than the retention period, it is removed from the stream and no longer available to be read.
There is no concept of processed or unprocessed events; readers may request any position in the stream and re-read data as many times as they like. It is an application's responsibility to track which events they have processed and position readers accordingly.
Event Hubs retention periods vary by tier, the maximum of which is 90 days (premium and dedicated). Details can be found in Event Hubs Quotas. The Event Hubs FAQ adds a bit more detail in What is the maximum retention period for events?

is it possible to query the historical data from event hub?
Adding to #Jesse Squire, Sure it is possible to capture the historical data from event hubs when you enable Capture while creating the event hub where the data in the event hubs is sent to storage account.
RESULTS:

Does using Azure Service Bus filters on a Subscription incur the cost of an operation?

If a single Azure Service Bus Topic with 10 subscriptions exists. I put a Message on the Topic and it goes to all 10 subscriptions. From the docs I assume this incurs the cost of 10 operations.
https://azure.microsoft.com/en-gb/pricing/details/service-bus
However if we added a filter to all 10 to only allow certain Messages, would it still incur the cost of one operation regardless, i.e to process the filter even if the Message does not go on the Subscription?

if we added a filter to all 10 to only allow certain Messages, would it still incur the cost of one operation?
Yes even if we are using filters on a subscription and if we receive message from a topic, i.e., message retrieval after abandon, deferral or dead lettering will be counted as independent operations and will considered as billable operations.
So, it will incur an operational charge assuming all messages are delivered to all subscriptions.

Can I create monitoring alerts for azure event grid domain topics?

I would like to setup following alerts for domain topics when
Delivery Failed Events (at domain) exceed x in y amount of time
Delivery Failed Events (at domain topic 1) exceed x in y amount of time
Delivery Failed Events (at domain topic 2) exceed x in y amount of time
The reason why I want the domain topic granularity is that topic 1 customer may be fine but topic 2 customer may be having issues. So customer (for topic 2) is down currently and is in extended outage period (that may last more than a day). So I want to be able to disable the alert for topic 2 only and would like to enable it once customer (for topic 2) is up and running again. Meanwhile, I want to have all other topic level alerts enabled.
I did not see a way to configure the above in the portal. Is it possible (or not) to configure above at this time in any other way? If so, can please provide the direction on how to achieve it?

The AEG provides durable delivery for each event message at least once to each subscriber based on its subscription. More details can be found in the docs.
In the case, when the AEG can not successfully deliver a message after retrying, the dead-lettering feature (configured for each subscriber) can be used for notification and/or analyzing process via a storage eventing, where a dead-letter message is stored.
On the publisher side, the publisher received a standard Http response from the event domain endpoint immediately after its posting, see more details in the docs.
The current version of the AEG is not integrated to the Diagnostic settings (for instance, like it is done for Event Hubs) which will be enabled to push the metrics and/or logs to the stream pipeline for their analyzing process.
However, as a workaround for that, the Azure Monitoring REST API can help you.
Using Lists the metrics values for event domain, we can obtained the metrics for topics such as Publish Succeeded, Publish Failed and Unmatched.
the following is an example of the REST Get:
https://management.azure.com/subscriptions/{myId}/resourceGroups/{myRG}/providers/Microsoft.EventGrid/domains/{myDomain}/providers/Microsoft.Insights/metrics?api-version=2018-01-01&interval=PT1M&aggregation=none&metricnames=PublishSuccessCount,PublishFailCount,PublishSuccessLatencyInMs,DroppedEventCount
Based on the polling technique, you can push the event domain metrics values to the stream pipeline for their analyzing, monitoring, alerting, etc. using an Azure Stream Analytics job. Your management requirements (for instance, publisher_topic1 is disabled, etc.) can be referenced to the input stream job.
Note, that the event domain metrics didn't give a topic granularity and also there is no an activity event log at that level. I do recommend to use the AEG feedback page.

Azure Service Bus synchronize all masterdata

Let's say I've got an azure service bus in a microservice scenario.
One microservice pushes master data changes to the other services with a subscription.
Now let's say a new service is introduced and subscribes to the master data service. How can I make sure that the new service receives all neccessary data?
Do I have to resend all master data on the master data service or does the azure service bus (or alternatives) provide some features for that?

As far as I know there is no way to achieve what you want within the capabilities of Azure Service Bus. Also, I don't think this what Service Bus is there for.
Of course there is a configurable "time to live" value for messages within queues and topics, which could probably be set to some really high value, but this would still not make your master data be infinitely available for future services. And - but this is just my opinion and I'm far from being an expert - I wouldn't want to load up my service bus with potentially thousands or even millions of messages (depending on what you're doing) without them being processed quickly.
For your specific concern I'd rather implement something like a "master data import service" without any service bus integration. Details of this, however, depend on your environment and specific requirements.

Couple of points:
1) This is not possible with Azure Service bus. Even If you set TTL at Topic level, the messages will only be delivered to available subscriptions at that point of time. you cant read messages directly from Topic.
2) you can consider Eventhub option where you can create new consumer group with offset from when you want to start reading messages but Eventhub has maximum retention period as 7 days. If you need message retention beyond 7 days, enabling Event Hubs Capture on your event hub pulls the data from your event hub to the Storage account. But in this case you would require additional logic to read from this storage account to replay the messages.

Does Microsoft's Service Bus replicate message for every subscription in a topic?

Does the Azure Service Bus and its on-premise version, Service Bus for Windows Server, replicate a message for every subscriber?
For example, let's say that there is a single topic with five subscribers, then is that message stored in the service bus' database five times - once for each subscriber - or is that message only stored once with business logic to determine which subscribers have read the message?
It would be nice if there is an official site and/or documentation to provide as a reference.

The behavior the Azure Service Bus seems to be that it is keeping a copy per subscriber. I tested this by creating a topic with two subscriptions. I sent in a single message and I see that the size of the Topic in Bytes is 464 (using topic.SizeInBytes). When I receive one message of a subscription the size the drops in half to 232. I tested it with three subscriptions and same behavior occurred: 696 bytes.
Even if they aren't keeping a copy of the message per subscription they are counting the size of the message times the number of subscriptions against the maximum size of the topic, which may be what you were trying to determine.
I agree it would be nice if they documented the behavior, especially for Service Bus for Windows Server since that could affect planning for the amount of storage you need to set aside. As for the Azure Service Bus side, I'm not sure the implementation behind the scenes matters as much as knowing how it factors towards the max size of the topic.

A subscription to a topic resembles a virtual queue that receives
copies of the messages that were sent to the topic. You can optionally
register filter rules for a topic on a per-subscription basis, which
allows you to filter/restrict which messages to a topic are received
by which topic subscriptions.
I think it copies messages. If it does not copy, it should check always, did all subscribers get the messages ? Additionally, if there is filter, it should check just these subscribers to delete message. I think, copying and applying simple consume implemation cost is less than without copying cost.
Article

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string