Is it possible to query historical data from azure event hub? - azure

I read that event hub has retention period up to 60days. So is it possible to query the historical data from event hub?
Will it delete the processed events automatically? Assuming yes - If not so - what is the point of storing the processed message?

Event Hubs represents a persistent stream of events, meaning that data is not deleted until its retention period is reached. Once an event is older than the retention period, it is removed from the stream and no longer available to be read.
There is no concept of processed or unprocessed events; readers may request any position in the stream and re-read data as many times as they like. It is an application's responsibility to track which events they have processed and position readers accordingly.
Event Hubs retention periods vary by tier, the maximum of which is 90 days (premium and dedicated). Details can be found in Event Hubs Quotas. The Event Hubs FAQ adds a bit more detail in What is the maximum retention period for events?

is it possible to query the historical data from event hub?
Adding to #Jesse Squire, Sure it is possible to capture the historical data from event hubs when you enable Capture while creating the event hub where the data in the event hubs is sent to storage account.
RESULTS:

Related

Receive and Delete messages from Event Hub

I know that the messages in an event hub expires after a certain period of time depending on how we configure it but is there any way we can delete the events received in an event hub through code or through configuration in the Azure portal as soon as we receive them?
please go through this documentation for sending and receiving messages in eventhub;
https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-java-get-started-send
https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-dotnet-framework-getstarted-send
At this time, there is no mechanism to delete all messages. Messages expire automatically beyond their 24 hour retention. If you care about messages only from the time you subscribe, you can do a one time subscription with SubscribeRecency of newest (check the java sdk for exact value).
Subscriptions are durable and if you disconnect and reconnect you will see newest messages only the first time you subscribe and not each time i.e. message delivery will commence from newest messages upon first subscription and subsequent reconnects will resume delivery of messages published since you last connected
If you want messages deleted once you received them, you might want to consider Azure Service Bus Queues - they support exactly that.
Event Hubs provide an immutable append-only log, in other words, events are not supposed to be changed once they are created. Think an EH message as an event at a point in time therefore you cannot go past in time and change an event.
If you need mutable messaging then consider Azure Service Bus.

Is it possible to configure Azure Event Hub to retain message if Azure Function fails processing it?

I have an Azure Function that listens for messages in an Event Hub. The function takes messages from the Event Hub, processes them, and passes them to another Hub. At this point the messages are removed from the Event Hub.
If the Function fails processing the message for whatever reason, is it possible to tell the Event Hub to not remove the message, and to try to deliver it to the Function again at some point in the future?
I understand that the Event Hubs have a maximum retention period of 7 days. I would like for the Event Hub & Function to continue trying during that period.
Readers never "remove" messages from an Event Hub. They are different from Service Bus Topics and Queues in this.
Event Hubs rely on clients to maintain their own bookmarks for each partition. The high-level API EventProcessorHost does this for you:
The EventProcessorHost class also implements an Azure storage-based
checkpointing mechanism. This mechanism stores the offset on a per
partition basis, so that each consumer can determine what the last
checkpoint from the previous consumer was.
But the lower-level EventHubReceiver exposes the StartingSequenceNumber property for you to control this explicitly.
However a desire for guaranteed delivery strongly suggests that you may want to copy the messages requiring guaranteed delivery from an Event Hub to a Service Bus Topic or Queue or perhaps an Azure SQL Database table for processing.

Can I create thousands of event hubs in one Azure Event Hubs namespace

I need to send messages from few thousands of devices to central hub and be able to get live stream of messages for specific device from that hub. So far, Azure Event Hubs seems to the cheapest option in terms of messages count. Event Hub namespace allows to create distinct event hubs in it.
Can I create few thousands of such hubs, one per device?
Is it a good idea? What could be potential drawbacks?
How price is calculated - per namespace or per event hub? (I think per namespace, but I cannot find this info)
If per namespace, does it mean that purchased throughput units are shared among all event hubs? If yes, will single event hub namespace with 1000 event hubs will consume same amount of resources as single event hub namespace with single event hub but which receives messages from 1000 devices?
No, you are limited to 10 Event Hubs per namespace.
Event Hub per device is not the recommended usage. Usual scenario is to put all messages from all devices to the same Event Hub, and then you can separate them again in the processing side. This will scale much better.
Event Hubs quotas
Azure Event hub is an event ingestion service to which you can send events from the event publishers.The events will be available in the event hub partitions to which different consumer groups subscribe.The partitions can be designed to accept only specific kind of events.
You can also create multiple event hubs within an event hub namespace. You can create a maximum of 10 event hubs per Event hub namespace, 32 event hub partitions within an event hub and 20 consumer groups per event hub. So, You can use event hub partitions to separate the events from the event publishers and consume the events in the processing side very easily.
The pricing is at event hub level and not at namespace level. Based on the tier you choose you will be provided with variable features like:
Basic tier:
You can have only 1 consumer group
Standard and Dedicated tier:
You can create up to 20 consumer groups.
For example,
If you choose Basic or Standard tier and region as East US, You will be charged $0.028 per million events for ingress and $0.015 per unit/hour for throughput.
If you choose Dedicated tier, you will be charged 6.849$ per hour which includes the unlimited ingress and throughput charges, but the minimum hours charged are 4hrs.
The main advantage of using dedicated tier is the message retention period is 7 days whereas in basic and standard tier it is just 1 day, and the message size is up to 1 MB whereas in basic and standard tier it is just 256 KB.
Refer https://azure.microsoft.com/en-in/pricing/details/event-hubs/.

Event retention in Microsoft Azure EventHub

I was checking on details about message retention in event hub.
Suppose, I have set the retentionPolicy as 1 day and I had send some messages. Then, if I change the message retentionPolicy to 3 days, will the existing eventData also be retained for 3 days?
Absolutely Yes.
And one more important detail about retention policy - EventHubs does not apply the retention policy at message level. Its at file-system level. EventHubs is a high-throughput event ingestion pipeline. In-short it's a stream of events on cloud - to provide higher thruput & performance - we don't deal with any event-level operations (for example, an equivalent offering is ServiceBus Q/Topics - TimeToLive property on Message). Behind the covers - eventhubs actually stores data in pages (of lets say, for explanation sake 10 MB). Retention policy will be applied only on these pages. So - some of your messages which were sent 10 days before might also still be present, even if you have a retention policy of 1 day - if you have very-very-low data rates on the hub - making into that page.
Yes, If you use UpdateEventHubAsync to update the message retention period. However, the actual message cleanup time is not guaranteed. The azure infrastructure may decide to cleanup based on its dynamics.

What is Partition Id,Offset,Host Name in Azure Event Hub Receiver?

I am working in azure event hub. I have some doubts.
What is Partition Id in Azure event hub receiver? Is this Id is same as partition Key in Azure event hub Publisher?
What is Offset? ,What the use of it in azure event hub consumer?
Can I consume the message with out using consumer group?
Can I consume the message with single receiver?
What is the use of blob in event hub consumer? I want only view the message what ever I sent.
This article Event Hubs Overview should answer your questions in detail, but to summarize:
When you create a new Event Hub in the portal, you specify how many partitions you need. The Publisher hashes the partition key of an event to determine which partition to send the event to. An event hub receiver receives events from those partitions.
An event hub consumer tracks which events it has received by by using an offset into each partition. By changing the offset you can, for example, re-read events from a partition.
You must have at least one consumer group (there is a default one). Each consumer group has it's own view of the partitions (different offset values) that let it read the events from the partitions independently of the other consumer groups.
Typically, you have one receiver per partition to enable scale out. An event hub has between 8 and 16 partitions.
Offset values are managed by the client. You can checkpoint your latest position in each partition to enable you to restart at the latest event if the client restarts. The checkpoint mechanism writes the latest offset values to blob storage.

Resources