I'm using the EventProcessorHost to get messages off an event hub. Is there an easy way to change the maximum number of messages that are pulled off at a time. Right now the default is 10 and I know when using a normal EventReciever it is relatively easy to change the default, but I couldn't find any documentation for when using EventProcessor.
I want so when ProcessEventsAsync is called the maximum number of messages passed in to be less than 10.
You can do it by providing EventProcessorOptions when registering EventProcessor with MaxBatchSize property modified (https://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.eventprocessoroptions.maxbatchsize.aspx).
For example:
var eventProcessorHost = new EventProcessorHost(...);
await eventProcessorHost.RegisterEventProcessorAsync<MyEventProcessor>(new EventProcessorOptions{MaxBatchSize = 5});
Related
I have an Event Hub and Azure Function connected to it. With small amounts of data all works well, but when I tested it with 10 000 events, I got very peculiar results.
For test purposes I send into Event hub numbers from 0 to 9999 and log data in application insights and in service bus. For the first test I see in Azure that hub got exactly 10 000 events, but service bus and AI got all messages between 0 and 4500, and every second message after 4500 (so it lost about 30%). In second test, I got all messages from 0 to 9999, but every second message between 3500 and 3200 was duplicated. I would like to get all messages once, what did I do wrong?
public async Task Run([EventHubTrigger("%EventHubName%", Connection = "AzureEventHubConnectionString")] EventData[] events, ILogger log)
{
int id = _random.Next(1, 100000);
_context.Log.TraceInfo("Started. Count: " + events.Length + ". " + id); //AI log
foreach (var message in events)
{
//log with ASB
var mess = new Message();
mess.Body = message.EventBody.ToArray();
await queueClient.SendAsync(mess);
}
_context.Log.TraceInfo("Completed. " + id); //AI log
}
By using EventData[] events, you are reading events from hub in batch mode, thats why you see X events processing at a time then next seconds you process next batch.
Instead of EventData[] use simply EventData.
When you send events to hub check that all events are sent with the same partition key if you want try batch processing otherwise they can be splitted in several partitions depending on TU (throughput units), PU (Processing Units) and CU (Capacity Units).
Egress: Up to 2 MB per second or 4096 events per second.
Refer to this document.
Throughput limits for Basic, Standard, Premium..:
There are a couple of things likely happening, though I can only speculate with the limited context that we have. Knowing more about the testing methodology, tier of your Event Hubs namespace, and the number of partitions in your Event Hub would help.
The first thing to be aware of is that the timing between when an event is published and when it is available in a partition to be read is non-deterministic. When a publish operation completes, the Event Hubs broker has acknowledged receipt of the events and taken responsibility for ensuring they are persisted to multiple replicas and made available in a specific partition. However, it is not a guarantee that the event can immediately be read.
Depending on how you sent the events, the broker may also need to route events from a gateway by performing a round-robin or applying a hash algorithm. If you're looking to optimize the time from publish to availability, taking ownership of partition distribution and publishing directly to a partition can help, as can ensuring that you're publishing with the right degree of concurrency for your host environment and scenario.
With respect to duplication, it's important to be aware that Event Hubs offers an "at least once" guarantee; your consuming application should expect some duplicates and needs to be able to handle them in the way that is appropriate for your application scenario.
Azure Functions uses a set of event processors in its infrastructure to read events. The processors collaborate with one another to share work and distribute the responsibility for partitions between them. Because collaboration takes place using storage as an intermediary to synchronize, there is an overlap of partition ownership when instances are scaled up or scaled down, during which time the potential for duplication is increased.
Functions makes the decision to scale based on the number of events that it sees waiting in partitions to be read. In the case of your test, if your publication pattern increases rapidly and Functions sees "the event backlog" grow to the point that it feels the need to scale by multiple instances, you'll see more duplication than you otherwise would for a period of 10-30 seconds until partition ownership normalizes. To mitigate this, using an approach of gradually increasing speed of publishing over a 1-2 minute period can help to smooth out the scaling and reduce (but not eliminate) duplication.
So I've been playing around with MassTransit and Azure Service Bus Premium, here's a sample of one of my consumers. Hypothetical initial load for one publisher would be about 1000 messages a second. However whenever I attempt to configure a consumer, it seems to generally average out at about 20-40 messages per loop.
cfg.ReceiveEndpoint("ReceivePoint", e =>{
e.PrefetchCount = 500;
e.MaxConcurrentCalls = 20;
e.Batch<IBlahContract>(b => {
b.MessageLimit = 500;
b.TimeLimit = TimeSpan.FromSeconds(1);
b.Consumer(() => new BatchBlahConsumer(provider.GetRequiredService<IRepository>(), provider.GetRequiredService<ILogger<BatchBlahConsumer>>()));
});
});
I did try Throughput test which managed a thousand plus messages a second. Did anyone get any tips on how to achieve optimal performance? And might it make more sense to consider a managed instance of RabbitMq since this needs to scale? It just feels like Azure Service Bus isn't really suited to such high throughput?
Edit: Slight addition to this, suspect it's related to a requirement to keep prefetch to about 20 and then consumer concurrency is what really defines performance. So basically, it needs consumer level configuration in terms of estimated requirements. Which would make me lean more towards using rabbit.
Your batch message limit is 500, which is honestly way too high. With the MaxConcurrentCalls set at 20, you'll always hit the timeout instead of the batch size limit, because the Azure client library will only ever deliver 20 messages at once, and the batch size is significantly higher than that value (500 vs 20). You need to set it high enough that it can complete a batch or you'll always be completing the batch on timeout alone.
Lower the batch size, and increase the MaxConncurrentCalls, so that they are the same, or at least so the batch size is less than the concurrent calls limit, so that batches can be completed upon message receipt instead of waiting to time out.
I'm new to Azure Stream Analytics. I have an Event hub as input source and now I'm trying to execute a simple query on this stream. An example query is like this:
SELECT
count(*)
INTO [output1]
FROM
[input1] TIMESTAMP BY Time
GROUP BY TumblingWindow(second, 10)
So I want to count the events which arrived within a certain time frame.
When executing this query, I always get the following error:
Request exceeded maximum allowed size limit
As I already narrowed down the checked time window and I'm certain that the amount of events within this time frame is not very big (at most several 100)
I'm not sure how to avoid this error.
Do you have a hint?
Thanks!
Request exceeded maximum allowed size limit
This error(i believe it should be more explicit) indicates that you violated the azure stream analytic resource and object limits.
It's not just about quantity, it's also about size.Please check your source inputs' size or try to reduce the windowsize and test again.
1.Does the record size of the source query mean that one event can only have 64 KB or does this parameter mean 64 K events?
It means the size of one event should below 64KB.
Is there a possibility to use Stream Analytics to select only certain
subfields of the event or is the only way to reduce the event size
before it is sent to the event hub?
As i know,ASA only collects data for processing it,so the size is all depends on the source side and your query sql. Since you need to use COUNT, i'm afraid that you have to do something on the eventhub side.Please refer to my thoughts:
Use Event Hub Azure Function Trigger, when an event streams into event hub,trigger the function and pick only partial key-values and save it into another event hub namespace.(Just in order to reduce the size of source event) Anyway you only need to COUNT records, i think it works for you.
I am using "node-rdkafka" npm module for our distributed service architecture written in Nodejs. We have a use case for metering where we allow only a certain amount of messages to be consumed and processed every n seconds. For example, a "main" topic has 100 messages pushed by a producer and "worker" consumes from main topic every 30 seconds. There is a lot more to the story of the use case.
The problem I am having is that I need to progamatically get the lag of a given topic(all partitions).
Is there a way for me to do that?
I know that I can use "bin/kafka-consumer-groups.sh" to access some of the data I need but is there another way?
Thank you in advance
You can retrieve that information directly from your node-rdkafka client via several methods:
Client metrics:
The client can emit metrics at defined interval that contain the current and committed offsets as well as the end offset so you can easily calculate the lag.
You first need to enable the metrics events by setting for example 'statistics.interval.ms': 5000 in your client configuration. Then set a listener on the event.stats events:
consumer.on('event.stats', function(stats) {
console.log(stats);
});
The full stats are documented on https://github.com/edenhill/librdkafka/wiki/Statistics but you probably are mostly interested in the partition stats: https://github.com/edenhill/librdkafka/wiki/Statistics#partitions
Query the cluster for offsets:
You can use queryWatermarkOffsets() to retrieve the first and last offsets for a partition.
consumer.queryWatermarkOffsets(topicName, partition, timeout, function(err, offsets) {
var high = offsets.highOffset;
var low = offsets.lowOffset;
});
Then use the consumer's current position (position()) or committed (committed()) offsets to calculate the lag.
Kafka exposes "records-lag-max" mbean which is the max records in lag for a partition via jmx, so you can get the lag querying this mbean
Refer to below doc for the exposed jmx mbean in detail .
https://docs.confluent.io/current/kafka/monitoring.html#consumer-group-metrics
I want to push a message into azure service bus, let say of size 3 MB.
for this, I have wrote :
QueueInfo queueInfo = new QueueInfo("sq-jcibe-microservice-plm-qa");
long maxSizeInMegabytes = 5120;
queueInfo.setMaxSizeInMegabytes(maxSizeInMegabytes);
service.updateQueue(queueInfo);
service.sendQueueMessage("sq-jcibe-microservice-plm-qa", brokeredMessage);
I am getting following exception.
com.microsoft.windowsazure.exception.ServiceException: com.sun.jersey.api.client.UniformInterfaceException: PUT https://sb-jcibe-microservice-qa.servicebus.windows.net/sq-jcibe-microservice-plm-qa?api-version=2013-07 returned a response status of 400 Bad Request
Response Body: <Error><Code>400</Code><Detail>SubCode=40000. For a Partitioned Queue, ordering is supported only if RequiresSession is set to true.
Parameter name: SupportOrdering. TrackingId:59bb3ae1-95f9-45e1-8896-d0f6a9ac2be8_G3, SystemTracker:sb-jcibe-microservice-qa.servicebus.windows.net:sq-jcibe-microservice-plm-qa, Timestamp:11/30/2016 4:52:22 PM</Detail></Error>
I am not undrstanding what does it mean and how should I resolve the problem.Please help me out with this scenario.?
maxSizeInMegabytes refers to the maximum total size of the messages on a queue, not individual message size.
Individual message size cannot exceed 256KB for a standard tier, and 1MB for a premium tier (including headers for both).
If you wish to send messages larger than the Maxim message size, you'll have to either implement a claim check pattern (http://www.enterpriseintegrationpatterns.com/patterns/messaging/StoreInLibrary.html) or use a framework that does it for you. Implementing yourself would mean something among the lines of storing the payload as a Storage blob and message would contain the Uri.