Kafka enabled Azure Event Hub: Invalid session timeout in Receiver - azure

I'm trying to use the exact code provided here to send/receive data from a Kafka enabled Azure Event Hub.
https://github.com/Azure/azure-event-hubs-for-kafka/tree/master/quickstart/dotnet/EventHubsForKafkaSample
I'm successful in sending messages to the event hub, but each time I try to initialize the receiver, I get this invalid session timeout error.
7|2018-11-14 19:10:52.967|ssarkar#consumer-1|SEND| [thrd:sasl_ssl://ssarkar-test.servicebus.windows.net:9093/bootstrap]: sasl_ssl://ssarkar-test.servicebus.windows.net:9093/0: Sent JoinGroupRequest (v0, 109 bytes # 0, CorrId 6)
7|2018-11-14 19:10:52.992|ssarkar#consumer-1|RECV| [thrd:sasl_ssl://ssarkar-test.servicebus.windows.net:9093/bootstrap]: sasl_ssl://ssarkar-test.servicebus.windows.net:9093/0: Received JoinGroupResponse (v0, 16 bytes, CorrId 6, rtt 24.28ms)
7|2018-11-14 19:10:52.992|ssarkar#consumer-1|REQERR| [thrd:main]: sasl_ssl://ssarkar-test.servicebus.windows.net:9093/0: JoinGroupRequest failed: Broker: Invalid session timeout: actions Permanent
The only timeout I am specifying is the request.timeout.ms, and I have tried without that as well, but the error won't go away. I have also tried using various values of session.timeout.ms and still the error persists.
There is some info online about making sure that the session timeout value falls within the min and max of the group timeout value. But I don't have a way to view the broker configs on Azure Event Hub, so I have no idea what they are supposed to be.

EH allows session timeouts between 6000 ms and 300000 ms. We also reject your join group request if the request's rebalance timeout is less than session timeout.
Quick note - we aren't actually running real Kafka brokers, so there is a bit of added complexity to exposing broker configs. However, we will update our Github repository with configuration values/ranges!
11/22/19 edit - configuration doc can be found here https://github.com/Azure/azure-event-hubs-for-kafka/blob/master/CONFIGURATION.md

Related

Azure Service Bus message receipt and publishing fails after changing database DTUs

I have an Azure Kubernetes Service that subscribes to a topic on an Azure Service Bus. As messages are received, a number of operations happen before calling a stored proc in an Azure Database. It then publishes a message to another topic on the same Service Bus. This runs, processing thousands of messages without issue.
When the DBA changes the DTUs on that Azure Database, the k8s service stops receiving messages, indicating a message exists but none received. It also begins showing "Service link closed" errors with the topic the app would attempt to publish to.
It never corrects out of this state.
Subscribed to topic messages
my-subscribed-topic/Subscriptions/my-subscription-d363a5a7-2262-4c74-a134-3a94f6b3c290-Receiver: RenewLockAsync start. MessageCount = 1, LockToken = 5abf6b8a-21fe-4b16-938a-b179b29ebadc
my-subscribed-topic/Subscriptions/my-subscription-d363a5a7-2262-4c74-a134-3a94f6b3c290-Receiver: RenewLockAsync done. LockToken = 5abf6b8a-21fe-4b16-938a-b179b29ebadc
my-subscribed-topic/Subscriptions/my-subscription-d363a5a7-2262-4c74-a134-3a94f6b3c290: Processor RenewMessageLock complete. LockToken = 5abf6b8a-21fe-4b16-938a-b179b29ebadc
my-subscribed-topic/Subscriptions/my-subscription-d363a5a7-2262-4c74-a134-3a94f6b3c290: Processor RenewMessageLock start. MessageCount = 1, LockToken = 5abf6b8a-21fe-4b16-938a-b179b29ebadc
my-subscribed-topic/Subscriptions/my-subscription-eaf2b5f2-2d34-43e0-88b1-76414175422e-Receiver: ReceiveBatchAsync done. Received '0' messages. LockTokens =
Published to topic messages
Send Link Closed. Identifier: published-to-topic-995469e6-d697-433a-aaea-112366bdc58a, linkException: Azure.Messaging.ServiceBus.ServiceBusException: The link 'G24:260656346:amqps://my-sb-resource.servicebus.windows.net/-34d9f631;1:454:455' is force detached. Code: publisher(link83580724). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00. (GeneralError). For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot..
Send Link Closed. Identifier: published-to-topic-6ba720c4-8894-474e-b321-0f84f569e6fc, linkException: Azure.Messaging.ServiceBus.ServiceBusException: The link 'G24:260657004:amqps://my-sb-resource.servicebus.windows.net/-34d9f631;1:456:457' is force detached. Code: publisher(link83581007). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00. (GeneralError). For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot..
Send Link Closed. Identifier: published-to-topic-865efa89-0775-4f5f-a5d0-9fde35fdabce, linkException: Azure.Messaging.ServiceBus.ServiceBusException: The link 'G24:260657815:amqps://my-sb-resource.servicebus.windows.net/-34d9f631;1:458:459' is force detached. Code: publisher(link83581287). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00. (GeneralError). For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot..
I can't think of a reason changing DTUs would have any impact whatsoever on maintaining connection with a service bus. We've replicated the behavior three straight times, though.

Apache Pulsar Client - Broker notification of Closed consumer - how to resume data feed?

TLDR: using python client library to subscribe to pulsar topic. logs show: 'broker notification of consumer closed' when something happens server-side. subscription appears to be re-established according to logs but we find later that backlog was growing on cluster b/c no msgs being sent to our subscription to consume
Running into an issue where we have an Apache-Pulsar cluster we are using that is opaque to us, and has a namespace defined where we publish/consume topics, is losing connection with our consumer.
We have a python client consuming from a topic (with one Pulsar Client subscription per thread).
We have run into an issue where, due to an issue on the pulsar cluster, we see the following entry in our client logs:
"Broker notification of Closed consumer"
followed by:
"Created connection for pulsar://houpulsar05.mycompany.com:6650"
....for every thread in our agent.
Then we see the usual periodic log entries like this:
{"log":"2022-09-01 04:23:30.269 INFO [139640375858944] ConsumerStatsImpl:63 | Consumer [persistent://tenant/namespace/topicname, subscription-name, 0] , ConsumerStatsImpl (numBytesRecieved_ = 0, totalNumBytesRecieved_ = 6545742, receivedMsgMap_ = {}, ackedMsgMap_ = {}, totalReceivedMsgMap_ = {[Key: Ok, Value: 3294], }, totalAckedMsgMap_ = {[Key: {Result: Ok, ackType: 0}, Value: 3294], })\n","stream":"stdout","time":"2022-09-01T04:23:30.270009746Z"}
This gives the appearance that some connection has been re-established to some other broker.
However, we do not get any messages being consumed. We have an alert on Grafana dashboard which shows us the backlog on topics and subscription backlog. Eventually it either hits a count or rate thresshold which will alert us that there is a problem. When we restart our agent, the subscription is re-establish and the backlog is can immediately be seen heading to 0.
Has anyone experienced such an issue?
Our code is typical:
consumer = client.subscribe(
topic='my-topic',
subscription_name='my-subscription',
consumer_type=my_consumer_type,
consumer_name=my_agent_name
)
while True:
msg = consumer.receive()
ex = msg.value()
i haven't yet found a readily-available way docker-compose or anything to run a multi-cluster pulsar installation locally on Docker desktop for me to try killing off a broker and see how consumer reacts.
Currently Python client only supports configuring one broker's address and doesn't support retry for lookup yet. Here are two related PRs to support it:
https://github.com/apache/pulsar/pull/17162
https://github.com/apache/pulsar/pull/17410
Therefore, setting up a multi-nodes cluster might be nothing different from a standalone.
If you only specified one broker in the service URL, you can simply test it with a standalone. Run a consumer and a producer sending messages periodically, then restart the standalone. The "Broker notification of Closed consumer" appears when the broker actively closes the connection, e.g. your consumer has sent a SEEK command (by seek call), then broker will disconnect the consumer and the log appears.
BTW, it's better to show your Python client version. And GitHub issues might be a better place to track the issue.

Connection to Azure Service Bus using Java Spring Application - Timeout

I have written a client which tries to connect to Azure service bus. As soon as the server starts up i get the below errors and i receive no messages present at the queue. I tried replacing the sb protocol with amqpwss, but it dint help.
2020-05-25 21:23:11 [ReactorThreadeebf108d-444b-4acd-935f-c2c2c135451d] INFO c.m.a.s.p.RequestResponseLink - Internal send link 'RequestResponseLink-Sender_0480eb_c31e1cc239bf471e811e53a30adc6488_G51' of requestresponselink to '$cbs' encountered error.
com.microsoft.azure.servicebus.primitives.ServiceBusException: com.microsoft.azure.servicebus.amqp.AmqpException: The connection was inactive for more than the allowed 60000 milliseconds and is closed by container 'LinkTracker'. TrackingId:c31e1cc239bf471e811e53a30adc6488_G51, SystemTracker:gateway7, Timestamp:2020-05-25T21:23:10
at com.microsoft.azure.servicebus.primitives.ExceptionUtil.toException(ExceptionUtil.java:55)
at com.microsoft.azure.servicebus.primitives.RequestResponseLink$InternalSender.onClose(RequestResponseLink.java:759)
at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.processOnClose(BaseLinkHandler.java:66)
at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.onLinkRemoteClose(BaseLinkHandler.java:42)
at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:176)
at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108)
at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:324)
at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:291)
at com.microsoft.azure.servicebus.primitives.MessagingFactory$RunReactor.run(MessagingFactory.java:491)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.microsoft.azure.servicebus.amqp.AmqpException: The connection was inactive for more than the allowed 60000 milliseconds and is closed by container 'LinkTracker'. TrackingId:c31e1cc239bf471e811e53a30adc6488_G51, SystemTracker:gateway7, Timestamp:2020-05-25T21:23:10
... 10 common frames omitted
There is a similar issue opened in GitHub
what you posted here is the trace, not the error. Yes, the service
closes idle connections are 10 minutes. The client traces it and
reopens the connection. It is seamless, doesn't throw any exceptions
to the application. That can't be your problem. If your sends are
failing means there may be another problem, but not this one.
As i see the second line it is about the timeout of 6 secs, can you check the troubleshoot page if it helps. Also this.
we recommend adding "sync-publish=true" to the connection url

Azure website timing out after long process

Team,
I have a Azure website published on Azure. The application reads around 30000 employees from an API and after the read is successful, it updates the secondary redis cache with all the 30,000 employees.
The timeout occurs in the second step whereby when it updates the secondary redis cache with all the employees. From my local it works fine. But as soon as i deploy this to Azure, it gives me a
500 - The request timed out.
The web server failed to respond within the specified time
From the blogs i came to know that the default time out is set as 4 mins for azure website.
I have tried all the fixes provided on the blogs like setting the command SCM_COMMAND_IDLE_TIMEOUT in the application settings to 3600.
I even tried putting the Azure redis cache session state provider settings as this in the web.config with inflated timeout figures.
<add type="Microsoft.Web.Redis.RedisSessionStateProvider" name="MySessionStateStore" host="[name].redis.cache.windows.net" port="6380" accessKey="QtFFY5pm9bhaMNd26eyfdyiB+StmFn8=" ssl="true" abortConnect="False" throwOnError="true" retryTimeoutInMilliseconds="500000" databaseId="0" applicationName="samname" connectionTimeoutInMilliseconds="500000" operationTimeoutInMilliseconds="100000" />
The offending code responsible for the timeout is this:
`
public void Update(ReadOnlyCollection<ColleagueReferenceDataEntity> entities)
{
//Trace.WriteLine("Updating the secondary cache with colleague data");
var secondaryCache = this.Provider.GetSecondaryCache();
foreach (var entity in entities)
{
try
{
secondaryCache.Put(entity.Id, entity);
}
catch (Exception ex)
{
// if a record fails - log and continue.
this.Logger.Error(ex, string.Format("Error updating a colleague in secondary cache: Id {0}, exception {1}", entity.Id));
}
}
}
`
Is there any thing i can make changes to this code ?
Please can anyone help me...i have run out of ideas !
You're doing it wrong! Redis is not a problem. The main request thread itself is getting terminated before the process is completed. You shouldn't let a request wait for that long. There's a hard-coded restriction on in-flight requests of 230-seconds max which can't be changed.
Read here: Why does my request time out after 230 seconds?
Assumption #1: You're loading the data on very first request from client-side!
Solution: If the 30000 employees record is for the whole application, and not per specific user - you can trigger the data load on app start-up, not on user request.
Assumption #2: You have individual users and for each of them you have to store 30000 employees data, on the first request from client-side.
Solution: Add a background job (maybe WebJob/Azure Function) to process the task. Upon request from client - return a 202 (Accepted with the job-status location in the header. The client can then poll for the status of the task at a certain frequency update the user accordingly!
Edit 1:
For Assumption #1 - You can try batching the objects while pushing the objects to Redis. Currently, you're updating one object at one time, which will be 30000 requests this way. It is definitely will exhaust the 230 seconds limit. As a quick solution, batch multiple objects in one request to Redis. I hope it should do the trick!
UPDATE:
As you're using StackExchange.Redis - use the following pattern to batch the objects mentioned here already.
Batch set data from Dictionary into Redis
The number of objects per requests varies depending on the payload size and bandwidth available. As your site is hosted on Azure, I do not thing bandwidth will be much of a concern
Hope that helps!

Camel AMQP autoAck failed to resolve an endpoint

I am trying to set autoAck to false while reading from azure service bus queue, which I am connecting using amqp. Below is the code.
from("amqp:queue:testqueue?autoAck=false&concurrentConsumers=1")
But I am getting an error msg:
Failed to create route route1: Route(route1)[[From[amqp:queue:testqueue?autoAck=false&concu... because of Failed to resolve endpoint: amqp://queue:testqueue?autoAck=false&concurrentConsumers=1 due to: Failed to resolve endpoint: amqp://queue:testqueue?autoAck=false&concurrentConsumers=1 due to: There are 1 parameters that couldn't be set on the endpoint. Check the uri if the parameters are spelt correctly and that they are properties of the endpoint. Unknown parameters=[{autoAck=false}]
I am trying to process the messages from the service bus queue but don't want them to be removed till the time processing is not complete.
I finally found the answer, in order to remove the messages from the queue only when a consumer has accepted the message and acknowledged that it was successfully processed, you need to add "acknowledgementModeName" to the route
from("amqp:queue:testqueue?acknowledgementModeName=CLIENT_ACKNOWLEDGE&concurrentConsumers=1")
For more clarification visit this page http://camel.apache.org/jms.html

Resources