Timeout exceptions when using the Azure SDK causes thread to hang

Timeout exceptions when using the Azure SDK causes thread to hang - azure

I am using the following Azure SDK dependencies in my project,
implementation 'com.azure:azure-core:1.30.0'
implementation 'com.azure:azure-core-http-netty:1.12.3'
implementation 'com.azure:azure-core-management:1.7.0'
implementation ('com.azure.resourcemanager:azure-resourcemanager:2.17.0')
{
exclude group: 'com.fasterxml.jackson' , module:'core'
exclude group: 'com.fasterxml.jackson' , module:'dataformat'
exclude group: 'com.azure' , module:'azure-core'
exclude group: 'com.azure' , module:'azure-core-http-netty'
exclude group: 'com.azure' , module:'azure-core-management'
}
I am constructing the AzureResourceManager like so,
HttpClient httpClient = new NettyAsyncHttpClientBuilder()
.configuration(Configuration.NONE)
.readTimeout(Duration.ofSeconds(10))
.connectTimeout(Duration.ofSeconds(10))
.responseTimeout(Duration.ofSeconds(10))
.writeTimeout(Duration.ofSeconds(10))
.build();
TokenCredential credential = new ClientSecretCredentialBuilder()
.clientId(applicationId)
.clientSecret(secretKey)
.tenantId(tenantId)
.httpClient(httpClient)
.maxRetry(0)
.build();
AzureProfile profile = new AzureProfile(tenantId, subscriptionId, AzureEnvironment.AZURE);
return AzureResourceManager.configure()
.withRetryPolicy(new RetryPolicy(new RetryOptions(new FixedDelayOptions(1, Duration.ofSeconds(30)))))
.withHttpClient(httpClient)
.withLogLevel(HttpLogDetailLevel.BASIC)
.authenticate(credential, profile).withDefaultSubscription();
Then I use AzureResourceManager to make API calls to get Azure resources like VNETs and such. Occasionally those requests will timeout and I see an exception like so,
[reactor-http-epoll-4] i.n.c.AbstractChannelHandlerContext [] [||] - An exception 'java.lang.NoClassDefFoundError: io/netty/handler/ssl/SslClosedEngineException' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.util.concurrent.TimeoutException: Channel response timed out after 10000 milliseconds.
at com.azure.core.http.netty.implementation.ResponseTimeoutHandler.responseTimedOut(ResponseTimeoutHandler.java:58)
at com.azure.core.http.netty.implementation.ResponseTimeoutHandler.lambda$handlerAdded$0(ResponseTimeoutHandler.java:45)
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:835)
After this we find that the thread on which the Azure SDK call was made is hung. The stack above looks that way because the Azure SDK call itself is async and we're waiting for the Future to complete.
Couple of questions,
Is there anything wrong in the way we are using the SDK?
Is this some sort of a known issue with the SDK or HttpClient?

Related

Catch exception from ServiceBusTrigger Azure function v4 out of process(isolated, worker)

I have a dotnet worker azure function v4 (aka isolated, out of process).
I have a serviceBusTrigger for which I listen for new messages published.
I want to create a subscription at the startup of the azure function, I am using the Program.cs and HostBuidler approach to set up the DI.
I am using the following code and it works fine:
IHost host = new HostBuilder()
.ConfigureAppConfiguration(builder =>
{
// omitted for brevity
})
.ConfigureServices((_, services) =>
{
// omitted for brevity
})
.Build();
// here I am creating a subscription if not exist
ServiceBusAdmin serviceBusAdmin = new();
await serviceBusAdmin.CreateOrUpdateAzureServiceBusSubscriptionsAsync();
await host.RunAsync();
The problem I encounter is that the ServiceBussTrigger tries to connect to the AzureServiceBuss before bootstrapping is completed. And shows a lot of errors in the terminal like the following:
[2023-02-13T] System.Private.CoreLib: The messaging entity ':Topic:my-topic|my-sub' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions. TrackingId:2c20, SystemTracker:'Topic:my-topic|my-sub', Timestamp:2023-02-13T15:12:36 TrackingId:043a, SystemTracker:gateway7, Timestamp:2023-02-13T15:12:36 (MessagingEntityNotFound).
What I want to do is to make sure I can create the subscription in the azure function side, and handle somehow the errors.
I scattered the internet to find some info and looked at the official repository for dotnet worker to find some event I could subscribe to to catch the exception, but couldn't find any.
P.S: 'Topic:my-topic|my-sub' name is renamed because it contains name from company I work for.

OData PATCH request results in IllegalArgumentException

Since last week we started using SDK version 3.34.1 (and also tested this with 3.35.0). When we send a PATCH request to a SAP service we get a HTTP 204 No-Content response back from our SAP service (SAP Gateway). When the SDK tries to read that response, it tries to parse the response body which is empty. This leads to the following exception:
2020-12-17 16:13:51.767 ERROR 106363 --- [ut.sap.cases]-0] .s.c.s.d.o.c.r.ODataRequestResultGeneric :
Failed to buffer HTTP response. Unable to buffer HTTP entity.
java.lang.IllegalArgumentException: Wrapped entity may not be null
at org.apache.http.util.Args.notNull(Args.java:54)
at org.apache.http.entity.HttpEntityWrapper.<init>(HttpEntityWrapper.java:59)
at org.apache.http.entity.BufferedHttpEntity.<init>(BufferedHttpEntity.java:59)
at com.sap.cloud.sdk.datamodel.odata.client.request.ODataRequestResultGeneric.lambda$getHttpResponse$4f00ca4e$1(ODataRequestResultGeneric.java:180)
at io.vavr.control.Try.of(Try.java:75)
at com.sap.cloud.sdk.datamodel.odata.client.request.ODataRequestResultGeneric.getHttpResponse(ODataRequestResultGeneric.java:180)
at com.sap.cloud.sdk.datamodel.odata.client.request.ODataHealthyResponseValidator.requireHealthyResponse(ODataHealthyResponseValidator.java:44)
at io.vavr.control.Try.andThenTry(Try.java:250)
at com.sap.cloud.sdk.datamodel.odata.client.request.ODataRequestGeneric.tryExecute(ODataRequestGeneric.java:194)
at com.sap.cloud.sdk.datamodel.odata.client.request.ODataRequestGeneric.tryExecuteWithCsrfToken(ODataRequestGeneric.java:225)
at com.sap.cloud.sdk.datamodel.odata.client.request.ODataRequestUpdate.execute(ODataRequestUpdate.java:136)
at com.sap.cloud.sdk.datamodel.odata.helper.FluentHelperUpdate.executeRequest(FluentHelperUpdate.java:372)
at com.alliander.gvrn.pmd.adapter.out.sap.cases.SapCasesClient.updateCase(SapCasesClient.java:103)
at com.alliander.gvrn.pmd.adapter.out.sap.cases.SapCasesClient.persistOn(SapCasesClient.java:81)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at ....
We use generated typed OData V2 client, which are generated by providing our EDMX files a per the SDK documentation.
Below a code snippet of the function that's calling the service. The matrixCase is a autogenerated object. The OData PATCH is properly handled by the SAP service.
private void updateCase(final ExternalId caseId, final PMDFlow pmdFlow, String jwtToken) {
final HttpDestination sapMatrix = httpDestinationProvider.providePrincipalPropagationDestination(jwtToken);
// "Create matrixCase object with key
MatrixCase matrixCase = MatrixCase.builder()
.psReference(caseId.getValue())
.build();
// Set PmdAppControl explicitly, otherwise the generated client doesn't know which fields are updated.
matrixCase.setPMDAppControl(pmdFlow.getSapNotation());
try {
casesService
.updateMatrixCase(matrixCase)
.executeRequest(sapMatrix);
} catch (ODataException e) {
OdataLogger.logODataException(e);
throw new SapClientException(e);
}
}
We've updated to SDK 3.34.1 due to other issues, however before we used 3.32.0 and I don't remember having this issue in version 3.32.0
Any ideas?
Danny

Yes, your observation is correct that 204 represents a valid answer and is not worth logging an error. Hence, the Cloud SDK team will adjust the log entry level to be less alarming.
Regards,
Tanvi

how to keep azure event hub connection alive for receiving batch diagnostic using amqp

We had enabled diagnostic feature on our batch account to stream events to event hub which we are capturing in our application to take action based on batch task states. However we are noticing that the connection gets closed automatically(probably because no events occurring over night) and hence we have to bounce back the server every once in a while to receive the events/messages back again.
We still rely on java 7 and here are the dependencies that we added for batch processing:
//azure dependency
compile('com.microsoft.azure:azure-storage:7.0.0')
compile('com.microsoft.azure:azure-batch:5.0.1') {
//do not get transitive dependency com.nimbusds:nimbus-jose-jw because spring security still rely on old version of it
excludes group: 'com.nimbusds', module: 'nimbus-jose-jw'
}
compile('com.fasterxml.jackson.core:jackson-core:2.9.8')
compile('org.apache.qpid:qpid-amqp-1-0-common:0.32')
compile('org.apache.qpid:qpid-amqp-1-0-client:0.32')
compile('org.apache.qpid:qpid-amqp-1-0-client-jms:0.32')
compile('org.apache.qpid:qpid-jms-client:0.40.0')
compile('org.apache.geronimo.specs:geronimo-jms_1.1_spec:1.1.1')
//end of azure dependency
And here is the code snipped that does the connection, actually we used the code example given here : http://theitjourney.blogspot.com/2015/12/sendreceive-messages-using-amqp-in-java.html since we couldn't find any working example for java 7 in azure doc itself.
/**
* Set up connection to the service bus using AMQP mechanism.
* NOTE: Messages received from the message bus are not guaranteed to follow order.
* */
MessageConsumer initiateConsumer(MessageListener messageListener, Integer partitionInx, BatchEventHubConfig batchEventHubConfig) {
// set up JNDI context
String queueName = "EventHub"
String connectionFactoryName = "SBCFR"
Hashtable<String, String> hashtable = new Hashtable<>()
hashtable.put("connectionfactory.${connectionFactoryName}", batchEventHubConfig.getAMQPConnectionURI())
hashtable.put("queue.${queueName}", "${batchEventHubConfig.name}/ConsumerGroups/${batchEventHubConfig.consumerGroup}/Partitions/${partitionInx}")
hashtable.put(Context.INITIAL_CONTEXT_FACTORY, "org.apache.qpid.amqp_1_0.jms.jndi.PropertiesFileInitialContextFactory")
Context context = new InitialContext(hashtable)
ConnectionFactory factory = (ConnectionFactory) context.lookup(connectionFactoryName)
Destination queue = (Destination) context.lookup(queueName)
Connection connection = factory.createConnection(batchEventHubConfig.sasPolicyName, batchEventHubConfig.sasPolicyKey)
connection.setExceptionListener(new BatchExceptionListener())
connection.start()
Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE)
MessageConsumer messageConsumer = session.createConsumer(queue)
messageConsumer.setMessageListener(messageListener)
messageConsumer
}
So is there a way to track if a connection was closed, and if so re-start the connection again?
Any information to further diagnose this issue will be appreciated as well.

I think I found the issue, I had used "SBCFR" as connectionFactoryName, looking closely at the example in the link, I should have used "SBCF". Also I updated the lib "org.apache.qpid:qpid-jms-client" from version "0.40.0" to "0.41.0"
Also in the above code, I shouldn't have used AUTO_ACKNOWLEGDE because for the longest time I thought something was wrong because I was never receiving the events in my local setup. Turned out other machines were also connected to the same consumer group and had already ack'ed the message.

Outbound TCP Connection issue cause be sending data to event hub and data lake from azure function

I'm working on a Azure function with http POST trigger, once client call it and post a json data, I will send it to event hub and save to data lake.
once it got hitted by the high traffic, 20k/hour, azure functino will generate high outbound TCP connection, which will exceed the limitation (1920) of the plan.
does high outbound TCP connection cause by writing to event hub, data lake, or both?
is there a way to reduce it so I don't have to pay more to upgrade our plan?
how to debug it to trouble shooting the problem?
here is the code of send data to event hub:
EventHubClient ehc = EventHubClient.CreateFromConnectionString(cn);
try
{
log.LogInformation($"{CogniPointListener.LogPrefix}Sending {batch.Count} Events: {DateTime.UtcNow}");
await ehc.SendAsync(batch);
await ehc.CloseAsync();
}
catch (Exception exception)
{
log.LogError($"{CogniPointListener.LogPrefix}SendingMessages: {DateTime.UtcNow} > Exception: {exception.Message}");
throw;
}
here is the send data to data lake:
var creds = new ClientCredential(clientId, clientSecret);
var clientCreds = ApplicationTokenProvider.LoginSilentAsync(tenantId, creds).GetAwaiter().GetResult();
// Create ADLS client object
AdlsClient client = AdlsClient.CreateClient(adlsAccountFQDN, clientCreds);
try
{
using (var stream = client.CreateFile(fileName, IfExists.Overwrite))
{
byte[] textByteArray = Encoding.UTF8.GetBytes(str);
stream.Write(textByteArray, 0, textByteArray.Length);
}
// Debug
log.LogInformation($"{CogniPointListener.LogPrefix}SaveDataLake saved ");
}
catch (System.Exception caught)
{
string err = $"{caught.Message}Environment.NewLine{caught.StackTrace}Environment.NewLine";
log.LogError(err, $"{CogniPointListener.LogPrefix}SaveDataLake");
throw;
}
Thanks,

I just raised an issue with Azure SDK https://github.com/Azure/azure-sdk-for-net/issues/26884 reporting the problem of socket exhaustion when using ApplicationTokenProvider.LoginSilentAsync.
The current version 2.4.1 of Microsoft.Rest.ClientRuntime.Azure.Authentication uses the old version 4.3.0 of Microsoft.IdentityModel.Clients.ActiveDirectory that creates a new HttpClientHandler on every call.
Creating HttpClientHandler on every is bad. After HttpClientHandler is disposed, the underlaying socket connections are still active for significant time (in my experience 30+ seconds).
There's a thing called HttpClientFactory that ensures HttpClientHandler is not created frequently. Here's a guide from Microsoft explaining how to use HttpClient and HttpClientHandler properly - Use IHttpClientFactory to implement resilient HTTP requests.
I wish they reviewed their SDKs to ensure they follow their own guidelines.
Possible workaround
Microsoft.IdentityModel.Clients.ActiveDirectory since version 5.0.1-preview supports passing a custom HttpClientFactory.
IHttpClientFactory myHttpClientFactory = new MyHttpClientFactory();
AuthenticationContext authenticationContext = new AuthenticationContext(
authority: "https://login.microsoftonline.com/common",
validateAuthority: true,
tokenCache: <some token cache>,
httpClientFactory: myHttpClientFactory);
So it should be possible to replicate what ApplicationTokenProvider.LoginSilentAsync does in your codebase to create AuthenticationContext passing your own instance of HttpClientFactory.
The things you might need to do:
Ensure Microsoft.IdentityModel.Clients.ActiveDirectory with version of after 5.0.1-preview is added to the project
Since the code is used in Azure functions, HttpClientFactory needs to be set up and injected. More info can be found in another StackOverflow answer
Replace calls ApplicationTokenProvider.LoginSilentAsync(tenantId, creds) with something like that (this code is an inlined version of LoginSilentAsync that passes httpClientFactory to AuthenticationContext
var settings = ActiveDirectoryServiceSettings.Azure;
var audience = settings.TokenAudience.OriginalString;
var context = new AuthenticationContext(settings.AuthenticationEndpoint + domain,
settings.ValidateAuthority,
TokenCache.DefaultShared,
httpClientFactory);
var authenticationProvider = new MemoryApplicationAuthenticationProvider(clientCredential);
var authResult = await authenticationProvider.AuthenticateAsync(clientCredential.ClientId, audience, context).ConfigureAwait(false);
var credentials = new TokenCredentials(
new ApplicationTokenProvider(context, audience, clientCredential.ClientId, authenticationProvider, authResult),
authResult.TenantId,
authResult.UserInfo == null ? null : authResult.UserInfo.DisplayableId);
I really don't replicating the logic in the workaround, but I don't think there's any other option until it's fixed properly in Microsoft.Rest.ClientRuntime.Azure.Authentication
Good luck!

TCP connections are limited in specific numbers depending on the plan you have your functions on (Consumption or a static plan in any level B/S/P).
For high workloads I prefer to either
A: Use a queue with a separate function and limiting the concurrency by the function batch size and other settings
or
B: Use a SemaphoreSlim in order to control concurrency of outgoing traffic. (https://learn.microsoft.com/de-de/dotnet/api/system.threading.semaphoreslim?redirectedfrom=MSDN&view=netframework-4.7.2)

Azure Redis Cache: "role discovery data is unavailable"

I'm trying to connect to an instance of Azure Redis Cache from my local dev machine. I'm using StackExchange.Redis like so:
var lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
{
return ConnectionMultiplexer.Connect(
$"{redisServerUrl},abortConnect=false,ssl=true,password={redisServerKey},connectTimeout=10000,syncTimeout=10000");
});
When lazyConnection is called I get an InvalidOperationException with the message:
"role discovery data is unavailable"
and this one-liner stack trace:
Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.get_CurrentRoleInstance()
Why is the exception thrown and how can I avoid it?

StackExchange.Redis tries to discover the RoleInstance name under the covers if you don't specify a ConfigurationOptions.ClientName value. It is odd that you are getting this error bubbled out to your code because the code in question handles all exceptions and defaults back to returning the Computer name.
I suspect that if you add ",name=XXX" to your connection string, the error will go away because you will avoid that code path.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Timeout exceptions when using the Azure SDK causes thread to hang - azure

Related

Catch exception from ServiceBusTrigger Azure function v4 out of process(isolated, worker)

OData PATCH request results in IllegalArgumentException

how to keep azure event hub connection alive for receiving batch diagnostic using amqp

Outbound TCP Connection issue cause be sending data to event hub and data lake from azure function

Azure Redis Cache: "role discovery data is unavailable"

Categories

Resources