Intermittent 501 response from InvokeDeviceMethodAsync - Azure IoT - azure

InvokeDeviceMethodAsync is intermittently (and only recently) returning a status code of 501 within the responses (the response body is null).
I understand this means Not Implemented. However, the method IS implemented - in fact, it's the only method that is. The device is using Microsoft.Azure.Devices.Client (1.32.0-preview-001 since we're also previewing the Device Streams feature).
Setup, device side
This is all called at startup. After this, some invocations succeed, some fail.
var deviceClient = DeviceClient.CreateFromConnectionString(connectionDetails.ConnectionString, TransportType.Mqtt);
await deviceClient.SetMethodHandlerAsync("RedactedMethodName", RedactedMethodHandler, myObj, cancel).ConfigureAwait(true);
Call, server side
var methodInvocation = new CloudToDeviceMethod("RedactedMethodName")
{
ResponseTimeout = TimeSpan.FromSeconds(60),
ConnectionTimeout = TimeSpan.FromSeconds(60)
};
var invokeResponse = await _serviceClient.InvokeDeviceMethodAsync(iotHubDeviceId, methodInvocation, CancellationToken.None);
What have I tried?
Check code, method registration
Looking for documentation about 501: can't find any
Looking through the source for the libraries (https://github.com/Azure/azure-iot-sdk-csharp/search?q=501). Just looks like "not implemented", i.e. nothing registered
Turning on Distributed Tracing from the Azure portal, with sampling rate 100%. Waited a long time, but still says "Device is not synchronised with desired settings"
Exploring intellisense on the DeviceClient object. Not much there!
What next?
Well, I'd like to diagnose.
What possible reasons are there for the 501 response?
Are there and diagnostic tools, e.g. logging, I have access to?

It looks like, there is no response from the method within the responseTimeoutInSeconds value, so for test purpose (and the real response error) try to use a REST API to invoke the device method.
You should received a http status code described here.

Related

Azure Function Timer Trigger & API management - Manual execution returns 404

I have a function app with:
a few functions triggered by a Timer Trigger
and some triggered by the HTTP Trigger.
I have also an Azure API Management service set up for the function app, where the HTTP Triggered functions have their endpoints defined.
I am trying to trigger one of my timer triggered functions manually as per the guide here https://learn.microsoft.com/en-us/azure/azure-functions/functions-manually-run-non-http
I am however getting a 404 result in Postman, despite the seemingly correct URL and x-functions-key.
The function:
The key:
The request:
I also noticed that:
if I don't include the x-functions-key header, then I get 401 Unauthorized result
and if I include an incorrect key, then I get 403 Forbidden.
Could it be related to the API management service being set up for the function app?
How can I troubleshoot this further?
I have managed to solve it.
It turns out that Azure Functions timer trigger requires six parts cron expression (I was only aware of the five part style)
Without that, it does not work - sadly this is not easily noticeable in the UI.
I have realized that by investigating Application Insights logs:
The function page shows that everything is fine:
Changing the CRON format has fixed the 404 issue and I started getting 202 Accepted response.
As a bonus note, I have to add:
Even though the response was 202 Accepted, the triggering didn't work correctly, because my function return type was Task<IActionResult> which is not accepted for timer triggered functions.
Again, only ApplicationInsights showed that anything is wrong:
The 'MonkeyUserRandom' function is in error: Microsoft.Azure.WebJobs.Host: Error indexing method 'MonkeyUserRandom'. Microsoft.Azure.WebJobs.Host: Cannot bind parameter '$return' to type IActionResult&. Make sure the parameter Type is supported by the binding. If you're using binding extensions (e.g. Azure Storage, ServiceBus, Timers, etc.) make sure you've called the registration method for the extension(s) in your startup code (e.g. builder.AddAzureStorage(), builder.AddServiceBus(), builder.AddTimers(), etc.).
That's a bonus tip for a 'manual triggering of non-http function does not work'.
I test it in my side, it works fine. Please refer to the below screenshot:
Please check if you request https://xxx.azurewebsites.net/admin/functions/TimerTrigger1 but not https://xxx.azurewebsites.net/admin/functions/TimerTrigger. Note it's "TimerTrigger1".
I requst with ..../TimerTrigger at first test because the document shows us QueueTrigger, and it response 404.

How to catch errors raised in Azure Device SDK?

I am using the Azure Device SDK for .NET Core in order to connect my devices to Azure IoT Hub. From time to time the server rejects some messages (like twin updates or telemetry messages) from the devices and responds with status code 400. As a result there are exceptions thrown on client side but due to its asynchronous nature they are swallowed somewhere inside the Azure SDK and never thrown at my code.
How can I actually be notified about these errors so I can handle and display them?
I can also see from the Azure Device SDK code that it uses some kind of logging (EventSource) but this is never enabled in the code:
From Logging.Common.cs:
Log.IsEnabled() // always returns false
Can you point me to some way where I can 1) actually enable logging in the Azure Device SDK and 2) find the content that was actually logged?
Update: Details regarding exception that is swallowed somewhere
// Fired here after I send twin reported properties to server:
AmqpTransportHandler.VerifyResponseMessage:
if (status >= 400)
{
throw new InvalidOperationException("Service rejected the message with status: " + status);
}
// Then becomes caught and re-fired here:
AmqpTransportHandler.SendTwinPatchAsync:
throw AmqpClientHelper.ToIotHubClientContract(exception);
// Then it disappears somewhere in the "dance" of the async tasks
You can capture traces: https://github.com/Azure/azure-iot-sdk-csharp/tree/master/tools/CaptureLogs
Our sample demonstrates best practice regarding exception catching, for example: https://github.com/Azure/azure-iot-sdk-csharp/blob/master/iothub/device/samples/DeviceClientMqttSample/Program.cs

Limiting the number of concurrent jobs on Azure Functions queue

I have a Function app in Azure that is triggered when an item is put on a queue. It looks something like this (greatly simplified):
public static async Task Run(string myQueueItem, TraceWriter log)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(Config.APIUri);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
ApiResponse apiResponse = JsonConvert.DeserializeObject<ApiResponse>(json);
log.Info($"Activity data successfully sent to platform in {apiResponse.elapsed}ms. Tracking number: {apiResponse.tracking}");
}
}
This all works great and runs pretty well. Every time an item is put on the queue, we send the data to some API on our side and log the response. Cool.
The problem happens when there's a big spike in "the thing that generates queue messages" and a lot of items are put on the queue at once. This tends to happen around 1,000 - 1,500 items in a minute. The error log will have something like this:
2017-02-14T01:45:31.692 mscorlib: Exception while executing function:
Functions.SendToLimeade. f-SendToLimeade__-1078179529: An error
occurred while sending the request. System: Unable to connect to the
remote server. System: Only one usage of each socket address
(protocol/network address/port) is normally permitted
123.123.123.123:443.
At first, I thought this was an issue with the Azure Function app running out of local sockets, as illustrated here. However, then I noticed the IP address. The IP address 123.123.123.123 (of course changed for this example) is our IP address, the one that the HttpClient is posting to. So, now I'm wondering if it is our servers running out of sockets to handle these requests.
Either way, we have a scaling issue going on here. I'm trying to figure out the best way to solve it.
Some ideas:
If it's a local socket limitation, the article above has an example of increasing the local port range using Req.ServicePoint.BindIPEndPointDelegate. This seems promising, but what do you do when you truly need to scale? I don't want this problem coming back in 2 years.
If it's a remote limitation, it looks like I can control how many messages the Functions runtime will process at once. There's an interesting article here that says you can set serviceBus.maxConcurrentCalls to 1 and only a single message will be processed at once. Maybe I could set this to a relatively low number. Now, at some point our queue will be filling up faster than we can process them, but at that point the answer is adding more servers on our end.
Multiple Azure Functions apps? What happens if I have more than one Azure Functions app and they all trigger on the same queue? Is Azure smart enough to divvy up the work among the Function apps and I could have an army of machines processing my queue, which could be scaled up or down as needed?
I've also come across keep-alives. It seems to me if I could somehow keep my socket open as queue messages were flooding in, it could perhaps help greatly. Is this possible, and any tips on how I'd go about doing this?
Any insight on a recommended (scalable!) design for this sort of system would be greatly appreciated!
I think the code error is because of: using (var client = new HttpClient())
Quoted from Improper instantiation antipattern:
this technique is not scalable. A new HttpClient object is created for
each user request. Under heavy load, the web server may exhaust the
number of available sockets.
I think I've figured out a solution for this. I've been running these changes for the past 3 hours 6 hours, and I've had zero socket errors. Before I would get these errors in large batches every 30 minutes or so.
First, I added a new class to manage the HttpClient.
public static class Connection
{
public static HttpClient Client { get; private set; }
static Connection()
{
Client = new HttpClient();
Client.BaseAddress = new Uri(Config.APIUri);
Client.DefaultRequestHeaders.Add("Connection", "Keep-Alive");
Client.DefaultRequestHeaders.Add("Keep-Alive", "timeout=600");
Client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
}
}
Now, we have a static instance of HttpClient that we use for every call to the function. From my research, keeping HttpClient instances around for as long as possible is highly recommended, everything is thread safe, and HttpClient will queue up requests and optimize requests to the same host. Notice I also set the Keep-Alive headers (I think this is the default, but I figured I'll be implicit).
In my function, I just grab the static HttpClient instance like:
var client = Connection.Client;
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
I haven't really done any in-depth analysis of what's happening at the socket level (I'll have to ask our IT guys if they're able to see this traffic on the load balancer), but I'm hoping it just keeps a single socket open to our server and makes a bunch of HTTP calls as the queue items are processed. Anyway, whatever it's doing seems to be working. Maybe someone has some thoughts on how to improve.
If you use consumption plan instead of Functions on a dedicated web app, #3 more or less occurs out of the box. Functions will detect that you have a large queue of messages and will add instances until queue length stabilizes.
maxConcurrentCalls only applies per instance, allowing you to limit per-instance concurrency. Basically, your processing rate is maxConcurrentCalls * instanceCount.
The only way to control global throughput would be to use Functions on dedicated web apps of the size you choose. Each app will poll the queue and grab work as necessary.
The best scaling solution would improve the load balancing on 123.123.123.123 so that it can handle any number of requests from Functions scaling up/down to meet queue pressure.
Keep alive afaik is useful for persistent connections, but function executions aren't viewed as a persistent connection. In the future we are trying to add 'bring your own binding' to Functions, which would allow you to implement connection pooling if you liked.
I know the question was answered long ago, but in the mean time Microsoft have documented the anti-pattern that you were using.
Improper Instantiation antipattern

Rebus - Send delayed message to another queue (Azure ServiceBus)

I have a website and and a webjob, where the website is a oneway client and the webjob is worker.
I use the Azure ServiceBus transport for the queue.
I get the following error:
InvalidOperationException: Cannot use ourselves as timeout manager
because we're a one-way client
when I try to send Bus.Defer from the website bus.
Since Azure Servicebus have built in support for timeoutmanager should not this work event from a oneway client?
The documentation on Bus.Defer says: Defers the delivery of the message by attaching a header to it and delivering it to the configured timeout manager endpoint
/// (defaults to be ourselves). When the time is right, the deferred message is returned to the address indicated by the header."
Could I fix this by setting the ReturnAddress like this:
headers.Add(Rebus.Messages.Headers.ReturnAddress, "webjob-worker");
Could I fix this by setting the ReturnAddress like this: headers.Add(Rebus.Messages.Headers.ReturnAddress, "webjob-worker");
Yes :)
The problem is this: When you await bus.Defer a message with Rebus, it defaults to return the message to the input queue of the sender.
When you're a one-way client, you don't have an input queue, and thus there is no way for you to receive the message after the timeout has elapsed.
Setting the return address fixes this, although I admit the solution does not exactly reek of elegance. A nicer API would be if Rebus had a Defer method on its routing API, which could be called like this:
var routingApi = bus.Advanced.Routing;
await routingApi.Defer(recipient, TimeSpan.FromSeconds(10), message);
but unfortunately it does not have that method at the moment.
To sum it up: Yes, setting the return address explicitly on the deferred message makes a one-way client capable of deferring messages.

Which status codes should I expect when using Azure Table Storage

I want to do something when/if an insert operation on Azure Table Storage fails. Assume that I want to return false from the below code when I receive an error. _table is of type CloudTable and the code below works.
public bool InsertEntity(TableEntity entity)
{
var insertOperation = TableOperation.Insert(entity);
var result = _table.Execute(insertOperation);
return (result.HttpStatusCode == (int)System.Net.HttpStatusCode.OK);
}
I get the result 203 when the operation succeeds. But there are other possible results like "200 OK".
How can I write a piece of code that will allow me to understand from the status code that something went wrong?
Using the .NET SDK, any situation that needs to be handled will throw an exception. i.e. Any status code that is not 2xx will cause an exception.
To handle situations where something went wrong, I don't have to manually check the status code of the result for every request. All I have to do is to write exception handling code. Like below:
try
{
var result = _table.Execute(insertOperation);
}
catch (Exception)
{
Log("Something went wrong in table operation.");
}
From this page:
REST API operations for Azure storage services return standard HTTP
status codes, as defined in the HTTP/1.1 Status Code Definitions.
So every successful operation against table service will return 2XX status code. To find out about the exact code returned, I would recommend checking out each operation on the REST API Documentation page. For example, Create Table operation returns 201 status code if the operation is successful.
Similarly, for errors in table service you will get error code in 400 range (that would mean you provided incorrect data e.g. 409 (Conflict) error if you're trying to create a table which already exists) or in 500 range (for example, table service is unavailable). You can find the list of all Table Service Error Codes here: https://msdn.microsoft.com/en-us/library/azure/dd179438.aspx.
Basically, any return in 2xx is "OK". In this example:
https://msdn.microsoft.com/en-us/library/system.net.httpstatuscode%28v=vs.110%29.aspx
203 Non-Authoritative Information:
Indicates that the returned metainformation is from a cached copy
instead of the
origin server and therefore may be incorrect.
This Azure white paper elaborates further:
http://go.microsoft.com/fwlink/?LinkId=153401
9.6.5 Error handling and reporting
The REST API is designed to look like a standard HTTP server interacting with existing HTTP clients
(e.g., browsers, HTTP client libraries, proxies, caches, and so on).
To ensure the HTTP clients handle errors properly, we map each Windows
Azure Table error to an HTTP status code.
HTTP status codes are less expressive than Windows Azure Table error
codes and contain less information about the error. Although the HTTP
status codes contain less information about the error, clients that
understand HTTP will usually handle the error correctly.
Therefore, when handling errors or reporting Windows Azure Table
errors to end users, use the Windows Azure Table error code along with
the HTTP status code as it contains more information about the error.
Additionally, when debugging your application, you should also consult
the human readable element of the XML error
response.
These links are also useful:
Microsoft Azure: Status and Error Codes
Clean way to catch errors from Azure Table (other than string match?)
If you are using Azure Storage SDK accessing Azure Table Storage, the SDK would throw a StorageException on the client side for unexpected Http Status Codes returned from the table storage service. To extract the actual HttpStatusCode you would need to wrap your code in a try {} catch(StorageException ex){} block. And then parse the actual exception object to extract the HttpStatusCode embedded in it.
Have a look at Azure Storage Exception parser I implemented in Nuget:
https://www.nuget.org/packages/AzureStorageExceptionParser/
This extracts HttpStatusCode and many other useful fields from Azure StorageExceptions. You can use the same library accross table, blob, queue clients etc. as they all follow the same StorageException pattern.
Note that there will be some exceptions thrown by the Azure Storage SDK that are not StorageExceptions, those are mostly client side request validation type of exceptions and naturally they do not contain any HttpStatusCode. (Hence you would need to have a catch for specifically StorageExceptions to extract HttpStatusCode s).
As a separate note, Azure Storage SDK has a fairly robust retry mechanism for failed requests. Below is the snippet from SDK source code where they decide if the failed response is retrieable or not.
https://github.com/Azure/azure-storage-net/blob/master/Lib/Common/RetryPolicies/ExponentialRetry.cs
if ((statusCode >= 300 && statusCode < 500 && statusCode != 408)
|| statusCode == 501 // Not Implemented
|| statusCode == 505 // Version Not Supported
|| lastException.Message == SR.BlobTypeMismatch)
{
return false; //aka. do not Retry if w are here otherwise Retry if within max retry count..
}

Resources