Limiting the number of concurrent jobs on Azure Functions queue - azure

I have a Function app in Azure that is triggered when an item is put on a queue. It looks something like this (greatly simplified):
public static async Task Run(string myQueueItem, TraceWriter log)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(Config.APIUri);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
ApiResponse apiResponse = JsonConvert.DeserializeObject<ApiResponse>(json);
log.Info($"Activity data successfully sent to platform in {apiResponse.elapsed}ms. Tracking number: {apiResponse.tracking}");
}
}
This all works great and runs pretty well. Every time an item is put on the queue, we send the data to some API on our side and log the response. Cool.
The problem happens when there's a big spike in "the thing that generates queue messages" and a lot of items are put on the queue at once. This tends to happen around 1,000 - 1,500 items in a minute. The error log will have something like this:
2017-02-14T01:45:31.692 mscorlib: Exception while executing function:
Functions.SendToLimeade. f-SendToLimeade__-1078179529: An error
occurred while sending the request. System: Unable to connect to the
remote server. System: Only one usage of each socket address
(protocol/network address/port) is normally permitted
123.123.123.123:443.
At first, I thought this was an issue with the Azure Function app running out of local sockets, as illustrated here. However, then I noticed the IP address. The IP address 123.123.123.123 (of course changed for this example) is our IP address, the one that the HttpClient is posting to. So, now I'm wondering if it is our servers running out of sockets to handle these requests.
Either way, we have a scaling issue going on here. I'm trying to figure out the best way to solve it.
Some ideas:
If it's a local socket limitation, the article above has an example of increasing the local port range using Req.ServicePoint.BindIPEndPointDelegate. This seems promising, but what do you do when you truly need to scale? I don't want this problem coming back in 2 years.
If it's a remote limitation, it looks like I can control how many messages the Functions runtime will process at once. There's an interesting article here that says you can set serviceBus.maxConcurrentCalls to 1 and only a single message will be processed at once. Maybe I could set this to a relatively low number. Now, at some point our queue will be filling up faster than we can process them, but at that point the answer is adding more servers on our end.
Multiple Azure Functions apps? What happens if I have more than one Azure Functions app and they all trigger on the same queue? Is Azure smart enough to divvy up the work among the Function apps and I could have an army of machines processing my queue, which could be scaled up or down as needed?
I've also come across keep-alives. It seems to me if I could somehow keep my socket open as queue messages were flooding in, it could perhaps help greatly. Is this possible, and any tips on how I'd go about doing this?
Any insight on a recommended (scalable!) design for this sort of system would be greatly appreciated!

I think the code error is because of: using (var client = new HttpClient())
Quoted from Improper instantiation antipattern:
this technique is not scalable. A new HttpClient object is created for
each user request. Under heavy load, the web server may exhaust the
number of available sockets.

I think I've figured out a solution for this. I've been running these changes for the past 3 hours 6 hours, and I've had zero socket errors. Before I would get these errors in large batches every 30 minutes or so.
First, I added a new class to manage the HttpClient.
public static class Connection
{
public static HttpClient Client { get; private set; }
static Connection()
{
Client = new HttpClient();
Client.BaseAddress = new Uri(Config.APIUri);
Client.DefaultRequestHeaders.Add("Connection", "Keep-Alive");
Client.DefaultRequestHeaders.Add("Keep-Alive", "timeout=600");
Client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
}
}
Now, we have a static instance of HttpClient that we use for every call to the function. From my research, keeping HttpClient instances around for as long as possible is highly recommended, everything is thread safe, and HttpClient will queue up requests and optimize requests to the same host. Notice I also set the Keep-Alive headers (I think this is the default, but I figured I'll be implicit).
In my function, I just grab the static HttpClient instance like:
var client = Connection.Client;
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
I haven't really done any in-depth analysis of what's happening at the socket level (I'll have to ask our IT guys if they're able to see this traffic on the load balancer), but I'm hoping it just keeps a single socket open to our server and makes a bunch of HTTP calls as the queue items are processed. Anyway, whatever it's doing seems to be working. Maybe someone has some thoughts on how to improve.

If you use consumption plan instead of Functions on a dedicated web app, #3 more or less occurs out of the box. Functions will detect that you have a large queue of messages and will add instances until queue length stabilizes.
maxConcurrentCalls only applies per instance, allowing you to limit per-instance concurrency. Basically, your processing rate is maxConcurrentCalls * instanceCount.
The only way to control global throughput would be to use Functions on dedicated web apps of the size you choose. Each app will poll the queue and grab work as necessary.
The best scaling solution would improve the load balancing on 123.123.123.123 so that it can handle any number of requests from Functions scaling up/down to meet queue pressure.
Keep alive afaik is useful for persistent connections, but function executions aren't viewed as a persistent connection. In the future we are trying to add 'bring your own binding' to Functions, which would allow you to implement connection pooling if you liked.

I know the question was answered long ago, but in the mean time Microsoft have documented the anti-pattern that you were using.
Improper Instantiation antipattern

Related

How should one properly handle opening and closing of Azure Service Bus/Topic Clients in an Azure Function?

I'm not sure of the proper way to manage the lifespans of the various clients necessary to interact with the Azure Service Bus. From my understanding there are three different but similar clients to manage: ServiceBusClient, a Topic/Queue/Subscription Service, and then a Sender of some sort. In my case, its TopicService and a Sender. Should I close the sender after every message? After a certain amount of downtime? And same with all the others? I feel like I should keep the ServiceBusClient open until the function is entirely complete, so that probably carries over to the Topic Client as well. There's just so many ways to skin this one, I'm not sure where to start to draw the line. I'm pretty sure it's not this extreme:
function sendMessage(message: SendableMessageInfo) {
let client=createServiceBusClientFromConnectionString(connectionString)
let tClient = createTopicClient(client);
const sender = tClient.createSender();
sender.send(message);
sender.close();
tClient.close();
client.close();
}
But leaving everything open all the time seems like a memory leak waiting to happen. Should I handle this all through error handling? Try-catch, then close everything in a finally block?
I could also just use the Azure Function binding, correct me if I'm wrong:
const productChanges: AzureFunction = async function (context: Context, products: product[]): Promise<void> {
context.bindings.product_changes = []
for (let product of product) {
if(product.updated) {
let message = this.createMessage(product)
context.bindings.product_changes.push(message)
}
}
context.done();
}
I can't work out from the docs or source which would be better (both in terms of performance and finances) for an extremely high throughput Topic (at surge, ~100,000 requests/sec).
Any advice would be appreciated!
In my opinion, we'd better use Azure binding or set the client static but not create the client every time. If use Azure binding, we will not consider the problem about close the sender, if set the client static, it's ok too. Both of the solutions have good performance and there is no difference in cost (you can refer to this tutorial for servicebus price: https://azure.microsoft.com/en-us/pricing/details/service-bus/) between these twos. Hope it would be helpful to your question.
I know this is a late reply, but I'll try to explain the concepts behind the clients below in case someone lands here looking for answers.
Version 1
_ ServiceBusClient (maintains the connection)
|_ TopicClient
|_ Sender (sender link)
Version 7
_ ServiceBusClient (maintains the connection)
|_ ServiceBusSender (sender link)
In both version 1 and version 7 of #azure/service-bus SDK, when you use the sendMessages method or the equivalent send method for the first time, a connection is created on the ServiceBusClient if there was none and the new sender link is created.
The sender link remains active for a while and is cleared on its own(by the SDK) if there is no activity. Even if it is closed by inactivity, the subsequent send call even after waiting for a long duration would work just fine since it creates a new sender link.
Once you're done using the ServiceBusClient, you can close the client and all the internal senders, receivers are also closed with this if they are not already closed individually.
The latest version 7.0.0 of #azure/service-bus has been released recently.
#azure/service-bus - 7.0.0
Samples for 7.0.0
Guide to migrate from #azure/service-bus v1 to v7

Checking connection to Azure Service Bus

I have some code dependent of Azure Service Bus. I've created an endpoint that checks the availability of my Azure Service Bus topic using the following code:
var connectionString = CloudConfigurationManager.GetSetting("servicebusconnectionstring");
var manager = NamespaceManager.CreateFromConnectionString(connectionString);
var sub = manager.GetSubscription("mytopic", "mysubscription");
var count = sub.MessageCount;
This actually works, but I have two questions (since I'm constantly experiencing timeouts using this code).
Question 1: Is there an easier/better way of checking Service Bus connectivity from C#?
Question 2: When using the code above, which instances should I configure as singleton in my IoC container? I'm suspecting creating all instances every time I ping this endpoint to cause the timeout, since I don't see problems in my other endpoints where I re-use a TopicClient.
Getting MessageCount is potentially an expensive operation, especially if the value is high.
You could run a simple operation like a check whether the topic exists:
var ns = NamespaceManager.CreateFromConnectionString("...");
ns.TopicExists("mytopic");
which will throw an exception (probably MessagingCommunicationException) if communication to Service Bus fails.
It's ok to reuse NamespaceManager between requests, so you can make it singleton. Not sure if that brings any measurable performance benefit though.

Azure WebJobs getting initialized randomly

We have webjobs consisting of several methods in a single Functions.cs file. They have servicebus triggers on topic/queues. Hence, keep listening to topic/queue for brokeredMessage. As soon as the message arrives, we have a processing logic that does lot of stuff. But, we find sometimes, all the webjobs get reinitialized suddenly. I found few articles on the website which says webjobs do get initialized and it is usual.
But, not sure if that is the only way and can we prevent it from getting reinitialized as we call brokeredMessage.Complete as soon we get brokeredMessage since we do not want it to be keep processing again and again?
Also, we have few webjobs in one app service and few webjobs in other app service. And, we find all of the webjobs from both the app service get re initialized at the same time. Not sure, why?
You should design your process to be able to deal with occasional disconnects and failures, since this is a "feature" or applications living in the cloud.
Use a transaction to manage the critical area of your code.
Pseudo/commented code below, and a link to the Microsoft documentation is here.
var msg = receiver.Receive();
using (scope = new TransactionScope())
{
// Do whatever work is required
// Starting with computation and business logic.
// Finishing with any persistence or new message generation,
// giving your application the best change of success.
// Keep in mind that all BrokeredMessage operations are enrolled in
// the transaction. They will all succeed or fail.
// If you have multiple data stores to update, you can use brokered messages
// to send new individual messages to do the operation on each store,
// giving eventual consistency.
msg.Complete(); // mark the message as done
scope.Complete(); // declare the transaction done
}

Pusher Account over quota

We use Puhser in our application in order to have real-time updates.
Something very stange happens - while google analytics says that we have around 200 simultaneous connections, Pusher says that we have 1500.
I would like to monitor Pusher connections in real-time but could not find any method to do so. Somebody can help??
Currently there's no way to get realtime stats on the number of connections you currently have open for your app. However, it is something that we're investigating currently.
In terms of why the numbers vary between Pusher and Google Analytics, it's usually down to the fact that Google Analytics uses different methods of tracking whether or not a user is on the site. We're confident that our connection counting is correct, however, that's not to say that there isn't a potentially unexpected reason for your count to be high.
A connection is counted as a WebSocket connection to Pusher. When using the Pusher JavaScript library a new WebSocket connection is created when you create a new Pusher instance.
var pusher = new Pusher('APP_KEY');
Channel subscriptions are created over the existing WebSocket connection (known as multiplexing), and do not count towards your connection quota (there is no limit on the number allowed per connection).
var channel1 = pusher.subscribe('ch1');
var channel2 = pusher.subscribe('ch2');
// All done over as single connection
// more subscriptions
// ...
var channel 100 = pusher.subscribe('ch100');
// Still just a 1 connection
Common reasons why connections are higher than expected
Users open multiple tabs
If a user has multiple tabs open to the same application, multiple instances of Pusher will be created and therefore multiple connections will be used e.g. 2 tabs open will mean 2 connections are established.
Incorrectly coded applications
As mentioned above, a new connection is created every time a new Pusher object is instantiated. It is therefore possible to create many connections in the same page.
Using an older version of one our libraries
Our connection strategies have improved over time, and we recommend that you keep up to date with the latest versions.
Specifically, in newer versions of our JS library, we carry out ping-pong requests between server and client to verify that the client is still around.
Other remedies
While our efforts are always to keep a connection going indefinitely to an application, it is possible to disconnect manually if you feel this works in your scenario. It can be achieved by making a call to Pusher.disconnect(). Below is some example code:
var pusher = new Pusher("APP_KEY");
var timeoutId = null;
function startInactivityCheck() {
timeoutId = window.setTimeout(function(){
pusher.disconnect();
}, 5 * 60 * 1000); // called after 5 minutes
};
// called by something that detects user activity
function userActivityDetected(){
if(timeoutId !== null) {
window.clearTimeout(timeoutId);
}
startInactivityCheck();
};
How this disconnection is transmitted to the user is up to you but you may consider prompting them to let them know that they will not receive any further real-time updates due to a long period of inactivity. If they wish to start receiving real-time updates again they should click a button.

.NET 4.5 Increase WCF Client Calls Async?

I have a .NET 4.5 WCF client app that uses the async/await pattern to make volumes of calls. My development machine is dual-proc with 8gb RAM (production will be 5 CPU with 8gb RAM at Amazon AWS) . The remote WCF service called by my code uses out and ref parameters on a web method that I need. My code instances a proxy client each time, writes any results to a public ConcurrentDictionary, and then returns null.
I ran Perfmon, watching the thread count on the system, and it goes between 28-30. It takes hours for my client to complete the volumes of calls that are made. Yes, hours. The remote service is backed by a big company, they have many servers to receive my WCF calls, so the more calls I can throw at them, the better.
I think that things are actually still happening synchronously, even though the method that makes the WCF call is decorated with "async" because the proxy method cannot have "await". Is that true?
My code looks like this:
async private void CallMe()
{
Console.WriteLine( DateTime.Now );
var workTasks = this.AnotherConcurrentDict.Select( oneB => GetData( etcetcetc ).Cast<Task>().ToList();
await Task.WhenAll( workTasks );
}
private async Task<WorkingBits> GetData(etcetcetc)
{
var commClient = new RemoteClient();
var cpResponse = new GetPackage();
var responseInfo = commClient.GetData( name, password , ref (cpResponse.aproperty), filterid , out cpResponse.Identifiers);
foreach (var onething in cpResponse.Identifiers)
{
// add to the ConcurrentDictionary
}
return null; // I already wrote to the ConcurrentDictionary so no need to return anything
responseInfo is not awaitable beacuse the WCF call has ref and out parameters.
I was thinking that way to speed this up is not to put async/await in this method, but instead create a wrapper method where I can make things await/async, but I am not that is the smartest/safest way to work it.
What is a smart way to get more outbound calls to the service (expand IO completion thread pool, trick calls into running in the background so Task.WhenAll can complete quicker)?
Thanks for all ideas/samples/pointers. I am hitting a bottleneck somewhere.
1) Make sure you're really calling it asynchronously, rather than just blocking on the calls. Code samples would help here.
2) You may need to do this:
ServicePointManager.DefaultConnectionLimit = 100;
By default it only allows 2 simultaneous connections to the same server.
3) Make sure you dispose the proxy object after the call is complete so you're not tying up resources.
If you're doing things asynchronously the threadpool size shouldn't be a bottleneck. To get a better idea of what kind of problem you're having, you can use Interlocked.Increment and Interlocked.Decrement to track the number of pending calls and see if it's being limited somewhere.
You could also substitute your real call with a call to a very simple method that you know will not have any bottlenecks, to see if the problem is in the client or server.

Resources