Azure Socket Leaks? - azure

I have an ASP.NET Core a website with a lot of simultaneous users which crashes many times during the day and I scaled up and out but no luck.
I have been told my numerous Azure support staff that the issue is that I'm sending out a lot of database calls although database utilization improved after creating indexes. Can you kindly advise what you think the problem is as I have done my best...
I was told that I have "socket leaks".
Please note:
I don't have any external service calls except to sendgrid
I have not used ConfigureAwait(false)
I'm not using "using" statements or explicitly disposing contexts
This is my connection string If it may help...
Server=tcp:sarahah.database.windows.net,1433;Initial Catalog=SarahahDb;Persist Security Info=False;User ID=********;Password=******;MultipleActiveResultSets=True;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;Max Pool Size=400;
These are some code examples:
In Startup.CS:
services.AddDbContext<ApplicationDbContext>(options =>
options.UseSqlServer(Configuration.GetConnectionString("DefaultConnection")));
Main class:
private readonly ApplicationDbContext _context;
public MessagesController(ApplicationDbContext context, IEmailSender emailSender, UserManager<ApplicationUser> userManager)
{
_context = context;
_emailSender = emailSender;
_userManager = userManager;
}
This an important method code for example:
string UserId = _userManager.GetUserId(User);
var user = await _context.Users.Where(u => u.Id.Equals(UserId)).Include(u => u.Messages).FirstOrDefaultAsync();
// some other code
return View(user.Messages);
Please advise as I have tried my best but this is very embarrassing to me in font of my customers.

Without the error messages that you're seeing, here's a few ideas that you can check.
I'd start with going to your Web App's Overview blade in the Azure Portal. Update the monitoring graph to a time period when you're experiencing problems. Are you CPU bound? Have you exhausted memory? Also, check the HTTP Queue length. If your HTTP queue is really long, it's because your server is choking trying to service the requests and users are experiencing timeout issues.
Next, jump over to your SQL Server's Overview blade in the Azure Portal, and look at the resource utilization chart. Set the time period on the chart to when you're experiencing problems. Have you pegged out your DTUs for your database? If so, it's a sign of poor indexing, poor schema design, or you're just undersized and need to scale up.
Turn on ApplicationInsights if you haven't already. You can use the ApplicationInsights API to insert your own trace statements into your code. Or, you might be able to see exceptions causing the issue without having to do your own tracing.
Check the Kudu logs for your Web Apps.
I agree with Tseng - your usage of EF and .NET Core's DI framework looks correct.
Let us know how the troubleshooting goes and provide additional information on exactly what kind of errors you're seeing. Best of luck!

It looks like a DI issue to me. You are injecting ApplicationDbContext context. Which means the ApplicationDbContext will be resolved from the DI container meaning it will stay open the entire request (transient) as Tseng pointed out. It should be a scoped.
You can inject IServiceScopeFactory scopeFactory in your controller and do something like:
using (var scope = _scopeFactory.CreateScope())
{
var context = scope.ServiceProvider.GetRequiredService<ApplicationDbContext>();
}
Note that if you are using ASP.NET Core 1.1 and want to be sure that all your services are being resolved correctly change your ConfigureService method in the Startup to:
public IServiceProvider ConfigureServices(IServiceCollection services)
{
// Register services
return services.BuildServiceProvider(validateScopes: true);
}

Related

No exceptions or stack traces in Azure Application Insights

I have an ASP.NET Core 3.1 solution deployed into an Azure Web App hooked up to Application Insights. I can't for the life of me get exceptions and stack traces to log into Application Insights, instead I get a basic request trace with no exception information attached:
I've tried most combinations of setting up logging/application insights telemetry, here are some of the things I've tried:
services.AddApplicationInsightsTelemetry(); in the ConfigureServices() method of Startup.cs
Adding logging.AddApplicationInsights(); to my logging builder in Program.cs
Removing the custom error page exception handler in case that was affecting things
I have the APPINSIGHTS_INSTRUMENTATIONKEY environment variable set on my Web App in Azure.
I'm using the following code to generate exceptions in Application Insights:
[AllowAnonymous]
[Route("autoupdate")]
public async Task<IActionResult> ProfileWebhook()
{
var formData = await this.Request.ReadFormAsync();
var config = TelemetryConfiguration.CreateDefault();
var client = new TelemetryClient(config);
client.TrackException(new Exception(string.Join("~", formData.Keys)));
logger.LogError(new Exception(string.Join("~", formData.Keys)), "Fail");
throw new Exception(string.Join("~", formData.Keys));
}
Nothing is working and I'm going crazy! Any help greatly appreciated.
Usually, Application insights will guarantee that all the kinds of telemetries(like exceptions, trace, event etc.) will be arrived around 5 minutes, please refer this doc: How long does it take for telemetry to be collected?. But there is still a chance that it will take a longer time due to beckend issue(a very small chance).
If you're using visual studio, you can check if the telemetry is sent or not via Application Insights search.
You can also check if you're using a correct IKey, or if you have enabled sampling.
But if it keeps this behavior in your side, you should consider contacting MS support to find the root cause.
Hope it helps.

Azure Functions missing rows in AppInsights

I need to log information from all invocations, successful or not, of a function app in Azure. I first tried just using log.LogInformation() and found that messages were not being written from all function invocations. Doing some research I got to understand that in high load scenarios (mine is a high load scenario), sometimes the runtime decides not to log some of the successful invocations. Fair enough.
I then tried using custom events to do logging and capture the info I needed:
TelemetryConfiguration config = TelemetryConfiguration.CreateDefault();
TelemetryClient tc = new TelemetryClient(config);
Dictionary<string, string> props = new Dictionary<string, string>();
props["msgid"] = msgid;
tc.TrackEvent("MsgToBenefitsService", props);
Still no luck, in some runs I did, I saw only 82 rows in app insights from 1000 invocations. I haven't been able to find any documentation saying that Custom Events might not be logged, so I expected that I would see 1000 events logged for 1000 invocations.
Is there anything wrong with the logging code above ? And are there any options to guarantee that I can write information from an invocation to AppInsights ? Or am I stuck with having to explicitly log myself from the function app ?
As background, this function app has a service bus trigger to read messages off a topic. I'm using v3 of the runtime.
Any help would be appreciated.
Thanks.
Please disable sampling in host.json:
"applicationInsights": {
"samplingSettings": {
"isEnabled": false
}
}
applicationInsights.samplingSettings
Sampling in Application Insights

How to set Infinite Timeout for Azure Function app v2.0

I have a very long running process which is hosted using Azure Function App (though it's not recommended for long running processes) targeting v2.0. Earlier it was targeting v1.0 runtime so I didn't face any function timeout issue.
But now after updating the runtime to target v2.0, I am not able to find any way to set the function timeout to Infinite as it was in case of v1.0.
Can someone please help me out on this ?
From your comments it looks like breaking up into smaller functions or using something other than functions isn't an option for you currently. In such case, AFAIK you can still do it with v2.0 as long as you're ready to use "App Service Plan".
The max limit of 10 minutes only applies to "Consumption Plan".
In fact, documentation explicitly suggests that if you have functions that run continuously or near continuously then App Service Plan can be more cost-effective as well.
You can use the "Always On" setting. Read about it on Microsoft Docs here.
Azure Functions scale and hosting
Also, documentation clearly states that default value for timeout with App Service plan is 30 minutes, but it can be set to unlimited manually.
Changes in features and functionality
UPDATE
From our discussion in comments, as null value isn't working for you like it did in version 1.x, please try taking out the "functionTimeout" setting completely.
I came across 2 different SO posts mentioning something similar and the Microsoft documentation text also says there is no real limit. Here are the links to SO posts I came across:
SO Post 1
SO Post 2
One way of doing it is to implement Eternal orchestrations from Durable Functions. It allows you to implement an infinite loop with dynamic intervals. Of course, you need to slightly modify your code by adding support for the stop/start function at any time (you must pass the state between calls).
[FunctionName("Long_Running_Process")]
public static async Task Run(
[OrchestrationTrigger] DurableOrchestrationContext context)
{
var initialState = context.GetInput<object>();
var state = await context.CallActivityAsync("Run_Long_Running_Process", initialState);
if (state == ???) // stop execution when long running process is completed
{
return;
}
context.ContinueAsNew(state);
}
You cannot set an Azure Function App timeout to infinite. I believe the longest any azure function app will consistently run is 10 minuets. As you stated Azure functions are not meant for long running processes. You may need find a new solution for your app, especially if you will need to scale up the app at all in the future.

Limiting the number of concurrent jobs on Azure Functions queue

I have a Function app in Azure that is triggered when an item is put on a queue. It looks something like this (greatly simplified):
public static async Task Run(string myQueueItem, TraceWriter log)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(Config.APIUri);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
ApiResponse apiResponse = JsonConvert.DeserializeObject<ApiResponse>(json);
log.Info($"Activity data successfully sent to platform in {apiResponse.elapsed}ms. Tracking number: {apiResponse.tracking}");
}
}
This all works great and runs pretty well. Every time an item is put on the queue, we send the data to some API on our side and log the response. Cool.
The problem happens when there's a big spike in "the thing that generates queue messages" and a lot of items are put on the queue at once. This tends to happen around 1,000 - 1,500 items in a minute. The error log will have something like this:
2017-02-14T01:45:31.692 mscorlib: Exception while executing function:
Functions.SendToLimeade. f-SendToLimeade__-1078179529: An error
occurred while sending the request. System: Unable to connect to the
remote server. System: Only one usage of each socket address
(protocol/network address/port) is normally permitted
123.123.123.123:443.
At first, I thought this was an issue with the Azure Function app running out of local sockets, as illustrated here. However, then I noticed the IP address. The IP address 123.123.123.123 (of course changed for this example) is our IP address, the one that the HttpClient is posting to. So, now I'm wondering if it is our servers running out of sockets to handle these requests.
Either way, we have a scaling issue going on here. I'm trying to figure out the best way to solve it.
Some ideas:
If it's a local socket limitation, the article above has an example of increasing the local port range using Req.ServicePoint.BindIPEndPointDelegate. This seems promising, but what do you do when you truly need to scale? I don't want this problem coming back in 2 years.
If it's a remote limitation, it looks like I can control how many messages the Functions runtime will process at once. There's an interesting article here that says you can set serviceBus.maxConcurrentCalls to 1 and only a single message will be processed at once. Maybe I could set this to a relatively low number. Now, at some point our queue will be filling up faster than we can process them, but at that point the answer is adding more servers on our end.
Multiple Azure Functions apps? What happens if I have more than one Azure Functions app and they all trigger on the same queue? Is Azure smart enough to divvy up the work among the Function apps and I could have an army of machines processing my queue, which could be scaled up or down as needed?
I've also come across keep-alives. It seems to me if I could somehow keep my socket open as queue messages were flooding in, it could perhaps help greatly. Is this possible, and any tips on how I'd go about doing this?
Any insight on a recommended (scalable!) design for this sort of system would be greatly appreciated!
I think the code error is because of: using (var client = new HttpClient())
Quoted from Improper instantiation antipattern:
this technique is not scalable. A new HttpClient object is created for
each user request. Under heavy load, the web server may exhaust the
number of available sockets.
I think I've figured out a solution for this. I've been running these changes for the past 3 hours 6 hours, and I've had zero socket errors. Before I would get these errors in large batches every 30 minutes or so.
First, I added a new class to manage the HttpClient.
public static class Connection
{
public static HttpClient Client { get; private set; }
static Connection()
{
Client = new HttpClient();
Client.BaseAddress = new Uri(Config.APIUri);
Client.DefaultRequestHeaders.Add("Connection", "Keep-Alive");
Client.DefaultRequestHeaders.Add("Keep-Alive", "timeout=600");
Client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
}
}
Now, we have a static instance of HttpClient that we use for every call to the function. From my research, keeping HttpClient instances around for as long as possible is highly recommended, everything is thread safe, and HttpClient will queue up requests and optimize requests to the same host. Notice I also set the Keep-Alive headers (I think this is the default, but I figured I'll be implicit).
In my function, I just grab the static HttpClient instance like:
var client = Connection.Client;
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
I haven't really done any in-depth analysis of what's happening at the socket level (I'll have to ask our IT guys if they're able to see this traffic on the load balancer), but I'm hoping it just keeps a single socket open to our server and makes a bunch of HTTP calls as the queue items are processed. Anyway, whatever it's doing seems to be working. Maybe someone has some thoughts on how to improve.
If you use consumption plan instead of Functions on a dedicated web app, #3 more or less occurs out of the box. Functions will detect that you have a large queue of messages and will add instances until queue length stabilizes.
maxConcurrentCalls only applies per instance, allowing you to limit per-instance concurrency. Basically, your processing rate is maxConcurrentCalls * instanceCount.
The only way to control global throughput would be to use Functions on dedicated web apps of the size you choose. Each app will poll the queue and grab work as necessary.
The best scaling solution would improve the load balancing on 123.123.123.123 so that it can handle any number of requests from Functions scaling up/down to meet queue pressure.
Keep alive afaik is useful for persistent connections, but function executions aren't viewed as a persistent connection. In the future we are trying to add 'bring your own binding' to Functions, which would allow you to implement connection pooling if you liked.
I know the question was answered long ago, but in the mean time Microsoft have documented the anti-pattern that you were using.
Improper Instantiation antipattern

ServiceStack RedisMqServer not always handling messages published from separate application

Context
I have a RedisMqServer configured to handle a single message on my ServiceStack web service. The messages on that MQ originate from another application and show up in the .inq with all the correct properties. Everything is on 4.0.38.
My configuration in MyAppHost.cs:
public override void Configure(Container container)
{
var redisFactory = new PooledRedisClientManager(0, "etc:etc");
redisFactory.ConnectTimeout = 5;
redisFactory.IdleTimeOutSecs = 30;
redisFactory.PoolTimeout = 3;
container.Register<IRedisClientsManager>(redisFactory);
//Plugins, Filters, other Registrations omitted
var mqHost = new RedisMqServer(redisFactory, retryCount: 2);
mqHost.DisablePublishingResponses = true;
mqHost.RegisterHandler<CreateVisitor>(ServiceController.ExecuteMessage);
mqHost.Start();
}
And then in Global.asax.cs:
void Application_Start(object sender, EventArgs e)
{
new MyAppHost().Init();
}
Problem
The messages are not consistently handled when I deploy this elsewhere. They wait in the .inq until whenever. Nothing is lost, just delayed for an indeterminate duration.
As of this moment, the only things that come to mind are:
I'm using IIS Express locally, and the server is using IIS.
Application_Start needs to happen before it can handle messages.
I've tried initializing the service by making other API calls over HTTP, before and after queuing messages, with more failure than success. Sometimes the service starts to handle them, but I am unable to identify and thus influence when this happens.
Note
I do have several other console applications and windows services that listen on other MQs and handle messages placed by other applications, and those have always worked flawlessly. This is the first time I've tried this from within an existing web service, however.
Hard to know what the issue from this description (are messages getting lost or just delayed?) but this sounds like it's due to ASP.NET AppDomain recycling in which case you can disable AppDomain recycling or setup up a continuous ping route to hit your ASP.NET Web Application to keep the AppDomain alive.
If the ASP.NET Service is available on the Internet you can use services like https://uptimerobot.com or https://www.pingdom.com to configure it to ping your Service at different intervals (e.g. 5-10 minutes) otherwise if this is an internal Service you can use a Scheduled Task.

Resources