ServiceStack RedisMqServer not always handling messages published from separate application - servicestack

Context
I have a RedisMqServer configured to handle a single message on my ServiceStack web service. The messages on that MQ originate from another application and show up in the .inq with all the correct properties. Everything is on 4.0.38.
My configuration in MyAppHost.cs:
public override void Configure(Container container)
{
var redisFactory = new PooledRedisClientManager(0, "etc:etc");
redisFactory.ConnectTimeout = 5;
redisFactory.IdleTimeOutSecs = 30;
redisFactory.PoolTimeout = 3;
container.Register<IRedisClientsManager>(redisFactory);
//Plugins, Filters, other Registrations omitted
var mqHost = new RedisMqServer(redisFactory, retryCount: 2);
mqHost.DisablePublishingResponses = true;
mqHost.RegisterHandler<CreateVisitor>(ServiceController.ExecuteMessage);
mqHost.Start();
}
And then in Global.asax.cs:
void Application_Start(object sender, EventArgs e)
{
new MyAppHost().Init();
}
Problem
The messages are not consistently handled when I deploy this elsewhere. They wait in the .inq until whenever. Nothing is lost, just delayed for an indeterminate duration.
As of this moment, the only things that come to mind are:
I'm using IIS Express locally, and the server is using IIS.
Application_Start needs to happen before it can handle messages.
I've tried initializing the service by making other API calls over HTTP, before and after queuing messages, with more failure than success. Sometimes the service starts to handle them, but I am unable to identify and thus influence when this happens.
Note
I do have several other console applications and windows services that listen on other MQs and handle messages placed by other applications, and those have always worked flawlessly. This is the first time I've tried this from within an existing web service, however.

Hard to know what the issue from this description (are messages getting lost or just delayed?) but this sounds like it's due to ASP.NET AppDomain recycling in which case you can disable AppDomain recycling or setup up a continuous ping route to hit your ASP.NET Web Application to keep the AppDomain alive.
If the ASP.NET Service is available on the Internet you can use services like https://uptimerobot.com or https://www.pingdom.com to configure it to ping your Service at different intervals (e.g. 5-10 minutes) otherwise if this is an internal Service you can use a Scheduled Task.

Related

How to stop outbound HTTP connections from timing out

Backgound:
I'm currently hosting an ASP.NET application in Azure with the following specs:
ASP .Net Core 2.2
Using Flurl for HTTP requests
Kestrel Webserver
Docker (Linux - mcr.microsoft.com/dotnet/core/aspnet:2.2 runtime)
Azure App Service on P2V2 tier app service plan
I have a a couple of background jobs that run on the service that makes a lot of outbound HTTP calls to a 3rd party service.
Issue:
Under a small load (approximately 1 call per 10 seconds), all requests are completed in under a second with no issue. The issue I'm having is that under a heavy load, when service can make up to 3/4 calls in a 10 second span, some of the requests will randomly timeout and throw an exception. When I was using RestSharp the exception would read "The operation has timed out". Now that I'm using Flurl, the exception reads "The call timed out".
Here's the kicker - If I run the same job from my laptop running Windows 10 / Visual Studios 2017, this problem does NOT occur. This leads me to believe I'm hitting some limit or running out of some resource in my hosted environment. Unclear if that is connection/socket or thread related.
Things I've tried:
Ensure all code paths to the request are using async/await to prevent lockouts
Ensure Kestrel Defaults allow unlimited connections (it does by default)
Ensure Dockers default connection limits are sufficient (2000 by default, more than enough)
Configuring ServicePointManager settings for connection limits
Here is the code in my startup.cs that I'm currently using to try and prevent this issue:
public class Startup
{
public Startup(IHostingEnvironment hostingEnvironment)
{
...
// ServicePointManager setup
ServicePointManager.UseNagleAlgorithm = false;
ServicePointManager.Expect100Continue = false;
ServicePointManager.DefaultConnectionLimit = int.MaxValue;
ServicePointManager.EnableDnsRoundRobin = true;
ServicePointManager.ReusePort = true;
// Set Service point timeouts
var sp = ServicePointManager.FindServicePoint(new Uri("https://placeholder.thirdparty.com"));
sp.ConnectionLeaseTimeout = 15 * 1000; // 15 seconds
FlurlHttp.ConfigureClient("https://placeholder.thirdparty.com", cli => cli.Settings.ConnectionLeaseTimeout = new TimeSpan(0, 0, 15));
}
}
Has anyone else run into a similar issue to this? I'm open to any suggestions on how to best debug this situation, or possible methods to correct the issue. I'm at a complete loss after researching this for several days.
Thank you in advance.
I had similar issues. Take a look at Asp.net Core HttpClient has many TIME_WAIT or CLOSE_WAIT connections . Debugging via netstat helped identify the problem for me. As one possible solution. I suggest you use IHttpClientFactory. You can get more info from https://learn.microsoft.com/en-us/aspnet/core/fundamentals/http-requests?view=aspnetcore-2.2 It should be fairly easy to use as described in Flurl client lifetime in ASP.Net Core 2.1 and IHttpClientFactory

Application Insights skipping events

I am using this code to send events to application insights in a console application
TelemetryConfiguration.Active.InstrumentationKey = "XXXXXXXXX";
TelemetryClient telemetryClient = new TelemetryClient();
for (int i = 0; i < 100; i++)
{
telemetryClient.TrackEvent("Hello World!");
telemetryClient.TrackException(new OutOfMemoryException());
}
telemetryClient.Flush();
Task.Delay(60000).Wait();
Now the problem i am having is that it is not seeming to log all my events , sometimes the visual studio toolbar says 44 , sometimes it is 68 and never 100 .
The type of information i am going to send is important cause i will be monitoring several console applications from this service .
Is there any way to have application insights send every thing to azure and not skip events ? I think i am giving it enough time to send every thing before exiting .
Without full code, its hard to say the configuration used.Couple of things to look for:
Have you enabled sampling? If you really want accurate count of events, then disable sampling (https://learn.microsoft.com/en-us/azure/azure-monitor/app/sampling)
Have you configured channel explicitly? If not, the default will be InMemoryChannel, which does not do any retries for transient issues. Its best to use ServerTelemetryChannel, to protect data loss in the even of network issue or application insights backend transient issues.
var config = new TelemetryConfiguration(); // or active or create default...
var channel = new ServerTelemetryChannel();
channel.initialize(config)
// create client from the config.
TelemetryClient tc= new TelemetryClient(config);

Random 21/42 seconds timeout in outgoing traffic on Azure Web Sites

I have an ASP.NET MVC 5 application running in the azure german cloud as Azure Web App (single instance - Standard S3 size).
I'm calling a non azure hosted REST/SOAP service on a particular host and the web requests either succeed promptly or timeout after 21 / 42 seconds.
I've load tested the requests and the percentile of requests timing out is between 20 and 80.
One particular remarkable property of the timeout is, that they occur after exactly 21 or 42 seconds (this is serious, no reference to hitchhiker's guide to the galaxy intended).
Calling a different service from the web app works just fine, temporarily at least.
We've already checked the firewall of the non azure service and if the timeout occurs, not a single packet reached the host.
This issue occurred once in the past one year ago and support was unable to tell what the cause was until the issue suddenly went away roughly two weeks after first occuring, so the ticket got closed as fixed itself but now its back.
The code is using https://github.com/canton7/RestEase (uses HttpClient underneath) and looks like
[Header("Content-Type", "application/json")]
public interface IApi
{
[Post("/Login")]
Task<LoginToken> Login([Body]LoginRequest request);
}
private static Dictionary<string, IApi> ApiClientsByHost = new Dictionary<string, IApi>();
private IApi GetApiForHost(string host)
{
if (!ApiClientsByHost.TryGetValue(host, out var client))
{
lock (ApiClientsByHost)
{
if (!ApiClientsByHost.TryGetValue(host, out client))
{
ApiClientsByHost[host] = client = RestClient.For<IApi>(host);
}
}
}
return client;
}
var client = GetApiForHost("https://production/");
var loginToken = await client.Login(new LoginRequest { Username = username, Password = password });
By different service, i mean using "https://testserver/" instead of "https://production/" (testserver is located in a different data center with different IP and all).
The API authentication is passing a token via query but it timeouts already before being able to get a token.
The code is caching the IApi to avoid the TCP starvation problems of disposing HttpClients (but i've never run into port exhaustion).
Restarting the app does not resolve the issue and the issue only occurs to production currently (but a year ago, when this issue occurred on production, we've switched to testserver which worked initially but after some time, ran into the same problem)
EDIT: Found some explanation in the last answer as to where those magical 21 seconds are comming from.
EDIT: One way i've found to workaround is, is to setup a azure vm with a proxy on it and configure defaultProxy to pass through that vm.
That's TCP retransmission timing out. It's odd that you are getting different values though.

Azure Socket Leaks?

I have an ASP.NET Core a website with a lot of simultaneous users which crashes many times during the day and I scaled up and out but no luck.
I have been told my numerous Azure support staff that the issue is that I'm sending out a lot of database calls although database utilization improved after creating indexes. Can you kindly advise what you think the problem is as I have done my best...
I was told that I have "socket leaks".
Please note:
I don't have any external service calls except to sendgrid
I have not used ConfigureAwait(false)
I'm not using "using" statements or explicitly disposing contexts
This is my connection string If it may help...
Server=tcp:sarahah.database.windows.net,1433;Initial Catalog=SarahahDb;Persist Security Info=False;User ID=********;Password=******;MultipleActiveResultSets=True;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;Max Pool Size=400;
These are some code examples:
In Startup.CS:
services.AddDbContext<ApplicationDbContext>(options =>
options.UseSqlServer(Configuration.GetConnectionString("DefaultConnection")));
Main class:
private readonly ApplicationDbContext _context;
public MessagesController(ApplicationDbContext context, IEmailSender emailSender, UserManager<ApplicationUser> userManager)
{
_context = context;
_emailSender = emailSender;
_userManager = userManager;
}
This an important method code for example:
string UserId = _userManager.GetUserId(User);
var user = await _context.Users.Where(u => u.Id.Equals(UserId)).Include(u => u.Messages).FirstOrDefaultAsync();
// some other code
return View(user.Messages);
Please advise as I have tried my best but this is very embarrassing to me in font of my customers.
Without the error messages that you're seeing, here's a few ideas that you can check.
I'd start with going to your Web App's Overview blade in the Azure Portal. Update the monitoring graph to a time period when you're experiencing problems. Are you CPU bound? Have you exhausted memory? Also, check the HTTP Queue length. If your HTTP queue is really long, it's because your server is choking trying to service the requests and users are experiencing timeout issues.
Next, jump over to your SQL Server's Overview blade in the Azure Portal, and look at the resource utilization chart. Set the time period on the chart to when you're experiencing problems. Have you pegged out your DTUs for your database? If so, it's a sign of poor indexing, poor schema design, or you're just undersized and need to scale up.
Turn on ApplicationInsights if you haven't already. You can use the ApplicationInsights API to insert your own trace statements into your code. Or, you might be able to see exceptions causing the issue without having to do your own tracing.
Check the Kudu logs for your Web Apps.
I agree with Tseng - your usage of EF and .NET Core's DI framework looks correct.
Let us know how the troubleshooting goes and provide additional information on exactly what kind of errors you're seeing. Best of luck!
It looks like a DI issue to me. You are injecting ApplicationDbContext context. Which means the ApplicationDbContext will be resolved from the DI container meaning it will stay open the entire request (transient) as Tseng pointed out. It should be a scoped.
You can inject IServiceScopeFactory scopeFactory in your controller and do something like:
using (var scope = _scopeFactory.CreateScope())
{
var context = scope.ServiceProvider.GetRequiredService<ApplicationDbContext>();
}
Note that if you are using ASP.NET Core 1.1 and want to be sure that all your services are being resolved correctly change your ConfigureService method in the Startup to:
public IServiceProvider ConfigureServices(IServiceCollection services)
{
// Register services
return services.BuildServiceProvider(validateScopes: true);
}

How can I keep my Azure WebJob running without "Always On"

I have a continous webjob associated with a website and I am running that website on the Shared mode. I don't want to go to Always On option as there is no real need for my application. I only want to process the message when the calls are made to my website.
My issue is that the job keeps stopping after few minutes even though I am continuously calling a dummy keep alive method on my website at every 5 minute that posts a message to the queue that is monitored by that webjob.
My webjob is a simple console application built using the WebJob SDK that has a code like this
JobHost host = new JobHost(new JobHostConfiguration(storageConnictionSttring));
host.RunAndBlock();
and the message processing function looks like below:
public static void ProcessKeepAliveMessages([QueueTrigger("keepalive")] KeepAliveTrigger message)
{
Console.WriteLine("Keep Alive message called on :{0}", message.MessageTime);
}
The message log for the job basically says says
[03/05/2015 18:51:02 > 4660f6: SYS INFO] WebJob is stopping due to website shutting down
I don't mind if that happen this way, but when the website starts with the next call to keep alive, the webjob is not started. All the messages are queued till I go to the management dashboard or the SCM portal as shown below
https://mysite.scm.azurewebsites.net/api/continuouswebjobs
I can see the status like this:
[{"status":"Starting","detailed_status":"4660f6 - Starting\r\n","log_url":"https://mysite.scm.azurewebsites.net/vfs/data/jobs/continuous/WebJobs/job_log.txt","name":"WebJobs","run_command":"mysite.WebJobs.exe","url":"https://mysite.scm.azurewebsites.net/api/continuouswebjobs/WebJobs","extra_info_url":"https://mysite.scm.azurewebsites.net/azurejobs/#/jobs/continuous/WebJobs","type":"continuous","error":null,"using_sdk":true,"settings":{}}]
I would really appreciate if someone can help me understand what is going wrong here.
I've run into a similar problem. I have a website (shared mode) and an associated webjob (continuous type). Looking at webjob logs, I found that the job enters stopped state after about 15 min. of inactivity and stops reacting to trigger messages. It seems contradictory to the concept of continuous job concept but, apparently, to get it running truly continuously you have to subscribe to a paid website. You get what you pay for...
That said, my website needs to be used only about every few days and running in a shared mode makes perfect sense. I don't mind that the site needs a bit extra time to get started - as long as it restarts automatically. The problem with the webjob is that once stopped it won't restart by itself. So, my goal was to restart it with the website.
I have noticed that a mere look at the webjob from Azure Management Portal starts it. Following this line of thinking, I have found that fetching webjob properties is enough to switch it to the running state. The only trick is how to fetch the properties programmatically, so that restarting the website will also restart the webjob.
Because the call to fetch webjob properties must be authenticated, the first step is to go to Azure Management Portal and download the website publishing profile. In the publishing profile you can find the authentication credentials: username (usually $<website_name>) and userPWD (hash of the password). Copy them down.
Here is a function that will get webjob properties and wake it up (if not yet running):
class Program
{
static void Main(string[] args)
{
string websiteName = "<website_name>";
string webjobName = "<webjob_name>";
string userName = "<from_publishing_profile>";
string userPWD = "<from_publishing_profile>";
string webjobUrl = string.Format("https://{0}.scm.azurewebsites.net/api/continuouswebjobs/{1}", websiteName, webjobName);
var result = GetWebjobState(webjobUrl, userName, userPWD);
Console.WriteLine(result);
Console.ReadKey(true);
}
private static JObject GetWebjobState(string webjobUrl, string userName, string userPWD)
{
HttpClient client = new HttpClient();
string auth = "Basic " + Convert.ToBase64String(Encoding.UTF8.GetBytes(userName + ':' + userPWD));
client.DefaultRequestHeaders.Add("authorization", auth);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var data = client.GetStringAsync(webjobUrl).Result;
var result = JsonConvert.DeserializeObject(data) as JObject;
return result;
}
}
You can use a similar function to get all webjobs in your website (use endpoint https://<website_name>.scm.azurewebsites.net/api/webjobs). You may also look at the returned JObject to verify the actual state of the webjob and other properties.
If you want the WebJob to not stop you need to make sure your scm site is alive.
So the keep-alive requests should go to https://sitename.scm.azurewebsites.net and these requests need to be authenticated (basic auth using your deployment credentials).

Resources