Debugging NServiceBus ServiceControl Heartbeat plugin - azure

I have an NServiceBus endpoint running on an Azure worker role. I installed the package ServiceControl.Plugin.Nsb5.Heartbeat. When I deploy directly from VS to the cloud service, my endpoint shows up in ServicePulse and I get my heartbeats as expected.
When I go through our automated deployment process, the endpoint isn't detected by servicepulse, and I don't get any heartbeats. (Even if you don't have the heartbeat plugin installed, ServicePulse does detect the endpoint and tells you that that endpoint does not have the heartbeat plugin installed).
When I login through RD, I can see the heartbeat assembly in the approot. My config is the same for both scenario's, but I'll add it here for reference:
In my config appsettings:
<add key="Heartbeat/Interval" value="00:00:01" />
<add key="ServiceControl/Queue" value="xxx.xxx.servicecontrol" />
Rest of my config:
<section name="MessageForwardingInCaseOfFaultConfig" type="NServiceBus.Config.MessageForwardingInCaseOfFaultConfig, NServiceBus.Core" />
<MessageForwardingInCaseOfFaultConfig ErrorQueue="error" />
My ServiceControl instance is running on my local computer and monitoring the correct service bus. The error queue name is set to error, just like in the config, and the error forwarding queue name is set to error.log.
When the workerrole starts, and the NSB is started, I can find this in the logs (which btw is exactly the same as what I can find in the workerrole that is sending out heartbeats):
Name: Heartbeats
Version: 2.0.0
Enabled by Default: Yes
Status: Enabled
Dependencies: None
Startup Tasks: HeartbeatStartup
I have absolutely no clue why the same code is behaving differently. It's the same code, the same config, the same setup, just deployed differently. When comparing deployed assemblies, I can't detect a difference. The heartbeat assembly is there, and it looks like NSB is picking it up as well. I'm just not receiving any heartbeats from that particular endpoint.
Any idea on what I could be missing? Or what I could try to fix this?
Thanks in advance!

It turns out that both endpoints are sending heartbeats, but ServicePulse shows them as one endpoint.
In ServicePulse I could see one endpoint: Endpoint#MachineA.
MachineA was the actual machine name of the worker role instance of my CloudService "Test". I could login to this instance through RD and see NSB's log activating heartbeat functionality.
When I deployed through our automated deployment to CloudService "Dev", I got no additional endpoint in ServicePulse. So I decided to delete CloudService "Test" completely.
When I checked ServicePulse, endpoint Endpoint#MachineA was still up and receiving heartbeats every second. I couldn't figure out why since I had just deleted CloudService "Test" with that particular instance.
I decided to rename the endpoint, and deploy through our automated procedure to CloudService "Dev" (so CloudService "Test" does not exist). At that moment I saw the endpoint Endpoint#MachineA go down, and a new EndpointRenamed#MachineX go up, receiving heartbeat messages.
So it was a non-issue in the sense that both my endpoints were sending out heartbeats. The problem lies in the fact that ServicePulse somehow considered them to be the same endpoint. They did have the same name, but they were hosted in a different cloud service on a different machine, which should be translated into another endpoint in ServicePulse.
Hope that helps someone!

Related

Deploying Azure App Service Webjob Using .Net 6 Fails to Start "Failed to bind to address http://127.0.0.1:5000: address already in use"

I ran into an issue while migrating an Azure app service from .Net Core 5 to 6 while also updating the stack configuration in Azure Portal to use .Net version ".Net 6 (LTS)". The app service only contains continuous webjobs that process service bus messages. Locally, the webjob project runs fine but when deployed to Azure it fails to start. In Kudu tools I'm presented with an error:
[01/03/2023 18:21:32 > 1b0f90: ERR ] Unhandled exception. System.IO.IOException: Failed to bind to address http://127.0.0.1:5000: address already in use.
[01/03/2023 18:21:32 > 1b0f90: ERR ] ---> Microsoft.AspNetCore.Connections.AddressInUseException: Only one usage of each socket address (protocol/network address/port) is normally permitted.
[01/03/2023 18:21:32 > 1b0f90: ERR ] ---> System.Net.Sockets.SocketException (10048): Only one usage of each socket address (protocol/network address/port) is normally permitted.
Eventually I am able to get past the error by applying the app setting ASPNETCORE_URLS=http://localhost:5001 to the app service, and applying the same app setting every .Net Core 6 app service running web jobs in the same app service plan except I have to increment the port to something different. This does not seem to be a problem with non-webjob applications, and only occurs when I configure the app service stack to ".Net 6 (LTS)" in Azure Portal.
My question is: Is there another workaround to this issue? I find adding unique port assignments to every webjob running .Net 6 to be a cumbersome and not ideal, and this issue will exist as a serious gotcha for future development.
Here is the dependencies I am pulling in:
Azure.Messaging.ServiceBus Version=7.11.0
Microsoft.Azure.WebJobs Version=3.0.32
Microsoft.ApplicationInsights.AspNetCore Version=2.21.0
Microsoft.ApplicationInsights.NLogTarget Version=2.21.0
Microsoft.Azure.Services.AppAuthentication Version=1.6.2
Microsoft.Azure.WebJobs.Extensions Version=4.0.1
Microsoft.Azure.WebJobs.Extensions.ServiceBus Version=5.3.0
Microsoft.Azure.WebJobs.Extensions.Storage Version=5.0.1
NLog Version=5.0.4
NLog.Targets.Seq Version=2.1.0
NLog.Web.AspNetCore Version=5.1.4
To reproduce:
Create two or more .Net Core 6 applications that only implement Webjobs. My Webjobs functions process Service Bus topic messages, not sure if this is important to reproduce.
Deploy the Webjob applications to the same App Service Plan
In the configuration blade settings tab for each web app make sure that the runtime stack is set to ".Net 6 (LTS)", keep the rest as default.
Now when you go to view the webjobs in Azure Portal you will see that the job is stuck in a restart cycle.
The problem seems to be around setting the stack settings version to ".Net 6 (LTS)". From this article it seems that this setting makes the app service Run Kestrel with YARP, I'm guessing the feature parity is not 1:1 with the previous stack.
Example project that can reproduce the issue can be found on Github. Follow README found in .\Scripts to deploy example to Azure.
Note: there seems to be an issue with the template setting the stack to .Net 6. This may need to be done manually post deployment to fully reproduce the issue.
I have created 2 .NET Core 6 Applications and deployed to Azure Web Jobs in same Azure App Service.
Make sure to enable Always On option in App Service => Configuration => General Settings to ensure WebJobs are running Continuously.
I have updated the Stack settings run time Version to .NET 6.
Now when you go to view the webjobs in Azure Portal you will see that the job is stuck in a restart cycle.
Yes, even I got stuck with the same issue. The WebJob which I have published 2nd is showing the Pending Restart status.
When I click on the Logs, I can see the below error is Logged.
Make sure that you are setting a connection string named >`AzureWebJobsDashboard` in your Microsoft Azure Website >configuration by using the following format >`DefaultEndpointsProtocol=https;AccountName=**NAME**;Account>Key=**KEY**` pointing to the Microsoft Azure Storage account where >the Microsoft Azure WebJobs Runtime logs are stored.
In the second Console App, I haven't Configured the WebJobs and Storage Account.
Updated the code and published again.
Now I can see both the Jobs are in Running State.
My Program.cs file:
// See https://aka.ms/new-console-template for more information
using System.Threading.Tasks;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
namespace WebJobApp
{
class Program
{
static async Task Main()
{
var builder = new HostBuilder();
builder.UseEnvironment(EnvironmentName.Development);
builder.ConfigureWebJobs(b =>
{
b.AddAzureStorageCoreServices();
b.AddAzureStorageQueues();
});
builder.ConfigureLogging((context, b) =>
{
b.AddConsole();
});
var host = builder.Build();
using (host)
{
await host.RunAsync();
}
}
}
}
Reference taken from MSDoc.

Deployed Azure WebApp gives 403

My issue:
When I try access the main URL for my web app, Azure replies with a '403 - You do not have permission to view this directory or page'.
Context:
I have deployed a Python webapp to Azure using the Pipeline/Release on DevOps (Azure Web App Deploy task seems to run successfully with the artifact generated by the Pipeline). I have previously deployed Python Function Apps successfully with a similar pipeline (different app type of course, and sku).
The Kudu SCM page works e.g.,: myapp.scm.azurewebsites.net
All logs seem to indicate the webapp deployment was successful. If I use CMD or Powershell from the SCM, I can see my app.py (for Flask) is in the correct location. The deployment has my requirements under the site packages installed including Flask.
The app runs quite successfully on my local machine via 'flask run', after I activate the virtual environment.
Yet when I try connect to myapp.azurewebsites.net, I get a 403 on the plain route. Anything after it like /test or /myapi returns a 404.
Something I do not see in any of the logs I can access via Kudu is mention of 'gunicorn', which I believe is what Azure uses by default. I just want to see some kind of log output somewhere to show that flask or gunicorn or something has successfully loaded app.py and is listening for incoming connections.
Maybe you do not know why I would get 403's, but you might know where I should be seeing the aforementioned logs.
TIA for any suggestions.
EDIT:
Something to add is that if I enable logs, and connect to the logstream then I do see logs generated as I access Kudu. This suggests some Application & Web Server are running - at least for whatever container runs that side of things.
It even notes the failed connections from Postman for the actual myapp.azurewebsites.net, but has nothing other than a line indicating that there is a 403.
My app has been stripped down to the most bare app.py with no includes other than Flask and routes which simply return a string. Most includes in requirements.txt have also been stripped out.
Still same issue.
I do have an answer after a couple of days worth of pulling my hair out.
Turns out that the 403's were not actually a permissions issue.
az webapp list-runtimes --os windows
The list shows no runtimes available for Python/Flask Web App. This is why I could not find any gunicorn or Flask logs - neither are set up. Azure deployed the artifact's zip and called it a day.
To rectify this, the DevOps Pipeline/Release must run on Linux. The Azure Web App Deploy task, when set to "Web App on Linux", will have Python runtime stacks available. Once selected, these will allow for a startup command to be specified. (Such as flask run --host=0.0.0.0 --port=8000)
Furthermore in azuredeploy.json the "Microsoft.Web/serverfarms" must have a "kind" specified to include "linux". It also requires:
"properties": {"reserved" : true}
Once deployed, logs indicate that docker is being set to an internal port of 8000 while the default 'flask run' which is executed would use 5000.
Ideally: use gunicorn with port mapping but, to get things going, tell flask to use port 8000.

"Could not create SSL/TLS secure channel" when deploying via MSDeply to Web App

I'm having a very weird issue:
I'm deploying to an azure web app via MSDeploy from an on-prem CI/CD pipeline with Bamboo.
I get:
27-Jun-2022 20:53:50 Verbose: Pre-authenticating to remote agent URL 'https://app-xxxxx-stg.scm.azurewebsites.net:443/msdeploy.axd?site=app-xxxxx-stg' as '$app-xxxxx-stg'.
27-Jun-2022 20:53:50 Error: Could not complete the request to remote agent URL 'https://app-xxxx-stg.scm.azurewebsites.net/msdeploy.axd?site=app-xxxxx-stg'.
27-Jun-2022 20:53:50 Error: The request was aborted: Could not create SSL/TLS secure channel.
27-Jun-2022 20:53:50 Error count: 1.
What is weird is that the same deployment script works just fine with other webapps that have been created in the past (the dev and tst environments) and also fails with the same error if I try to deploy to prod environment (also just created).
The environments are created via ARM template, so they are exactly the same.
I've read other similar issues, but my webapp is configured to allow only TLS 1.2 min. But as mentioned, all the web apps are configured the same way, and the deployments all start from the same machine.
What could be the issue? how can I solve this connection problem?
Thank you
Just to let everyone know: problem is solved by forcing net framework applications (like MS Deploy) to default to TLS1.2.
As per this article: https://learn.microsoft.com/en-us/mem/configmgr/core/plan-design/security/enable-tls-1-2-client#bkmk_net
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft.NETFramework\v2.0.50727]
"SystemDefaultTlsVersions" = dword:00000001
"SchUseStrongCrypto" = dword:00000001
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft.NETFramework\v4.0.30319]
"SystemDefaultTlsVersions" = dword:00000001
"SchUseStrongCrypto" = dword:00000001
We'd a discussion on this on Microsoft Q&A thread. Just posting here to benefit the community. Self hosted agent connected to site using 1.0 or 1.1, we may encounter this error. Hence, for self hosted agent/custom agent for DevOps pipeline deployment ensure that the agent has >TLS1.2 installed.

How do I Monitor Health of Bot Deployed in Azure to Spot Random Downtime

I'm trying to track when my bot stops responding. I think it stops responding when it doesn't get any usage for a while. I'll send it a few requests sometimes and then I won't get any responses back. I can fix this by stopping and restarting the App Service within the Azure Portal.
I was considering creating a cronjob that sends a POST request to the somebotname.azurewebsites.net/api/messages endpoint and e-mailing me if there's no response, but I'm not sure how to get a token so that this will pass. I was also considering doing a daily publish via azure devops but I'm not sure if this is even possible.
Is there a best practice for testing if a bot is still running?
The best way to do it is with Application Insights which you can find here
https://learn.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview
It's not hard to add to a project. Simply add NuGet package and initialize with the key (InstrumentationKey).
<ItemGroup>
<PackageReference Include="Microsoft.ApplicationInsights.AspNetCore" Version="2.7.0" />
</ItemGroup>
And init like so
public void ConfigureServices(IServiceCollection services)
{
// The following line enables Application Insights telemetry collection.
services.AddApplicationInsightsTelemetry();
// This code adds other services for your application.
services.AddMvc();
}
With appsettings.json
{
"ApplicationInsights": {
"InstrumentationKey": "putinstrumentationkeyhere"
},
"Logging": {
"LogLevel": {
"Default": "Warning"
}
}
}
For instance here is full example for .NET core
https://learn.microsoft.com/en-us/azure/azure-monitor/app/asp-net-core
In the app insights you will see all errors in real time and you can set up alerts to notify you of downtime.
If you will have production issues there is also cool feature of snapshot debugging to test what happened.
Check your "Always On" setting and make sure it's set to "On".
Azure WebSites Always On
Configuration->General Settings->Always On
Of course, you'll need to pay more in this case.

Azure Cloud Service (Classic) Roles do not start when I have Swashbuckle and ANY implementation of IOperationFilter

We have an asp.net web api project deployed to an Azure Cloud Service (classic) that has been running fine using Swashbuckle for almost a year. We configure it like so...
GlobalConfiguration.Configuration
.EnableSwagger(c =>
{
c.SingleApiVersion("v1", "PartnerAPI");
c.UseFullTypeNameInSchemaIds();
}).EnableSwaggerUi(c => { });
Recently we needed to tweak the swagger generated output by plugging in an IOperationFilter. However our Azure Cloud Service (Classic) instance will not start if we create a class that implements the IOperationFilter. We don't even try to configure Swagger to use it. Just the fact that there is a class that implements that interface in our solution causes the deploy to fail stating...
2016-12-29T16:10:26.1066042Z ##[error]BadRequest : Your role instances have recycled a number of times during an update or upgrade operation. This indicates that the new version of your service or the configuration settings you provided when configuring the service prevent the role instances from running. Verify your code does not throw unhandled exceptions and that your configuration settings are correct and then start another update or upgrade operation.
Some Notes:
Everything runs fine on my machine, directly and in the azure emulator
Everything runs fine on teammates machine same as above
The following message appears to be related in the event logs on the azure machine when I rdp into it.
File Server Resource Manager was unable to access the following file or volume: 'E:'. This file or volume might be locked by another application right now, or you might need to give Local System access to it.
Same problem in Swashbuckle versions 5 and 5.5
No new nuget packages or references to the project
Only a "using Swashbuckle.Swagger;' that was added to the SwaggerConfig.cs
The Azure Portal reports the following for the "Instance Status Message"...
[12/29T16:43Z]Failed to load role entrypoint. System.Reflection.ReflectionTypeLoadException: Unable to load one or more of the requested types. Retrieve the LoaderExceptions property for more information. at System.Reflection.RuntimeModule.GetTypes(RuntimeModule module) at System.Reflection.Assembly.GetTypes() at Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.GetRoleEntryPoint(Assembly entryPointAssembly) --- End of inner exception stack trace --- at Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.GetRoleEntryPoint(Assembly entryPointAssembly) at Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.CreateRoleEntryPoint(RoleType roleTypeEnum) at Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.InitializeRoleInternal(RoleType roleTypeEnum)' Last exit time: [2016/12/29, 16:43:59.525]. Last exit code: 0.

Resources