Subscribing to Service Fabric cluster level events - azure

I am trying to create a service that will update an external list of Service Endpoints for applications running in my service fabric cluster. (Basically I need to replicate the Azure Load Balancer in my on premises F5 Load Balancer.)
During last month's Service Fabric Q&A, the team pointed me at RegisterServiceNotificationFilterAsync.
I made a stateless service using this method, and deployed it to my development cluster. I then made a new service by running the ASP.NET Core Stateless service template.
I expected that when I deployed the second service, the break point would hit in my first service, indicating that a service had been added. But no breakpoint was hit.
I have found very little in the way of examples for this kind of thing on the internet, so I am asking here hopping that someone else has done this and can tell me where I went wrong.
Here is the code for my service that is trying to catch the application changes:
protected override async Task RunAsync(CancellationToken cancellationToken)
{
var fabricClient = new FabricClient();
long? filterId = null;
try
{
var filterDescription = new ServiceNotificationFilterDescription
{
Name = new Uri("fabric:")
};
fabricClient.ServiceManager.ServiceNotificationFilterMatched += ServiceManager_ServiceNotificationFilterMatched;
filterId = await fabricClient.ServiceManager.RegisterServiceNotificationFilterAsync(filterDescription);
long iterations = 0;
while (true)
{
cancellationToken.ThrowIfCancellationRequested();
ServiceEventSource.Current.ServiceMessage(this.Context, "Working-{0}", ++iterations);
await Task.Delay(TimeSpan.FromSeconds(1), cancellationToken);
}
}
finally
{
if (filterId != null)
await fabricClient.ServiceManager.UnregisterServiceNotificationFilterAsync(filterId.Value);
}
}
private void ServiceManager_ServiceNotificationFilterMatched(object sender, EventArgs e)
{
Debug.WriteLine("Change Occured");
}
If you have any tips on how to get this going, I would love to see them.

You need to set the MatchNamePrefix to true, like this:
var filterDescription = new ServiceNotificationFilterDescription
{
Name = new Uri("fabric:"),
MatchNamePrefix = true
};
otherwise it will only match specific services. In my application I can catch cluster wide events when this parameter is set to true.

Related

Azure Functions Dependency Tracking for SQL Server and Service Bus Into Application Insights

Previously I have Azure Web App (.net core) and It successfully track the SQL Server and Service Bus dependency into Application Insights. It is not working some how with Azure Functions.
Environment
dotnet 6
dotnet-isolated mode
log level default set to "Information".
Azure Environment using Consumption plan for Azure Functions.
Application Insights key is configured.
I have Azure API management at front and backend is Azure Function and that call SQL Server and Service Bus.
Api Management Service to Azure function dependency successfully resolved but Azure Function to other component is not working.
I know I am posting my own answer. Also there are chance that in future there may be some good solution or it get integrated the way it is in in-process mode.
By then follow steps.
Add Package
Microsoft.ApplicationInsights.WorkerService
In program.cs in configuring host.
services.AddApplicationInsightsTelemetryWorkerService();
More info at
https://learn.microsoft.com/en-us/azure/azure-monitor/app/worker-service
The only way I've managed to solve this issue so far was by setting up custom Middleware
.ConfigureFunctionsWorkerDefaults(config =>
{
config.UseMiddleware<AiContextMiddleware>();
})
In the IServiceCollection you need to setup simply
.AddApplicationInsightsTelemetryWorkerService()
public class AiContextMiddleware : IFunctionsWorkerMiddleware
{
private readonly TelemetryClient _client;
private readonly string _hostname;
public AiContextMiddleware(TelemetryClient client)
{
_client = client;
_hostname = Environment.GetEnvironmentVariable("AI_CLOUD_ROLE_NAME");
}
public async Task Invoke(FunctionContext context, FunctionExecutionDelegate next)
{
var operationId = ExtractOperationId(context.TraceContext.TraceParent);
// Let's create and start RequestTelemetry.
var requestTelemetry = new RequestTelemetry
{
Name = context.FunctionDefinition.Name,
Id = context.InvocationId,
Properties =
{
{ "ai.cloud.role", _hostname},
{ "AzureFunctions_FunctionName", context.FunctionDefinition.Name },
{ "AzureFunctions_InvocationId", context.InvocationId },
{ "AzureFunctions_OperationId", operationId }
},
Context =
{
Operation =
{
Id = operationId,
ParentId = context.InvocationId,
Name = context.FunctionDefinition.Name
},
GlobalProperties =
{
{ "ai.cloud.role", _hostname},
{ "AzureFunctions_FunctionName", context.FunctionDefinition.Name },
{ "AzureFunctions_InvocationId", context.InvocationId },
{ "AzureFunctions_OperationId", operationId }
}
}
};
var operation = _client.StartOperation(requestTelemetry);
try
{
await next(context);
}
catch (Exception e)
{
requestTelemetry.Success = false;
_client.TrackException(e);
throw;
}
finally
{
_client.StopOperation(operation);
}
}
private static string ExtractOperationId(string traceParent)
=> string.IsNullOrEmpty(traceParent) ? string.Empty : traceParent.Split("-")[1];
}
It's definitely not a perfect solution as you then get two starting logs, but as end result, you get all logs traces + dependencies correlated to an operation.
I've solved this issue in the first place like that, now I'm revisiting whether there are any better ways to solve this.
Let me know too whether you managed to solve this issue on your side.

Application Insights telemetry correlation using log4net

I'm looking to have our distributed event logging have proper correlation. For our Web Applications this seems to be automatic. Example of correlated logs from one of our App Services API:
However, for our other (non ASP, non WebApp) services were we use Log4Net and the App Insights appender our logs are not correlated. I tried following instructions here: https://learn.microsoft.com/en-us/azure/azure-monitor/app/correlation
Even after adding unique operation_Id attributes to each operation, we're not seeing log correlation (I also tried "Operation Id"). Example of none correlated log entry:
Any help on how to achieve this using log4net would be appreciated.
Cheers!
Across services correlation IDs are mostly propagated through headers. When AI is enabled for Web Application, it reads IDs/Context from the incoming headers and then updates outgoing headers with the appropriate IDs/Context. Within service, operation is tracked with Activity object and every telemetry emitted will be associated with this Activity, thus sharing necessary correlation IDs.
In case of Service Bus / Event Hub communication, the propagation is also supported in the recent versions (IDs/Context propagate as metadata).
If service is not web-based and AI automated correlation propagation is not working, you may need to manually get incoming ID information from some metadata if any exists, restore/initiate Activity, start AI operation with this Activity. When telemetry item is generated in scope of that Activity, it will get proper IDs and will be part of the overarching trace. With that, if telemetry is generated from Log4net trace that was executed in the scope of AI operation context then that telemetry should get right IDs.
Code sample to access correlation from headers:
public class ApplicationInsightsMiddleware : OwinMiddleware
{
// you may create a new TelemetryConfiguration instance, reuse one you already have
// or fetch the instance created by Application Insights SDK.
private readonly TelemetryConfiguration telemetryConfiguration = TelemetryConfiguration.CreateDefault();
private readonly TelemetryClient telemetryClient = new TelemetryClient(telemetryConfiguration);
public ApplicationInsightsMiddleware(OwinMiddleware next) : base(next) {}
public override async Task Invoke(IOwinContext context)
{
// Let's create and start RequestTelemetry.
var requestTelemetry = new RequestTelemetry
{
Name = $"{context.Request.Method} {context.Request.Uri.GetLeftPart(UriPartial.Path)}"
};
// If there is a Request-Id received from the upstream service, set the telemetry context accordingly.
if (context.Request.Headers.ContainsKey("Request-Id"))
{
var requestId = context.Request.Headers.Get("Request-Id");
// Get the operation ID from the Request-Id (if you follow the HTTP Protocol for Correlation).
requestTelemetry.Context.Operation.Id = GetOperationId(requestId);
requestTelemetry.Context.Operation.ParentId = requestId;
}
// StartOperation is a helper method that allows correlation of
// current operations with nested operations/telemetry
// and initializes start time and duration on telemetry items.
var operation = telemetryClient.StartOperation(requestTelemetry);
// Process the request.
try
{
await Next.Invoke(context);
}
catch (Exception e)
{
requestTelemetry.Success = false;
telemetryClient.TrackException(e);
throw;
}
finally
{
// Update status code and success as appropriate.
if (context.Response != null)
{
requestTelemetry.ResponseCode = context.Response.StatusCode.ToString();
requestTelemetry.Success = context.Response.StatusCode >= 200 && context.Response.StatusCode <= 299;
}
else
{
requestTelemetry.Success = false;
}
// Now it's time to stop the operation (and track telemetry).
telemetryClient.StopOperation(operation);
}
}
public static string GetOperationId(string id)
{
// Returns the root ID from the '|' to the first '.' if any.
int rootEnd = id.IndexOf('.');
if (rootEnd < 0)
rootEnd = id.Length;
int rootStart = id[0] == '|' ? 1 : 0;
return id.Substring(rootStart, rootEnd - rootStart);
}
}
Code sample for manual correlated operation tracking in isolation:
async Task BackgroundTask()
{
var operation = telemetryClient.StartOperation<DependencyTelemetry>(taskName);
operation.Telemetry.Type = "Background";
try
{
int progress = 0;
while (progress < 100)
{
// Process the task.
telemetryClient.TrackTrace($"done {progress++}%");
}
// Update status code and success as appropriate.
}
catch (Exception e)
{
telemetryClient.TrackException(e);
// Update status code and success as appropriate.
throw;
}
finally
{
telemetryClient.StopOperation(operation);
}
}
Please note that the most recent version of Application Insights SDK is switching to W3C correlation standard, so header names and expected format would be different as per W3C specification.

Same Azure topic is processed multiple times

We have a job hosted in an azure website, the job reads entries from a topic subscription. Everything works fine when we only have one instance to host the website. Once we scale out to more than one instance we observe the message is processed as many times as instances we have. Each instance points to the same subscription. From what we read, once the item is read, it won't be available for any other process. The duplicated processing is happening inside the same instance, meaning that if we have two instances, the item is processed twice in one of the instances, it is not splitted.
What can be possible be wrong in the way we are doing things?
This is how we proceed to configure the connection to the queue, if the subscription does not exists, it is created:
var serviceBusConfig = new ServiceBusConfiguration
{
ConnectionString = transactionsBusConnectionString
};
config.UseServiceBus(serviceBusConfig);
var allRule1 = new RuleDescription
{
Name = "All",
Filter = new TrueFilter()
};
SetupSubscription(transactionsBusConnectionString,"topic1", "subscription1", allRule1);
private static void SetupSubscription(string busConnectionString, string topicNameKey, string subscriptionNameKey, RuleDescription newRule)
{
var namespaceManager =
NamespaceManager.CreateFromConnectionString(busConnectionString);
var topicName = ConfigurationManager.AppSettings[topicNameKey];
var subscriptionName = ConfigurationManager.AppSettings[subscriptionNameKey];
if (!namespaceManager.SubscriptionExists(topicName, subscriptionName))
{
namespaceManager.CreateSubscription(topicName, subscriptionName);
}
var subscriptionClient = SubscriptionClient.CreateFromConnectionString(busConnectionString, topicName, subscriptionName);
var rules = namespaceManager.GetRules(topicName, subscriptionName);
foreach (var rule in rules)
{
subscriptionClient.RemoveRule(rule.Name);
}
subscriptionClient.AddRule(newRule);
rules = namespaceManager.GetRules(topicName, subscriptionName);
rules.ToString();
}
Example of the code that process the topic item:
public void SendInAppNotification(
[ServiceBusTrigger("%eventsTopicName%", "%SubsInAppNotifications%"), ServiceBusAccount("OutputServiceBus")] Notification message)
{
this.valueCalculator.AddInAppNotification(message);
}
This method is inside a Function static class, I'm using azure web job sdk.
Whenever the azure web site is scaled to more than one instance, all the instances share the same configuration.
It sounds like you're creating a new subscription each time your new instance runs, rather than hooking into an existing one. Topics are designed to allow multiple subscribers to attach in that way as well - usually though each subscriber has a different purpose, so they each see a copy of the message.
I cant verify this from your code snippet but that's my guess - are the config files identical? You should add some trace output to see if your processes are calling CreateSubscription() each time they run.
I think I can access the message id, I'm using azure web job sdk but I think I can find a way to get it. Let me check it and will let you know.

Azure Autoscale Restarts Running Instances

I've been using Autoscale to shift between 2 and 1 instances of a cloud service in a bid to reduce costs. This mostly works except that from time to time (not sure what the pattern seems to be here), the act of scaling up (1->2) causes both instances to recycle, generating a service outage for users.
Assuming nothing fancy is going on in RoleEntry in response to topology changes, why would scaling from 1->2 restart the already running instance?
Additional notes:
It's clear both instances are recycling by looking at the Instances
tab in Management Portal. Outage can also be confirmed by hitting the
public site.
It doesn't happen consistently but I'm not sure what the pattern is. It feels like when the 1-instance configuration has been running for multiple days, attempts to scale up recycle both. But if the 1-instance configuration has only been running for a few hours, you can scale up and down without outages.
The first instance always comes back much faster than the 2nd instance being introduced.
This has always been this way. When you have 1 server running and you go to 2+, the initial server is restarted. In order to have a full SLA, you need to have 2+ servers at all time.
Nariman, see my comment on Brent's post for some information about what is happening. You should be able to resolve this with the following code:
public class WebRole : RoleEntryPoint
{
public override bool OnStart()
{
// For information on handling configuration changes
// see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.
IPHostEntry ipEntry = Dns.GetHostEntry(Dns.GetHostName());
string ip = null;
foreach (IPAddress ipaddress in ipEntry.AddressList)
{
if (ipaddress.AddressFamily.ToString() == "InterNetwork")
{
ip = ipaddress.ToString();
}
}
string urlToPing = "http://" + ip;
HttpWebRequest req = HttpWebRequest.Create(urlToPing) as HttpWebRequest;
WebResponse resp = req.GetResponse();
return base.OnStart();
}
}
You should be able to control this behavior. In the roleEntrypoint, there's an event you can trap for, RoleEnvironmentChanging.
A shell of some code to put into your solution will look like...
RoleEnvironment.Changing += RoleEnvironmentChanging;
private void RoleEnvironmentChanging(object sender, RoleEnvironmentChangingEventArgs e)
{
}
RoleEnvironment.Changed += RoleEnvironmentChanged;
private void RoleEnvironmentChanged(object sender, RoleEnvironmentChangedEventArgs e)
{
}
Then, inside the RoleEnvironmentChanged method, we can detect what the change is and tell Azure if we want to restart or not.
if ((e.Changes.Any(change => change is RoleEnvironmentConfigurationSettingChange)))
{
e.Cancel = true; // don't recycle the role
}

How to detect if the environment is staging or production in azure hosted service worker role?

I have a worker role in my hosted service.
The worker is sending e-mail daily bases.
But in the hosted service, there are 2 environment, Staging and Production.
So my worker role sends e-mail 2 times everyday.
I'd like to know how to detect if the worker is in stagning or production.
Thanks in advance.
As per my question here, you'll see that there is no fast way of doing this. Also, unless you really know what you are doing, I strongly suggest you not do this.
However, if you want to, you can use a really nice library (Azure Service Management via C#) although we did have some trouble with WCF using it.
Here's a quick sample on how to do it (note, you need to include the management certificate as a resource in your code & deploy it to Azure):
private static bool IsStaging()
{
try
{
if (!CloudEnvironment.IsAvailable)
return false;
const string certName = "AzureManagement.pfx";
const string password = "Pa$$w0rd";
// load certificate
var manifestResourceStream = typeof(ProjectContext).Assembly.GetManifestResourceStream(certName);
if (manifestResourceStream == null)
{
// should we panic?
return true;
}
var bytes = new byte[manifestResourceStream.Length];
manifestResourceStream.Read(bytes, 0, bytes.Length);
var cert = new X509Certificate2(bytes, password);
var serviceManagementChannel = Microsoft.Toolkit.WindowsAzure.ServiceManagement.ServiceManagementHelper.
CreateServiceManagementChannel("WindowsAzureServiceManagement", cert);
using (new OperationContextScope((IContextChannel)serviceManagementChannel))
{
var hostedServices =
serviceManagementChannel.ListHostedServices(WellKnownConfiguration.General.SubscriptionId);
// because we don't know the name of the hosted service, we'll do something really wasteful
// and iterate
foreach (var hostedService in hostedServices)
{
var ad =
serviceManagementChannel.GetHostedServiceWithDetails(
WellKnownConfiguration.General.SubscriptionId,
hostedService.ServiceName, true);
var deployment =
ad.Deployments.Where(
x => x.PrivateID == Zebra.Framework.Azure.CloudEnvironment.CurrentRoleInstanceId).
FirstOrDefault
();
if (deployment != null)
{
return deployment.DeploymentSlot.ToLower().Equals("staging");
}
}
}
return false;
}
catch (Exception e)
{
// if something went wrong, let's not panic
TraceManager.AzureFrameworkTraceSource.TraceData(System.Diagnostics.TraceEventType.Error, "Exception", e);
return false;
}
}
If you're using an SQL server (either Azure SQL or SQL Server hosted in VM), you could stop the Staging worker role from doing work by only allowing the public IP of the Production instance access to the database server.

Resources