Azure Service Fabric Application stuck in Deleting state - azure

I had a deployment on my service fabric cluster go wrong; I attempted to delete an application and for some reason, the deletion never seemed to and now the application is stuck in the deleting state, while all my deployments remain. I can't delete or upgrade the application since I get a status of "deleting"
Is there a way to update the status of the application so I can then proceed to delete it (for real) this time?

You'll most likely need to use power shell and execute an application delete that way, I had this issue as well when starting out with service fabric.
For instructions on how to connect to the cluster using powershell click here.
$nodes = Get-ServiceFabricNode
foreach ($node in $nodes)
{
$replicas = Get-ServiceFabricDeployedReplica -NodeName $node.NodeName -ApplicationName "fabric:/AppNameHere"
foreach ($replica in $replicas)
{
Remove-ServiceFabricReplica -ForceRemove -NodeName $node.NodeName -PartitionId $replica.PartitionId -ReplicaOrInstanceId $replica.ReplicaOrInstanceId
}
}
Deletions that get stuck, in my experience, are often due to the application not honoring cancellation tokens. What kind of application did you deploy?

Related

Triggered Azure Web Job goes into Aborted State

I have 2 Azure web jobs. One is triggered and other one is Continuous. Whenever there is some change in server configuration or app configuration of the web job, Triggered web job goes into aborted state. It goes because because of the change, application need to be restarted and somehow it is not able to restart properly. Need help on this
Here is the source code of setting the status:
if (triggeredJobStatus.Status == JobStatus.Running)
{
if (isLatest)
{
// If it is the latest run, make sure it's actually running
string triggeredJobDataPath = Path.Combine(JobsDataPath, jobName);
LockFile triggeredJobRunLockFile = TriggeredJobRunner.BuildTriggeredJobRunnerLockFile(triggeredJobDataPath, TraceFactory);
if (!triggeredJobRunLockFile.IsHeld)
{
triggeredJobStatus.Status = JobStatus.Aborted;
}
}
else
{
// If it's not latest run it cannot be running
triggeredJobStatus.Status = JobStatus.Aborted;
}
}
Additionally have you checked the logs for the WebJob and all of its triggered methods? Please read through the logs and check if anything abnormal is going on.
Job status is shown as aborted if its status file shows it as running, but it is not actually running.
Additional Reference:
https://github.com/MicrosoftDocs/azure-docs/issues/19686
Hope it helps.

Azure: what could be the cause of the error "Unable to edit or replace deployment"?

When I recreate my VM I got the following error:
Problem occurred during request to Azure services. Cloud provider details: Unable to edit or replace deployment 'VM-Name': previous deployment from '8/20/2019 6:20:33 AM' is still active (expiration time is '8/27/2019 5:17:41 AM'). Please see https://aka.ms/arm-deploy for usage details.
Help me please to understand.
What could be the cause of the error ?
UPDATED:
This deployment has not been started previously.
Prior to this, errors were received during creation:
Azure is not available now. Please Try again later
There were several such errors one at a time and then I got that error related to:
Unable to edit or replace deployment
My assumptions about this.
Tell me, am I right or not ?
I launched the image, then after some time I recreated it.
Creation began, but at that moment the connection with Azure was lost.
Then, when the connection was restored, we tried to make a deployment that was not removed in the previous attempt (because there was no connection with Azure).
As a result, we got such an error.
Does this theory make sense?
exactly what it says, there is another deployment with the same name going on at this time, either change the name of the deployment you are trying to queue or wait for the other deployment to finish\fail
This can also occur if you use Bicep templates for your ARM deployement and multiple modules or resources in the template have the same name:
module fooModule '../modules/foo.bicep' = {
name: 'foo'
}
module barModule '../modules/bar.bicep' = {
name: 'foo'
}
I got the same error initially pipeline was working but when retriggered pipeline took more time so i canceled the deployment and made a fresh rerun it encounters. i think i need wait until that deployment filed.

Serilog not working in Service Fabric

I am using Serilog to write to a file and try to get more information about an error that is occurring in my production cluster...
In my local dev cluster the log files are created fine but they are not created in the VM's on my production cluster. I think this may be security related
Has anyone ever had this?
My production cluster has 5 nodes with a Windows 2016 VM on each
Even more strange is that this works on a single node cluster in Azure
public static ILogger ConfigureLogging(string appName, string appVersion)
{
AppDomain.CurrentDomain.ProcessExit += (sender, args) => Log.CloseAndFlush();
var configPackage = FabricRuntime.GetActivationContext().GetConfigurationPackageObject("Config");
var environmentName = configPackage.GetSetting("appSettings", "Inspired.TradingPlatform:EnvironmentName");
var loggerConfiguration = new LoggerConfiguration()
.WriteTo.File(#"D:\SvcFab\applog-" + appName + ".txt", shared: true, rollingInterval: RollingInterval.Day)
.Enrich.WithProperty("AppName", appName)
.Enrich.WithProperty("AppVersion", appVersion)
.Enrich.WithProperty("EnvName", environmentName);
var log = loggerConfiguration.CreateLogger();
log.Information("Starting {AppName} v{AppVersion} application", appName, appVersion);
return Log.Logger = log;
}
Paul
I wouldn't recommend logging into local files in Service Fabric, since your node may be moved to another VM any time and you won't have access to these files. Consider using another sinks which write to external system (database, message bus or logging system like loggly)
It is likely a permission issue. Your service might be trying to log to a folder where it does not have permission.
By default, your services will run under same user as the Fabric.exe process, that run as NetworkService, you can find more information about this on this link.
I would not recommend this approach, because many reasons, a few of them are:
Your services might be moved around the cluster so your files will be incomplete
You have to log on multiple machines to find the logs
The node might be gone with files (Scale up + Down, Failure, Disk error)
Multiple instances on same node trying to access the same file
and so on...
On Service Fabric, the recommended way is to use EventSource(or ETW) + EventFlow + Application Insights. They run smoothly together and bring you many features.
If you want to use stay on Serilog, I would recommend you use Serilog + Application Insights instead, it will give you move flexibility on your monitoring. Take a look at the Application Insights sink for serilog here.
This was actually user error! I was connecting to a different cluster of VMs than the one my service fabric was connected to! Whoops!

TransactionScope in azure webjobs

I have a webjob running in azure that is processing data sent to an event hub.
In the eventprocessor I want to save information to a SQL server. To make sure that everything is inserted correctly I want to use transactions.
When I run the code locally everything works perfect. But when running in Azure nothing happens, no error is thrown.
What I have read it should be possible to use TransactionScope. This example code below is not working.
using (TransactionScope scope = new TransactionScope())
{
dataImportDao.StartProcessingMessage(mappedMessage);
scope.Complete();
}
Any suggestions how to solve it or if I should go with a different approach is very appreciated.

Azure Document Db Worker Role

I am having problems getting the Microsoft.Azure.Documents library to initialize the client in an azure worker role. I'm using Nuget Package 0.9.1-preview.
I have mimicked what was done in the example for azure document
When running locally through the emulator I can connect fine with the documentdb and it runs as expected. When running in the worker role, I am getting a series of NullReferenceException and then ArgumentNullException.
The bottom System.NullReferenceException that is highlighted above has this call stack
so the nullReferenceExceptions start in this call at the new DocumentClient.
var endpoint = "myendpoint";
var authKey = "myauthkey";
var enpointUri = new Uri(endpoint);
DocumentClient client = new DocumentClient(endpointUri, authKey);
Nothing changes between running it locally vs on the worker role other then the environment (obviously).
Has anyone gotten DocumentDb to work on a worker role or does anyone have an idea why it would be throwing null reference exceptions? The parameters getting passed into the DocumentClient() are filled.
UPDATE:
I tried to rewrite it being more generic which helped at least let the worker role run and let me attached a debugger. It is throwing the error on the new DocumentClient. Seems like some security passing is null. Both the required parameters on initialization are not null. Is there a security setting I need to change for my worker role to be able to connect to my documentdb? (still works locally fine)
UPDATE 2:
I can get the instance to run in release mode, but not debug mode. So it must be something to do with some security setting or storage setting that is misconfigured I guess?
It seems I'm getting System.Security.SecurityExceptions - only when using The DocumentDb - queues do not give me that error. All Call Stacks for that error seem to be with System.Diagnostics.EventLog. The very first Exception I see in the Intellitrace Summary is System.Threading.WaitHandleCannotBeOpenedException.
More Info
Intellitrace summary exception data:
top is the earliest and bottom is the latest (so System.Security.SecurityException happens first then the NullReference)
The solution for me to get rid of the security exception and null reference exception was to disable intellitrace. Once I did that, I was able to deploy and attach debugger and see everything working.
Not sure what is between the null in intellitrace and the DocumentClient, but hopefully it's just in relation to the nuget and it will be fixed in the next iteration.
unable to repro.
I created a new Worker Role. Single instance. Added authkey & endoint config to cscfg.
Created private static DocumentClient at WorkerRole class level
Init DocumentClient in OnStart
Dispose DocumentClient in OnStop
In RunAsync inside loop,
execute a query Works as expected.
Test in emulator works.
Deployed as Release to Production slot. works.
Deployed as Debug to Staging with Remote Debug. works.
Attached VS to CloudService, breakpoint hit inside loop.
Working solution : http://ryancrawcour.blob.core.windows.net/samples/AzureCloudService1.zip

Resources