Azure Service Fabric Changing Setting After Deployment - azure

We're testing out service fabric and experiencing some issues.
Firstly the VM Type A1v2 comes with 10gb of HDD, however the issue is the log file takes up over 9gb of space almost immediately so deployments then fail with an out of disk space exception. So I've discovered I need to set the SharedLogSizeInMb to a smaller value as below:
"fabricSettings": [
{
"name": "KtlLogger",
"parameters": [
{
"name": "SharedLogSizeInMB",
"value": "256"
}
]
}
],
The issue now is I am not sure how to apply this change. I can't seem to find a way to do it in the portal and when I download the powershell setup scripts from the portal in the Service Fabric setup process and run them to create a new Service Fabric instance it just gets as far as deploying and fails.
So my questions are:
1) How should I be adjusting this setting, I assume it can only be done via an ARM script?
2) Can these scripts only be used to create a new Service Fabric cluster or can you also somehow run them to just change a setting?
3) Should the vanilla script I export from Azure just work? The error messages I can find are very generic and non explanatory. Seems to just be an exception getting thrown in each VM when trying to create service fabric. I am pretty much using all the default settings nothing special.
Thanks,
Oliver
EDIT
My files in comment 2 just have this over and over.
2017/05/02-04:22:00.388,Info,4864,ImageStoreClient.ManagedFileLock,Obtained writer lock for D:\SvcFab\lock
2017/05/02-04:22:00.739,Info,4864,FabricDeployer.FabricDeployer,Executing Configure /fabricBinRoot:C:\Program Files\Microsoft Service Fabric\bin /fabricDataRoot:D:\SvcFab /fabricLogRoot:D:\SvcFab\Log /cm:C:\WindowsAzure\Logs\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.0.0.35\TempClusterManifest.xml /oldClusterManifestString: /im:C:\WindowsAzure\Logs\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.0.0.35\InfrastructureManifest.xml /instanceId: /targetVersion: /nodeName: /nodeTypeName: /runAsType: /runAsAccountName: /runAsPassword: /serviceStartupType: /output: /currentVersion: /error: /bootstrapMSIPath: /machineName: /fabricPackageRoot: /jsonClusterConfigLocation: /enableCircularTraceSession:False
2017/05/02-04:22:02.452,Info,4864,FabricDeployer.FabricDeployer,Running operation System.Fabric.FabricDeployer.ConfigureOperation
2017/05/02-04:22:02.576,Info,4864,FabricDeployer.FabricDeployer,Creating FabricDataRoot D:\SvcFab, if it doesn't exist on machine
2017/05/02-04:22:02.576,Info,4864,FabricDeployer.FabricDeployer,Creating FabricLogRoot D:\SvcFab\Log, if it doesn't exist on machine
2017/05/02-04:22:04.907,Info,4864,ImageStoreClient.ManagedFileLock,Released writer lock on D:\SvcFab\lock
My Event Logs this over and over.
Failed starting service, Error: Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Configure node failed with code -1
at Microsoft.Azure.ServiceFabric.Extension.Core.NodeBootstrapAgent.StartFabricHostService(Boolean isBootstrapping)
ERROR: Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Configure node failed with code -1
at Microsoft.Azure.ServiceFabric.Extension.Core.NodeBootstrapAgent.StartFabricHostService(Boolean isBootstrapping)
at Microsoft.Azure.ServiceFabric.Extension.Core.NodeBootstrapAgent.d__11.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.ServiceFabric.Extension.Core.NodeBootstrapAgent.d__0.MoveNext()

Related

Azure Function Node.js Failed to start a new language worker for runtime: node

I unexpectedly began receiving a 502 Bad Gateway error for all of my HTTP-triggered functions in an Azure Function app that has been running successfully for the past few months.
After digging into the kudu logs, I found the following -
Failed to start a new language worker for runtime: node.
Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcException : Result: Failure
Exception: Worker was unable to load entry point "index.js": Found zero files matching the supplied pattern
Stack: Error: Worker was unable to load entry point "index.js": Found zero files matching the supplied pattern
at C:\Program Files (x86)\SiteExtensions\Functions\4.12.0\workers\node\dist\src\worker-bundle.js:2:44797
at Generator.next (<anonymous>)
at o (C:\Program Files (x86)\SiteExtensions\Functions\4.12.0\workers\node\dist\src\worker-bundle.js:2:44124)
at processTicksAndRejections (internal/process/task_queues.js:95:5)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Grpc.GrpcWorkerChannel.StartWorkerProcessAsync(CancellationToken cancellationToken) at /_/src/WebJobs.Script.Grpc/Channel/GrpcWorkerChannel.cs : 271
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.InitializeJobhostLanguageWorkerChannelAsync(??) at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 154
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.InitializeJobhostLanguageWorkerChannelAsync(??) at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 146
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.InitializeJobhostLanguageWorkerChannelAsync(??) at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 137
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.<>c__DisplayClass56_0.<StartWorkerProcesses>b__0(??) at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 229
I have not changed the file path settings and I was able to find the index.js source file inside of /dist in Kudu as is specified in my function.json binding.
My application configuration settings have Functions extension version set to ~4 and runtime set to Node ~16.
[Screenshot of Function App configuration settings for functions extension version and runtime][2]
In my deployment pipeline, the logs state that the app is being deployed with Node version 16.17.1, and Kudu logs further state that the specific version of the Functions extension tools being used is 4.12.0.
I have tried the following: restarting my application; updating my app configuration to explicitly set the Functions extension package to 4.12.1 (most recently released version); setting my Node version to 14; changing my App Service plan from consumption to premium to see if the error could be due in some way to cold starting; and explicitly setting the entry point of my HTTP-triggered functions in my function.json file. I have also updated my host.json file to update the ExtensionsBundle to use version 3.0.0 at the lowest:
{
"version": "2.0",
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[3.3.0, 4.0.0)"
}
}
I have seen this problem referenced on Stack Overflow, GitHub and Microsoft support forums related to .NET projects but have not been able to use these resources to resolve my issue.
I don't know if you still have this problem, I had the same problem since 0:00 UTC last night on one of my subscriptions and I just found a workaround: in despair I downgraded my Function apps from "~4" to "~3" ... 502s and exceptions disappeared!
Definitely not a long-term solution but I have a Production again and my customers can access their app.
I was facing the same issue, and solved it by creating a new Function App and solving the problems when pre-validating migration from version 3 to 4.
So my recommendation for you is to go to your Function App > Diagnose and Solve Problems menu, and search for "Functions 4.x Pre-Upgrade Validator".
There you can check problems that may need to be solved before upgrading correctly to newest ~4 version.
In my situation it was the name of the Function App, being > than 32 chars.
You can find info regarding this topic in the official Azure Function Migration Guide.
Regards!
Had this issue for function app hosted on windows after upgrading node version to 18.x.x
WEBSITE_NODE_DEFAULT_VERSION : ~18
And our run time was 3.x
but as per this
it looks like Node 18 is only supported by runtime 4.x GA (Node.js 18, 16, & 14)
This meant we had to upgrade the host json to update accordingly to support node 18
Update config for function app was upgraded
FUNCTIONS_EXTENSION_VERSION : ~4
Final step was to restart the function app.
after doing the above, solved the issue Failed to start a new language worker for runtime: node.

Azure: what could be the cause of the error "Unable to edit or replace deployment"?

When I recreate my VM I got the following error:
Problem occurred during request to Azure services. Cloud provider details: Unable to edit or replace deployment 'VM-Name': previous deployment from '8/20/2019 6:20:33 AM' is still active (expiration time is '8/27/2019 5:17:41 AM'). Please see https://aka.ms/arm-deploy for usage details.
Help me please to understand.
What could be the cause of the error ?
UPDATED:
This deployment has not been started previously.
Prior to this, errors were received during creation:
Azure is not available now. Please Try again later
There were several such errors one at a time and then I got that error related to:
Unable to edit or replace deployment
My assumptions about this.
Tell me, am I right or not ?
I launched the image, then after some time I recreated it.
Creation began, but at that moment the connection with Azure was lost.
Then, when the connection was restored, we tried to make a deployment that was not removed in the previous attempt (because there was no connection with Azure).
As a result, we got such an error.
Does this theory make sense?
exactly what it says, there is another deployment with the same name going on at this time, either change the name of the deployment you are trying to queue or wait for the other deployment to finish\fail
This can also occur if you use Bicep templates for your ARM deployement and multiple modules or resources in the template have the same name:
module fooModule '../modules/foo.bicep' = {
name: 'foo'
}
module barModule '../modules/bar.bicep' = {
name: 'foo'
}
I got the same error initially pipeline was working but when retriggered pipeline took more time so i canceled the deployment and made a fresh rerun it encounters. i think i need wait until that deployment filed.

how to debug error 500 Internal Server Error on an Azure App?

I got an "500 Internal Server Error - An error occurred while starting the application" after deploying my application: https://iidapp.azurewebsites.net/
I keep finding the following error message but I am unable to find out on msdn websites any information describing how I can specify the SAS URL
INFO: The app was working for a long period and I didn't have to set the SAS URL; I wonder why suddenly Azure is generating exceptions
INFO2: the app works perfectly on my local machine
Any help is welcomed as I couldn't find any solution by reading the related topics on stackoverflow
2017-04-05T18:51:32
System.ApplicationException: The trace listener AzureBlobTraceListener is disabled. ---> System.InvalidOperationException: The SAS URL for the cloud storage account is not specified. Use the environment variable 'DIAGNOSTICS_AZUREBLOBCONTAINERSASURL' to define it.
at Microsoft.WindowsAzure.WebSites.Diagnostics.AzureBlobTraceListener.RefreshConfig()
--- End of inner exception stack trace ---
An error was triggered by W3SVC-WP : {app name}02000780
I found out on eventid that the code 02000780 meant that a file was missing
I eventually found out that it was possible to log further information by enabling the stdoulog inside the web.config. Read When a .NET Core Azure App Service won’t start: 502.5 Process Failure
I opened the debug console and found out that a directory wasn't found https://{app_id}.scm.azurewebsites.net/DebugConsole
Voilà! I corrected the code and the app is up and running!

Service Fabric Application PackageDeployment Operation Time Out exception

i have service fabric cluster and 3 nodes are created in 3 systems and it is inter-connected. i am able to connect each of nodes. These nodes are created in windows server. These Windows Server(VMs) are on-premises.
Manually i am trying to deploy my package into my cluster/one of nodes, i am getting Operation Timeout exception. i have used below commands to execute for deployment.
Service Fabric Power shell Commands:
Copy-ServiceFabricApplicationPackage -ApplicationPackagePath 'c:\sample\etc' -ApplicationPackagePathInImageStore 'abc.app.portaltype'
after execute above command it runs for 2 -3 mins and throws Operation Timeout exception. My package size is almost 250 MB and approximately 15000 file exist in my package. after that i have passed an extra parameter -TimeOutSec to 600(10mins) explicitly in above command, then it successfully executed and it copied to service fabric imagestore.
Register-ServiceFabricApplicationType -ApplicationPathInImageStore 'abc.app.portaltype'
after executed Copy-ServiceFabricApplicationPackage command , i have executed above Register-ServiceFabricApplicationType command to register my in cluster.but it also throws Operation timeout exception then i have passed an extra parameter -TimeOutSec to 600(10mins) explicitly in above command, but no luck it throws same operation timeout exception.
Just to make sure these operation Timeout issue because of no files in package or not. i have created simple empty service fabric asp.net core app and created package and try to deploy in same server with using above command, it deployed with in fraction of second and it works as smoothly.
Anybody has any idea how to over come service fabric operation timeout issue ?
How to handle the operation timeout issue if the package contains large set of files ?
Any help/suggestion would be very appreciated.
Thanks,
If this is taking longer than the 10 Minute default max it's probably one of the following issues:
Large application packages (>100s of MB)
Slow network connections
A large number of files within the application package (>1000s).
The following workarounds should help you.
Add the following settings to your cluster config:
"fabricSettings": [
{
"name": "NamingService",
"parameters": [
{
"name": "MaxOperationTimeout",
"value": "3600"
},
]
}
]
Also add:
"fabricSettings": [
{
"name": "EseStore",
"parameters": [
{
"name": "MaxCursors",
"value": "32768"
},
]
}
]
There’s a couple additional features which are currently rolling out. For these to be present and functional, you need to be sure that the client is at least 2.4.28 and the runtime of your cluster is at least 5.4.157. If you’re staying up to date these should already be present in your environment.
For register you can specify the -Async flag which will handle the upload asynchronously, reducing the need for the timeout to just the time necessary to send the command, not the application package. You can also query the status of the registration with Get-ServiceFabricApplicationType. 5.5 fixes some issues with these commands, so if they aren't working for you you'll have to wait for that release to hit your environment.

Azure Service Fabric Application stuck in Deleting state

I had a deployment on my service fabric cluster go wrong; I attempted to delete an application and for some reason, the deletion never seemed to and now the application is stuck in the deleting state, while all my deployments remain. I can't delete or upgrade the application since I get a status of "deleting"
Is there a way to update the status of the application so I can then proceed to delete it (for real) this time?
You'll most likely need to use power shell and execute an application delete that way, I had this issue as well when starting out with service fabric.
For instructions on how to connect to the cluster using powershell click here.
$nodes = Get-ServiceFabricNode
foreach ($node in $nodes)
{
$replicas = Get-ServiceFabricDeployedReplica -NodeName $node.NodeName -ApplicationName "fabric:/AppNameHere"
foreach ($replica in $replicas)
{
Remove-ServiceFabricReplica -ForceRemove -NodeName $node.NodeName -PartitionId $replica.PartitionId -ReplicaOrInstanceId $replica.ReplicaOrInstanceId
}
}
Deletions that get stuck, in my experience, are often due to the application not honoring cancellation tokens. What kind of application did you deploy?

Resources