Upgrade failed Azure AKS from 1.8.1 to 1.8.6 - azure

Im trying to upgrade my Azure AKS cluster from version 1.8.1 to 1.8.6.
Deployment failed. Correlation ID: 09312a25-04f3-4e35-8f79-6b2337bb7f19. Operation failed with status: 200. Details: Resource state Failed
The output of command az aks upgrade --name myaks --resource-group myresourcegr --kubernetes-version 1.8.6
More output az aks show --name myaks --resource-group myresourcegr --output table
Name Location ResourceGroup KubernetesVersion ProvisioningState Fqdn
------- ---------- --------------- ------------------- ------------------- -----------------------------------------
myaks westeurope myresourcegr 1.8.6 Failed myaks-81cf39b1.hcp.westeurope.azmk8s.io
Retrying the upgrade action I get always the same error
Edit
Output of command az aks get-versions -g myresourcegr -n myaks
{
"agentPoolProfiles": [
{
"kubernetesVersion": "1.8.6",
"name": null,
"osType": "Linux",
"upgrades": [
"1.8.6"
]
}
],
"controlPlaneProfile": {
"kubernetesVersion": "1.8.6",
"name": null,
"osType": "Linux",
"upgrades": [
"1.8.6"
]
},
"id": "/subscriptions/xxx-98ec-4db6-bfed-946d93a62a7c/resourcegroups/myresourcegr/providers/Microsoft.ContainerService/managedClusters/myaks/upgradeprofiles/default",
"name": "default",
"resourceGroup": "myresourcegr",
"type": "Microsoft.ContainerService/managedClusters/upgradeprofiles"
}

When we consider for AKS upgrade, Always try with creating a new nodepool with the decided upgrade version to the cluster first and then drain the nodes of old nodepools . It will help you t identify the related issues and will be considered for the production upgrade process.

Related

EMR 6.7 configuration in EMR 6.9 gives error Classification 'spark-log4j' is not valid for parent classification 'null'

I was using emr 6.7 with the software configuration:
[
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
},
{
"Classification": "spark-log4j",
"Properties": {
"log4j.rootCategory": "ERROR, console"
}
}
]
but for some reason when I shifted to emr 6.9.
The was website started throwing error
Classification 'spark-log4j' is not valid for parent classification
'null'.
If I remove this spark-log4j then it works but starts giving unnecessary INFO and DEBUG logs
How can I configure spark-log4j in EMR 6.9?
Change the classification to 'spark-log4j2' from 'spark-log4j' in your configuration as newer EMR versions use log4j2 for spark now.
Here is the documentation for valid classifications: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-690-release.html#emr-690-class.

Is it possible to connect a DeploymentScript to a VNET?

When running a bicep resource of type Microsoft.Resources/deploymentScripts
that runs a script that needs access to a keyvault which only allows selected networks how can we make the following script work?
resource exampleScript 'Microsoft.Resources/deploymentScripts#2020-10-01' = {
name: 'KeyVaultSecretFromProduct'
location: resourceGroup().location
kind: 'AzurePowerShell'
identity: {
type: 'UserAssigned'
userAssignedIdentities: {
'/subscriptions/${subscription().subscriptionId}/resourcegroups/${managedIdentity.scope}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/${managedIdentity.name}': {}
}
}
properties: {
arguments: '-ResourceGroupName \\"${keyVaultSecretFromProduct.scope}\\" -SubscriptionKey \\"${subscriptionKey}\\" -KeyVault \\"${keyVaultSecretFromProduct.keyVault}\\"'
azPowerShellVersion: '3.0'
scriptContent: loadTextContent('../../membership-optimization/create-secret-for-product-key.ps1')
retentionInterval: 'P1D'
}
}
After running it fails with the error:
New-AzResourceGroupDeployment: 15:37:50 - The deployment 'test_keyvault' failed with error(s). Showing 1 out of 1 error(s).
Status Message: At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details. (Code: DeploymentFailed)
- {
"status": "failed",
"error": {
"code": "ResourceDeploymentFailure",
"message": "The resource operation completed with terminal provisioning state 'failed'.",
"details": [
{
"code": "DeploymentScriptError",
"message": "The provided script failed with the following error:\r\nMicrosoft.Azure.KeyVault.Models.KeyVaultErrorException: Operation returned an invalid status code 'Forbidden'\n at Microsoft.Azure.Commands.KeyVault.Models.KeyVaultDataServiceClient.SetSecret(String vaultName, String secretName, SecureString secretValue, PSKeyVaultSecretAttributes secretAttributes)\n at Microsoft.Azure.Commands.KeyVault.SetAzureKeyVaultSecret.ExecuteCmdlet()\n at Microsoft.WindowsAzure.Commands.Utilities.Common.CmdletExtensions.<>c__3`1.<ExecuteSynchronouslyOrAsJob>b__3_0(T c)\n at Microsoft.WindowsAzure.Commands.Utilities.Common.CmdletExtensions.ExecuteSynchronouslyOrAsJob[T](T cmdlet, Action`1 executor)\n at Microsoft.WindowsAzure.Commands.Utilities.Common.CmdletExtensions.ExecuteSynchronouslyOrAsJob[T](T cmdlet)\n at Microsoft.WindowsAzure.Commands.Utilities.Common.AzurePSCmdlet.ProcessRecord()\r\nat <ScriptBlock>, /mnt/azscripts/azscriptinput/userscript.ps1: line 46\r\nat <ScriptBlock>, <No file>: line 1\r\nat <ScriptBlock>, /mnt/azscripts/azscriptinput/DeploymentScript.ps1: line 264. Please refer to https://aka.ms/DeploymentScriptsTroubleshoot for more deployment script information."
}
]
}
} (Code:Conflict)
CorrelationId: xxxxxxxxxxxxxxxxxxxx
A Vnet with some subnets used for app services were configured so that those app services can have access to key vault secrets.
Is there a way to solve this problem? Any workaround? Maybe a command that we can run that allows us to connect to the vnet?
A work-around could be to change the VNET settings on the vault, then run the script, and then re-set the VNET settings to its original state. It kinda sucks but it is the only thing I got working for me when handling this situation (though in my case It was a Powershell script).
Or, you could run the script on a VM, that is in a authorised subnet.

Error: Rotate certificates in Azure Kubernetes Service (AKS)

I used https://learn.microsoft.com/en-us/azure/aks/certificate-rotation this link to rotate certificates in AKS. Certificate got updated but my cluster is in failed state. Because of this my application is down.
I am getting below mentioned error when I am running this command az aks rotate-certs -g $RESOURCE_GROUP_NAME -n $CLUSTER_NAME
ERROR: "error": { "code": "ErrorCodeRotateClusterCertificates", "message": "VMASAgentPoolReconciler retry failed: Category: ClientError; SubCode: OutboundConnFailVMExtensionError; Dependency: Microsoft.Compute/virtualMachines/extensions; OrginalError: Code=\"VMExtensionProvisioningError\" Message=\"VM has reported a failure when processing extension 'cse-agent-0'. Error message: \\\"Enable failed: failed to execute command: command terminated with exit status=50\\n[stdout]\\n\\n[stderr]\\ncurl: option --proxy-insecure: is unknown\\ncurl: try 'curl --help' or 'curl --manual' for more information\\nCommand exited with non-zero status 2\\n0.00user 0.00system 0:00.00elapsed 100%!!(MISSING)C(string=VMAS agent pools reconciling)PU (0avgtext+0avgdata 7044maxresident)k\\n0inputs+8outputs (0major+372minor)pagefaults 0swaps\\n\\\"\\r\\n\\r\\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot \"; AKSTeam: NodeProvisioning, Retriable: false" } }
Kubernetes version: 1.14.8
Please help to resolved this issue.
What version of Ubuntu are you running on your nodes? From that error, guessing Ubuntu 16.04 or older.
I'm not sure if it will work, but instead of trying to rotate certificates, can you try upgrading the nodes?
You might also want to consider just creating a new cluster, and using VMSS instead of VMAS.

Error while using hyperledger composer with explorer

I am trying to use hyperledger composer alongside hyperledger explorer. I've deployed a simple business network on fabric-dev-servers. On the composer side it is working fine and I am able to interact with the network perfectly but when I am trying to integrate it with hyperledger explorer I am getting the following error while starting it.
console log
postgres://hppoc:password#127.0.0.1:5432/fabricexplorer
<<<<<<<<<<<<<<<<<<<<<<<<<< Explorer Error >>>>>>>>>>>>>>>>>>>>>
TypeError: Cannot read property 'size' of undefined
at Platform.initialize (/home/paradox/hyperledger/fabric/blockchain-explorer/app/platform/fabric/Platform.js:54:48)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:189:7)
(node:23248) DeprecationWarning: grpc.load: Use the #grpc/proto-loader module with grpc.loadPackageDefinition instead
Received kill signal, shutting down gracefully
Closed out connections
App log:
[2018-10-29 22:14:30.719] [DEBUG] Platform - ******* Initialization started for hyperledger fabric platform ******
[2018-10-29 22:14:30.719] [DEBUG] Platform - Setting admin organization enrolment files
db log:
[2018-10-29 22:14:22.055] [INFO] pgservice - Please set logger.setLevel to DEBUG in ./app/helper.js to log the debugging.
Following is my config.json
config:
{
"network-config": {
"org1": {
"name": "Org1",
"mspid": "Org1MSP",
"peer1": {
"requests": "grpcs://127.0.0.1:7051",
"events": "grpcs://127.0.0.1:7053",
"server-hostname": "peer0.org1.example.com",
"tls_cacerts": "/home/paradox/hyperledger/fabric/fabric-dev-servers/fabric-scripts/hlfv12/composer/crypto-config/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/tls/ca.crt"
},
"admin": {
"key": "/home/paradox/hyperledger/fabric/fabric-dev-servers/fabric-scripts/hlfv12/composer/crypto-config/peerOrganizations/org1.example.com/users/Admin#org1.example.com/msp/keystore",
"cert": "/home/paradox/hyperledger/fabric/fabric-dev-servers/fabric-scripts/hlfv12/composer/crypto-config/peerOrganizations/org1.example.com/users/Admin#org1.example.com/msp/signcerts"
}
}
},
"channel": "composerchannel",
"orderers": [
{
"mspid": "OrdererMSP",
"server-hostname": "orderer.example.com",
"requests": "grpcs://127.0.0.1:7050",
"tls_cacerts": "/home/paradox/hyperledger/fabric/fabric-dev-servers/fabric-scripts/hlfv12/composer/crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/tls/ca.crt"
}
],
"keyValueStore": "/tmp/fabric-client-kvs",
"configtxgenToolPath": "/home/playground/fabric-samples/bin",
"SYNC_START_DATE_FORMAT": "YYYY/MM/DD",
"syncStartDate": "2018/9/01",
"eventWaitTime": "30000",
"license": "Apache-2.0",
"version": 1.0
}
I have faced similar kind of issue and solved it using following steps.
Download Explorer 3.5 from given url.
https://github.com/hyperledger/blockchain-explorer/tree/v0.3.5.1
Hyperledger Composer Setup(Update config.json)
Build Hyperledger Explorer
Run Hyperledger Explorer
Finally it work for Fabric 1.2 and Composer#0.20.
I hope it will help you !
Looking at the format of your config.json file, it looks like you might be either using an old version of Explorer, or an old config.json. Support for Fabric 1.2 (which it looks like you are using) was only added in Explorer 3.7, along with changes to the structure of config.json.
So, I would recommend the following:
Update to Explorer 3.7 (branch release-3.7).
Follow the instructions here.

"Win32Exception: The subsystem needed to support the image type is not present" when deploying Fabric Cluster on Nano Server

I am deploying a Service Fabric Cluster on Nano Server using the secure-cluster-5-node template (https://github.com/Azure/azure-quickstart-templates/tree/master/service-fabric-secure-cluster-5-node-1-nodetype)
I get the following error:
Operation xxx
Tracking xxx
StatusConflict
Provisioning StateFailed
Timestamp‎6‎/‎22‎/‎2017‎ ‎13‎:‎05‎:‎14
Duration6 minutes 11 seconds
TypeMicrosoft.Compute/virtualMachineScaleSets
Resource Id/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Compute/virtualMachineScaleSets/nt1vm
StatusMessage{
"status": "Failed",
"error": {
"code": "ResourceDeploymentFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "VMExtensionHandlerNonTransientError",
"message": "Handler 'Microsoft.Azure.ServiceFabric.ServiceFabricNode' has reported failure for VM Extension 'ServiceFabricNodeVmExt_vmNodeType0Name' with terminal error code '1007' and error message: 'Install failed for plugin (name: Microsoft.Azure.ServiceFabric.ServiceFabricNode, version 1.0.0.35). Exception:\nSystem.ComponentModel.Win32Exception: The subsystem needed to support the image type is not present\r\n at System.Diagnostics.Process.StartCore(ProcessStartInfo startInfo)\r\n at Microsoft.Azure.Agent.StateMachine.HandlerStateMachine.InvokeCommand(String command, PluginArtifacts pluginArtifact, String pluginVersion, String pluginFolder, String pluginLogFolder, Int32 processWaitTimeout, PluginEventType startType, PluginEventType endType)\r\n at Microsoft.Azure.Agent.StateMachine.HandlerStateMachine.InstallHandler(PluginArtifacts artifact)'"
}
]
}
}
The settings are using in the ARM template for the os is:
"vmImagePublisher": {
"value": "MicrosoftWindowsServer"
},
"vmImageOffer": {
"value": "WindowsServer"
},
"vmImageSku": {
"value": "2016-Nano-Server"
},
"vmImageVersion": {
"value": "latest"
},
Any idea on how to troubleshoot this?
For now, Nano Server is not supported by Service Fabric Cluster. Please refer to this link.
You are able to create clusters on VMs running these operating
systems:
Windows Server 2012 R2
Windows Server 2016
Linux Ubuntu 16.04(in public preview)
You also could check it on Azure Portal. 2016-Nano-Server could not be selected.

Resources