Proxmox - migration fails with exit code 255 - proxmox

Attempting to migrate a container between Proxmox nodes failed saying the following command failed with exit code 255:
TASK ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=violet' root#172.20.20.1 pvecm mtunnel -migration_network 172.20.20.1/16 -get_migration_ip' failed: exit code 255
Running the command manually shows the given error message:
could not get migration ip: multiple, different, IP address configured for network 172.20.20.1/16

Turns out, I had a second interface on the target host configured on the same network (eno1 had 172.20.0.1 and eno3 had 172.20.20.3, e.g.). Removing/disabling one of these interfaces resolved the issue.

Related

Error: Rotate certificates in Azure Kubernetes Service (AKS)

I used https://learn.microsoft.com/en-us/azure/aks/certificate-rotation this link to rotate certificates in AKS. Certificate got updated but my cluster is in failed state. Because of this my application is down.
I am getting below mentioned error when I am running this command az aks rotate-certs -g $RESOURCE_GROUP_NAME -n $CLUSTER_NAME
ERROR: "error": { "code": "ErrorCodeRotateClusterCertificates", "message": "VMASAgentPoolReconciler retry failed: Category: ClientError; SubCode: OutboundConnFailVMExtensionError; Dependency: Microsoft.Compute/virtualMachines/extensions; OrginalError: Code=\"VMExtensionProvisioningError\" Message=\"VM has reported a failure when processing extension 'cse-agent-0'. Error message: \\\"Enable failed: failed to execute command: command terminated with exit status=50\\n[stdout]\\n\\n[stderr]\\ncurl: option --proxy-insecure: is unknown\\ncurl: try 'curl --help' or 'curl --manual' for more information\\nCommand exited with non-zero status 2\\n0.00user 0.00system 0:00.00elapsed 100%!!(MISSING)C(string=VMAS agent pools reconciling)PU (0avgtext+0avgdata 7044maxresident)k\\n0inputs+8outputs (0major+372minor)pagefaults 0swaps\\n\\\"\\r\\n\\r\\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot \"; AKSTeam: NodeProvisioning, Retriable: false" } }
Kubernetes version: 1.14.8
Please help to resolved this issue.
What version of Ubuntu are you running on your nodes? From that error, guessing Ubuntu 16.04 or older.
I'm not sure if it will work, but instead of trying to rotate certificates, can you try upgrading the nodes?
You might also want to consider just creating a new cluster, and using VMSS instead of VMAS.

AKS nodes failed provisioning

So I have an AKS cluster in DEV env which was working fine. Today I have noticed that some pods due being removed/uninstalled via helm were stuck in Terminating state.
I found out that none of the 3 nodes are ready. When I stopped the cluster and started again, VMs failed to create in VMMS with associated message:
VM has reported a failure when processing extension 'vmssCSE'. Error message: "Enable failed: failed to execute command: command terminated with exit status=50
According to what I have found might look like the VMs in scale set are missing outbound internet connectivity, however the associated NSG has only the defaults:
When inspecting the VMSS status, it says the following:
VM has reported a failure when processing extension 'vmssCSE'. Error message: "Enable failed: failed to execute command: command terminated with exit status=50 [stdout] [stderr] nc: connect to mcr.microsoft.com port 443 (tcp) failed: Connection timed out Command exited with non-zero status 1 0.00user 0.00system 2:10.07elapsed 0%CPU (0avgtext+0avgdata 2360maxresident)k 0inputs+8outputs (0major+113minor)pagefaults 0swaps " More information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot
This troubleshooting doesn't seem to be helpful as it states:
When restricting egress traffic from an AKS cluster, there are required and optional recommended outbound ports / network rules and FQDN / application rules for AKS. If your settings are in conflict with any of these rules, certain kubectl commands won't work correctly. You may also see errors when creating an AKS cluster.
Verify that your settings aren't conflicting with any of the required or optional recommended outbound ports / network rules and FQDN / application rules.
But the default rules have not changed, therefore I'm lost at that point.

prestodb: Failed to launch presto server and no nodes running

Below is the snapshot of the error
presto> select * from system.runtime.nodes;
Error running command: java.net.ConnectException: Failed to connect to localhost/127.0.0.1:8080
I tried using local IP address and 10.0.2.2 yet the same error occurs.

Chef-server-ctl reconfigure/ Creating Admin User on chef server

I am fairly new to Linux (and brand new to chef) and I have ran into an issue when setting up my chef server. I am trying to create an admin user with the command
sudo chef-server-ctl user-create admin Admin Ladmin admin#example.com
examplepass -f admin.pem
but after I keep getting this error:
ERROR: Connection refused connecting...
ERROR: Connection refused connecting to https://127.0.0.1/users/, retry 5/5
ERROR: Network Error: Connection refused - Connection refused
connecting to https://..., giving up
Check your knife configuration and network settings
I also noticed that when I ran chef-server-ctl I got this output:
[2016-12-21T13:24:59-05:00] ERROR: Running exception handlers Running
handlers complete
[2016-12-21T13:24:59-05:00] ERROR: Exception
handlers complete Chef Client failed. 0 resources updated in 01 seconds
[2016-12-21T13:24:59-05:00] FATAL: Stacktrace dumped to
/var/opt/opscode/local-mode-cache/chef-stacktrace.out
[2016-12-21T13:24:59-05:00] FATAL: Please provide the contents of the
stacktrace.out file if you file a bug report
[2016-12-21T13:24:59-05:00] FATAL:
Chef::Exceptions::CannotDetermineNodeName: Unable to determine node
name: configure node_name or configure the system's hostname and fqdn
I read that this error is due to a prerequisite mistake but I'm uncertain as to what it means or how to fix it. So any input would be greatly appreciated.
Your server does not have a valid FQDN (aka full host name). You'll have to fix this before installing Chef server.

MemSQL node failed to start

When I tried today to start my cluster I get the following error
MemSQL node CCBA806E30C9A8E4430ADFEAF5DF435ED91B8F7F failed to start:
Failed to connect to MemSQL node
CCBA806E30C9A8E4430ADFEAF5DF435ED91B8F7F: No error in tracelog
and the worst part is that I am not able to query anything. I get this error whenever I query.
ERROR 1777 (HY000): Partition memsqldb:0 has no master instance.
What is the exact problem here?
i tried running ​memsql-ops report​
memsql-ops report
2015-10-10 09:51:10: Jf3fc7e04 [INFO] Building a diagnostic report for agent A0cd4d79612c64a1298c66f7d346fe44a
2015-10-10 09:51:20: Jf3fc7e04 [WARNING] Could not get info for MemSQL node 378F1F82DE7DE6A3D4552A381F316399A579DF18: [Errno 2003] Can't connect to MySQL server on '192.168.104.184' (4)
figured out that the problem is with IPs as my IP at my office is different from that of my Home.So i did as follows
iptables -t nat -I OUTPUT --dest 192.168.104.184 -j DNAT --to-dest 10.69.0.2
memsql-ops cluster-start
This solved the Problem.

Resources