Kubernetes setup on centos 7 hangs - linux

I have setup of 1 master and 2 salves of kubernetes on centos machines.
Also there is dashboard for kubernetes and 6 spring boot services. The mongodb is installed as service on master node.
The setup works fine for 1 or 2 days but then the system hangs and need to do the forceful restart. There are no memory or cpu panic logs or any kubernetes error logs available. The CPU and memory usage is 2% of total available.
Please can anyone suggest any issues with the setup. Below is the versions that i use:
Docker - Docker version 1.13.1, build 64e9980/1.13.1
CentOS -
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"```
**Kubernetes** -
```kubeadm version: &version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:39:11Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}```
**mongodb** -
v4.0.19

Related

docker networking not working after installing Docker Desktop in Ubuntu

after I installed Docker Desktop on my linux machine, my docker containers don't work any more. The containers can't communicate with each others or connect to the internet.
I found the Self-diagnose tool. And when running this I get:
➜ ~ /opt/docker-desktop/bin/com.docker.diagnose check
[2022-10-21T21:53:23.963576537Z][com.docker.diagnose][I] set path configuration to OnHost
Starting diagnostics
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[PASS] DD0017: can a VM be started?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0013: is the $PATH ok?
[PASS] DD0034: is Context set to a Docker Desktop context?
[PASS] DD0003: is the Docker CLI working?
[FAIL] DD0014: are the backend processes running? 2 errors occurred:
* querying com.docker.backend process: is it running as a different user?: readlink /proc/22636/exe: permission denied
* querying com.docker.backend process: is it running as a different user?: readlink /proc/22654/exe: permission denied
[PASS] DD0007: is the backend responding?
[PASS] DD0008: is the native API responding?
[PASS] DD0009: is the vpnkit API responding?
[PASS] DD0010: is the Docker API proxy responding?
[FAIL] DD0012: is the VM networking working? network checks failed: failed to ping host: exit status 1
[2022-10-21T21:53:24.648095036Z][com.docker.diagnose][I] ipc.NewClient: 4c9ff8fa-diagnose-network -> diagnosticd.sock diagnosticsd
[common/pkg/diagkit/gather/diagnose.runIsVMNetworkingOK()
[ common/pkg/diagkit/gather/diagnose/network.go:34 +0xd9
[common/pkg/diagkit/gather/diagnose.(*test).GetResult(0x11754a0)
[ common/pkg/diagkit/gather/diagnose/test.go:46 +0x43
[common/pkg/diagkit/gather/diagnose.Run.func1(0x11754a0)
[ common/pkg/diagkit/gather/diagnose/run.go:17 +0x5a
[common/pkg/diagkit/gather/diagnose.walkOnce.func1(0x2?, 0x11754a0)
[ common/pkg/diagkit/gather/diagnose/run.go:142 +0x77
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x1, 0x11754a0, 0xc000223728)
[ common/pkg/diagkit/gather/diagnose/run.go:151 +0x87
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x0, 0x11755a0, 0xc000223728)
[ common/pkg/diagkit/gather/diagnose/run.go:148 +0x52
[common/pkg/diagkit/gather/diagnose.walkOnce(0xb0fde0?, 0xc00035f890)
[ common/pkg/diagkit/gather/diagnose/run.go:137 +0xcc
[common/pkg/diagkit/gather/diagnose.Run(0x11755a0, 0x7fbd50ef6300?, {0xc00035fb20, 0x1, 0x1})
[ common/pkg/diagkit/gather/diagnose/run.go:16 +0x1d4
[main.checkCmd({0xc00012e010?, 0x6?, 0x4?}, {0x0, 0x0})
[ common/cmd/com.docker.diagnose/main.go:133 +0x105
[main.main()
[ common/cmd/com.docker.diagnose/main.go:99 +0x2a7
[2022-10-21T21:53:24.648917178Z][com.docker.diagnose][I] (c88fef01) 4c9ff8fa-diagnose-network C->S diagnosticsd POST /check-network-connectivity: {"ips":["169.254.5.211","169.254.5.199","192.168.86.22"]}
[2022-10-21T21:53:25.180233483Z][com.docker.diagnose][W] (c88fef01) 4c9ff8fa-diagnose-network C<-S d6792b26-diagnosticsd POST /check-network-connectivity (531.701612ms): failed to ping host: exit status 1
[SKIP] DD0030: is the image access management authorized?
[PASS] DD0037: is the virtiofs setup correct?
[PASS] DD0036: is the credentials store configured correctly?
[PASS] DD0033: does the host have Internet access?
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[PASS] DD0017: can a VM be started?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0032: do Docker networks overlap with host IPs?
Please investigate the following 2 issues:
1 : The test: are the backend processes running?
Failed with: 2 errors occurred:
* querying com.docker.backend process: is it running as a different user?: readlink /proc/22636/exe: permission denied
* querying com.docker.backend process: is it running as a different user?: readlink /proc/22654/exe: permission denied
Not all of the backend processes are running.
2 : The test: is the VM networking working?
Failed with: network checks failed: failed to ping host: exit status 1
VM seems to have a network connectivity issue. Check your host firewall and anti-virus settings in case they are blocking the VM.
Docker version:
➜ ~ docker version
Client: Docker Engine - Community
Cloud integration: v1.0.29
Version: 20.10.20
API version: 1.41
Go version: go1.18.7
Git commit: 9fdeb9c
Built: Tue Oct 18 18:20:18 2022
OS/Arch: linux/amd64
Context: desktop-linux
Experimental: true
Server: Docker Desktop 4.13.0 (89412)
Engine:
Version: 20.10.20
API version: 1.41 (minimum version 1.12)
Go version: go1.18.7
Git commit: 03df974
Built: Tue Oct 18 18:18:35 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.8
GitCommit: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
The issues are right there, but I have no clue on how to fix this.

Wirtual Machine with Windows Server from Azure doesnt run Linux based Docker Container

I try to run a Docker container based on Linux on Virtual Machine from Azure with Windows Server 2019.
I work with a lot of tutorials for that, I enabled experimental flags, so docker version show:
PS C:\Users\azure> docker version
Client: Docker Engine - Enterprise
Version: 19.03.5
API version: 1.40
Go version: go1.12.12
Git commit: 2ee0c57608
Built: 11/13/2019 08:00:16
OS/Arch: windows/amd64
Experimental: false
Server: Docker Engine - Enterprise
Engine:
Version: 19.03.5
API version: 1.40 (minimum version 1.24)
Go version: go1.12.12
Git commit: 2ee0c57608
Built: 11/13/2019 07:58:51
OS/Arch: windows/amd64
Experimental: true
And docker info:
docker info
Client:
Debug Mode: false
Plugins:
cluster: Manage Docker clusters (Docker Inc., v1.2.0)
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 4
Server Version: 19.03.5
Storage Driver: lcow (linux) windowsfilter (windows)
LCOW:
Windows:
Logging Driver: json-file
Plugins:
Volume: local
Network: ics internal l2bridge l2tunnel nat null overlay private transparent
Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
Operating System: Windows Server 2019 Datacenter Version 1809 (OS Build 17763.1098)
OSType: windows
Architecture: x86_64
CPUs: 1
Total Memory: 2GiB
Name: xxx-yyy
ID: R2TB:P4GZ:MRU4:IU4A:BPTU:DPYY:GV7C:VNL3:JW6F:IRKJ:BTKW:BVNE
Docker Root Dir: C:\ProgramData\docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
But finally, when I run any Linux container I got this error:
PS C:\Users\azure> docker run --platform=linux hello-world:linux
docker : C:\Program Files\Docker\docker.exe: Error response from daemon: failed to start
service utility VM (createreadwrite): hcsshim::CreateComputeSystem
2410bb8b9e431b1068750d0c79376b1fdc196eef97c0a48ec8571775349acde7_svm: The virtual machine
could not be started because a required feature is not installed.
At line:1 char:1
+ docker run --platform=linux hello-world:linux
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (C:\Program File... not installed.:String) [],
RemoteException
+ FullyQualifiedErrorId : NativeCommandError
(extra info: {"SystemType":"container","Name":"2410bb8b9e431b1068750d0c79376b1fdc196eef97c0
a48ec8571775349acde7_svm","Layers":null,"HvPartition":true,"HvRuntime":{"ImagePath":"C:\\Pr
ogram Files\\Linux Containers","LinuxInitrdFile":"initrd.img","LinuxKernelFile":"kernel"},"
ContainerType":"linux","TerminateOnLastHandleClosed":true}).
See 'C:\Program Files\Docker\docker.exe run --help'.
I miss something in Azure? In VM config?
I solve my problem and it wasn't a problem with config, docker, or with Windows Server.
The problem was hardware - when you select Azure processor you should use a processor with nested virtualization. The solution is described here: https://blog.darrenjrobinson.com/azure-vm-docker-createcontainer-error-0xc0370102/

JBOSS not running

I'm from Linux server background and very new to JBOSS. I'm trying to setup a IoT application server which requires JBOSS service to provide
a web interface for the application server.
But when i check the JBOSS server state it is showing 'starting', i need this to be 'running'.
# /opt/cgms/bin/jboss-cli.sh --connect controller=127.0.0.1 ":read- attribute(name=server-state)"
{
"outcome" => "success",
"result" => "starting"
}
I can see that the deployment is getting failed when i start JBOSS using the script standalone.sh. I've increased the deployment-timeout
up to 6000 seconds in standalone.xml, still the deployment is failing with the following message in /opt/cgms/standalone/deployments/cgms.ear.failed,
""JBAS015052: Did not receive a response to the deployment operation within the allowed timeout period [6000 seconds].
Check the server configuration file and the server logs to find more about the status of the deployment."
Here is my JBOSS setup details,
[root#app-server ~]# /opt/cgms/bin/jboss-cli.sh --connect
[standalone#localhost:9999 /] version
JBoss Admin Command-line Interface
JBOSS_HOME: /opt/cgms
JBoss AS release: 7.3.0.Final-redhat-14 "Janus"
JBoss AS product: EAP 6.2.0.GA
JAVA_HOME: null
java.version: 1.8.0_65
java.vm.vendor: Oracle Corporation
java.vm.version: 25.65-b01
os.name: Linux
os.version: 3.10.0-229.el7.x86_64
When i check the server.log, it is stuck at,
# tailf /opt/cgms/server/cgms/log/server.log
624: app-server: Aug 12 2017 05:45:01.506 +0000: %IOTFND-6-UNSPECIFIED: %[ch=StdSchedulerFactory][sev=INFO][tid=MSC service thread 1-1]: Quartz scheduler 'CgnmsQuartz' initialized from an externally provided properties instance.
625: app-server: Aug 12 2017 05:45:01.506 +0000: %IOTFND-6-UNSPECIFIED: %[ch=StdSchedulerFactory][sev=INFO][tid=MSC service thread 1-1]: Quartz scheduler version: 2.2.1
It will not go further from here.
I've tried with java 1.7, but the script standalone.sh failed with a java error,
java.lang.UnsupportedClassVersionError: com/cisco/cgms/loglayout/LogHandler : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at org.jboss.modules.ModuleClassLoader.doDefineOrLoadClass(ModuleClassLoader.java:345)
at org.jboss.modules.ModuleClassLoader.defineClass(ModuleClassLoader.java:423)
at org.jboss.modules.ModuleClassLoader.loadClassLocal(ModuleClassLoader.java:261)
at org.jboss.modules.ModuleClassLoader$1.loadClassLocal(ModuleClassLoader.java:76)
Here are my server details,
OS - Red Hat Enterprise Linux Server release 7.1 (Maipo) - runs on Oracle VM VirtualBox
kernel - app-server 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
When i check netstat, port 80 and 443 are listening.
Please help to fix this problem.

system service is not started on fedora

Using the below code, I created a service
Code snippet from agentInstaller.sh
fileAgentController="agent_controller.sh"
if [[ "$os" = "debian" ]] ;then
update-rc.d $fileAgentController defaults
else
chkconfig --add /etc/init.d/$fileAgentController
fi
export start="start"
export command="/etc/init.d/$fileAgentController"
sh $command ${start}
Above code successfully start the service 'agent_controller.sh' on Amazon Linux AMI 2017.03 - amzn rhel fedora and Ubuntu 16.04.2 LTS
But give error with following machine details :-
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.3"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.3 (Maipo)"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.3:GA:server"
Red Hat Enterprise Linux Server release 7.3 (Maipo)
I encountered following error on above machine :-
Reloading systemd: [ OK ]
Starting agent_controller.sh (via systemctl): Failed to start
agent_controller.sh.service: Unit not found.
[FAILED]

docker run hello-world still fails, permission denied

I'm trying to run docker but it still fails. Here is what i get
root#c1170137:~# docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
c04b14da8d14: Extracting 974 B/974 B
docker: failed to register layer: ApplyLayer exit status 1 stdout: stderr: permission denied.
See 'docker run --help'.
kernel: 4.4.16-1-pve
i'm using debian jessie
Distributor ID: Debian
Description: Debian GNU/Linux 8.5 (jessie)
Release: 8.5
Codename: jessie
Edit:
daemon.log
http://hastebin.com/qinufacuto.coffee
docker info
root#c1177124:~# docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 1.12.1
Storage Driver: vfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host bridge null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.16-1-pve
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 2 GiB
Name: c1177124
ID: 4YUJ:OL2E:WLJC:23WJ:5HRW:LRY3:QHKC:MKXO:JDWO:VWOQ:JMWN:V52W
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
127.0.0.0/8
By the way, the problem could be caused by the kernel.
Thank you for any idea or solution
Use lxc.apparmor.profile: unconfined
Just put at the end of an /etc/pve/lxc/ID.conf file and restart your LXC container.
Using lxc.aa_profile: unconfined is deprecated as was renamed.
If you don't care about security or trust your docker containers:
Edit the configuration file of your lxc container on the host in /etc/pve/lxc/ID.conf by adding lxc.aa_profile: unconfined at the end of the file.
Remove apparmor: apt-get remove apparmor --purge
Iam Solved this problem with execute this command on Host:
lxc config set your-lxc-name security.nesting true
lxc config set your-lxc-name security.privileged true
I had the same error. In my case it was due to McAfee antivirus. I removed it and then pull successfully. McAffe was blocking the /etc/passwd folder and Docker could not pull images.
Here people had the same exact problem:
https://github.com/moby/moby/issues/37817

Resources