Tried installing kubernetes v1.2.0 on azure environment but after installation cannot access kube apis at port 8080.
Following services are running :
root 1473 0.2 0.5 536192 42812 ? Ssl 09:22 0:00 /home/weave/weaver --port 6783 --name 22:95:7a:6e:30:ed --nickname kube-00 --datapath datapath --ipalloc-range 10.32.0.0/12 --dns-effective-listen-address 172.17.42.1 --dns-listen-address 172.17.42.1:53 --http-addr 127.0.0.1:6784
root 1904 0.1 0.2 30320 20112 ? Ssl 09:22 0:00 /opt/kubernetes/server/bin/kube-proxy --master=http://kube-00:8080 --logtostderr=true
root 1907 0.0 0.0 14016 2968 ? Ss 09:22 0:00 /bin/bash -c until /opt/kubernetes/server/bin/kubectl create -f /etc/kubernetes/addons/; do sleep 2; done
root 1914 0.2 0.3 35888 22212 ? Ssl 09:22 0:00 /opt/kubernetes/server/bin/kube-scheduler --logtostderr=true --master=127.0.0.1:8080
root 3129 2.2 0.3 42488 25192 ? Ssl 09:27 0:00 /opt/kubernetes/server/bin/kube-controller-manager --master=127.0.0.1:8080 --logtostderr=true
curl -v http://localhost:8080 returns error
Rebuilt URL to: http://localhost:8080/
Trying 127.0.0.1...
connect to 127.0.0.1 port 8080 failed: Connection refused
Failed to connect to localhost port 8080: Connection refused
Closing connection 0 curl: (7) Failed to connect to localhost port 8080: Connection refused
Same works fine with v1.1.2.
I'm using following guidelines https://github.com/kubernetes/kubernetes/tree/master/docs/getting-started-guides/coreos/azure and updated line https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/coreos/azure/cloud_config_templates/kubernetes-cluster-main-nodes-template.yml#L187 to user version v1.2.0.
The services you show running do not include the apiserver. For a quick breakdown I can explain what each service does that you show running.
Weave: This is a software overlay network and assigns IP addresses to your pods.
kube-proxy: This runs on your worker nodes allow pods to run and route traffic between exposed services.
kubectl create: Kubectl is actually the management cli tool but in this case using -f /etc/kubernetes/addons/; sleep 2 is watching the /etc/kubernetes/addons/ folder and automatically creating any objects (pods, replication controllers, services, etc.) that are put in that folder.
kube-scheduler: Responsible for scheduling pods onto nodes. Uses policies and rules.
kube-controller-manager: Manages the state of the cluster by always making sure the current state and desired state are the same. This includes starting/stopping pods and creating objects (services, replication-controllers, etc) that do not yet exist or killing them if they shouldn't exist.
All of these services interact with the kube-apiserver which should be a separate service that coordinates all of the information these other services use. You'll need the apiserver running in order for all of the other components to do their jobs.
I won't go into the details of getting it running in your environment but from it looks like in the comments on your original thread you found some missing documentation to get it running.
Related
I am running an Elasticsearch container as Podman pod using podman play kube and a yaml definition of a pod. Pod is created, cluster of three nodes is created and everything works as expected. But: Podman pod dies after a few days of staying idle.
Podman podman ps command says:
ERRO[0000] Error refreshing container af05fafe31f6bfb00c2599255c47e35813ecf5af9bbe6760ae8a4abffd343627: error acquiring lock 1 for container af05fafe31f6bfb00c2599255c47e35813ecf5af9bbe6760ae8a4abffd343627: file exists
ERRO[0000] Error refreshing container b4620633d99f156bb59eb327a918220d67145f8198d1c42b90d81e6cc29cbd6b: error acquiring lock 2 for container b4620633d99f156bb59eb327a918220d67145f8198d1c42b90d81e6cc29cbd6b: file exists
ERRO[0000] Error refreshing pod 389b0c34313d9b23ecea3faa0e494e28413bd15566d66297efa9b5065e025262: error retrieving lock 0 for pod 389b0c34313d9b23ecea3faa0e494e28413bd15566d66297efa9b5065e025262: file exists
POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS
389b0c34313d elasticsearch-pod Created 1 week ago af05fafe31f6 2
What's weird is that the process is still listening if we try to find the process id listening on port 9200 or 9300:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp6 0 0 :::9200 :::* LISTEN 1328607/containers-
tcp6 0 0 :::9300 :::* LISTEN 1328607/containers-
The process ID that is hanging (and making the process still listening is):
user+ 1339220 0.0 0.1 45452 8284 ? S Jan11 2:19 /bin/slirp4netns --disable-host-loopback --mtu 65520 --enable-sandbox --enable-seccomp -c -e 3 -r 4 --netns-type=path /tmp/run-1002/netns/cni-e4bb2146-d04e-c3f1-9207-380a234efa1f tap0
The only actions I do to the pod is regular: podman pod stop, podman pod rm and podman play kube that is starting pod.
What can be causing such strange behaviour of Podman? What may be causing the lock not to be released properly?
System information:
NAME="Red Hat Enterprise Linux"
VERSION="8.3 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.3"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.3 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.3:GA"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.3"
Red Hat Enterprise Linux release 8.3 (Ootpa)
Red Hat Enterprise Linux release 8.3 (Ootpa)
Podman version:
podman --version
podman version 2.2.1
The workaround that worked for me is to add this configuration file from the Podman repository [1] under /usr/lib/tmpfiles.d/ and /etc/tmpfiles.d/, in this way we are preventing the removal of Podman temporary files from /tmp directory [2]. As stated in [3], additionally CNI leaves Network information in /var/lib/cni/networks when the system crashes or containers do not shut down properly. This behaviour has been fixed in the latest Podman release [4] and it happens when using rootless Podman.
Workaround
First, check the runRoot default directory set for your Podman rootless user:
podman info | grep runRoot
Create the temporary configuration file:
sudo vim /usr/lib/tmpfiles.d/podman.conf
Add the following content, replacing /tmp/podman-run-* by your default runRoot directory. E.g. If your output is /tmp/run-6695/containers then use: x /tmp/run-*
# /tmp/podman-run-* directory can contain content for Podman containers that have run
# for many days. This following line prevents systemd from removing this content.
x /tmp/podman-run-*
x /tmp/containers-user-*
D! /run/podman 0700 root root
D! /var/lib/cni/networks
Copy the temporary file from /usr/lib/tmpfiles.d to /etc/tmpfiles.d/
sudo cp -p /usr/lib/tmpfiles.d/podman.conf /etc/tmpfiles.d/
After you have done all the steps according to your configuration, the error should disappear.
References
https://github.com/containers/podman/blob/master/contrib/tmpfile/podman.conf
https://bugzilla.redhat.com/show_bug.cgi?id=1888988#c9
https://github.com/containers/podman/commit/2e0a9c453b03d2a372a3ab03b9720237e93a067c
https://github.com/containers/podman/pull/8241
I'm trying to set up a test cluster using etcd 2.3.7 installed from CentOS RPM on CentOS 7.1. On the Loader 1 I executed:
etcdctl member add loader2 http://10.11.51.231:2380
And received response which confirmed the operation completed successfully.
Similarly:
etcdctl member add loader3 http://10.11.51.231:2380
with all default settings, and here's what I see:
Loader 1 10.11.51.166
systemctl status etcd -ln1
etcd.service - Etcd Server
Loaded: loaded (/usr/lib/systemd/system/etcd.service; disabled)
Active: active (running) since Sun 2017-02-19 14:33:18 IST; 28min ago
Main PID: 19009 (etcd)
CGroup: /system.slice/etcd.service
└─19009 /usr/bin/etcd --name=default --data-dir=/var/lib/etcd/default.etcd --listen-client-urls=http://localhost:2379
Feb 19 15:02:03 loader3 etcd[19009]: cannot get the version of member a4803061db803edc (Get http://10.11.51.166:2380/version: dial tcp 10.11.51.166:2380: getsockopt: connection refused)
Tried to see cluster health:
etcdctl --debug cluster-health
Cluster-Endpoints: http://127.0.0.1:4001, http://127.0.0.1:2379
cURL Command: curl -X GET http://127.0.0.1:4001/v2/members
cURL Command: curl -X GET http://127.0.0.1:2379/v2/members
member ce2a822cea30bfca is unhealthy: got unhealthy result from http://localhost:2379
member da05b63349d818dc is unreachable: no available published client urls
cluster is unhealthy
Note how this ignores the two nodes added previously, but sends requests to random port on localhost...
Loader 2 10.11.51.174
At first this machine started OK, but after I saw there was something wrong with Loader 1, I tried adding Loader 1 as a member from this machine, and now I see the same picture on this machine too. I.e. it tries to query this 4001 port, where nobody responds. On all machines:
netstat -tupln | grep etcd
tcp 0 0 127.0.0.1:7001 0.0.0.0:* LISTEN 4507/etcd
tcp 0 0 127.0.0.1:2379 0.0.0.0:* LISTEN 4507/etcd
tcp 0 0 127.0.0.1:2380 0.0.0.0:* LISTEN 4507/etcd
Nobody listens on 4001...
Loader 3 10.11.51.231
On this loader I didn't try to add new members. So it looks like this:
etcdctl --debug cluster-health
Cluster-Endpoints: http://127.0.0.1:4001, http://127.0.0.1:2379
cURL Command: curl -X GET http://127.0.0.1:4001/v2/members
cURL Command: curl -X GET http://127.0.0.1:2379/v2/members
member ce2a822cea30bfca is healthy: got healthy result from http://localhost:2379
cluster is healthy
In other words it still sends requests to random port, but this time it isn't bothered by the fact that nobody replied...
Below is the contents of the configuration files:
cat /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=-/etc/etcd/etcd.conf
User=etcd
# set GOMAXPROCS to number of processors
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd --name=\"${ETCD_NAME}\" --data-dir=\"${ETCD_DATA_DIR}\" --listen-client-urls=\"${ETCD_LISTEN_CLIENT_URLS}\""
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
And:
cat /etc/etcd/etcd.conf
# [member]
ETCD_NAME=default
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
#ETCD_WAL_DIR=""
#ETCD_SNAPSHOT_COUNT="10000"
#ETCD_HEARTBEAT_INTERVAL="100"
#ETCD_ELECTION_TIMEOUT="1000"
#ETCD_LISTEN_PEER_URLS="http://localhost:2380"
ETCD_LISTEN_CLIENT_URLS="http://localhost:2379"
#ETCD_MAX_SNAPSHOTS="5"
#ETCD_MAX_WALS="5"
#ETCD_CORS=""
#
#[cluster]
#ETCD_INITIAL_ADVERTISE_PEER_URLS="http://localhost:2380"
# if you use different ETCD_NAME (e.g. test), set ETCD_INITIAL_CLUSTER value for this name, i.e. "test=http://..."
#ETCD_INITIAL_CLUSTER="default=http://localhost:2380"
#ETCD_INITIAL_CLUSTER_STATE="new"
#ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379"
#ETCD_DISCOVERY=""
#ETCD_DISCOVERY_SRV=""
#ETCD_DISCOVERY_FALLBACK="proxy"
#ETCD_DISCOVERY_PROXY=""
#ETCD_STRICT_RECONFIG_CHECK="false"
#
#[proxy]
#ETCD_PROXY="off"
#ETCD_PROXY_FAILURE_WAIT="5000"
#ETCD_PROXY_REFRESH_INTERVAL="30000"
#ETCD_PROXY_DIAL_TIMEOUT="1000"
#ETCD_PROXY_WRITE_TIMEOUT="5000"
#ETCD_PROXY_READ_TIMEOUT="0"
#
#[security]
#ETCD_CERT_FILE=""
#ETCD_KEY_FILE=""
#ETCD_CLIENT_CERT_AUTH="false"
#ETCD_TRUSTED_CA_FILE=""
#ETCD_PEER_CERT_FILE=""
#ETCD_PEER_KEY_FILE=""
#ETCD_PEER_CLIENT_CERT_AUTH="false"
#ETCD_PEER_TRUSTED_CA_FILE=""
#
#[logging]
#ETCD_DEBUG="false"
# examples for -log-package-levels etcdserver=WARNING,security=DEBUG
#ETCD_LOG_PACKAGE_LEVELS=""
#
#[profiling]
#ETCD_ENABLE_PPROF="false"
So... what is going on? The error messages given by etcd are the typical mindless nonsense produced by Go built-ins. The HTTP server that etcd uses is again, the Go built-in junk, that produces non-standard and absolutely worthless replies. So I cannot understand what was (if at all) misconfigured / missing.
I can able to connect postgres from terminal as well as python manage.py dbshell command
But when i'm trying to connect from apache i'm Getting error as follows.
Error : OperationalError: could not connect to server: Permission denied
Is the server running on host "192.168.1.10" and accepting
TCP/IP connections on port 5432?
My listen Address on postgress conf file is 192.168.1.10 Address
pg_hg_cong allowed host all all 192.168.0.0/24 trust
And also selinux turned httpd_can_network_connect_db on
Port is listening on 192.168.1.10:5432 on netstat output.
And database's are storing in /tmp directory
wxrwxrwx. 1 postgres postgres 0 Dec 18 07:40 .s.PGSQL.5432
-rw-------. 1 postgres postgres 50 Dec 18 07:40 .s.PGSQL.5432.lock
Actually I have enabled selinux httpd_can_network_connect_db parameters on db server instead of web server
So issue got solved after enabling httpd_can_network_connect_db on web server
We are encountering clock drift issues with our MongoDB replica set running on AWS. This just seemed to start happening recently after we added additional data to the set, before then we did not really notice this issue unless the system was under heavy load. The following error is logged in the mongod.log file sporadically and the system is not under load.
To test this we have isolated a set of machines with the same dataset and not in use by our web application though the error is still occurring;
2014-12-12T13:33:51.333+0000 [rsBackgroundSync] changing sync target
because current sync target's most recent OpTime is Dec 12 13:32:42:c
which is more than 30 seconds behind member mongo1:27017 whose most
recent OpTime is 1418391230
From the above the time stamp shows that one of the mongodb replica set members is over a minute behind. The worst we have seen is 12 minutes out of sync.
This error in turn causes replication lag and we receive the notification about this from the Mongo Monitoring Service although it does correct itself.
The setup is 3 x r3.xlarge AWS Linux instances, 1 in each availability zone of the EU-West-1A region. The machines have been setup using the Mongo recommended settings with a Raid array and the cloud formation scripts provided by Mongo. The data is around 4GB in size.
We think the issue is related to the NTP sync, by default on the AWS Linux Amazon Machine Image the ntpd service is configured to go to a pool of aws ntp servers hosted on www.pool.ntp.org.
To try and rule this out we setup our own NTP server on AWS that the MongoDB servers could sync to. The issue still occurred so we changed the maxpoll and minpoll time for the ntpd service on the mongo machines to sync the time every 16 seconds from the NTP server but the error is still occurring.
We increased the MongoDB OpLog size as well to see if that would make any difference but it didn’t.
Does anyone else encounter this type of issue? Is there something we are missing?
Cheers,
Colin.
ps -ef |grep ntp;
mongodb1
ntp 5163 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 15865 15839 0 09:31 pts/2 00:00:00 grep ntp
mongodb2
ntp 4834 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 19056 19029 0 09:31 pts/0 00:00:00 grep ntp
mongodb3
ntp 5795 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 26199 26173 0 09:31 pts/0 00:00:00 grep ntp
cat /etc/ntp.conf;
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.amazon.pool.ntp.org iburst dynamic
#server 1.amazon.pool.ntp.org iburst dynamic
#server 2.amazon.pool.ntp.org iburst dynamic
#server 3.amazon.pool.ntp.org iburst dynamic
server time-server.domain.com iburst
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
# Enable additional logging.
logconfig =clockall =peerall =sysall =syncall
# Listen only on the primary network interface.
interface listen eth0
interface ignore ipv6
ntpq -npcrv;
remote refid st t when poll reach delay offset jitter
==============================================================================
*172.31.14.137 91.*.*.* 3 u 557 1024 377 1.121 -0.264 0.161
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5#1.2349-o Sat Mar 23 00:37:31 UTC 2013 (1)",
processor="x86_64", system="Linux/3.14.23-22.44.amzn1.x86_64", leap=00,
stratum=4, precision=-23, rootdelay=23.597, rootdisp=109.962,
refid=172.31.14.137,
reftime=d83a757a.175b5fa1 Tue, Dec 16 2014 9:10:18.091,
clock=d83a77a7.82431efa Tue, Dec 16 2014 9:19:35.508, peer=27361,
tc=10, mintc=3, offset=-0.264, frequency=-13.994, sys_jitter=0.000,
clk_jitter=0.358, clk_wander=0.053
After upgrading to MongoDB 3 using the WiredTiger storage engine we do not see this issue any more.
I am running a 6 node cluster of cassandra 1.2 on an Amazon Web Service VPC with Oracle's 64-bit JVM version 1.7.0_10.
When I'm logged on to one of the nodes (ex. 10.0.12.200) I can run nodetool -h 10.0.12.200 status just fine.
However, if I try to use another ip address in the cluster (10.0.32.153) from that same terminal I get Failed to connect to '10.0.32.153:7199: Connection refused'.
On the 10.0.32.153 node I am trying to connect to I've made the following checks.
From 10.0.12.200 I can run telnet 10.0.32.153 7199 and I get a connection, so it doesn't appear to be a security group/firewall issue to port 7199.
On 10.0.32.153 if I run netstat -ant|grep 7199 I see
tcp 0 0 0.0.0.0:7199 0.0.0.0:* LISTEN
so cassandra does appear to be listening on the port
The cassandra-env.sh file on 10.0.32.153 has all of the JVM_OPTS for jmx active
-Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
The only shot in the dark I've seen while trying to solve this problem while searching the interwebs is to set the following:
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=10.0.32.153"
But when I do this I don't even get a response. It just hangs.
Any guidance would be greatly appreciated.
The issue did end up being a firewall/security group issue. While it is true that the jmx port 7199 is used, apparently other ports are used randomly for rmi. Cassandra port usage - how are the ports used?
So the solution is to open up the firewalls then configure the cassandra-env.sh to include
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=<ip>