Can not configure CoreOS cluster properly - coreos

I am stuck on configuring CoreOS cluster.
My cloud_config file is the next for both of VMs:
#cloud-config
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAA...
hostname: core001
coreos:
etcd2:
name: core001
discovery: https://discovery.etcd.io/86567bce070bd5316bdc9357ee2600de
# private networking need to use $public_ipv4:
advertise-client-urls: http://192.168.128.156:2379,http://192.168.128.156:4001
initial-advertise-peer-urls: http://192.168.128.156:2380
# listen on the official ports 2379, 2380 and one legacy port 4001:
listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
listen-peer-urls: http://192.168.128.156:2380
fleet:
public-ip: 192.168.128.156
units:
- name: etcd2.service
command: start
- name: fleet.service
command: start
write_files:
- path: /etc/systemd/network/enp0s8.network
permissions: 0644
owner: root
content: |
[Match]
Name=enp0s8
[Network]
Address=192.168.128.156/22
Gateway=192.168.128.1
users:
- name: test
passwd: $1$yxV9YDKT$s.fAj5dlFyrPwrH0xAQJy/
groups:
- sudo
- docker
#cloud-config
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1y...
hostname: core002
coreos:
etcd2:
name: core001
discovery: https://discovery.etcd.io/86567bce070bd5316bdc9357ee2600de
# private networking need to use $public_ipv4:
advertise-client-urls: http://192.168.128.157:2379,http://192.168.128.157:4001
initial-advertise-peer-urls: http://192.168.128.157:2380
# listen on the official ports 2379, 2380 and one legacy port 4001:
listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
listen-peer-urls: http://192.168.128.157:2380
fleet:
public-ip: 192.168.128.157
units:
- name: etcd2.service
command: start
- name: fleet.service
command: start
write_files:
- path: /etc/systemd/network/enp0s8.network
permissions: 0644
owner: root
content: |
[Match]
Name=enp0s8
[Network]
Address=192.168.128.157/22
Gateway=192.168.128.1
users:
- name: test
passwd: $1$yxV9YDKT$s.fAj5dlFyrPwrH0xAQJy/
groups:
- sudo
- docker
I have installed both of nodes successfully, but when I try to run:
core#core001 ~ $ fleetctl list-machines
MACHINE IP METADATA
cd08747e... 192.168.128.156 -
I get only one machine. The same on the second node:
core#core002 ~ $ fleetctl list-machines
MACHINE IP METADATA
753caf1b... 192.168.128.157 -
I suspect that there may be something wrong with etcd, but going through tons of google references - I didn't found anything useful for this case.
Could you please help me with this issue ?
I'm just starting with studying of CoreOS so some of aspects are unclear for me.
Thanks in advance

You have created two separate single-node etcd clusters. The etcd logs might have a hint as to why. I would guess that duplicating the name core001 may have contributed.

Related

Grafana Loki with promtail no json parsing

I have two Grafana Loki installations. Done with helm from official repository.
Both are exact the same configured (expecting DNS)
The only difference is, one is on Azure and one is on own Esxi.
The problem I have is the log file parsing. The installation on Azure seems to parse the log files always with - cri {} settings and not with - docker {}
A quick search inside the promtail pods show me inside the promtail.yaml the - docker {} setting. But I always get the output:
2023-01-16 10:39:15
2023-01-16T09:39:15.604384089Z stdout F {"level":50,"time":1673861955603,"service
On our own Esxi I have the correct:
2023-01-13 16:58:18
{"level":50,"time":1673625498068,"service"
From what I read the stdout F is -cri {} parsing, default by promtail.
Any idea why this happen? My installation yaml is:
#helm upgrade --install loki --namespace=monitoring grafana/loki-stack -f value_mus.yaml
grafana:
enabled: true
admin:
existingSecret: grafana-admin-credentials
sidecar:
datasources:
enabled: true
maxLines: 1000
image:
tag: latest
persistence:
enabled: true
size: 10Gi
storageClassName: managed-premium
accessModes:
- ReadWriteOnce
grafana.ini:
users:
default_theme: light
server:
domain: xxx
smtp:
enabled: true
from_address: xxx
from_name: Grafana Notification
host: xxx
user: xxx
password: xxx
skip_verify: false
startTLS_policy:
promtail:
enabled: true
config:
snippets:
pipelineStages:
- docker: {}
Any help will be welcome.

How to apply the changes of a Linux users group assignments inside a local ansible playbook?

I´m trying to install docker and create a docker image within a local ansible playbook containing multiple plays, adding the user to docker group in between:
- hosts: localhost
connection: local
become: yes
gather_facts: no
tasks:
- name: install docker
ansible.builtin.apt:
update_cache: yes
pkg:
- docker.io
- python3-docker
- name: Add current user to docker group
ansible.builtin.user:
name: "{{ lookup('env', 'USER') }}"
append: yes
groups: docker
- name: Ensure that docker service is running
ansible.builtin.service:
name: docker
state: started
- hosts: localhost
connection: local
gather_facts: no
tasks:
- name: Create docker container
community.docker.docker_container:
image: ...
name: ...
When executing this playbook with ansible-playbook I´m getting a permission denied error at the "Create docker container" task. Rebooting and calling the playbook again solves the error.
I have tried manually executing some of the commands suggested here and executing the playbook again which works, but I´d like to do everything from within the playbook.
Adding a task like
- name: allow user changes to take effect
ansible.builtin.shell:
cmd: exec sg docker newgrp `id -gn`
does not work.
How can I refresh the Linux user group assignments from within the playbook?
I´m on Ubuntu 18.04.

Azure Containers deployment - "Operation failed with status 200: Resource State Failed"

From Azure we try to create container using the Azure Container Instances with prepared YAML. From the machine where we execute az container create command we can login successfully to our private registry (e.g fa-docker-snapshot-local.docker.comp.dev on JFrog Artifactory ) after entering password and we can docker pull it as well
docker login fa-docker-snapshot-local.docker.comp.dev -u svc-faselect
Login succeeded
So we can pull it successfully and the image path is the same like when doing manually docker pull:
image: fa-docker-snapshot-local.docker.comp.dev/fa/ads:test1
We have YAML file for deploy, and trying to create container using the az command from the SAME server. In the YAML file we have set up the same registry information: server, username and password and the same image
az container create --resource-group FRONT-SELECT-NA2 --file ads-azure.yaml
When we try to execute this command, it takes for 30 minutes and after that message is displayed: "Deployment failed. Operation failed with status 200: Resource State Failed"
Full Yaml:
apiVersion: '2019-12-01'
location: eastus2
name: ads-test-group
properties:
containers:
- name: front-arena-ads-test
properties:
image: fa-docker-snapshot-local.docker.comp.dev/fa/ads:test1
environmentVariables:
- name: 'DBTYPE'
value: 'odbc'
command:
- /opt/front/arena/sbin/ads_start
- ads_start
- '-unicode'
- '-db_server test01'
- '-db_name HEDGE2_ADM_Test1'
- '-db_user sqldbadmin'
- '-db_password pass'
- '-db_client_user HEDGE2_ADM_Test1'
- '-db_client_password Password55'
ports:
- port: 9000
protocol: TCP
resources:
requests:
cpu: 1.0
memoryInGB: 4
volumeMounts:
- mountPath: /opt/front/arena/host
name: ads-filesharevolume
imageRegistryCredentials: # Credentials to pull a private image
- server: fa-docker-snapshot-local.docker.comp.dev
username: svcacct-faselect
password: test
ipAddress:
type: Private
ports:
- protocol: tcp
port: '9000'
volumes:
- name: ads-filesharevolume
azureFile:
sharename: azurecontainershare
storageAccountName: frontarenastorage
storageAccountKey: kdUDK97MEB308N=
networkProfile:
id: /subscriptions/746feu-1537-1007-b705-0f895fc0f7ea/resourceGroups/SELECT-NA2/providers/Microsoft.Network/networkProfiles/fa-aci-test-networkProfile
osType: Linux
restartPolicy: Always
tags: null
type: Microsoft.ContainerInstance/containerGroups
Can you please help us why this error occurs?
Thank you
According to my knowledge, there is nothing wrong with your YAML file, I only can give you some possible reasons.
Make sure the configurations are all right, the server URL, username, and password, also include the image name and tag;
Change the port from '9000' into 9000``, I mean remove the double quotes;
Take a look at the Note, maybe the mount volume makes a crash to the container. Then you need to mount the file share to a new folder, I mean the new folder that does not exist before.

kubernetes securitycontext runAsNonRoot Not working

I am testing with securityContext but I cant start a pod when I set runAsNonRoot to true.
I use vagrant to deploy a master and two minions and ssh to the host machine as the user abdelghani :
id $USER
uid=1001(abdelghani) gid=1001(abdelghani) groups=1001(abdelghani),27(sudo)
Cluster information:
Kubernetes version: 4.4.0-185-generic
Cloud being used: (put bare-metal if not on a public cloud)
Installation method: manual
Host OS: ubuntu16.04.6
CNI and version:
CRI and version:
apiVersion: v1
kind: Pod
metadata:
name: buggypod
spec:
containers:
- name: container
image: nginx
securityContext:
runAsNonRoot: true
I do :
kubectl apply -f pod.yml
it says pod mybugypod created but when I check with :
kubectl get pods
the pod’s status is CreateContainerConfigError
what is it I am doing wrong?
I try to run the pod based on your requirement. And the reason it failed is the Nginx require to modify some configuration in /etc/ owned by root and when you runAsNonRoot it fails as it cannot edit the Nginx default config.
This is the error you actually get when you run it.
10-listen-on-ipv6-by-default.sh: error: can not modify /etc/nginx/conf.d/default.conf (read-only file system?)
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2020/08/13 17:28:55 [warn] 1#1: the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /etc/nginx/nginx.conf:2
nginx: [warn] the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /etc/nginx/nginx.conf:2
2020/08/13 17:28:55 [emerg] 1#1: mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)
nginx: [emerg] mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)
The spec I ran.
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: buggypod
name: buggypod
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
containers:
- image: nginx
name: buggypod
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
My suggestion is you create a custom Nginx image with a Dockerfile that also creates user and provides permissions to the folders /var/cache/nginx, /etc/nginx/conf.d, /var/log/nginx for the newly created user. Such that you achieve running the container as Non-Root.
Nginx service will expect a read and write permission to its configuration path (/etc/nginx) by default non root user would have that access to the path that is the reason it is failing.
You just set runAsNonRoot but you can't expect or guarantee that container will start the service as user 1001. Please try setting runAsUser explicitly to 1001 like below, this should resolve your issue.
apiVersion: v1
kind: Pod
metadata:
name: buggypod
spec:
containers:
- name: container
image: nginx
securityContext:
runAsUser: 1001

Error: release helm-kibana-security failed: timed out waiting for the condition

I have been using helm chart to install elasticserahc and kibana into kubernetes,
using the defualt configuration everything went ok but I want to enable the security on both elasticsearch and kibana
I didi what's recommanded in the documentation , the security was enabled for elasticsearch but I have probleme upgrading kibana with security configuratuion it gives me this error :
Error: release helm-kibana-security failed: timed out waiting for the condition
once I run make ( from /kibana/examples/security )
I even tried to install is directly without using the Makefile :
helm install --wait --timeout=600 --values ./security.yml --name helm-kibana-security ../../
but having the same issue , can any one help me please
"failed: timed out waiting for the condition"
This message occurs when you install a release with the --wait flag, however, the pods are unable to start for some reason.
The problem is most likely in "./security.yml"
try running the below commands to debug the issue:
kubectl describe pod kibana-pod-name
kubectl logs kibana-pod-name
this is the security.yml file
---
elasticsearchHosts: "https://security-master:9200"
extraEnvs:
- name: 'ELASTICSEARCH_USERNAME'
valueFrom:
secretKeyRef:
name: elastic-credentials
key: username
- name: 'ELASTICSEARCH_PASSWORD'
valueFrom:
secretKeyRef:
name: elastic-credentials
key: password
kibanaConfig:
kibana.yml: |
server.ssl:
enabled: true
key: /usr/share/kibana/config/certs/kibana/kibana.key
certificate: /usr/share/kibana/config/certs/kibana/kibana.crt
xpack.security.encryptionKey: something_at_least_32_characters
elasticsearch.ssl:
certificateAuthorities: /usr/share/kibana/config/certs/elastic-certificate.pem
verificationMode: certificate
protocol: https
secretMounts:
- name: elastic-certificate-pem
secretName: elastic-certificate-pem
path: /usr/share/kibana/config/certs
- name: kibana-certificates
secretName: kibana-certificates
path: /usr/share/kibana/config/certs/kibana

Resources