Generating Certificates using Certbot (as a CronJob) within Kubernetes - azure

I've unsuccessfully been trying to create a simple "run" command using kubectl whereby the container is started and passed in the (partial) arguments to initially create my certificates (which I will initially do manually through PowerShell) and could do with some input from the community.
My Environment:
(Local) Windows 10 with PowerShell
(Remote) Azure Kubernetes Cluster
My efforts consist of two key commands, the first being the creation of the overrides (in JSON) for the container (primarily so I can mount the Azure File Shares where I want certificates to be stored):
$override= '{ "spec": { "template": { "spec": { "containers": [ { "name": "certbot", "image": "certbot/certbot", "stdin": true, "tty": true, "volumeMounts": [{ "name": "certdata", "mountPath": "/etc/letsencrypt" }] } ], "volumes": [{ "name": "certdata", "persistentVolumeClaim": { "claimName": "azure-fileshare" } }] } } } }' | ConvertTo-Json
The second is then the kubectl run command which would be used as the basis for the CronJob (creating the CronJob itself is my next task once I've gotten this working correctly):
kubectl run -i --rm --tty certbot --namespace=prod --overrides=$override --image=certbot/certbot -- certonly --manual
I've been trying a number of variations, and this approach seems the cleanest. However, I'm currently getting the following response from Kubernetes:
Error attaching, falling back to logs: unable to upgrade connection: container certbot not found in pod certbot-9df67bd65-w96rq_prod
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Certbot doesn't know how to automatically configure the web server on this system. However, it can still get a certificate for you. Please run "certbot certonly" to do so. You'll need to manually configure your web server to use the resulting certificate.
The latter part of the warning indicates that certbot is not receiving any of the arguments (in this case "certonly" and "--manual"), but I can't figure out quite where I'm going wrong. I feel like I've sanity checked the commands with both the Kubernetes & certbot docs and can't see any obvious issues.
Can anyone point out the gremlin here?
Note: I've successfully tested this approach using Docker locally, and am now trying to recreate this within Azure.

You dont need to create a image from image to do that, just create a pod like this:
apiVersion: v1
kind: Pod
metadata:
name: certbot
spec:
containers:
- name: certbot
image: certbot/certbot
command: ["/bin/sh"] << this overrides entrypoint
restartPolicy: Never
https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/

Related

Authentication failed: ldap operation failed: unable to retrieve user bind DN

I'm really stuck here. I inherited a system which stores secrets in a Hashicorp vault, and I'm getting this error, Authentication failed: ldap operation failed: unable to retrieve user bind DN
I am not sure how to resolve this issue, and have been Googling for hours, and trying a lot of things.
I did see the post at ref. [A], but it isn't helpful.
Also the post at ref [B] gives some information about setting the binddn, but in the classic way to frustrate a new user, doesn't say where, how, or give any examples.
Hashicorp Vault v1.6.x
The vault is running on a docker container, on an AWS EC2.
... I have the .pem file, and am able to ssh into the EC2
. I am able to ssh into the docker container with root priv, like so:
... docker exec -it 123abc123abc sh
On the container, some vault commands work; e.g:
... vault version
--> Vault v1.6.0 (123asdf1234adsf1234adsf1234adsf13w4radsf1234asdff)
It is using ldap configuration
When trying to retrieve config and other info, I get this message:
... a. "* missing client token"
How to proceed?
I'm not an expert with this, and would appreciate clear, full, command-line examples.
Thanks for your help.
Sincerely,
Keith
DOCKER COMPOSE FILE
$ cat docker-compose.yml
version: '3'
services:
vault:
image: vault:1.6.0
cap_add:
- IPC_LOCK
environment:
- VAULT_ADDR=http://127.0.0.1:8200
command: vault server -config=/vault/config/config.json
ports:
- 80:8200
volumes:
- vault-data:/vault
- ./config.json:/vault/config/config.json
volumes:
vault-data:ubuntu#ip-192-0-2-1:/home/tarjan-docker
VAULT CONFIG
/vault/config # cat config.json
{
"backend": {
"file": {
"path": "/vault/data"
}
},
"listener": {
"tcp":{
"address": "0.0.0.0:8200",
"tls_disable": 1
}
},
"default_lease_ttl": "30m",
"max_lease_ttl": "30m",
"log_level": "info",
"ui": true
}
A. https://discuss.hashicorp.com/t/ldap-operation-failed-unable-to-retrieve-user-bind-dn/12926
B. https://support.hashicorp.com/hc/en-us/articles/5289574376083-Receiving-ldap-operation-failed-failed-to-bind-as-user-error-when-logging-in-via-LDAP-authentication-method
https://discuss.hashicorp.com/t/authentication-failed-ldap-operation-failed-unable-to-retrieve-user-bind-dn/50123

containerd error "failed to find user by uid" when creating ejbca docker container on azure

When I try to create an Azure container instance for EJBCA-ce I get an error and cannot see any logs.
I expect the following result :
But I get the following error :
Failed to start container my-azure-container-resource-name, Error response: to create containerd task: failed to create container e9e48a_________ffba97: guest RPC failure: failed to find user by uid: 10001: expected exactly 1 user matched '0': unknown
Some context:
I run the container on azure cloud container instance
I tried
from ARM template
from Azure Portal.
with file share mounted
with database env variable
without any env variables
It runs fine locally using the same env variable (database configuration).
It used to run with the same configuration a couple weeks ago.
Here are some logs I get when I attach the container group from az cli.
(count: 1) (last timestamp: 2020-11-03 16:04:32+00:00) pulling image "primekey/ejbca-ce:6.15.2.3"
(count: 1) (last timestamp: 2020-11-03 16:04:37+00:00) Successfully pulled image "primekey/ejbca-ce:6.15.2.3"
(count: 28) (last timestamp: 2020-11-03 16:27:52+00:00) Error: Failed to start container aci-pulsy-ccm-ejbca-snd, Error response: to create containerd task: failed to create container e9e48a06807fba124dc29633dab10f6229fdc5583a95eb2b79467fe7cdffba97: guest RPC failure: failed to find user by uid: 10001: expected exactly 1 user matched '0': unknown
An extract of the dockerfile from dockerhub
I suspect the issue might be related to the commands USER 0 and USER 10001 we found several times in the dockerfile.
COPY dir:89ead00b20d79e0110fefa4ac30a827722309baa7d7d74bf99910b35c665d200 in /
/bin/sh -c rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
CMD ["/bin/bash"]
USER 0
COPY dir:893e424bc63d1872ee580dfed4125a0bef1fa452b8ae89aa267d83063ce36025 in /opt/primekey
COPY dir:756f0fe274b13cf418a2e3222e3f6c2e676b174f747ac059a95711db0097f283 in /licenses
USER 10001
CMD ["/opt/primekey/wildfly-14.0.1.Final/bin/standalone.sh" "-b" "0.0.0.0"
MAINTAINER PrimeKey Solutions AB
ARG releaseTag
ARG releaseEdition
ARM template
{
"type": "Microsoft.ContainerInstance/containerGroups",
"apiVersion": "2019-12-01",
"name": "[variables('ejbcaContainerGroupName')]",
"location": "[parameters('location')]",
"tags": "[variables('tags')]",
"dependsOn": [
"[resourceId('Microsoft.DBforMariaDB/servers', variables('ejbcaMariadbServerName'))]",
"[resourceId('Microsoft.DBforMariaDB/servers/databases', variables('ejbcaMariadbServerName'), variables('ejbcaMariadbDatabaseName'))]"
],
"properties": {
"sku": "Standard",
"containers": [
{
"name": "[variables('ejbcaContainerName')]",
"properties": {
"image": "primekey/ejbca-ce:6.15.2.3",
"ports": [
{
"protocol": "TCP",
"port": 443
},
{
"protocol": "TCP",
"port": 8443
}
],
"environmentVariables": [
{
"name": "DATABASE_USER",
"value": "[concat(parameters('mariadbUser'),'#', variables('ejbcaMariadbServerName'))]"
},
{
"name": "DATABASE_JDBC_URL",
"value": "[variables('ejbcaEnvVariableJdbcUrl')]"
},
{
"name": "DATABASE_PASSWORD",
"secureValue": "[parameters('mariadbAdminPassword')]"
}
],
"resources": {
"requests": {
"memoryInGB": 1.5,
"cpu": 2
}
}
,
"volumeMounts": [
{
"name": "certificates",
"mountPath": "/mnt/external/secrets"
}
]
}
}
],
"initContainers": [],
"restartPolicy": "OnFailure",
"ipAddress": {
"ports": [
{
"protocol": "TCP",
"port": 443
},
{
"protocol": "TCP",
"port": 8443
}
],
"type": "Public",
"dnsNameLabel": "[parameters('ejbcaContainerGroupDNSLabel')]"
},
"osType": "Linux",
"volumes": [
{
"name": "certificates",
"azureFile": {
"shareName": "[parameters('ejbcaCertsFileShareName')]",
"storageAccountName": "[parameters('ejbcaStorageAccountName')]",
"storageAccountKey": "[parameters('ejbcaStorageAccountKey')]"
}
}
]
}
}
It runs fine on my local machine on linux (ubuntu 20.04)
docker run -it --rm -p 8080:8080 -p 8443:8443 -h localhost -e DATABASE_USER="mymaridbuser#my-db" -e DATABASE_JDBC_URL="jdbc:mariadb://my-azure-domain.mariadb.database.azure.com:3306/ejbca?useSSL=true" -e DATABASE_PASSWORD="my-pwd" primekey/ejbca-ce:6.15.2.3
In the EJBCA-ce container image, I think they are trying to provide an user different than root to run the EJBCA server. According to the Docker documentation:
The USER instruction sets the user name (or UID) and optionally the user group (or GID) to use when running the image and for any RUN, CMD and ENTRYPOINT instructions that follow it in the Dockerfile
In the Dockerfile they reference two users, root, corresponding to UID 0, and another one, with UID 10001.
Typically, in Linux and UNIX systems, UIDs can be organized in different ranges: it is largely dependent on the concrete operating system and user management praxis, but it is very likely that the first user account created in a linux system will be assigned to UID 1001 or 10001, like in this case. Please, see for instance the UID entry in wikipedia or this article.
AFAIK, the USER indicated does not need to exist in your container to run it correctly: in fact, if you run it locally, it will start without further problem.
The user with UID 10001 will be actually setup in your container by the script that is run in the CMD defined in the Dockerfile, /opt/primekey/bin/start.sh, by this code fragment:
if ! whoami &> /dev/null; then
if [ -w /etc/passwd ]; then
echo "${APPLICATION_NAME}:x:$(id -u):0:${APPLICATION_NAME} user:/opt:/sbin/nologin" >> /etc/passwd
fi
fi
Please, be aware that APPLICATION_NAME in this context takes the value ejbca and that the user which runs this script, as indicated in the Dockerfile, is 10001. That will be the value provided by the command id -u in this code.
You can verify it if you run your container locally:
docker run -it -p 8080:8080 -p 8443:8443 -h localhost primekey/ejbca-ce:6.15.2.3
And initiate bash into it:
docker exec -it container_name /bin/bash
If you run whoami, it will tell you ejbca.
If you run id it will give you the following output:
uid=10001(ejbca) gid=0(root) groups=0(root)
You can verify the user existence in the /etc/passwd as well:
bash-4.2$ cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
ejbca:x:10001:0:ejbca user:/opt:/sbin/nologin
The reason why Pierre did not get this output is because he ran the container overwriting the provided CMD and, as a consequence, not executing the start.sh script responsible of the user creation, as above mentioned.
For any reason, and this is where my knowledge fails me, when Azure is trying to run your container, it is failing because the USER 10001 identified in the Dockerfile does not exist.
I think it could be related with the use of containerd instead of docker.
The error reported by Azure seems related with the Microsoft project opengcs.
They say about the project:
Open Guest Compute Service is a Linux open source project to further the development of a production quality implementation of Linux Hyper-V container on Windows (LCOW). It's designed to run inside a custom Linux OS for supporting Linux container payload.
And:
The focus of LCOW v2 as a replacement of LCOW v1 is through the coordination and work that has gone into containerd/containerd and its Runtime V2 interface. To see our containerd hostside shim please look here Microsoft/hcsshim/cmd/containerd-shim-runhcs-v1.
The error you see in the console is raised by the spec.go file that you can find in their code base, when they are trying to establish the user on behalf of whom the container process should be run:
func setUserID(spec *oci.Spec, uid int) error {
u, err := getUser(spec, func(u user.User) bool {
return u.Uid == uid
})
if err != nil {
return errors.Wrapf(err, "failed to find user by uid: %d", uid)
}
spec.Process.User.UID, spec.Process.User.GID = uint32(u.Uid), uint32(u.Gid)
return nil
}
This code is executed by this other code fragment - you can see the full function code here:
parts := strings.Split(userstr, ":")
switch len(parts) {
case 1:
v, err := strconv.Atoi(parts[0])
if err != nil {
// evaluate username to uid/gid
return setUsername(spec, userstr)
}
return setUserID(spec, int(v))
And the getUser function:
func getUser(spec *oci.Spec, filter func(user.User) bool) (user.User, error) {
users, err := user.ParsePasswdFileFilter(filepath.Join(spec.Root.Path, "/etc/passwd"), filter)
if err != nil {
return user.User{}, err
}
if len(users) != 1 {
return user.User{}, errors.Errorf("expected exactly 1 user matched '%d'", len(users))
}
return users[0], nil
}
As you can see, these are exactly the errors that Azure is reporting you.
As a summary, I think they are providing a Windows LCOW solution that conforms to the OCI Image Format Specification suitable to run containers with containerd.
As you indicated if It used to run with the same configuration a couple weeks ago my best guest is that, perhaps, they switched your containers from a pure Linux containerd runtime implementation to one based in Windows and in the above mentioned software, and this is why you containers are now failing.
A possible workaround could be to create a custom image based on the official provided by PrimeKey and create the user 10001, as also Pierre pointed out.
To accomplish this task, first, create a new custom Dockerfile. You can try, for instance:
FROM primekey/ejbca-ce:6.15.2.3
USER 0
RUN echo "ejbca:x:10001:0:ejbca user:/opt:/sbin/nologin" >> /etc/passwd
USER 10001
Please, note that you may need to define some of the environment variables from the official EJBCA image.
With this Dockerfile you can build your image with docker or docker compose with an appropriate docker-compose.yaml file, something like:
version: "3"
services:
ejbca:
image: <your repository>/ejbca
build: .
ports:
- "8080:8080"
- "8443:8443"
Please, customize it as you consider appropriate.
With this setup the new container will still run properly in a local environment in the same way as the original one: I hope it will be also the case in Azure.
User with UID 10001 does not exists in your image. This does not prevent USER command in your Dockerfile to work or the image to be invalid itself, but it seems to cause issues with Azure container.
I cannot find doc or any reference on why it doesn't work on Azure (will update if so), but adding the user in the image should solve the issue. Try adding something like this in your Dockerfile to create user with UID 10001 (this must be done as root, i.e. with user 0) :
useradd -u 10001 myuser
Additional notes to see user 10001 does not exists:
# When running container, not recognized by system
$ docker run docker.io/primekey/ejbca-ce:6.15.2.3 whoami
whoami: cannot find name for user ID 10001
# Not present in /etc/passwd
$ docker run docker.io/primekey/ejbca-ce:6.15.2.3 cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin

Can hyperledger-fabric get the peer node running status without entering the docker container?

Can hyperledger-fabric get the peer node running status without entering the docker container? If so, how should I get it?
In docker-compose file, for peer service add following env variable. (You may add a different port for different services)
- CORE_OPERATIONS_LISTENADDRESS=0.0.0.0:9440
expose the port(You may expose port number as per availability). Export different port for different peer
ports:
- 9440:9440
Once all services up hit the following path for specific service(As per port defined)
curl -X GET localhost:9440/healthz
You will get a following response if the service is running.
{
"status": "OK",
"time": "2009-11-10T23:00:00Z"
}
If service is not available, you will get the following response.
{
"status": "Service Unavailable",
"time": "2009-11-10T23:00:00Z",
"failed_checks": [
{
"component": "docker",
"reason": "failed to connect to Docker daemon: invalid endpoint"
}
]
}
The Operations Service might be what you are looking for, the simple check is for "Health" and the more complex check is to look the "metrics".
It is covered in the Fabric docs.

Ansible service module returns service status as stopped when the service is actually running

Trying to stop a service (dse datastax enterprise) using ansible 2.7
- name: Stop service dse, if started
service:
name: dse
state: stopped
What I think ansible is saying is, I'm not doing anything because this service is already stopped. Part of the verbose output:
ok: [myhostname.domain.com] => {
"changed": false,
"invocation": {
"module_args": {
"daemon_reload": false,
"enabled": null,
"force": null,
"masked": null,
"name": "dse",
"no_block": false,
"scope": null,
"state": "stopped",
"user": null
}
},
"name": "dse",
"state": "stopped",
When I check the service on the remote host this is what I see
[user#remotehost ~]$ service dse status
dse is running
So what am I missing here?
FYI it's recommended doing a sudo service dse stop for this service, I don't know if lack of the sudo will make such a difference.
My understanding of this is since I do not have an unrestricted sudo and I do not have the ability to execute in /bin/sh thus it is failing.
The same command works when directly run on the server, and that is because
Ansible sends Python code to be executed on the targeted servers. Since Ansible is running Python code and generally not executing system commands directly, you can't limit system commands with sudo and expect them to work with Ansible.
More: https://gist.github.com/nanobeep/3b3d614a709086ff832a
Not sure everyone has this luxury but in my case modifying the sudoers file
from
TheGroupNameImPartOf ALL= ALL, !SU, !SHELLS
to
TheGroupNameImPartOf ALL= ALL
Did the magic!

How can I run a Docker container in AWS Elastic Beanstalk with non-default run parameters?

I have a Docker container that runs great on my local development machine. I would like to move this to AWS Elastic Beanstalk, but I am running into a small bit of trouble.
I am trying to mount an S3 bucket to my container by using s3fs. I have the Dockerfile:
FROM tomcat:7.0
MAINTAINER me#example.com
RUN apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y build-essential libfuse-dev libcurl4-openssl-dev libxml++2.6-dev libssl-dev mime-support automake libtool wget tar
# Add the java source
ADD . /path/to/tomcat/webapps/
ADD run_docker.sh /root/run_docker.sh
WORKDIR $CATALINA_HOME
EXPOSE 8080
CMD ["/root/run_docker.sh"]
And I install s3fs, mount an S3 bucket, and run the Tomcat server after the image has been created, by running run_docker.sh:
#!/bin/bash
#run_docker.sh
wget https://github.com/s3fs-fuse/s3fs-fuse/archive/master.zip -O /usr/src/master.zip;
cd /usr/src/;
unzip /usr/src/master.zip;
cd /usr/src/s3fs-fuse-master;
autoreconf --install;
CPPFLAGS=-I/usr/include/libxml2/ /usr/src/s3fs-fuse-master/configure;
make;
make install;
cd $CATALINA_HOME;
mkdir /opt/s3-files;
s3fs my-bucket /opt/s3-files;
catalina.sh run
When I build and run this Docker container using the command:
docker run --cap-add mknod --cap-add sys_admin --device=/dev/fuse -p 80:8080 -d username/mycontainer:latest
it works well. Yet, when I remove the --cap-add mknod --cap-add sys_admin --device=/dev/fuse, then s3fs fails to mount my S3 bucket.
Now, I would like to run this on AWS Elastic Beanstalk, and when I deploy the container (and run run_docker.sh), all the steps execute fine, except the step s3fs my-bucket /opt/s3-files in run_docker.sh fails to mount the bucket.
Presumably, this is because whatever Elastic Beanstalk does to run a Docker container, it doesn't add any additional flags like, --cap-add mknod --cap-add sys_admin --device=/dev/fuse.
My Dockerrun.aws.json file looks like:
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "tomcat:7.0"
},
"Ports": [
{
"ContainerPort": "8080"
}
]
}
Is it possible to add additional docker run flags to an AWS EB Docker deployment?
An alternative option is to find another way to mount an S3 bucket, but I suspect I'd run into similar permission errors regardless. Has anyone seen any way to accomplish this???
UPDATE:
For people trying to use #Egor's answer below, it works when the EB configuration is set to use v1.4.0 running Docker 1.6.0. Anything past the v1.4.0 version fails. So to make it work, build your environment as normal (which should give you a failed build), then rebuild it with a v1.4.0 running Docker 1.6.0 configuration. That should do it!
If you are using the latest version of aws docker stack (docker 1.7.1 for example), you'll need to slightly modify the above answer. Try this:
commands:
00001_add_privileged:
cwd: /tmp
command: 'sed -i "s/docker run -d/docker run --privileged -d/" /opt/elasticbeanstalk/hooks/appdeploy/enact/00run.sh'
Notice the change of location && name of the run script
Add file .ebextensions/01-commands.config
container_commands:
00001-docker-privileged: command: 'sed -i "s/docker run -d/docker run --privileged -d/" /opt/elasticbeanstalk/hooks/appdeploy/pre/04run.sh'
I am also using s3fs
Thanks elijahchancey for answer it was much helpful. I would just like to add small comment:
Elasticbeanstalk is now using ECS tasks to deploy and manage application cluster. There is very important paragraph in Multicontainer Docker Configuration
docs (which I originally missed).
The following examples show a subset of parameters that are commonly used. More optional parameters are available. For more information on the task definition format and a full list of task definition parameters, see Amazon ECS Task Definitions in the Amazon ECS Developer Guide.
So the document is not complete reference but it just shows typical entries and you are supposed to find more elsewhere. This has quite major impact because now (2018) you are able to specify more options and you don't need to hack ebextensions any more. Only thing you need to do is to use task parameter in containerDefinitions of your multi docker Dockerrun.aws.json.
This is not mentioned in single docker containers but one can try and verify...
Example of multi docker Dockerrun.aws.json with extra cap:
{
"AWSEBDockerrunVersion": 2,
"containerDefinitions": [
{
"name": "service1",
"image": "myapp/service1:latest",
"essential": true,
"memoryReservation": 128,
"portMappings": [
{
"hostPort": 8080,
"containerPort": 8080
}
],
"linuxParameters": {
"capabilities": {
"add": [
"SYS_PTRACE"
]
}
}
}
]
}
You can now add capabilities using the task definition. Here are the docs:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html
This is specifically what you would add to your task definition:
"linuxParameters": {
"capabilities": {
"add": [
"SYS_PTRACE"
]
}
},

Resources