Azure DSVM: Cannot connect to the Docker daemon - azure

We have been using Data Science Virtual Machine in combination with Virtual Machine scale set for our CI and then running custom Docker image in connected Azure pipelines.
https://github.com/PyTorchLightning/metrics/blob/77e252ec6165ec94e23ce5c5cf9ffdad01bf54a1/azure-pipelines.yml#L29
Recently we are observing the following failer message
Starting: Initialize containers
/usr/bin/docker version --format '{{.Server.APIVersion}}'
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
'
##[error]Exit code 1 returned from process: file name '/usr/bin/docker', arguments 'version --format '{{.Server.APIVersion}}''.
see the full output here - https://dev.azure.com/PytorchLightning/Metrics/_build/results?buildId=9061&view=logs&j=fd70b5b8-241a-53bf-d137-3fd86cf9f066&t=a0ca1fe4-fde6-4a82-9888-52f5ae79d8fe
UPDATE: the issue was solved in June 2021 release,
see Azure DSVM release notes

Based on the discussion on the post above, the solution (for now) is to pin the version of the scale set image to a previous version:
az vmss update -g <resource group> -n <vmss name> --set virtualMachineProfile.storageProfile.imageReference.version=21.01.21
Docker appears to be disabled in the latest version of the DSVM. Until that is corrected, pin the version. In general, for stability, pinning the version is probably a good idea and then be deliberate about when you change versions so that you know what is going on.

The docker is enabled by default on the latest image release (21.06.01) of Data Science Virtual Machine - Ubuntu 18. This should probably resolve this issue.

Below command is working on the latest Data Science Virtual Machine.
/usr/bin/docker --version
Docker version 20.10.6+azure, build 370c28948e3c12dce3d1df60b6f184990618553f
However above command output works, we need to start docker daemon using the below commands:
sudo systemctl unmask docker
sudo systemctl start docker
sudo chmod 777 /var/run/docker.sock

Related

Podman-docker container has not started while executing the run command in linux ( Rhel 8 ) server

While trying to run the podman docker container in Linux server (Rhel 8) facing below issue.
WARN[0000] error mounting subscriptions, skipping entry in /usr/share/containers/mounts.conf: getting host subscription data: failed to read subscriptions from "/usr/share/rhel/secrets": open /usr/share/rhel/secrets/redhat.repo: permission denied
Execution command: podman run -d --name redis_server -p 6377:6377 redis
I have followed these steps to run the container
Could you please suggest a solution to this issue?
giving reference as this solved my issue quoting answer:
I solved my specific problem. The original user account I was using had an empty mounts.conf file (copy the one in usr/share/containers).
use touch ~/.config/containers/mounts.conf
1874621 – Rootless Podman Unable to Use Host Subscriptions

Unused Docker containers stuck in Removal in Progress state. Device or Resource Busy

I'm running Docker version 20.10.5 on a Centos 7 Box. I stopped my project with docker-compose down and every container had the same message -
Error response from daemon: container <container ID>: driver "overlay2" failed to remove root filesystem: unlinkat /var/lib/docker/overlay2/<long number>/merged: device or resource busy
I've stopped the daemon, I've reinstalled Docker, I've tried umount, lsof, kill, and all the docker go-away commands including system prune but still they hang on.
(After re-installing Docker the status changes to Dead. When I try to delete the zombie containers their status changes to Removal In Progress)
How can I get rid of these containers?
For people who have similar issues:
So I had similar issue where
docker rm -f <docker name> was hanging
The only thing that helped me was:
service docker restart
On Ubuntu 22.04 with Docker engine 23.0.0 neither stopping/starting the Docker service nor a docker system prune removed the containers (still showing Removal in progress). The solution as outlined here was to manually remove the volumes associated with the container(s):
sudo service docker stop
sudo -i
cd /var/lib/docker/containers
rm -rf <container id>
sudo service docker start

Docker cannot login to azurecr.io

On Docker version 17.09.0-ce, build afdb6d4 (running on Mac OS 10.12.5) I'm having the following error when I run docker login <proj>.azurecr.io:
Warning: failed to get default registry endpoint from daemon (Cannot
connect to the Docker daemon at unix:///var/run/docker.sock. Is the
docker daemon running?). Using system default:
https://index.docker.io/v1/
This is after I input the username and password that I retrived using az acr. I've done this same process in the past and now it's not working anymore.
How can I debug this and login/pull images again?
Summary: I believe I just needed to restart Docker.
First, I turned on debugging by adding { "debug": true } to /etc/docker/daemon.json. Resource here. This probably wasn't needed
Second, I restarted docker from the mac terminal with osascript -e 'quit app "Docker"' followed by open -a Docker, details found here.
I suggest you need restart your docker service, please refer to this similar issue.
https://github.com/yegor256/rultor/issues/1041

docker build fails on a cloud VM

I have an Ubuntu 16.04 (Xenial) running inside an Azure VM. I have followed the instructions to install Docker and all seems fine and dandy.
One of the things that I need to do when I trigger docker run is to pass --net=host, which allows me to run apt-get update and other internet-dependent commands within the container.
The problem comes in when I try to trigger docker build based on an existing Ubuntu image. It fails:
The problem here is that there is no way to pass --net=host to the build command. I see that there are issues open on the Docker GitHub (#20987, #10324) but no clear resolution.
There is an existing answer on Stack Overflow that covers the scenario I want, but that doesn't work within a cloud VM.
Any thoughts on what might be happening?
UPDATE 1:
Here is the docker version output:
Client:
Version: 1.12.0
API version: 1.24
Go version: go1.6.3
Git commit: 8eab29e
Built: Thu Jul 28 22:11:10 2016
OS/Arch: linux/amd64
Server:
Version: 1.12.0
API version: 1.24
Go version: go1.6.3
Git commit: 8eab29e
Built: Thu Jul 28 22:11:10 2016
OS/Arch: linux/amd64
UPDATE 2:
Here is the output from docker network ls:
NETWORK ID NAME DRIVER SCOPE
aa69fa066700 bridge bridge local
1bd082a62ab3 host host local
629eacc3b77e none null local
Another approach would be to try letting docker-machine provision the VM for you and see if that works. There is a provider for Azure, so you should be able to set your subscription id on a local Docker client (Windows or Linux) and follow the instructions to get a new VM provisioned with Docker and it will also setup your local environment variables to communicate with the Docker VM instance remotely. After it is setup running docker ps or docker run locally would run the commands as if you were running them on the VM. Example:
#Name at end should be all lower case or it will fail.
docker-machine create --driver azure --azure-subscription-id <omitted> --azure-image canonical:ubuntuserver:16.04.0-LTS:16.04.201608150 --azure-size Standard_A0 azureubuntu
#Partial output, see docker-machine resource group in Azure portal
Running pre-create checks...
(azureubuntu) Completed machine pre-create checks.
Creating machine...
(azureubuntu) Querying existing resource group. name="docker-machine"
(azureubuntu) Resource group "docker-machine" already exists.
(azureubuntu) Configuring availability set. name="docker-machine"
(azureubuntu) Configuring network security group. location="westus" name="azureubuntu-firewall"
(azureubuntu) Querying if virtual network already exists. name="docker-machine-vnet" location="westus"
(azureubuntu) Configuring subnet. vnet="docker-machine-vnet" cidr="192.168.0.0/16" name="docker-machine"
(azureubuntu) Creating public IP address. name="azureubuntu-ip" static=false
(azureubuntu) Creating network interface. name="azureubuntu-nic"
(azureubuntu) Creating virtual machine. osImage="canonical:ubuntuserver:16.04.0-LTS:16.04.201608150" name="azureubuntu" location="westus" size="Standard_A0" username="docker-user"
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env azureubuntu
#Set environment using PowerShell (or login to the new VM) and see containers on remote host
docker-machine env azureubuntu | Invoke-Expression
docker info
docker network inspect bridge
#Build a local docker project using the remote VM
docker build MyProject
docker images
#To clean up the Azure resources for a machine (you can create multiple, also check docker-machine resource group in Azure portal)
docker-machine rm azureubuntu
Best I can tell that is working fine. I was able to build a debian:wheezy DockerFile that uses apt-get on the Azure VM without any issues. This should allow the containers to run using the default bridged network as well instead of the host network.
According to I can't get Docker containers to access the internet? using sudo systemctl restart docker might help, or enable net.ipv4.ip_forward = 1 or disable the firewall.
Also you may need to update the dns servers in /etc/resolv.conf on the VM

How to make an Azure VM & configure containers to use Azure File Storage via docker CLI / quickstart terminal?

I'm using the latest Docker Toolbox and I would like to launch docker containers on Azure that connect to an Azure File Store. What should one run to achieve this from the docker quick start terminal?
The easiest way to do this is to create an Ubuntu VM with Docker preinstalled on Azure:
https://azure.microsoft.com/en-us/blog/introducing-docker-in-microsoft-azure-marketplace/
Then follow the Azure File System Docker Volume Driver install instructions here:
https://github.com/Azure/azurefile-dockervolumedriver/blob/master/contrib/init/systemd/README.md
Once you can successfully create volumes on that VM, you can make them shared volumes or Data Volume Containers to share them between your Docker containers:
https://docs.docker.com/engine/tutorials/dockervolumes/
For more generic instructions, please use #rbj325's answer
Create docker-machine
First things first, we need an azure VM which we can use. We can use the docker-machine cli to create this. This set of instructions will create it with the ubuntu 16.04LTS to simplify(ish) installation steps.
docker-machine create --driver azure --azure-subscription-id XXXX \
--azure-location westeurope --azure-resource-group XXX \
--azure-image canonical:UbuntuServer:16.04.0-LTS:latest XXXXXX
This sets up everything we need on Azure.
Install azure file storage docker plugin
(Based on my knowledge of SSH) We then need to SSH into the docker-machine to be able to install the plugin.
docker-machine XXXXXX ssh
Once in, the following steps can be taken to install the plugin:
sudo -s
wget -qO /usr/bin/azurefile-dockervolumedriver https://github.com/Azure/azurefile-dockervolumedriver/releases/download/[VERSION]/azurefile-dockervolumedriver
chmod +x /usr/bin/azurefile-dockervolumedriver
wget -qO /etc/systemd/system/azurefile-dockervolumedriver.service https://raw.githubusercontent.com/Azure/azurefile-dockervolumedriver/master/contrib/init/systemd/azurefile-dockervolumedriver.service
cp [myconfigfile] /etc/default/
systemctl daemon-reload
systemctl enable azurefile-dockervolumedriver
systemctl start azurefile-dockervolumedriver
systemctl status azurefile-dockervolumedriver
Note that there are to things required here:
the latest version number for the driver from github
a file containing some azure storage credentials
For my installation process, I made a script that I could use and put my config file in a secure store that could be retrieved at install time. Please note it is gets the driver version 0.2.1.
Once this has completed, exit the ssh connection.
Create volumes
You should now be able to create docker volumes
docker volume create --name filestore -d azurefile -o share=filestore
Create docker containers
You can now use this volume with docker containers
docker run -it --name=example -v filestore:/filestore ubuntu /bin/bash

Resources