One or more errors occurred. (Permission denied /var/run/iotedge/mgmt.sock) caused by: docker returned exit code: - linux

I just installed IoTEdge on Raspberry strech following this:Azure/InternetofThings/IoTEdge/Install or uninstall the Azure IoT Edge runtime
However I get these errors below.
3 weeks I installed another one and it worked perfectly with the same instructions.
pi#raspberrypi:/etc/iotedge $ sudo iotedge check --verbose
Configuration checks
--------------------
√ config.yaml is well-formed - OK
√ config.yaml has well-formed connection string - OK
√ container engine is installed and functional - OK
√ config.yaml has correct hostname - OK
× config.yaml has correct URIs for daemon mgmt endpoint - Error
One or more errors occurred. (Permission denied /var/run/iotedge/mgmt.sock)
caused by: docker returned exit code: 1, stderr = One or more errors occurred. (Permission denied /var/run/iotedge/mgmt.sock)
√ latest security daemon - OK
√ host time is close to real time - OK
√ container time is close to host time - OK
‼ DNS server - Warning
Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
You can ignore this warning if you are setting DNS server per module in the Edge deployment.
caused by: Could not open container engine config file /etc/docker/daemon.json
caused by: No such file or directory (os error 2)
‼ production readiness: certificates - Warning
The Edge device is using self-signed automatically-generated development certificates.
They will expire in 89 days (at 2021-02-22 07:24:52 UTC) causing module-to-module and downstream device communication to fail on an active deployment.
After the certs have expired, restarting the IoT Edge daemon will trigger it to generate new development certs.
Please consider using production certificates instead. See https://aka.ms/iotedge-prod-checklist-certs for best practices.
√ production readiness: container engine - OK
‼ production readiness: logs policy - Warning
Container engine is not configured to rotate module logs which may cause it run out of disk space.
Please see https://aka.ms/iotedge-prod-checklist-logs for best practices.
You can ignore this warning if you are setting log policy per module in the Edge deployment.
caused by: Could not open container engine config file /etc/docker/daemon.json
caused by: No such file or directory (os error 2)
‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning
The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem.
Data might be lost if the module is deleted or updated.
Please see https://aka.ms/iotedge-storage-host for best practices.
× production readiness: Edge Hub's storage directory is persisted on the host filesystem - Error
Could not check current state of edgeHub container
caused by: docker returned exit code: 1, stderr = Error: No such object: edgeHub
Connectivity checks
-------------------
√ host can connect to and perform TLS handshake with IoT Hub AMQP port - OK
√ host can connect to and perform TLS handshake with IoT Hub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with IoT Hub MQTT port - OK
√ container on the default network can connect to IoT Hub AMQP port - OK
√ container on the default network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the default network can connect to IoT Hub MQTT port - OK
√ container on the IoT Edge module network can connect to IoT Hub AMQP port - OK
√ container on the IoT Edge module network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to IoT Hub MQTT port - OK
17 check(s) succeeded.
4 check(s) raised warnings.
2 check(s) raised errors.
iotedge list
pi#raspberrypi:~ $ sudo iotedge list
A module runtime error occurred
caused by: Could not list modules
caused by: connection error: Connection reset by peer (os error 104)

Related

rafthttp: dial tcp timeout on etcd 3-node cluster creation

I don't have an access to the etcd part of the project's source code, however I do have access to the /var/log/syslog.
The goal is to setup up 3-node cluster.
(1)The very first etcd error that comes up is:
rafthttp: failed to dial 76e7ffhh20007a98 on stream MsgApp v2 (dial tcp 10.0.0.134:2380: i/o timeout)
Before continuing, I would say that I can ping all three nodes from each of the nodes. As well as I have tried to open the 2380 TCP ports and still no success - same error.
(2)So, before that error I had following messages from the etcd, which in my opinion confirm that cluster is setup correctly:
etcdserver/membership: added member 76e7ffhh20007a98 [https://server2:2380]
etcdserver/membership: added member 222e88db3803e816 [https://server1:2380]
etcdserver/membership: added member 999115e00e17123d [https://server3:2380]
In /etc/hosts file these DNS names are resolved as:
server2 10.0.0.135
server1 10.0.0.134
server3 10.0.0.136
(3)The initial setup, however (on each nodes looks like this):
embed: listening for peers on https://127.0.0.1:2380
embed: listening for client requests on 127.0.0.1:2379
So, to sum up, each node have got this initial setup log (3) and then adds members (2) then once these steps are done it fails with (1). As I know the etcd cluster creation is following this pattern: https://etcd.io/docs/v3.5/tutorials/how-to-setup-cluster/
Without knowing the source code is really hard to debug, however maybe some ideas on the error and what could cause it?
UPD: etcdctl cluster-health output (ETCDCTL_ENDPOINT is exported):
cluster may be unhealthy: failed to list members Error: client: etcd
cluster is unavailable or misconfigured; error #0: client: endpoint
http://127.0.0.1:2379 exceeded header timeout ; error #1: dial tcp
127.0.0.1:4001: connect: connection refused
error #0: client: endpoint http://127.0.0.1:2379 exceeded header
timeout error #1: dial tcp 127.0.0.1:4001: connect: connection refused

Istio Multicluster primary-remote on Azure AKS

I am trying to create multi cluster istio primary-remote.
First created two clusters AZURE AKS. Used AzureCNI for Network Configuaration and following are the settings of the cluster.
First cluster
vnet istioclusterone - 10.10.0.0/20
subnet default - 10.10.0.0/20
k8s service address range 10.100.0.0/16
DNS service ip - 10.100.0.10
Docker Bridge address - 172.17.0.1/16
DNS-prefix - app-cluster-dns
Second cluster
vnet istioclusterone - 10.11.0.0/20
subnet default - 10.11.0.0/20
k8s service address range 10.101.0.0/16
DNS service ip - 10.101.0.10
Docker Bridge address - 172.18.0.1/16
DNS-prefix - processing-cluster-dns
Other than this gone with default settings.
Next Followed below articles to setup multi Istio cluster.
Before you begin
Primary-remote
last step in second article to setup cluster2 as remote is failed.
Found below errors in logs of istio-ingressgateway pod.
2022-04-11T07:51:00.352057Z warning envoy config StreamAggregatedResources gRPC config stream closed since 431s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
2022-04-11T07:51:08.514428Z warning envoy config StreamAggregatedResources gRPC config stream closed since 439s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
2022-04-11T07:51:12.462140Z warning envoy config StreamAggregatedResources gRPC config stream closed since 443s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
2022-04-11T07:51:39.950935Z warning envoy config StreamAggregatedResources gRPC config stream closed since 471s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
Has anyone tried this scenario please share your insights.
Thanks.
Update:
Have used custom certs for both the clusters previous error was solved.
then created a gateway in both the clusters.
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: cluster-aware-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: tls
protocol: TLS
tls:
mode: AUTO_PASSTHROUGH
hosts:
- "*.local"
Now getting new error. check below logs of pod istio-ingressgateway-575ccb4d79 of cluster2.
2022-04-13T09:14:04.650502Z warning envoy config StreamAggregatedResources gRPC config stream closed since 60s ago: 14, connection error: desc = "transport: Error while dialing dial tcp <publicIPofEastWestgateway>:15012: i/o timeout"
2022-04-13T09:14:27.026016Z warning envoy config StreamAggregatedResources gRPC config stream closed since 83s ago: 14, connection error: desc = "transport: Error while dialing dial tcp <publicIPofEastWestgateway:15012: i/o timeout"
what I undertood here, I have an eastwestgateway installed in cluster1 as in the documentation linkToDoc
cluster2 is trying to access cluster1. using publicIp of eastwest-gateway on port 15012 which is failing.
checked security groups port is opened. Tried telnet from a test pod from within the cluster to check. its failing.
can anyone help me here.
Thanks
It looks like a firewall issue. not sure if it'll help, but try opening the ports 15012 and 15443 on the remote cluster's outbound, to the eastwestgateway elb ip (primary cluster)

Unable to push image to OpenShift internal registry with i/o timeout

Pushing image docker-registry.default.svc:5000/th/th:source ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Registry server Address:
Registry server User Name: serviceaccount
Registry server Email: serviceaccount#example.org
Registry server Password: <<non-empty>>
error: build error: Failed to push image: After retrying 6 times, Push image still failed due to error: Get https://docker-registry.default.svc:5000/v1/_ping: dial TCP<ip>:5000: i/o timeout
Manually pushing an image from the CLI to the internal registry is working fine.
I have deployed the OpenShift instance 3.11 on a couple of azure VMs, while deploying I took care of adding external IP to the same.
All other images are also present in the docker registry and the curl command to the docker registry returns with exit code 0
What seemed curious was while deploying my app I tried pinging the registry from the build pods terminal. This resulted in the connection being hung up and no response.
Any ideas on how to fix this?
The sdn was causing this networking issue.
Does Azure support Calico networking?
Calico in VXLAN mode is supported on Azure. However, IPIP packets are
blocked by the Azure network fabric.
The above quote from calico reference was the reason this issue was caused. This could be resolved by changing to VXLAN mode in calico config. More details on how to switch can be found here.
For my solution I just switched to the default openshift sdn 'ovs-subnet' from calico in the inventory file.

Configuration Error in Azure IoT Edge installation - "configuration has correct URIs for daemon mgmt endpoint - Error"

Version details
OS: Ubuntu 18.04.5 LTS
aziot-edge: bionic,now 1.2.3-1 amd64
aziot-identity-service: bionic,now 1.2.2-1 amd64
docker: Docker version 20.10.8+azure, build 3967b7d28e15a020e4ee344283128ead633b3e0c
Verifying the installation shows that the aziot-identityd is in "Down-activating" state
# sudo iotedge system status
System services:
aziot-edged Running
aziot-identityd Down - activating
aziot-keyd Running
aziot-certd Running
aziot-tpmd Ready
aziot-identityd is in a bad state because:
aziot-identityd.service: Down - activating : Printing the last 10 log lines.
-- Logs begin at Fri 2020-11-06 12:29:56 IST, end at Fri 2021-09-10 19:07:13 IST. --
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [INFO] - Could not reconcile Identities with current device data. Reprovisioning.
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [INFO] - Updated device info for Edge1.
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - Failed to provision with IoT Hub, and no valid device backup was found: Hub client error
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - service encountered an error
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - caused by: Hub client error
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - caused by: internal error
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - 0: <unknown>
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 1: <unknown>
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN systemd[1]: aziot-identityd.service: Main process exited, code=exited, status=1/FAILURE
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN systemd[1]: aziot-identityd.service: Failed with result 'exit-code'.
iotedge check shows 2 configuration related errors:
# iotedge check --verbose
Configuration checks (aziot-identity-service)
---------------------------------------------
√ keyd configuration is well-formed - OK
√ certd configuration is well-formed - OK
√ tpmd configuration is well-formed - OK
√ identityd configuration is well-formed - OK
√ daemon configurations up-to-date with config.toml - OK
√ identityd config toml file specifies a valid hostname - OK
√ aziot-identity-service package is up-to-date - OK
√ host time is close to reference time - OK
√ preloaded certificates are valid - OK
√ keyd is running - OK
√ certd is running - OK
√ identityd is running - OK
× read all preloaded certificates from the Certificates Service - Error
could not load cert with ID "aziot-edged-trust-bundle"
Caused by:
parameter "id" has an invalid value
caused by: not found
√ read all preloaded key pairs from the Keys Service - OK
√ ensure all preloaded certificates match preloaded private keys with the same ID - OK
Connectivity checks (aziot-identity-service)
--------------------------------------------
√ host can connect to and perform TLS handshake with iothub AMQP port - OK
√ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with iothub MQTT port - OK
Configuration checks
--------------------
√ aziot-edged configuration is well-formed - OK
√ configuration up-to-date with config.toml - OK
√ container engine is installed and functional - OK
× configuration has correct URIs for daemon mgmt endpoint - Error
SocketError - SocketErrorCode (TimedOut) : Operation timed out
One or more errors occurred. (Got bad response: )
caused by: docker returned exit code: 1, stderr = SocketError - SocketErrorCode (TimedOut) : Operation timed out
One or more errors occurred. (Got bad response: )
√ aziot-edge package is up-to-date - OK
√ container time is close to host time - OK
‼ DNS server - Warning
Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
You can ignore this warning if you are setting DNS server per module in the Edge deployment.
caused by: Could not open container engine config file /etc/docker/daemon.json
caused by: No such file or directory (os error 2)
√ production readiness: container engine - OK
‼ production readiness: logs policy - Warning
Container engine is not configured to rotate module logs which may cause it run out of disk space.
Please see https://aka.ms/iotedge-prod-checklist-logs for best practices.
You can ignore this warning if you are setting log policy per module in the Edge deployment.
caused by: Could not open container engine config file /etc/docker/daemon.json
caused by: No such file or directory (os error 2)
× production readiness: Edge Agent's storage directory is persisted on the host filesystem - Error
Could not check current state of edgeAgent container
caused by: docker returned exit code: 1, stderr = Error: No such object: edgeAgent
× production readiness: Edge Hub's storage directory is persisted on the host filesystem - Error
Could not check current state of edgeHub container
caused by: docker returned exit code: 1, stderr = Error: No such object: edgeHub
√ Agent image is valid and can be pulled from upstream - OK
Connectivity checks
-------------------
√ container on the default network can connect to upstream AMQP port - OK
√ container on the default network can connect to upstream HTTPS / WebSockets port - OK
√ container on the default network can connect to upstream MQTT port - OK
√ container on the IoT Edge module network can connect to upstream AMQP port - OK
√ container on the IoT Edge module network can connect to upstream HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to upstream MQTT port - OK
30 check(s) succeeded.
2 check(s) raised warnings.
4 check(s) raised errors.
TOML file has only the manual provisioning with connection string.
I had this error because my IOT Hub networks "Public network access" was set as "Disabled".
You can correct this by going the following:
Go to the Azure portal, and go to the IOT Hub resource in question.
Go to the Networking menu option.
Change the "Public network access" to either "All Networks" or "Selected IP ranges", depending on your use case. Remember if you select "Selected IP ranges", you must add the VM/IOT devices ip address to the list of allowed IP addresses.
I came across this question like too many times while I was working with an enterprise environment. My finding is more related to the environment and security aspect of the whole system.
For my case, my working environment was RedHat Linux and Azure is hosted on-prem with added layer of proxy server. Only one piece of advice to solve most common issues in such environment is to give all necessary permissions of rwx (read, write, all).
Pinpointing the problem asked, the identity daemon is failing because the aziot trust bundle is not loading properly.
read all preloaded certificates from the Certificates Service - Error
could not load cert with ID "aziot-edged-trust-bundle"
Check the certificate is properly setup to use device identity certificate.
Second error is related to daemon management socket:
× configuration has correct URIs for daemon mgmt endpoint - Error
SocketError - SocketErrorCode (TimedOut) : Operation timed out
One or more errors occurred. (Got bad response: )
caused by: docker returned exit code: 1, stderr = SocketError - SocketErrorCode (TimedOut) : Operation timed out
One or more errors occurred. (Got bad response: )
This can be resolved by manually giving ownership permission to mgmt.sock at /var/lib/iotedge location.
Nevertheless, there may be a variety of reasons for iotedge dps to not work and further iotAgent and iotHub to not start. It is better to go to the root of the issue and start resolving it.

Fabic: Issue connection refused 7050

I am trying to create a network from the hyperledger fabic tutorial. I get the following error:
Error: failed to create deliver client for orderer: orderer client failed to connect to localhost:7050: failed to create new connection: connection error: desc = "transport: error while dialing: dial tcp [::1]:7050: connect: connection refused"
I opened up the port on the Centos 7 Virtual machine and still no luck. The docker container is exposing the port to the host.
I removed all docker containers, images and volumes. I even rebuilt the VM from scratch.
Any help would be great.
Thanks,
This situation is happened because you called a gRPC to orderer server but your call failed to hit the server. This situation may happen for many reasons, but for most of the cases the situation is happened due to server down(orderer server exit or down due to misconfiguration) or your call failed to hit the server due to misconfiguration.
I somehow encounter this problem before and the port was opened. Somehow it was a mistake where I forgot to put '-a' in command (launch cerificate authorities). Hope it help.
You might also refer this : https://hyperledger-fabric.readthedocs.io/en/release-2.0/build_network.html

Resources