Unable to push image to OpenShift internal registry with i/o timeout - azure

Pushing image docker-registry.default.svc:5000/th/th:source ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Registry server Address:
Registry server User Name: serviceaccount
Registry server Email: serviceaccount#example.org
Registry server Password: <<non-empty>>
error: build error: Failed to push image: After retrying 6 times, Push image still failed due to error: Get https://docker-registry.default.svc:5000/v1/_ping: dial TCP<ip>:5000: i/o timeout
Manually pushing an image from the CLI to the internal registry is working fine.
I have deployed the OpenShift instance 3.11 on a couple of azure VMs, while deploying I took care of adding external IP to the same.
All other images are also present in the docker registry and the curl command to the docker registry returns with exit code 0
What seemed curious was while deploying my app I tried pinging the registry from the build pods terminal. This resulted in the connection being hung up and no response.
Any ideas on how to fix this?

The sdn was causing this networking issue.
Does Azure support Calico networking?
Calico in VXLAN mode is supported on Azure. However, IPIP packets are
blocked by the Azure network fabric.
The above quote from calico reference was the reason this issue was caused. This could be resolved by changing to VXLAN mode in calico config. More details on how to switch can be found here.
For my solution I just switched to the default openshift sdn 'ovs-subnet' from calico in the inventory file.

Related

Configuration Error in Azure IoT Edge installation - "configuration has correct URIs for daemon mgmt endpoint - Error"

Version details
OS: Ubuntu 18.04.5 LTS
aziot-edge: bionic,now 1.2.3-1 amd64
aziot-identity-service: bionic,now 1.2.2-1 amd64
docker: Docker version 20.10.8+azure, build 3967b7d28e15a020e4ee344283128ead633b3e0c
Verifying the installation shows that the aziot-identityd is in "Down-activating" state
# sudo iotedge system status
System services:
aziot-edged Running
aziot-identityd Down - activating
aziot-keyd Running
aziot-certd Running
aziot-tpmd Ready
aziot-identityd is in a bad state because:
aziot-identityd.service: Down - activating : Printing the last 10 log lines.
-- Logs begin at Fri 2020-11-06 12:29:56 IST, end at Fri 2021-09-10 19:07:13 IST. --
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [INFO] - Could not reconcile Identities with current device data. Reprovisioning.
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [INFO] - Updated device info for Edge1.
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - Failed to provision with IoT Hub, and no valid device backup was found: Hub client error
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - service encountered an error
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - caused by: Hub client error
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - caused by: internal error
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - 0: <unknown>
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 1: <unknown>
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN systemd[1]: aziot-identityd.service: Main process exited, code=exited, status=1/FAILURE
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN systemd[1]: aziot-identityd.service: Failed with result 'exit-code'.
iotedge check shows 2 configuration related errors:
# iotedge check --verbose
Configuration checks (aziot-identity-service)
---------------------------------------------
√ keyd configuration is well-formed - OK
√ certd configuration is well-formed - OK
√ tpmd configuration is well-formed - OK
√ identityd configuration is well-formed - OK
√ daemon configurations up-to-date with config.toml - OK
√ identityd config toml file specifies a valid hostname - OK
√ aziot-identity-service package is up-to-date - OK
√ host time is close to reference time - OK
√ preloaded certificates are valid - OK
√ keyd is running - OK
√ certd is running - OK
√ identityd is running - OK
× read all preloaded certificates from the Certificates Service - Error
could not load cert with ID "aziot-edged-trust-bundle"
Caused by:
parameter "id" has an invalid value
caused by: not found
√ read all preloaded key pairs from the Keys Service - OK
√ ensure all preloaded certificates match preloaded private keys with the same ID - OK
Connectivity checks (aziot-identity-service)
--------------------------------------------
√ host can connect to and perform TLS handshake with iothub AMQP port - OK
√ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with iothub MQTT port - OK
Configuration checks
--------------------
√ aziot-edged configuration is well-formed - OK
√ configuration up-to-date with config.toml - OK
√ container engine is installed and functional - OK
× configuration has correct URIs for daemon mgmt endpoint - Error
SocketError - SocketErrorCode (TimedOut) : Operation timed out
One or more errors occurred. (Got bad response: )
caused by: docker returned exit code: 1, stderr = SocketError - SocketErrorCode (TimedOut) : Operation timed out
One or more errors occurred. (Got bad response: )
√ aziot-edge package is up-to-date - OK
√ container time is close to host time - OK
‼ DNS server - Warning
Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
You can ignore this warning if you are setting DNS server per module in the Edge deployment.
caused by: Could not open container engine config file /etc/docker/daemon.json
caused by: No such file or directory (os error 2)
√ production readiness: container engine - OK
‼ production readiness: logs policy - Warning
Container engine is not configured to rotate module logs which may cause it run out of disk space.
Please see https://aka.ms/iotedge-prod-checklist-logs for best practices.
You can ignore this warning if you are setting log policy per module in the Edge deployment.
caused by: Could not open container engine config file /etc/docker/daemon.json
caused by: No such file or directory (os error 2)
× production readiness: Edge Agent's storage directory is persisted on the host filesystem - Error
Could not check current state of edgeAgent container
caused by: docker returned exit code: 1, stderr = Error: No such object: edgeAgent
× production readiness: Edge Hub's storage directory is persisted on the host filesystem - Error
Could not check current state of edgeHub container
caused by: docker returned exit code: 1, stderr = Error: No such object: edgeHub
√ Agent image is valid and can be pulled from upstream - OK
Connectivity checks
-------------------
√ container on the default network can connect to upstream AMQP port - OK
√ container on the default network can connect to upstream HTTPS / WebSockets port - OK
√ container on the default network can connect to upstream MQTT port - OK
√ container on the IoT Edge module network can connect to upstream AMQP port - OK
√ container on the IoT Edge module network can connect to upstream HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to upstream MQTT port - OK
30 check(s) succeeded.
2 check(s) raised warnings.
4 check(s) raised errors.
TOML file has only the manual provisioning with connection string.
I had this error because my IOT Hub networks "Public network access" was set as "Disabled".
You can correct this by going the following:
Go to the Azure portal, and go to the IOT Hub resource in question.
Go to the Networking menu option.
Change the "Public network access" to either "All Networks" or "Selected IP ranges", depending on your use case. Remember if you select "Selected IP ranges", you must add the VM/IOT devices ip address to the list of allowed IP addresses.
I came across this question like too many times while I was working with an enterprise environment. My finding is more related to the environment and security aspect of the whole system.
For my case, my working environment was RedHat Linux and Azure is hosted on-prem with added layer of proxy server. Only one piece of advice to solve most common issues in such environment is to give all necessary permissions of rwx (read, write, all).
Pinpointing the problem asked, the identity daemon is failing because the aziot trust bundle is not loading properly.
read all preloaded certificates from the Certificates Service - Error
could not load cert with ID "aziot-edged-trust-bundle"
Check the certificate is properly setup to use device identity certificate.
Second error is related to daemon management socket:
× configuration has correct URIs for daemon mgmt endpoint - Error
SocketError - SocketErrorCode (TimedOut) : Operation timed out
One or more errors occurred. (Got bad response: )
caused by: docker returned exit code: 1, stderr = SocketError - SocketErrorCode (TimedOut) : Operation timed out
One or more errors occurred. (Got bad response: )
This can be resolved by manually giving ownership permission to mgmt.sock at /var/lib/iotedge location.
Nevertheless, there may be a variety of reasons for iotedge dps to not work and further iotAgent and iotHub to not start. It is better to go to the root of the issue and start resolving it.

One or more errors occurred. (Permission denied /var/run/iotedge/mgmt.sock) caused by: docker returned exit code:

I just installed IoTEdge on Raspberry strech following this:Azure/InternetofThings/IoTEdge/Install or uninstall the Azure IoT Edge runtime
However I get these errors below.
3 weeks I installed another one and it worked perfectly with the same instructions.
pi#raspberrypi:/etc/iotedge $ sudo iotedge check --verbose
Configuration checks
--------------------
√ config.yaml is well-formed - OK
√ config.yaml has well-formed connection string - OK
√ container engine is installed and functional - OK
√ config.yaml has correct hostname - OK
× config.yaml has correct URIs for daemon mgmt endpoint - Error
One or more errors occurred. (Permission denied /var/run/iotedge/mgmt.sock)
caused by: docker returned exit code: 1, stderr = One or more errors occurred. (Permission denied /var/run/iotedge/mgmt.sock)
√ latest security daemon - OK
√ host time is close to real time - OK
√ container time is close to host time - OK
‼ DNS server - Warning
Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
You can ignore this warning if you are setting DNS server per module in the Edge deployment.
caused by: Could not open container engine config file /etc/docker/daemon.json
caused by: No such file or directory (os error 2)
‼ production readiness: certificates - Warning
The Edge device is using self-signed automatically-generated development certificates.
They will expire in 89 days (at 2021-02-22 07:24:52 UTC) causing module-to-module and downstream device communication to fail on an active deployment.
After the certs have expired, restarting the IoT Edge daemon will trigger it to generate new development certs.
Please consider using production certificates instead. See https://aka.ms/iotedge-prod-checklist-certs for best practices.
√ production readiness: container engine - OK
‼ production readiness: logs policy - Warning
Container engine is not configured to rotate module logs which may cause it run out of disk space.
Please see https://aka.ms/iotedge-prod-checklist-logs for best practices.
You can ignore this warning if you are setting log policy per module in the Edge deployment.
caused by: Could not open container engine config file /etc/docker/daemon.json
caused by: No such file or directory (os error 2)
‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning
The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem.
Data might be lost if the module is deleted or updated.
Please see https://aka.ms/iotedge-storage-host for best practices.
× production readiness: Edge Hub's storage directory is persisted on the host filesystem - Error
Could not check current state of edgeHub container
caused by: docker returned exit code: 1, stderr = Error: No such object: edgeHub
Connectivity checks
-------------------
√ host can connect to and perform TLS handshake with IoT Hub AMQP port - OK
√ host can connect to and perform TLS handshake with IoT Hub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with IoT Hub MQTT port - OK
√ container on the default network can connect to IoT Hub AMQP port - OK
√ container on the default network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the default network can connect to IoT Hub MQTT port - OK
√ container on the IoT Edge module network can connect to IoT Hub AMQP port - OK
√ container on the IoT Edge module network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to IoT Hub MQTT port - OK
17 check(s) succeeded.
4 check(s) raised warnings.
2 check(s) raised errors.
iotedge list
pi#raspberrypi:~ $ sudo iotedge list
A module runtime error occurred
caused by: Could not list modules
caused by: connection error: Connection reset by peer (os error 104)

Fabic: Issue connection refused 7050

I am trying to create a network from the hyperledger fabic tutorial. I get the following error:
Error: failed to create deliver client for orderer: orderer client failed to connect to localhost:7050: failed to create new connection: connection error: desc = "transport: error while dialing: dial tcp [::1]:7050: connect: connection refused"
I opened up the port on the Centos 7 Virtual machine and still no luck. The docker container is exposing the port to the host.
I removed all docker containers, images and volumes. I even rebuilt the VM from scratch.
Any help would be great.
Thanks,
This situation is happened because you called a gRPC to orderer server but your call failed to hit the server. This situation may happen for many reasons, but for most of the cases the situation is happened due to server down(orderer server exit or down due to misconfiguration) or your call failed to hit the server due to misconfiguration.
I somehow encounter this problem before and the port was opened. Somehow it was a mistake where I forgot to put '-a' in command (launch cerificate authorities). Hope it help.
You might also refer this : https://hyperledger-fabric.readthedocs.io/en/release-2.0/build_network.html

Kubernetes on Azure : connectex

Followed steps from the link to create a K8s cluster using the Azure Portal. Tried using kubectl on a remote machine to check if it's working. Got this error.
Unable to connect to the server: dial tcp 13.90.35.157:443: connectex:
A connection attempt failed because the connected party did not
properly respond after a period of time, or established connection
failed because connected host has failed to respond.
I can SSH to the K8s master. Tried kubectl get nodes from the master and got similar error.
It is really hard to say from such a description what went wrong, but as this is a new cluster ( and I'm saying this because sometimes k8s cluster gets deployed but doesn't really work, so ), I would suggest deleting it and creating a new one and\or creating it using the Azure Cli\Azure Cloud Shell.
Basically its as simple as:
az acs create -n acs-cluster -g acsrg1 -d applink789 --generate-ssh-keys
if you have the resource group created, if not you can create it with:
az group create -n acsrg1 -l "westus"
According to your description, it seems you have not configured the Service Principal correctly. I use wrong service principal to deploy K8S in Azure, get the same error:
C:\Users>kubectl get nodes
Unable to connect to the server: dial tcp 13.90.27.73:443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
You may need to check to ensure the credentials were provided accurately, and that the configured Service Principal has read and write permissions to the target Subscription.
If your Service Principal is misconfigured, none of the kubernetes components will come up in a healthy manner. We can check to see if this the problem:
root#k8s-master-6FEE48E1-0:~# journalctl -u kubelet | grep --text autorest
If you see output that looks like the following, it means you have not configured the service Principal correctly.
root#k8s-master-6FEE48E1-0:~# journalctl -u kubelet | grep --text autorest
Jun 01 01:58:47 k8s-master-6FEE48E1-0 docker[5522]: E0601 01:58:47.447321 6028 kubelet.go:1186] Cannot get Node info: failed to get external ID from cloud provider: autorest#WithErrorUnlessStatusCode: POST https://login.microsoftonline.com/1fcf418e-66ed-4c99-9449-d8e18bf8737a/oauth2/token?api-version=1.0 failed with 400 Bad Request: StatusCode=400
Jun 01 01:58:47 k8s-master-6FEE48E1-0 docker[5522]: E0601 01:58:47.627128 6028 kubelet_node_status.go:70] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: autorest#WithErrorUnlessStatusCode: POST https://login.microsoftonline.com/1fcf418e-66ed-4c99-9449-d8e18bf8737a/oauth2/token?api-version=1.0 failed with 400 Bad Request: StatusCode=400
Jun 01 01:58:47 k8s-master-6FEE48E1-0 docker[5522]: E0601 01:58:47.885092 6028 kubelet_node_status.go:70] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: autorest#WithErrorUnlessStatusCode: POST https://login.microsoftonline.com/1fcf418e-66ed-4c99-9449-d8e18bf8737a/oauth2/token?api-version=1.0 failed with 400 Bad Request: StatusCode=400
More information about how to create /configure a service principal for ACS-Engin Kubernetes cluster, please refer to this link.

PuppetDB configurtion not working

I'm trying to configure puppetDB on the same puppet master server. I followed the puppet documentation, installed the database and configured the puppet to use database.
when I run puppet agent --test command its giving below error message.
I didn't see any process running in port 8081, I see puppet java process running on port 8140.
How can I resolve this error?
Warning: Unable to fetch my node definition, but the agent run will continue:
Warning: Error 500 on SERVER: Server Error: Could not retrieve facts for webserver: Failed to find facts from PuppetDB at puppet:8140: Failed to execute '/pdb/query/v4/nodes/webserver/facts' on at least 1 of the following 'server_urls': https://puppetdb:8081
Info: Retrieving pluginfacts
Info: Retrieving plugin
Warning: Error connecting to puppetdb on 8081 at route /pdb/query/v4/nodes/webserver/facts, error message received was 'Connection refused - connect(2) for "puppetdb" port 8081'. Failing over to the next PuppetDB server_url in the 'server_urls' list
Error: Cached facts for webserver failed: Failed to find facts from PuppetDB at puppet:8140: Failed to execute '/pdb/query/v4/nodes/webserver/facts' on at least 1 of the following 'server_urls': https://puppetdb:8081
Info: Loading facts
Info: Caching facts for webserver
Warning: Error connecting to puppetdb on 8081 at route /pdb/cmd/v1?checksum=039e22c7bf98e9cbf2f08169047d288c9b451c73&version=5&certname=webserver&command=replace_facts, error message received was 'Connection refused - connect(2) for "puppetdb" port 8081'. Failing over to the next PuppetDB server_url in the 'server_urls' list
Error: Failed to execute '/pdb/cmd/v1?checksum=039e22c7bf98e9cbf2f08169047d288c9b451c73&version=5&certname=webserver&command=replace_facts' on at least 1 of the following 'server_urls': https://puppetdb:8081
Error: Could not retrieve local facts: Failed to execute '/pdb/cmd/v1?checksum=039e22c7bf98e9cbf2f08169047d288c9b451c73&version=5&certname=webserver&command=replace_facts' on at least 1 of the following 'server_urls': https://puppetdb:8081
Error: Failed to apply catalog: Could not retrieve local facts: Failed to execute '/pdb/cmd/v1?checksum=039e22c7bf98e9cbf2f08169047d288c9b451c73&version=5&certname=webserver&command=replace_facts' on at least 1 of the following 'server_urls': https://puppetdb:8081
Hope you checked the SSL certs stored in /etc/puppetlabs/puppetdb/ssl are matching with the /etc/puppetlabs/puppet/ssl/certs/<certnameof your puppetserver.FQDN> .
This can be verified by
puppetdb ssl-setup
Sample entry
puppetdb ssl-setup
PEM files in /etc/puppetlabs/puppetdb/ssl already exists, checking integrity.
Setting ssl-host in /etc/puppetlabs/puppetdb/conf.d/jetty.ini already correct.
Setting ssl-port in /etc/puppetlabs/puppetdb/conf.d/jetty.ini already correct.
Setting ssl-key in /etc/puppetlabs/puppetdb/conf.d/jetty.ini already correct.
Setting ssl-cert in /etc/puppetlabs/puppetdb/conf.d/jetty.ini already correct.
Let me know if you have further issues .I have had the same issue and rectified by removing the /etc/puppetlabs/puppetdb/ssl directory and rerun the "puppetdb ssl-setup" command.
For some reason puppetdb process went down that's why no process running on port 8081. I have restarted puppetdb process, then agent -test command stated connecting to the webserver.
Here is the output of puppetdb service in centos 7.
# systemctl status puppetdb
● puppetdb.service - puppetdb Service
Loaded: loaded (/usr/lib/systemd/system/puppetdb.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2017-03-28 18:26:58 EDT; 1h 20min ago
Main PID: 5503 (java)
CGroup: /system.slice/puppetdb.service
└─5503 /usr/bin/java -Xmx192m -Djava.security.egd=/dev/urandom -XX:OnOutOfMemoryError=kill -9 %p -cp /opt/puppetlabs/...

Resources