Openshift 3 App Deployment Failed: Took longer than 600 seconds to become ready

Openshift 3 App Deployment Failed: Took longer than 600 seconds to become ready - node.js

I have a problem with my openshift 3 setup, based on Node.js + MongoDB (Persistent) https://github.com/openshift/nodejs-ex.git
Latest App Deployment: nodejs-mongo-persistent-7: Failed
--> Scaling nodejs-mongo-persistent-7 to 1
--> Waiting up to 10m0s for pods in rc nodejs-mongo-persistent-7 to become ready
error: update acceptor rejected nodejs-mongo-persistent-7: pods for rc "nodejs-mongo-persistent-7" took longer than 600 seconds to become ready
Latest Build: Complete
Pushing image 172.30.254.23:5000/husk/nodejs-mongo-persistent:latest ...
Pushed 5/6 layers, 84% complete
Pushed 6/6 layers, 100% complete
Push successful
I have no idea how to debug this? Can you help please.

Check what went wrong in console: oc get events
Failed to pull image? Make sure you included a proper secret

Related

How to pause azure pipeline execution for several seconds?

I am using Maria DB docker image for integration tests. I start it in an Azure pipeline via the following commands:
docker pull <some_azure_repository>/databasedump:<tag_number>
docker run -d --publish 3306:3306 <some_azure_repository>/databasedump:<tag_number>
And after that integration tests, written in Python, are started.
But when the code tries to connect to the Maria DB database, mysql error is returned.
+ 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0
Maybe the reason for that is, that the Maria DB database is big and it needs some seconds to be started.
So my question is whether there is a way to set a sleep of several second in a pipeline execution? In a script or bash section.

You can build a delay step into your pipelines yaml file between the setup of your docker image and your test execution.
# Delay v1
# Delay further execution of a workflow by a fixed time.
- task: Delay#1
inputs:
delayForMinutes: '0' # string. Required. Delay Time (minutes). Default: 0.
https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/delay-v1?view=azure-pipelines

Intermittent terminated: Application failed to start: not available

I am using Google cloudrun to run my NodeJs application.
Most of the time it is working without any issues, however once in a while I am getting the the error "terminated: Application failed to start: not available" and starts getting 503 errors.
I have gone through the other answers, but it did not help in my case as it is not intermittent issues.
Could you please help me understand the cause for the error and how can we fix this?
Configuration of the micro-service is as given below:
CPU - 1
Memory - 1 GiB
Concurrency - 40
Request timeout - 300 seconds
Execution environment - Second generation
Min instances - 1
Max instances - 100
Thank you,
KK

how to make azure external.metrics.k8s adapter work?

I've setup Azure external metrics adapter following this document "https://github.com/Azure/azure-k8s-metrics-adapter/tree/master/samples/servicebus-queue"
After the helm installation using service-principal when executing the command kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq I should be getting an output as suggested by the document. but instead I'm facing an error stating Error from server (ServiceUnavailable): the server is currently unable to handle the request
The helm installation went successful and below are the logs
I0116 12:49:36.216094 1 controller.go:40] Setting up external metric
event handlers I0116 12:49:36.216148 1 controller.go:52] Setting up
custom metric event handlers I0116 12:49:36.216528 1 controller.go:69]
initializing controller I0116 12:49:36.353905 1 main.go:104] Looking
up subscription ID via instance metadata I0116 12:49:36.359887 1
instancemetadata.go:40] connected to sub: ********************* I0116
12:49:36.416858 1 controller.go:77] starting 2 workers with 1000000000
interval I0116 12:49:36.417062 1 controller.go:88] Worker starting
I0116 12:49:36.417068 1 controller.go:88] Worker starting I0116
12:49:36.417074 1 controller.go:98] processing item I0116
12:49:36.417078 1 controller.go:98] processing item I0116
12:49:36.680065 1 serving.go:312] Generated self-signed cert
(apiserver.local.config/certificates/apiserver.crt,
apiserver.local.config/certificates/apiserver.key) I0116
12:49:37.197936 1 secure_serving.go:116] Serving securely on [::]:6443
When I execute the command kubectl api-versions external.metrics.k8s.io/v1beta1 is displayed in the list. So this proves that the installation went successful. But why am I not able to hit the api???

Solved it. Initially I was installing in my custom namespace. Looks like Azure metrics adapter will work only if it is installed in namespace "custom-metrics". Probably they should mention it somewhere in the document. It cost me 2 days of trouble shooting to figure this out :-(

Cypress UI tests throwing time out for waiting for browser

I am running Cypress UI tests in AzureDevOps CI/CD and some how most of the UI test are getting failed. All of the tests were running fine few days back.
It is throwing a Timed out waiting for the browser to connect. Retrying. error: Any advise on how to resolve the issue.
Environment Details:
Cypress version: 3.4.1,
Node: 10.x,
Azure DevOps CI/CD
Running: report/send-report.spec.js... (12 of 14)
2019-10-10T00:47:31.0294852Z
2019-10-10T00:47:31.0295427Z Warning: Cypress can only record videos when using the built in 'electron' browser.
2019-10-10T00:47:31.0295707Z
2019-10-10T00:47:31.0296579Z You have set the browser to: 'chrome'
2019-10-10T00:47:31.0296837Z
2019-10-10T00:47:31.0297613Z A video will not be recorded when using this browser.
2019-10-10T00:47:31.0313740Z (node:4030) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 end listeners added. Use emitter.setMaxListeners() to increase limit
2019-10-10T00:48:01.0316223Z
2019-10-10T00:48:01.0592004Z Timed out waiting for the browser to connect. Retrying...
2019-10-10T00:48:31.0587550Z
2019-10-10T00:48:31.0839142Z Timed out waiting for the browser to connect. Retrying again...
2019-10-10T00:49:01.0877330Z
2019-10-10T00:49:01.1241198Z The browser never connected. Something is wrong. The tests cannot run. Aborting...

I have noticed that you have set retries value as 2 to enable immediately retry on failure instead of moving on to the next test. So I recommend you to change the value and check if the error still occur.
And you can try another workaround, to change numTestsKeptInMemory down from 50 to something sane like 1 or 0. Here is the offical documentation. https://docs.cypress.io/guides/references/configuration.html#Global
In addition, it seems like an occasional error. Because some users failed on the first pipeline, but succeed on the second pipeline. And this should be a problem with cypress itself or your system's memory, you can report this problem to cypress directly.
Here is the link about cypress-io/cypress. https://github.com/cypress-io/cypress/issues/
And here is the link about the same error message.https://github.com/cypress-io/cypress/issues/1305

App Engine Flex deployment health check fails

I've made a Python 3 Flask app to serve as an API proxy with gunicorn. I've deployed the openapi to Cloud Endpoints and filled in the endpoints service in the app.yaml file.
When I try to deploy to app engine flex, the health check fails because it took too long. I've tried to alter the readiness_check's app_start_timeout_sec like suggested but to no avail. When checking the logs on stackdriver I can only see gunicorn booting a couple of workers and eventually terminating everything a couple times in a row. No further explanation of what goes wrong. I've also tried to specify resources in the app.yaml and scaling the workers in the gunicorn.conf.py file but to no avail.
Then I tried switching to uwsgi but this acted in the same way: starting up and terminating a couple of times in a row and health check timeout.
error:
ERROR: (gcloud.app.deploy) Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.
app.yaml
runtime: python
env: flex
entrypoint: gunicorn -c gunicorn.conf.py -b :$PORT main:app
runtime_config:
python_version: 3
endpoints_api_service:
name: 2019-09-27r0
rollout_strategy: managed
resources:
cpu: 1
memory_gb: 2
disk_size_gb: 10
gunicorn.conf.py:
import multiprocessing
bind = "127.0.0.1:8000"
workers = multiprocessing.cpu_count() * 2 + 1
requirments.txt:
aniso8601==8.0.0
certifi==2019.9.11
chardet==3.0.4
Click==7.0
Flask==1.1.1
Flask-Jsonpify==1.5.0
Flask-RESTful==0.3.7
gunicorn==19.9.0
idna==2.8
itsdangerous==1.1.0
Jinja2==2.10.1
MarkupSafe==1.1.1
pytz==2019.2
requests==2.22.0
six==1.12.0
urllib3==1.25.5
Werkzeug==0.16.0
pyyaml==5.1.2
Is there anyone who can spot a conflict or something I forgot in here? I'm out of ideas and really need help. It would also definitely help if someone could point me in the right direction where to find more info in the logs (I also run the gcloud app deploy with --verbosity=debug but this only shows "Updating service [default]... ...Waiting to retry."). I would really like to know what causes the health checks to timeout!
Thanks in advance!

You can both disable Health Checks or customize them:
For disabling you have to add the following to your app.yaml:
health_check:
enable_health_check: False
For customize them you can take a look into the Split health checks.
You can customize Liveness checks request by adding an optional liveness_check section on you app.yaml file, for example:
liveness_check:
path: "/liveness_check"
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
In the documentation you can check the settings available for liveness checks.
In addition, there are the Readiness checks. In the same way, you can customize some settings, for example:
readiness_check:
path: "/readiness_check"
check_interval_sec: 5
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
The values above mentioned can be changed according to your needs. Check this values especially since App Engine Flexible takes some minutes to get the instance startup-ed, this is a remarkable difference to App Engine Standard and should not be taken lightly.
If you examine the nginx.health_check logs for your application, you might see health check polling happening more frequently than you have configured, due to the redundant health checkers that are also following your settings. These redundant health checkers are created automatically and you cannot configure them.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Openshift 3 App Deployment Failed: Took longer than 600 seconds to become ready - node.js

Check what went wrong in console: oc get events Failed to pull image? Make sure you included a proper secret

Related

How to pause azure pipeline execution for several seconds?

Intermittent terminated: Application failed to start: not available

how to make azure external.metrics.k8s adapter work?

Cypress UI tests throwing time out for waiting for browser

App Engine Flex deployment health check fails

Categories

Resources