I've generated a project with Yeoman Angular-Fullstack generator (https://github.com/angular-fullstack/generator-angular-fullstack).
I created an app.yaml and tried to deploy the project on GAE with command:
gcloud app deploy
But I'm getting an error:
ERROR: (gcloud.app.deploy) Error Response: [13] Timed out when starting VMs. It's possible that the application code is unhealthy. (0/2 ready, 2 still deploying).
Any tips on how to debug the gcloud deploy? I'm running the latest gcloud SDK.
--
Here's a longer debug trace:
Updating service [default]...-DEBUG: Operation [apps/<MY-PROJECT>/operations/63e50c89-da5f-4697-aeea-447865a82cc4] not complete. Waiting 5s.
Updating service [default]...|DEBUG: Operation [apps/<MY-PROJECT>/operations/63e50c89-da5f-4697-aeea-447865a82cc4] complete. Result: {
"metadata": {
"target": "apps/<MY-PROJECT>/services/default/versions/20160804t151734",
"method": "google.appengine.v1beta5.Versions.CreateVersion",
"user": "<MY-EMAIL>#gmail.com",
"insertTime": "2016-08-04T12:16:31.905Z",
"endTime": "2016-08-04T12:24:03.526Z",
"#type": "type.googleapis.com/google.appengine.v1beta5.OperationMetadataV1Beta5"
},
"done": true,
"name": "apps/<MY-PROJECT>/operations/63e50c89-da5f-4697-aeea-447865a82cc4",
"error": {
"message": "Timed out when starting VMs. It's possible that the application code is unhealthy. (0/2 ready, 2 still deploying).",
"code": 13
}
}
Updating service [default]...failed.
DEBUG: (gcloud.app.deploy) Error Response: [13] Timed out when starting VMs. It's possible that the application code is unhealthy. (0/2 ready, 2 still deploying).
Traceback (most recent call last):
File "/Users/jp/softaa/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 719, in Execute
result = args.calliope_command.Run(cli=self, args=args)
File "/Users/jp/softaa/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 1404, in Run
resources = command_instance.Run(args)
File "/Users/jp/softaa/google-cloud-sdk/lib/surface/app/deploy.py", line 57, in Run
return deploy_util.RunDeploy(self, args)
File "/Users/jp/softaa/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 215, in RunDeploy
api_client.DeployService(name, version, service, manifest, image)
File "/Users/jp/softaa/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/appengine_api_client.py", line 89, in DeployService
return operations.WaitForOperation(self.client.apps_operations, operation)
File "/Users/jp/softaa/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/api/operations.py", line 70, in WaitForOperation
encoding.MessageToPyValue(completed_operation.error)))
OperationError: Error Response: [13] Timed out when starting VMs. It's possible that the application code is unhealthy. (0/2 ready, 2 still deploying).
ERROR: (gcloud.app.deploy) Error Response: [13] Timed out when starting VMs. It's possible that the application code is unhealthy. (0/2 ready, 2 still deploying).
GAE does not come with Mongo. You have two options
Use GAE flex and build your own mongo container and use it
Use a Mongo cloud provider like https://mlab.com and they even have a free version to test.
If your question is just about finding the cause of this error, I would recommend using the Development Server on your localhost to find the error.
Without taking a deeper look, I guess the Node part does not fulfil the requirements for GAE or you are trying to deploy to a region where Node Applications are not supported.
Related
Encountering problem when deploying Node/React application to App engine using the command: gcloud app deploy --quiet --no-promote --no-stop-previous-version --version={VERSION_NUMBER}. This didn't happen before with the previous deployments only until today.
I've also checked the deployment and application logs in Cloud Console and there are no errors. This code is also working on my local and there are no changes to the configuration so I don't know where the problem is coming from. Also did deploy the reverted code but the error is still the same.
app.yaml file:
runtime: custom
env: flex
resources:
cpu: 1
memory_gb: 2
disk_size_gb: 15
manual_scaling:
instances: 1
Here's the logs when running the debug mode: Result:
{ "done": true, "error": { "code": 13, "message": "The system encountered a fatal error" }, "metadata": { "#type": "type.googleapis.com/google.appengine.v1.OperationMetadataV1", "createVersionMetadata": { "cloudBuildId": "4e46d51e-f408-479e-ad06-81cd794d9028" }, "endTime": "2022-05-25T15:41:26.285Z", "insertTime": "2022-05-25T15:10:10.933Z", "method": "google.appengine.v1.Versions.CreateVersion", "target": "{OMITTED}", "user": "{OMITTED}" }, "name": "apps/{OMITTED}/operations/e9d94daa-055b-4903-8cbf-372df1ce5c3c" } Updating service [default] (this may take several minutes)...failed. DEBUG: (gcloud.app.deploy) Error Response: [13] The system encountered a fatal error Traceback (most recent call last): File "C:\Users\USER\AppData\Local\Google\Cloud SDK\google-cloud-sdk\lib\googlecloudsdk\calliope\cli.py", line 987, in Execute resources = calliope_command.Run(cli=self, args=args) File "C:\Users\USER\AppData\Local\Google\Cloud SDK\google-cloud-sdk\lib\googlecloudsdk\calliope\backend.py", line 809, in Run resources = command_instance.Run(args) File "C:\Users\USER\AppData\Local\Google\Cloud SDK\google-cloud-sdk\lib\surface\app\deploy.py", line 127, in Run return deploy_util.RunDeploy( File "C:\Users\USER\AppData\Local\Google\Cloud SDK\google-cloud-sdk\lib\googlecloudsdk\command_lib\app\deploy_util.py", line 692, in RunDeploy aiter.py", line 320, in _IsNotDone return not poller.IsDone(operation) File "C:\Users\USER\AppData\Local\Google\Cloud SDK\google-cloud-sdk\lib\googlecloudsdk\api_lib\app\operations_util.py", line 182, in IsDone raise OperationError(requests.ExtractErrorMessage(googlecloudsdk.api_lib.app.operations_util.OperationError: Error Response: [13] The system encountered a fatal error ERROR: (gcloud.app.deploy) Error Response: [13] The system encountered a fatal error
I've submitted a ticket to GCP support but haven't responded yet. Wondering if someone here has encountered this error before and the resolution.
As per this public issue tracker, The Google Support Team suggests using the commands below to fix the issue:
gcloud config set interactive/hidden true
then:
gcloud app update --service account=your_project_id#appspot.gserviceaccount.com
I raised a ticket to Google Support team and they came back saying it was due to the latest release of GAE that caused it. They said they have reverted the change and it is now working as expected.
I have nodeJs app that deployed on google app engine.
When I trigger deployment through GitLab CD everything goes ok.
Recently CD pipeline started returning errors, I add --verbosity=debug flag and get the following report (from GitLab CI)
╔════════════════════════════════════════════════════════════╗
╠═ Uploading 2 files to Google Cloud Storage ═╣
╚INFO: Uploading [/builds/***/***/api/package.json] to [staging.***.appspot.com/***]
INFO: Uploading [/tmp/tmp2sf554dn/source-context.json] to [staging.***.appspot.com/***]
════════════════════════════════════════════════════════════╝
File upload done.
DEBUG: Converted YAML to JSON: "{
"entrypoint": {
"shell": ""
},
"runtime": "nodejs12"
}"
DEBUG: Operation [apps/***/operations/***] not complete. Waiting to retry.
Updating service [api]...
......DEBUG: Operation [apps/***/operations/***] not complete. Waiting to retry.
................................DEBUG: Operation [apps/***/operations/***] not complete. Waiting to retry.
.................................DEBUG: Operation [apps/***/operations/***] not complete. Waiting to retry.
...............................DEBUG: Operation [apps/***/operations/***] not complete. Waiting to retry.
................................DEBUG: Operation [apps/***/operations/***] not complete. Waiting to retry.
...............................DEBUG: Operation [apps/***/operations/***] not complete. Waiting to retry.
............................DEBUG: Operation [apps/***/operations/***] not complete. Waiting to retry.
.............................DEBUG: Operation [apps/***/operations/***] not complete. Waiting to retry.
.............................DEBUG: Operation [apps/***/operations/***] not complete. Waiting to retry.
............................DEBUG: Operation [apps/***/operations/***] complete. Result: {
"error": {
"message": "Cloud build *** status: FAILURE\nError type: OK\nFull build logs: https://console.cloud.google.com/cloud-build/builds/***?project=***",
"code": 9
},
"metadata": {
"target": "***",
"method": "google.appengine.v1.Versions.CreateVersion",
"user": "***",
"endTime": "2020-09-10T09:46:28.638Z",
"#type": "type.googleapis.com/google.appengine.v1.OperationMetadataV1",
"insertTime": "2020-09-10T09:45:34.795Z"
},
"done": true,
"name": "apps/MY_PROJECT/operations/***"
}
failed.
DEBUG: (gcloud.app.deploy) Error Response: [9] Cloud build *** status: FAILURE
Error type: OK
Full build logs: https://console.cloud.google.com/cloud-build/builds/***?project=***
Traceback (most recent call last):
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 983, in Execute
resources = calliope_command.Run(cli=self, args=args)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 808, in Run
resources = command_instance.Run(args)
File "/usr/lib/google-cloud-sdk/lib/surface/app/deploy.py", line 121, in Run
default_strategy=flex_image_build_option_default))
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 644, in RunDeploy
ignore_file=args.ignore_file)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 430, in Deploy
extra_config_settings)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/appengine_api_client.py", line 208, in DeployService
poller=done_poller)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/operations_util.py", line 314, in WaitForOperation
sleep_ms=retry_interval)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 264, in WaitFor
sleep_ms, _StatusUpdate)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 326, in PollUntilDone
sleep_ms=sleep_ms)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py", line 229, in RetryOnResult
if not should_retry(result, state):
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 320, in _IsNotDone
return not poller.IsDone(operation)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/operations_util.py", line 183, in IsDone
encoding.MessageToPyValue(operation.error)))
googlecloudsdk.api_lib.app.operations_util.OperationError: Error Response: [9] Cloud build *** status: FAILURE
Error type: OK
Full build logs: https://console.cloud.google.com/cloud-build/builds/***?project=***
ERROR: (gcloud.app.deploy) Error Response: [9] Cloud build *** status: FAILURE
Error type: OK
Full build logs: https://console.cloud.google.com/cloud-build/builds/***
I checked the report on cloud build and get the following:
......
Setting CNB_DEPRECATION_MODE=quiet
Setting ENTRYPOINT: '/cnb/lifecycle/launcher'
*** Images (sha256:***):
eu.gcr.io/***/app-engine-tmp/app/ttl-2h/api/buildpack-app:latest - GET https://storage.googleapis.com/eu.artifacts.***.appspot.com/containers/images/sha256:***?access_token=REDACTED: unsupported status code 404; body: <?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: eu.artifacts.***.appspot.com/containers/images/sha256:***</Details></Error>
*** Digest: sha256:***
ERROR: failed to export: failed to write image to the following tags: [eu.gcr.io/***/app-engine-tmp/app/ttl-2h/api/buildpack-app:latest: GET https://storage.googleapis.com/eu.artifacts.***.appspot.com/containers/images/sha256:***?access_token=REDACTED: unsupported status code 404; body: <?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: eu.artifacts.***.appspot.com/containers/images/sha256:***</Details></Error>]
I don't alter the service account which used for deployment to google app engine before
Interesting detail that when I trigger deployment locally everything goes OK, and from custom GitLab.domain, but when I start to trigger on gitlab.com crashes recently.
Maybe you have a similar experience from fixing such problems?
Thanks.
We are trying to scan our docker images using Anchore Engine Jenkins plugin.
Currently we create our application docker images, push it in our own private local registry and then deploy it in our test environments.
Now, we want to setup docker image scanning in our CI/CD process to check for any vulnerabilities.
We have installed Anchore Engine using the recommended Docker-Compose yaml method given in the Documentation link:
https://anchore.freshdesk.com/support/solutions/articles/36000020729-install-on-docker-swarm
Post installation, we installed the
Anchore Container Image Scanner Plugin in Jenkins.
We configured the plugin as mentioned in the document link:
https://wiki.jenkins.io/display/JENKINS/Anchore+Container+Image+Scanner+Plugin
However, the scanning fails. Error Message as follows:
2018-10-11T07:01:44.647 INFO AnchoreWorker Analysis request accepted, received image digest sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8
2018-10-11T07:01:44.647 INFO AnchoreWorker Waiting for analysis of 10.180.25.2:5000/hello-world:latest, polling status periodically
2018-10-11T07:01:44.647 DEBUG AnchoreWorker anchore-engine get policy evaluation URL: http://10.180.25.2:8228/v1/images/sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8/check?tag=10.180.25.2:5000/hello-world:latest&detail=true
2018-10-11T07:01:44.648 DEBUG AnchoreWorker Attempting anchore-engine get policy evaluation (1/300)
2018-10-11T07:01:44.675 DEBUG AnchoreWorker anchore-engine get policy evaluation failed. URL: http://10.180.25.2:8228/v1/images/sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8/check?tag=10.180.25.2:5000/hello-world:latest&detail=true, status: HTTP/1.1 404 NOT FOUND, error: {
"detail": {},
"httpcode": 404,
"message": "image is not analyzed - analysis_status: not_analyzed"
}
NOTE:
In Image TAG 10.180.25.2:5000/hello-world:latest, 10.180.25.2:5000 is our local private registry and hello-world:latest is latest hello-world image available in docker hub which we pulled and pushed in our registry to try out image scanning using Anchore-Engine.
Unfortunately we are not able to find much resource online to try and resolve the above mentioned issue.
Anyone who might have worked on Anchore-Engine, please may I request to have a look and help us resolve this issue.
Also, any suggestions or alternatives to anchore-engine or detailed steps in case we might have missed anything would be really appreciated.
End of the output is as follows:
2018-10-15T00:48:43.880 WARN AnchoreWorker anchore-engine get policy evaluation failed. HTTP method: GET, URL: http://10.180.25.2:8228/v1/images/sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8/check?tag=10.180.25.2:5000/hello-world:latest&detail=true, status: 404, error: {
"detail": {},
"httpcode": 404,
"message": "image is not analyzed - analysis_status: not_analyzed"
}
2018-10-15T00:48:43.880 WARN AnchoreWorker Exhausted all attempts polling anchore-engine. Analysis is incomplete for sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8
2018-10-15T00:48:43.880 ERROR AnchorePlugin Failing Anchore Container Image Scanner Plugin step due to errors in plugin execution
hudson.AbortException: Timed out waiting for anchore-engine analysis to complete (increasing engineRetries might help). Check above logs for errors from anchore-engine
at com.anchore.jenkins.plugins.anchore.BuildWorker.runGatesEngine(BuildWorker.java:480)
at com.anchore.jenkins.plugins.anchore.BuildWorker.runGates(BuildWorker.java:343)
at com.anchore.jenkins.plugins.anchore.AnchoreBuilder.perform(AnchoreBuilder.java:338)
at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:81)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
at hudson.model.Build$BuildExecution.build(Build.java:206)
at hudson.model.Build$BuildExecution.doRun(Build.java:163)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
at hudson.model.Run.execute(Run.java:1724)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:421)
I also checked status and found below:
docker run anchore/engine-cli:latest anchore-cli --u admin --p admin123 --url http://172.18.0.1:8228/v1 system status
Service analyzer (dockerhostid-anchore-engine, http://anchore-engine:8084): up
Service catalog (dockerhostid-anchore-engine, http://anchore-engine:8082): up
Service policy_engine (dockerhostid-anchore-engine, http://anchore-engine:8087): down (unavailable)
Service simplequeue (dockerhostid-anchore-engine, http://anchore-engine:8083): up
Service apiext (dockerhostid-anchore-engine, http://anchore-engine:8228): up
Service kubernetes_webhook (dockerhostid-anchore-engine, http://anchore-engine:8338): up
Engine DB Version: 0.0.7
Engine Code Version: 0.2.4
It seems service policy engine is down
Service policy_engine (dockerhostid-anchore-engine, http://anchore-engine:8087): down (unavailable)
I also checked the docker logs . I found below error:
[service:policy_engine] 2018-10-15 09:37:46+0000 [-] [bootstrap] [DEBUG] service (policy_engine) starting in: 4
[service:policy_engine] 2018-10-15 09:37:46+0000 [-] [bootstrap] [INFO] Registration complete.
[service:policy_engine] 2018-10-15 09:37:46+0000 [-] [bootstrap] [INFO] Checking feeds client credentials
[service:policy_engine] 2018-10-15 09:37:46+0000 [-] [bootstrap] [DEBUG] Initializing a feeds client
[service:policy_engine] 2018-10-15 09:37:47+0000 [-] [bootstrap] [DEBUG] init values: [None, None, None, (), None, None]
[service:policy_engine] 2018-10-15 09:37:47+0000 [-] [bootstrap] [DEBUG] using values: ['https://ancho.re/v1/service/feeds', 'https://ancho.re/oauth/token', 'https://ancho.re/v1/account/users', 'anon#ancho.re', 3, 60]
[service:policy_engine] 2018-10-15 09:37:47+0000 [-] [urllib3.connectionpool] [DEBUG] Starting new HTTPS connection (1): ancho.re
[service:policy_engine] 2018-10-15 09:37:50+0000 [-] [bootstrap] [ERROR] Preflight checks failed with error: HTTPSConnectionPool(host='ancho.re', port=443): Max retries exceeded with url: /v1/account/users/anon#ancho.re (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ffa905f0b90>: Failed to establish a new connection: [Errno 113] No route to host',)). Aborting service startup
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/anchore_manager/cli/service.py", line 158, in startup_service
raise Exception("process exited: " + str(rc))
Exception: process exited: 1
[anchore-policy-engine] [anchore_manager.cli.service/startup_service()] [INFO] service process exited at (Mon Oct 15 09:37:50 2018): process exited: 1
[anchore-policy-engine] [anchore_manager.cli.service/startup_service()] [INFO] exiting service thread
Thanks and Regards,
Rohan Shetty
When images are added to anchore-engine, they are queued for analysis which moves them through a simple state machine that starts with ‘not_analyzed’, goes to ‘analyzing’ and finally ends in either ‘analyzed’ or ‘analysis_failed’. Only when an image has reached ‘analyzed’ will a policy evaluation be possible.
The anchore Jenkins plugin will add an image, then poll the engine for image status/evaluation for the configured number of tries (default 300). Once the image goes to ‘analyzed’ (where policy evaluation is possible), the plugin will then receive a policy evaluation result from the engine.
The plugin will fail the build (by default) if the max retries has been performed and the image has not reached ‘analyzed’, if the image does reach ‘analyzed’ but the policy evaluation is producing a ‘fail’ result (meaning the image didn’t pass your configured policy checks). Note that all build failure behavior can be controlled in the plugin (I.e. there are options to allow the plugin to succeed even if the analysis or image eval fails).
You’ll need to look at the end of the output from your build run (instead of just the beginning from your post), and combined with the information above, it should be clear which scenario is causing the plugin to fail the build.
We have resolved the issue.
Root Cause:
We were not able to establish a successful https connection to URL : https://ancho.re from within the anchore-engine docker container.
As a result the service:policy_engine was not able to start.
https://ancho.re is required to download policy feeds and sync-up periodically. Without these policy anchore-engine won't be able to analyse the docker images.
Solution:
1) We passed a HTTPS_PROXY URL as an environment variable in the docker-compose.yaml of anchore-engine.
We used this proxy URL to bypass restrictions in our environment and establish a connection with https://ancho.re url.
2) Restarted the docker containers.
Finally we got all services up and running including Anchore policy-engine.
FYI:
It takes a while to download all the required Feeds depending on your internet speed.
Lastly, Thanks to the Anchore community for quick responses and support over slack.
Hope this helps.
Warm Regards,
Rohan Shetty
I have few subscriptions in Azure and at least 35 resource groups and at least 100 Virtual machines in each subscription.
So it is 35 resource groups and 100 VM's and I wan't to delete azure extension in every VM.
Currently I am using script:
!/bin/bash
now=$(date +"%T")
USER="user"
RESOURCEGROUPLIST="/home/$USER/resourcegroupsdev"
VMLIST="/home/$USER/vmlistdev"
echo "################## DELETE EXTENSION ##################"
echo "Current time : $now"
cat $RESOURCEGROUPLIST | while read -r LINER
do
cat $VMLIST | while read -r LINE
do
az vm extension delete -g $LINER --vm-name $LINE -n LinuxDiagnostic --verbose
done
echo "Current time : $now"
done
Frequently I get this error:
VM 'dev-vm-test-001' has not reported status for VM agent or extensions. Please verify the VM has a running VM agent, and can establish outbound connections to Azure storage.
and sometimes this error:
Error occurred in request., SSLError: ("bad handshake: Error([('SSL routines', 'SSL3_GET_SERVER_CERTIFICATE', 'certificate verify failed')],)",)
Traceback (most recent call last):
File "/usr/bin/azure-cli/lib/python2.7/site-packages/azure/cli/main.py", line 36, in main
cmd_result = APPLICATION.execute(args)
File "/usr/bin/azure-cli/lib/python2.7/site-packages/azure/cli/core/application.py", line 210, in execute
result = expanded_arg.func(params)
File "/usr/bin/azure-cli/lib/python2.7/site-packages/azure/cli/core/commands/__init__.py", line 289, in __call__
return self.handler(*args, **kwargs)
File "/usr/bin/azure-cli/lib/python2.7/site-packages/azure/cli/core/commands/__init__.py", line 498, in _execute_command
raise client_exception
ClientRequestError: Error occurred in request., SSLError: ("bad handshake: Error([('SSL routines', 'SSL3_GET_SERVER_CERTIFICATE', 'certificate verify failed')],)",)
Deleting process takes forever - I mean - a lot of errors. No clear output, in which VM extension was deleted.
Do someone has idea how to boost process to delete extension?
For your own routine - It looks like you should be responsible for printing out\keeping a list of vm extension operations errored out\completed. since you are doing the looping, not the CLI. So its your responsibility.
For the SSL handshake - no ideas; for the storage - are you having Network Security Groups blocking outbound connections (or iptables or whatever)? as they might interfere with the VM extensions, so VM extensions cannot report status. And this essentially leads to this error. You can easily verify this by logging to the portal and checking the vm in question. under the extension property it should tell you something like: "vm agent failed to report status bla-bla-bla"
i would suggest raising this at the Azure CLI 2.0 repo. I don't think there's anything SO users can help you with here.
My opscenter always gets stuck with Loading OpsCenter... BTW this is my first installation and so far I did not get OpsCenter to run.
All three of these run normally.
nodetool status
service dse
service datastax-agent
I could reproduce it on both GChrome & MFirefox. Both remotely & running on localhost.
opscenterd.log :
2017-04-12 15:20:15,877 [myclustername] WARN: These nodes reported this message, Nodes: ['10.35.21.207'] Message: HTTP request http://10.35.21.207:61621/connection-status? failed:
An error occurred while connecting: 107: Transport endpoint is not connected. (MainThread)
when using life cycle manager , it sees my cluster name I picked but could not connect. Here's what the log looks like when I attempt to start managing the un-managed cluster.
[opscenterd] ERROR: Problem while calling ImportClusterIntoLifecycleManagerController (AgentCommunicationFailure): Cluster Import Failure: Unable to determine the DSE version for the specified cluster. Please verify that the Agents for this cluster are properly communicating with Opscenter.
File "/usr/share/opscenter/lib/py/twisted/internet/defer.py", line 1122, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/usr/share/opscenter/lib/py/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/usr/share/opscenter/jython/Lib/site-packages/opscenterd/WebServer.py", line 2598, in ImportClusterIntoLifecycleManagerController