KYPO Deployment failure openstack retuning no valid host found - security

I have deployed devstack for my OpenStack using the default configuration and trying to deploy kypo. I am running ./create-base.sh and getting the following error
[kypo-proxy-jump-stack]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.kypo-proxy-jump: Went to status ERROR due to "Message: No valid host was found. , Code: 500"
[kypo-proxy-jump-stack.kypo-proxy-jump]: CREATE_FAILED ResourceInError: resources.kypo-proxy-jump: Went to status ERROR due to "Message: No valid host was found. , Code: 500"
My devstack config:
content of local.conf
[[local|localrc]]
#Enable heat services
enable_service h-eng h-api h-api-cfn h-api-cw
[[local|localrc]]
#Enable heat plugin
enable_plugin heat https://opendev.org/openstack/heat
IMAGE_URL_SITE="https://download.fedoraproject.org"
IMAGE_URL_PATH="/pub/fedora/linux/releases/33/Cloud/x86_64/images/"
IMAGE_URL_FILE="Fedora-Cloud-Base-33-1.2.x86_64.qcow2"
IMAGE_URLS+=","$IMAGE_URL_SITE$IMAGE_URL_PATH$IMAGE_URL_FILE

There is a workaround: you need to reduce the kypo-proxy-jump's flavor.
Something like this:
openstack flavor create --ram 2048 --disk 10 --vcpus 1 standard.medium
However, check your Openstack resources and logs, there is probably lack of resource (disk, mem or cpu).

Related

Error: Rotate certificates in Azure Kubernetes Service (AKS)

I used https://learn.microsoft.com/en-us/azure/aks/certificate-rotation this link to rotate certificates in AKS. Certificate got updated but my cluster is in failed state. Because of this my application is down.
I am getting below mentioned error when I am running this command az aks rotate-certs -g $RESOURCE_GROUP_NAME -n $CLUSTER_NAME
ERROR: "error": { "code": "ErrorCodeRotateClusterCertificates", "message": "VMASAgentPoolReconciler retry failed: Category: ClientError; SubCode: OutboundConnFailVMExtensionError; Dependency: Microsoft.Compute/virtualMachines/extensions; OrginalError: Code=\"VMExtensionProvisioningError\" Message=\"VM has reported a failure when processing extension 'cse-agent-0'. Error message: \\\"Enable failed: failed to execute command: command terminated with exit status=50\\n[stdout]\\n\\n[stderr]\\ncurl: option --proxy-insecure: is unknown\\ncurl: try 'curl --help' or 'curl --manual' for more information\\nCommand exited with non-zero status 2\\n0.00user 0.00system 0:00.00elapsed 100%!!(MISSING)C(string=VMAS agent pools reconciling)PU (0avgtext+0avgdata 7044maxresident)k\\n0inputs+8outputs (0major+372minor)pagefaults 0swaps\\n\\\"\\r\\n\\r\\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot \"; AKSTeam: NodeProvisioning, Retriable: false" } }
Kubernetes version: 1.14.8
Please help to resolved this issue.
What version of Ubuntu are you running on your nodes? From that error, guessing Ubuntu 16.04 or older.
I'm not sure if it will work, but instead of trying to rotate certificates, can you try upgrading the nodes?
You might also want to consider just creating a new cluster, and using VMSS instead of VMAS.

AKS nodes failed provisioning

So I have an AKS cluster in DEV env which was working fine. Today I have noticed that some pods due being removed/uninstalled via helm were stuck in Terminating state.
I found out that none of the 3 nodes are ready. When I stopped the cluster and started again, VMs failed to create in VMMS with associated message:
VM has reported a failure when processing extension 'vmssCSE'. Error message: "Enable failed: failed to execute command: command terminated with exit status=50
According to what I have found might look like the VMs in scale set are missing outbound internet connectivity, however the associated NSG has only the defaults:
When inspecting the VMSS status, it says the following:
VM has reported a failure when processing extension 'vmssCSE'. Error message: "Enable failed: failed to execute command: command terminated with exit status=50 [stdout] [stderr] nc: connect to mcr.microsoft.com port 443 (tcp) failed: Connection timed out Command exited with non-zero status 1 0.00user 0.00system 2:10.07elapsed 0%CPU (0avgtext+0avgdata 2360maxresident)k 0inputs+8outputs (0major+113minor)pagefaults 0swaps " More information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot
This troubleshooting doesn't seem to be helpful as it states:
When restricting egress traffic from an AKS cluster, there are required and optional recommended outbound ports / network rules and FQDN / application rules for AKS. If your settings are in conflict with any of these rules, certain kubectl commands won't work correctly. You may also see errors when creating an AKS cluster.
Verify that your settings aren't conflicting with any of the required or optional recommended outbound ports / network rules and FQDN / application rules.
But the default rules have not changed, therefore I'm lost at that point.

Azure VM Scaleset custom script extension not working - possibly failing to get VM identity?

I'm attempting to deploy to my Virtual machine scale set using the custom script extension as below.
az vmss extension set --debug --name 'CustomScriptExtension' `
--resource-group 'my-rg' `
--publisher 'Microsoft.Compute' `
--version '1.9.5' `
--vmss-name 'myvmss' `
--settings '{\"commandToExecute\": \"powershell.exe ./download-package.ps1\", \"fileUris\": [\"https://[REDACTED].blob.core.windows.net/upload/download-package.ps1\"]}' `
--protected-settings '{\"managedIdentity\": {\"objectId\": \"[REDACTED]\"}}'
When running I get the following error:
cli.azure.cli.core.azclierror : Deployment failed. Correlation ID: 73f4d16b-afe0-4373-8773-1d7dd7d26940. VM has reported a failure when processing extension 'CustomScriptExtension'. Error message: "Failed to download all specified files. Exiting. Error Message: Exception of type 'Microsoft.WindowsAzure.GuestAgent.Plugins.CustomScriptHandler.Downloader.MsiNotFoundException' was thrown."
More information on troubleshooting is available at https://aka.ms/VMExtensionCSEWindowsTroubleshoot
Deployment failed. Correlation ID: 73f4d16b-afe0-4373-8773-1d7dd7d26940. VM has reported a failure when processing extension 'CustomScriptExtension'. Error message: "Failed to download all specified files. Exiting. Error Message: Exception of type 'Microsoft.WindowsAzure.GuestAgent.Plugins.CustomScriptHandler.Downloader.MsiNotFoundException' was thrown."
The file to be downloaded requires authentication so I have given the scale set a system assigned identity and granted it the Storage Blob Data Reader role on the storage account hosting the powershell file.
The custom extension logs on the VM suggest that it was unable to get the identity of the vm:
[7108+00000001] [11/20/2020 09:12:28.79] [INFO] Handler successfully enabled
[7108+00000001] [11/20/2020 09:12:28.80] [INFO] Loading configuration for sequence number 1
[7108+00000001] [11/20/2020 09:12:28.84] [INFO] HandlerSettings = ProtectedSettingsCertThumbprint: [REDACTED], ProtectedSettings: {[REDACTED]}, PublicSettings: {FileUris: [https://[REDACTED].blob.core.windows.net/upload/download-package.ps1], CommandToExecute: powershell.exe ./download-package.ps1}
[7108+00000001] [11/20/2020 09:12:29.26] [INFO] Downloading files specified in configuration...
[7108+00000001] [11/20/2020 09:12:30.90] [INFO] Attempting to get MSI from IMDS
[7108+00000001] [11/20/2020 09:12:31.04] [WARN] WebClient: non retryable error occurred System.Net.WebException: The remote server returned an error: (400) Bad Request.
at System.Net.WebClient.DownloadDataInternal(Uri address, WebRequest& request)
at System.Net.WebClient.DownloadString(Uri address)
at Microsoft.WindowsAzure.GuestAgent.Plugins.MsiUtils.WebClient.<>c__DisplayClass3_0.<DownloadStringWithRetries>b__0()
at Microsoft.WindowsAzure.GuestAgent.Plugins.MsiUtils.WebClientWithRetryAbstract.ActionWithRetries(Action action)
[7108+00000001] [11/20/2020 09:12:31.14] [ERROR] Unknown exception occurred while attempting to get MSI token System.Net.WebException: The remote server returned an error: (400) Bad Request.
at System.Net.WebClient.DownloadDataInternal(Uri address, WebRequest& request)
at System.Net.WebClient.DownloadString(Uri address)
at Microsoft.WindowsAzure.GuestAgent.Plugins.MsiUtils.WebClient.<>c__DisplayClass3_0.<DownloadStringWithRetries>b__0()
at Microsoft.WindowsAzure.GuestAgent.Plugins.MsiUtils.WebClientWithRetryAbstract.ActionWithRetries(Action action)
at Microsoft.WindowsAzure.GuestAgent.Plugins.MsiUtils.WebClient.DownloadStringWithRetries(Uri address)
at Microsoft.WindowsAzure.GuestAgent.Plugins.MsiUtils.MsiProvider.GetMsiHelper(NameValueCollection queries)
[7108+00000001] [11/20/2020 09:12:31.14] [INFO] Msi was not obtained
I can retrieve the identity token from the metadata endpoint via Invoke-WebRequest -Method Get -Uri 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/' so that appears to be set up correctly.
Any advice on what the problem could be or how to further diagnose this issue would be greatly appreciated.
Here are the few fixes you can try
The object ID of the managed identity might be incorrect.
Please also move commandToExecute and FileUris into protected settings with managed identities.
If want to use system assigned managed identity, you don't need to pass a clientId or objectID, more info here https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/custom-script-windows#property-managedidentity
edit: please explicitly pass an empty json object as settings when you add commandToExecute and fileUris to protected settings. Extensions would fail otherwise due to duplicated settings.

Anchore Engine - Jenkins CI plugin

We are trying to scan our docker images using Anchore Engine Jenkins plugin.
Currently we create our application docker images, push it in our own private local registry and then deploy it in our test environments.
Now, we want to setup docker image scanning in our CI/CD process to check for any vulnerabilities.
We have installed Anchore Engine using the recommended Docker-Compose yaml method given in the Documentation link:
https://anchore.freshdesk.com/support/solutions/articles/36000020729-install-on-docker-swarm
Post installation, we installed the
Anchore Container Image Scanner Plugin in Jenkins.
We configured the plugin as mentioned in the document link:
https://wiki.jenkins.io/display/JENKINS/Anchore+Container+Image+Scanner+Plugin
However, the scanning fails. Error Message as follows:
2018-10-11T07:01:44.647 INFO AnchoreWorker Analysis request accepted, received image digest sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8
2018-10-11T07:01:44.647 INFO AnchoreWorker Waiting for analysis of 10.180.25.2:5000/hello-world:latest, polling status periodically
2018-10-11T07:01:44.647 DEBUG AnchoreWorker anchore-engine get policy evaluation URL: http://10.180.25.2:8228/v1/images/sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8/check?tag=10.180.25.2:5000/hello-world:latest&detail=true
2018-10-11T07:01:44.648 DEBUG AnchoreWorker Attempting anchore-engine get policy evaluation (1/300)
2018-10-11T07:01:44.675 DEBUG AnchoreWorker anchore-engine get policy evaluation failed. URL: http://10.180.25.2:8228/v1/images/sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8/check?tag=10.180.25.2:5000/hello-world:latest&detail=true, status: HTTP/1.1 404 NOT FOUND, error: {
"detail": {},
"httpcode": 404,
"message": "image is not analyzed - analysis_status: not_analyzed"
}
NOTE:
In Image TAG 10.180.25.2:5000/hello-world:latest, 10.180.25.2:5000 is our local private registry and hello-world:latest is latest hello-world image available in docker hub which we pulled and pushed in our registry to try out image scanning using Anchore-Engine.
Unfortunately we are not able to find much resource online to try and resolve the above mentioned issue.
Anyone who might have worked on Anchore-Engine, please may I request to have a look and help us resolve this issue.
Also, any suggestions or alternatives to anchore-engine or detailed steps in case we might have missed anything would be really appreciated.
End of the output is as follows:
2018-10-15T00:48:43.880 WARN AnchoreWorker anchore-engine get policy evaluation failed. HTTP method: GET, URL: http://10.180.25.2:8228/v1/images/sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8/check?tag=10.180.25.2:5000/hello-world:latest&detail=true, status: 404, error: {
"detail": {},
"httpcode": 404,
"message": "image is not analyzed - analysis_status: not_analyzed"
}
2018-10-15T00:48:43.880 WARN AnchoreWorker Exhausted all attempts polling anchore-engine. Analysis is incomplete for sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8
2018-10-15T00:48:43.880 ERROR AnchorePlugin Failing Anchore Container Image Scanner Plugin step due to errors in plugin execution
hudson.AbortException: Timed out waiting for anchore-engine analysis to complete (increasing engineRetries might help). Check above logs for errors from anchore-engine
at com.anchore.jenkins.plugins.anchore.BuildWorker.runGatesEngine(BuildWorker.java:480)
at com.anchore.jenkins.plugins.anchore.BuildWorker.runGates(BuildWorker.java:343)
at com.anchore.jenkins.plugins.anchore.AnchoreBuilder.perform(AnchoreBuilder.java:338)
at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:81)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
at hudson.model.Build$BuildExecution.build(Build.java:206)
at hudson.model.Build$BuildExecution.doRun(Build.java:163)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
at hudson.model.Run.execute(Run.java:1724)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:421)
I also checked status and found below:
docker run anchore/engine-cli:latest anchore-cli --u admin --p admin123 --url http://172.18.0.1:8228/v1 system status
Service analyzer (dockerhostid-anchore-engine, http://anchore-engine:8084): up
Service catalog (dockerhostid-anchore-engine, http://anchore-engine:8082): up
Service policy_engine (dockerhostid-anchore-engine, http://anchore-engine:8087): down (unavailable)
Service simplequeue (dockerhostid-anchore-engine, http://anchore-engine:8083): up
Service apiext (dockerhostid-anchore-engine, http://anchore-engine:8228): up
Service kubernetes_webhook (dockerhostid-anchore-engine, http://anchore-engine:8338): up
Engine DB Version: 0.0.7
Engine Code Version: 0.2.4
It seems service policy engine is down
Service policy_engine (dockerhostid-anchore-engine, http://anchore-engine:8087): down (unavailable)
I also checked the docker logs . I found below error:
[service:policy_engine] 2018-10-15 09:37:46+0000 [-] [bootstrap] [DEBUG] service (policy_engine) starting in: 4
[service:policy_engine] 2018-10-15 09:37:46+0000 [-] [bootstrap] [INFO] Registration complete.
[service:policy_engine] 2018-10-15 09:37:46+0000 [-] [bootstrap] [INFO] Checking feeds client credentials
[service:policy_engine] 2018-10-15 09:37:46+0000 [-] [bootstrap] [DEBUG] Initializing a feeds client
[service:policy_engine] 2018-10-15 09:37:47+0000 [-] [bootstrap] [DEBUG] init values: [None, None, None, (), None, None]
[service:policy_engine] 2018-10-15 09:37:47+0000 [-] [bootstrap] [DEBUG] using values: ['https://ancho.re/v1/service/feeds', 'https://ancho.re/oauth/token', 'https://ancho.re/v1/account/users', 'anon#ancho.re', 3, 60]
[service:policy_engine] 2018-10-15 09:37:47+0000 [-] [urllib3.connectionpool] [DEBUG] Starting new HTTPS connection (1): ancho.re
[service:policy_engine] 2018-10-15 09:37:50+0000 [-] [bootstrap] [ERROR] Preflight checks failed with error: HTTPSConnectionPool(host='ancho.re', port=443): Max retries exceeded with url: /v1/account/users/anon#ancho.re (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ffa905f0b90>: Failed to establish a new connection: [Errno 113] No route to host',)). Aborting service startup
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/anchore_manager/cli/service.py", line 158, in startup_service
raise Exception("process exited: " + str(rc))
Exception: process exited: 1
[anchore-policy-engine] [anchore_manager.cli.service/startup_service()] [INFO] service process exited at (Mon Oct 15 09:37:50 2018): process exited: 1
[anchore-policy-engine] [anchore_manager.cli.service/startup_service()] [INFO] exiting service thread
Thanks and Regards,
Rohan Shetty
When images are added to anchore-engine, they are queued for analysis which moves them through a simple state machine that starts with ‘not_analyzed’, goes to ‘analyzing’ and finally ends in either ‘analyzed’ or ‘analysis_failed’. Only when an image has reached ‘analyzed’ will a policy evaluation be possible.
The anchore Jenkins plugin will add an image, then poll the engine for image status/evaluation for the configured number of tries (default 300). Once the image goes to ‘analyzed’ (where policy evaluation is possible), the plugin will then receive a policy evaluation result from the engine.
The plugin will fail the build (by default) if the max retries has been performed and the image has not reached ‘analyzed’, if the image does reach ‘analyzed’ but the policy evaluation is producing a ‘fail’ result (meaning the image didn’t pass your configured policy checks). Note that all build failure behavior can be controlled in the plugin (I.e. there are options to allow the plugin to succeed even if the analysis or image eval fails).
You’ll need to look at the end of the output from your build run (instead of just the beginning from your post), and combined with the information above, it should be clear which scenario is causing the plugin to fail the build.
We have resolved the issue.
Root Cause:
We were not able to establish a successful https connection to URL : https://ancho.re from within the anchore-engine docker container.
As a result the service:policy_engine was not able to start.
https://ancho.re is required to download policy feeds and sync-up periodically. Without these policy anchore-engine won't be able to analyse the docker images.
Solution:
1) We passed a HTTPS_PROXY URL as an environment variable in the docker-compose.yaml of anchore-engine.
We used this proxy URL to bypass restrictions in our environment and establish a connection with https://ancho.re url.
2) Restarted the docker containers.
Finally we got all services up and running including Anchore policy-engine.
FYI:
It takes a while to download all the required Feeds depending on your internet speed.
Lastly, Thanks to the Anchore community for quick responses and support over slack.
Hope this helps.
Warm Regards,
Rohan Shetty

Bosh init gives error, on openstack, while creating bosh VM through script

I have created an OpenStack environemt and I want to deploy BOSH, after which I will be deploying CloudFoundry on that VM for our office's Test environment. I am following these links as guides:
guide for Configuring the OpenStack Icehouse on Ubuntu LTS 14.04
guides for BOSH
After I have confiugured the script according to my environment, I ran the script and got the following errors:
Error:
Started deploying
Creating VM for instance 'bosh/0' from stemcell '20e80643-76a0-4b28-8993-ceafd1ecfdaf'... Failed (00:00:04)
Failed deploying (00:00:04)
Stopping registry... Finished (00:00:00)
Command 'deploy' failed:
Deploying:
Creating instance 'bosh/0':
Creating VM:
Creating vm with stemcell cid '20e80643-76a0-4b28-8993-ceafd1ecfdaf':
CPI 'create_vm' method responded with error: CmdError{"type":"Unknown","message":"Expected([200, 202]) \u003c=\u003e Actual(404 Not Found)\nexcon.error.response\n :body =\u003e \"{\\\"itemNotFound\\\": {\\\"message\\\": \\\"The resource could not be found.\\\", \\\"code\\\": 404}}\"\n :headers =\u003e {\n \"Content-Length\" =\u003e \"78\"\n \"Content-Type\" =\u003e \"applic ation/json; charset=UTF-8\"\n \"Date\" =\u003e \"Mon, 01 Jun 2015 07:59:11 GMT\"\n \"X-Compute-Request-Id\" =\u003e \"r eq-88aa57cd-b29f-49c0-ba77-a75292451367\"\n }\n :local_address =\u003e \"10.110.82.11\"\n :local_port =\u003e 52722\n :reason_phrase = \u003e \"Not Found\"\n :remote_ip =\u003e \"10.110.82.11\"\n :status =\u003e 404\n","ok_to_retry":false}
(Note: The script mentioned in above BOSH link referred to Stemcell 2950 but I have replaced it with the latest one i.e 2977)
Also, I am sort of new to the Linux, Openstack and cloud foundry so I apologize if it takes some time for me to understand and provide you with more diagnostic details.
Any ideas on what I am doing wrong. Thank you.
This seems like the version of OpenStack you are using does not support neutron networking used by the BOSH OpenStack CPI v28+ , so you can try enabling Nova-Networking in the BOSH cloud config properties use the older nova networking api.
properties:
openstack: &openstack
use_nova_networking: true
See http://bosh.io/docs/openstack-nova-networking.html for details.
Similar error found/addressed in https://github.com/cloudfoundry-incubator/bosh-openstack-cpi-release/issues/59

Resources