Prometheus metric not found on server - node.js

I am using the following YAML for my prometheus-adapter installation.
prometheus:
url: http://prometheus-server.prometheus.svc.cluster.local
port: 80
rules:
custom:
- seriesQuery: 'http_duration{kubernetes_namespace!="",kubernetes_pod_name!=""}'
resources:
overrides:
kubernetes_namespace: { resource: "namespace" }
kubernetes_pod_name: { resource: "pod" }
name:
matches: "^(.*)_sum"
as: "${1}_avg"
metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)"
This YAML is installed with the following command.
helm upgrade --install prometheus-adapter prometheus-community/prometheus-adapter --values=./prometheus-adapter-values.yaml --namespace prometheus
After generating some load with hey, I tried looking for the _avg metric with the following command.
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq -r '.resources[] | select (.name | contains ("pods/hello_http"))'
This is the output.
{
"name": "pods/hello_http_duration_sum",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
}
{
"name": "pods/hello_http_duration_count",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
}
{
"name": "pods/hello_http_duration_bucket",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
}
Why is the _avg metric not seen? Note that, at the moment, the accuracy of the metricsQuery is not important. I just want to know, why is the _avg metric not seen?
Where do I look for logs? The prometheus-adapter or the prometheus-server logs didn't show anything obvious.
Do I need additionalScrapeConfigs as described here?
This post similar to mine; however, my configuration matches that of OP. What am I missing?

Related

Databricks API - Instance Pool - How to update an existing job to use instance pool instead?

I am trying to update a batch of jobs to use some instance pools with the databricks api and when I try to use the update endpoint, the job just does not update. It says it executed without errors, but when I check the job, it was not updated.
What am I doing wrong?
What i used to update the job:
I used the get endpoint using the job_id to get my job settings and all
I updated the resulting data with the values that i needed and executed the call to update the job.
'custom_tags': {'ResourceClass': 'Serverless'},
'driver_instance_pool_id': 'my-pool-id',
'driver_node_type_id': None,
'instance_pool_id': 'my-other-pool-id',
'node_type_id': None
I used this documentation, https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsUpdate
here is my payload
{
"created_time": 1672165913242,
"creator_user_name": "email#email.com",
"job_id": 123123123123,
"run_as_owner": true,
"run_as_user_name": "email#email.com",
"settings": {
"email_notifications": {
"no_alert_for_skipped_runs": false,
"on_failure": [
"email1#email.com",
"email2#email.com"
]
},
"format": "MULTI_TASK",
"job_clusters": [
{
"job_cluster_key": "the_cluster_key",
"new_cluster": {
"autoscale": {
"max_workers": 4,
"min_workers": 2
},
"aws_attributes": {
"availability": "SPOT_WITH_FALLBACK",
"ebs_volume_count": 0,
"first_on_demand": 1,
"instance_profile_arn": "arn:aws:iam::XXXXXXXXXX:instance-profile/instance-profile",
"spot_bid_price_percent": 100,
"zone_id": "us-east-1a"
},
"cluster_log_conf": {
"s3": {
"canned_acl": "bucket-owner-full-control",
"destination": "s3://some-bucket/log/log_123123123/",
"enable_encryption": true,
"region": "us-east-1"
}
},
"cluster_name": "",
"custom_tags": {
"ResourceClass": "Serverless"
},
"data_security_mode": "SINGLE_USER",
"driver_instance_pool_id": "my-driver-pool-id",
"enable_elastic_disk": true,
"instance_pool_id": "my-worker-pool-id",
"runtime_engine": "PHOTON",
"spark_conf": {...},
"spark_env_vars": {...},
"spark_version": "..."
}
}
],
"max_concurrent_runs": 1,
"name": "my_job",
"schedule": {...},
"tags": {...},
"tasks": [{...},{...},{...}],
"timeout_seconds": 79200,
"webhook_notifications": {}
}
}
I tried to use the update endpoint and reading the docs for information but I found nothing related to the issue.
I finally got it
I was using partial update and found that this does not work for the whole job payload
So I changed the endpoint to use full update (reset) and it worked

Gitlab secret detection, how to test it works

I have gitlab secret detection, and i wanted to check it works. I have spring project and the job set up. What kind of secret pattern would it pick up.
Does anyone know how i can check it actually picks something up?
I have tried adding the following to the code, its made up, but doesn't get flagged:
aws_secret=AKIAIMNOJVGFDXXXE4OA
If the secrets detector finds a secret, it doesn't fail the job (ie, it doesn't have a non-0 exit code). In the analyzer output, it will show how many leaks were found, but not what they were. The full details are written to a file called gl-secret-detection-report.json. You can either cat the file in the job so you can see the results in the Job output, or upload it as an artifact so it gets recognized as a sast report.
Here's the secrets detection job from one of my pipelines that both cat's the file and uploads it as a sast report artifact. Note: for my purposes, I wasn't able to directly use the template, so I run the analyzer manually:
Secrets Detector:
stage: sast
image:
name: "registry.gitlab.com/gitlab-org/security-products/analyzers/secrets"
needs: []
only:
- branches
except:
- main
before_script:
- apk add jq
script:
- /analyzer run
- cat gl-secret-detection-report.json | jq '.'
artifacts:
reports:
sast: gl-secret-detection-report.json
The gl-secret-detection-report.json file looks like this for a test repository I set up and added a GitLab Runner registration token to a file called TESTING:
{
"version": "14.0.4",
"vulnerabilities": [
{
"id": "138bf52be327e2fc3d1934e45c93a83436c267e45aa84f5b55f2db87085cb205",
"category": "secret_detection",
"name": "GitLab Runner Registration Token",
"message": "GitLab Runner Registration Token detected; please remove and revoke it if this is a leak.",
"description": "Historic GitLab Runner Registration Token secret has been found in commit 0a4623336ac54174647e151186c796cf7987702a.",
"cve": "TESTING:5432b14f2bdaa01f041f6eeadc53fe68c96ef12231b168d86c71b95aca838f3c:gitlab_runner_registration_token",
"severity": "Critical",
"confidence": "Unknown",
"raw_source_code_extract": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"scanner": {
"id": "gitleaks",
"name": "Gitleaks"
},
"location": {
"file": "TESTING",
"commit": {
"author": "author",
"date": "2022-09-12T17:30:33Z",
"message": "a commit message",
"sha": "0a4623336ac54174647e151186c796cf7987702a"
},
"start_line": 1
},
"identifiers": [
{
"type": "gitleaks_rule_id",
"name": "Gitleaks rule ID gitlab_runner_registration_token",
"value": "gitlab_runner_registration_token"
}
]
}
],
"scan": {
"analyzer": {
"id": "secrets",
"name": "secrets",
"url": "https://gitlab.com/gitlab-org/security-products/analyzers/secrets",
"vendor": {
"name": "GitLab"
},
"version": "4.3.2"
},
"scanner": {
"id": "gitleaks",
"name": "Gitleaks",
"url": "https://github.com/zricethezav/gitleaks",
"vendor": {
"name": "GitLab"
},
"version": "8.10.3"
},
"type": "secret_detection",
"start_time": "2022-09-12T17:30:54",
"end_time": "2022-09-12T17:30:55",
"status": "success"
}
}
This includes the type of secret found, what file it was in and what line(s), and information from the commit where the secret was added.
If you wanted to force the job to fail if any secrets were found, you can do that with jq (note: I install jq in the before_script of this job, it's not available in the image by default.):
Secrets Detector:
stage: sast
image:
name: "registry.gitlab.com/gitlab-org/security-products/analyzers/secrets"
needs: []
only:
- branches
except:
- main
before_script:
- apk add jq
script:
- /analyzer run
- cat gl-secret-detection-report.json | jq '.'
- if [[ $(cat gl-secret-detection-report.json | jq '.vulnerabilities | length > 0') ]]; then echo "secrets found" && exit 1; fi
artifacts:
reports:
sast: gl-secret-detection-report.json

Adding allowVolumeExpansion: true to default storage classes in AKS

The documentation says the following:
These default storage classes don't allow you to update the volume size once created. To enable this ability, add the allowVolumeExpansion: true line to one of the default storage classes, or create you own custom storage class. You can edit an existing storage class using the kubectl edit sc command. For more information on storage classes and creating your own, see Storage options for applications in AKS.
I've tried editing the the default YAML (which just looks like JSON and not YAML) in the Kubernetes dashboard:
{
"kind": "StorageClass",
"apiVersion": "storage.k8s.io/v1",
"metadata": {
"name": "default",
"selfLink": "/apis/storage.k8s.io/v1/storageclasses/default",
"uid": "<uid>",
"resourceVersion": "3891497",
"creationTimestamp": "2020-02-14T01:34:03Z",
"labels": {
"kubernetes.io/cluster-service": "true"
},
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"storage.k8s.io/v1beta1\",\"kind\":\"StorageClass\",\"metadata\":{\"annotations\":{\"storageclass.beta.kubernetes.io/is-default-class\":\"true\"},\"labels\":{\"kubernetes.io/cluster-service\":\"true\"},\"name\":\"default\"},\"parameters\":{\"cachingmode\":\"ReadOnly\",\"kind\":\"Managed\",\"storageaccounttype\":\"Standard_LRS\"},\"provisioner\":\"kubernetes.io/azure-disk\"}\n",
"storageclass.beta.kubernetes.io/is-default-class": "true"
}
},
"provisioner": "kubernetes.io/azure-disk",
"parameters": {
"cachingmode": "ReadOnly",
"kind": "Managed",
"storageaccounttype": "Standard_LRS"
},
"reclaimPolicy": "Delete",
"volumeBindingMode": "Immediate",
"allowVolumeExpansion": "true"
}
Which results in:
StorageClass in version "v1" cannot be handled as a StorageClass: v1.StorageClass.AllowVolumeExpansion: ReadBool: expect t or f, but found ", error found in #10 byte of ...|ansion": "true" }|..., bigger context ...|ingMode": "Immediate", "allowVolumeExpansion": "true" }|...
Also:
{
"kind": "StorageClass",
"apiVersion": "storage.k8s.io/v1",
"metadata": {
"name": "default",
"selfLink": "/apis/storage.k8s.io/v1/storageclasses/default",
"uid": "<uid>",
"resourceVersion": "3891497",
"creationTimestamp": "2020-02-14T01:34:03Z",
"labels": {
"kubernetes.io/cluster-service": "true"
},
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"storage.k8s.io/v1beta1\",\"kind\":\"StorageClass\",\"metadata\":{\"annotations\":{\"storageclass.beta.kubernetes.io/is-default-class\":\"true\"},\"labels\":{\"kubernetes.io/cluster-service\":\"true\"},\"name\":\"default\"},\"parameters\":{\"cachingmode\":\"ReadOnly\",\"kind\":\"Managed\",\"storageaccounttype\":\"Standard_LRS\"},\"provisioner\":\"kubernetes.io/azure-disk\"}\n",
"storageclass.beta.kubernetes.io/is-default-class": "true"
}
},
"provisioner": "kubernetes.io/azure-disk",
"parameters": {
"cachingmode": "ReadOnly",
"kind": "Managed",
"storageaccounttype": "Standard_LRS",
"allowVolumeExpansion": "true"
},
"reclaimPolicy": "Delete",
"volumeBindingMode": "Immediate"
}
Which results in:
StorageClass.storage.k8s.io "default" is invalid: parameters: Forbidden: updates to parameters are forbidden.
Also tried all of the following with kubectl edit sc:
$ kubectl edit sc default allowVolumeExpansion: true
Error from server (NotFound): storageclasses.storage.k8s.io "allowVolumeExpansion:" not found
Error from server (NotFound): storageclasses.storage.k8s.io "true" not found
$ kubectl edit sc default "allowVolumeExpansion: true"
Error from server (NotFound): storageclasses.storage.k8s.io "allowVolumeExpansion: true" not found
$ kubectl edit sc/default allowVolumeExpansion: true
error: there is no need to specify a resource type as a separate argument when passing arguments in resource/name form (e.g. 'kubectl get resource/<resource_name>' instead of 'kubectl get resource resource/<resource_name>'
$ kubectl edit sc/default "allowVolumeExpansion: true"
error: there is no need to specify a resource type as a separate argument when passing arguments in resource/name form (e.g. 'kubectl get resource/<resource_name>' instead of 'kubectl get resource resource/<resource_name>'
What is the correct way of accomplishing this? Would be helpful if an example was in the documentation.
I do not meet the issue you got. The allowVolumeExpansion is a property of the storage class, not the parameter and it requires the boolean value. You can see it in StorageClass.
I think you mistake setting its value. In my test, I add the property in the YAML file like this:
allowVolumeExpansion: true
Not
allowVolumeExpansion: "true"
So I think you need to change the line into this:
"allowVolumeExpansion": true

Sending .Net Core application settings to kubernetes pods as environment variables

I'm hosting some stuff as an AppService in Azure and use environment variables to differentiate settings for different slots (test, dev etc).
If the AppSettings.json file contains a structure like:
{
"ConnectionString": {
"MyDb": "SomeConnectionString"
}
}
I can set the environment variable "ConnectionString:MyDb" to "SomeConnectionString" and .Net Core will understand that the : means child level.
But in Kubernetes I cannot use : as part of the environment key. Is there another way to handle hierarchy or do I need to switch to flat settings?
I believe you are referring to the env in the container definition for a Pod. From the YAML/JSON perspective, I don't see a problem with specifying a : in a key for an environment variable. You can also put it within quotes and should be valid JSON/YAML:
# convert.yaml
apiVersion: v1
kind: Pod
metadata:
name: envar-demo
labels:
purpose: demonstrate-envars
spec:
containers:
- name: envar-demo-container
image: dotnetapp
env:
- name: ConnectionString:Mydb
value: ConnectionString
Same in JSON:
$ kubectl convert -f convert.yaml -o=json
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "envar-demo",
"creationTimestamp": null,
"labels": {
"purpose": "demonstrate-envars"
}
},
"spec": {
"containers": [
{
"name": "envar-demo-container",
"image": "dotnetapp",
"env": [
{
"name": "ConnectionString:Mydb",
"value": "ConnectionString"
}
],
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"imagePullPolicy": "Always"
}
],
"restartPolicy": "Always",
"terminationGracePeriodSeconds": 30,
"dnsPolicy": "ClusterFirst",
"securityContext": {},
"schedulerName": "default-scheduler"
},
"status": {}
}
However, looks like this was a known issue with Windows/.NET applications. An attempt to fix it has been tried and ditched due to the fact the this is not valid in Bash. But looks like they settled to use the __ instead of : workaround
Yes, example
In the Appsettings.json
"ConnectionStrings": {
"Azure": "Server=tcp:uw2qdisa
In the manifest.yml
env:
- name: ConnectionStrings__Azure
valueFrom:
configMapKeyRef:
name: config-disa
key: ConnectionStrings
Explanation on Kubernetes
Some .Net Core applications expect environment variables with a colon (:) in the name. Kubernetes currently does not allow this. Replace colon (:) with double underscore (__) as documented here.

creating my first image for rocket (serviio with java dependency)

I have CoreOS stable (1068.10.0) installed and I want to create a serviio streaming media server image for rocket.
this is my manifest file:
{
"acVersion": "1.0.0",
"acKind": "ImageManifest",
"name": "tux-in.com/serviio",
"app": {
"exec": [
"/opt/serviio/bin/serviio.sh"
],
"user":"serviio",
"group":"serviio"
},
"labels": [
{
"name": "version",
"value": "1.0.0"
},
{
"name": "arch",
"value": "amd64"
},
{
"name": "os",
"value": "linux"
}
],
"ports": [
{
"name": "serviio",
"protocol": "tcp",
"port": 8895
}
],
"mountPoints": [
{
"name": "serviio-config",
"path": "/config/serviio",
"kind": "host",
"readOnly": false
}
],
"environment": {
"JAVA_HOME": "/opt/jre1.8.0_102"
}
}
I couldn't find on google how to add java package depenency, so I just downloaded jre, opened it to /rootfs/opt and set a JAVA_HOME environment variable. is that the right way to go?
welp.. because I configured serviio to run on user and group called serviio, I created /etc/group with serviio:x:500:serviio and /etc/passwd with serviio:x:500:500:Serviio:/opt/serviio:/bin/bash. is this ok? should I added and configured users differently ?
then I crated a rocket image with actool build serviio serviio-1.0-linux-amd64.aci, signed it and ran it with rkt run serviio-1.0-linux-amd64.aci. then with rkt list i see that the container started and exited immediately.
UUID APP IMAGE NAME STATE CREATED STARTED NETWORKS
bea402d9 serviio tux-in.com/serviio:1.0.0 exited 11 minutes ago 11 minutes ago
rkt status bea402d9 returns:
state=exited
created=2016-09-03 12:38:03.792 +0000 UTC
started=2016-09-03 12:38:03.909 +0000 UTC
pid=15904
exited=true
app-serviio=203
no idea how to debug this issue further. how can I see the output of the sh command that was executed? any other error related information?
have I configured things properly? I'm pretty lost so any information regarding the issue would be greatly appreciated.
thanks!

Resources