Unable to push metrics in prometheus

Unable to push metrics in prometheus - node.js

I am trying to push metrics in Prometheus using Pushgateway but not able to complete the task.
This is the code:
var client = require('prom-client');
var gateway = new client.Pushgateway('http://localhost:9091');
gateway.pushAdd({ jobName: 'test', group : "production" }, function(err, resp, body){
});
Prometheus config:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'codelab-monitor'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
scrape_configs:
- job_name: 'example-random'
scrape_interval: 5s
static_configs:
- targets: ['localhost:8080', 'localhost:8081']
labels:
group: 'production'
- targets: ['localhost:8082']
labels:
group: 'canary'
scrape_configs:
- job_name: 'test '
static_configs:
- targets: ['localhost:9091']

You've a few problems with your prometheus config - check out the Prometheus github repo example and the docs for future reference.
One issue is that you have multiple scrape_configs.
You can only have one scrape_configs in your configuration for Prometheus.
Another issue is that each job can only have one static_configs.
The rest is mainly due to incorrect formatting.
The edited config below should work for you now:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'codelab-monitor'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'production'
static_configs:
- targets: ['localhost:8080', 'localhost:8081']
labels:
group: 'production'
- job_name: 'canary'
static_configs:
- targets: ['localhost:8082']
labels:
group: 'canary'
- job_name: 'test'
static_configs:
- targets: ['localhost:9091']
It's also important to note that the metrics from the Pushgateway are not pushed to Prometheus. Prometheus is pull based and will pull the metrics from the Pushgateway itself. The metrics the Pushgateway collects are pushed to it by ephemeral and batch jobs.

Related

AKS timeout container kubectl Rollout status check failed

I have a sporadic issue that I am struggling to understand - azure pipeline on promote fails due to kubectl rollout status Deployment/name --timeout 120s --namespace xyz
I have tried to increase the progressDeadlineSeconds, but I know it may not take, I have tried to update the replicaSets to 2 so it can take but it still does not apply. I am not fully understanding this error and there is a roll out issue.
Yaml file
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: #{KubeComponentName}#
namespace: #{Namespace}#
spec:
selector:
matchLabels:
app: #{KubeComponentName}#
progressDeadlineSeconds: 600
replicas: #{ReplicaCount}#
template:
metadata:
labels:
app: #{KubeComponentName}#
annotations:
spec:
securityContext:
runAsUser: 999
serviceAccountName: #{KubeComponentName}#
containers:
- name: #{KubeComponentName}#
image: #{ImageRegistry}#/datahub/#{KubeComponentName}#:latest
#command: ["/bin/bash", "-c", "--"]
#args: [ "while true; do sleep 30; done;" ]
volumeMounts:
ports:
env:
- name: NodeName
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: PodName
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: PodNamespace
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: PodIp
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: PodServiceAccount
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: ComponentInfo__ComponentName
value: #{KubeComponentName}#
- name: ComponentInfo__ComponentHost
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: ComponentInfo__ServiceUser
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: MongoDbUserName
valueFrom:
secretKeyRef:
name: mongodb-xyz-username
key: secret-value
- name: MongoDbPassword
valueFrom:
secretKeyRef:
name: mongodb-xyz-password
key: secret-value
- name: MongoDbKubernetesHosts
value: #{MongoDbKubernetesHosts}#
- name: MongoDbScriptBasePath
value: #{MongoDbScriptBasePath}#
volumes:
My errors keep happening such that I get a timeout error waiting for rollout or exceeded progress deadline
----
/opt/vsts-agent/_work/_tool/kubectl/1.18.6/x64/kubectl rollout status Deployment/datahub-recon --timeout 120s --namespace xyz
Waiting for deployment "datahub-recon" rollout to finish: 0 of 1 updated replicas are available...
error: timed out waiting for the condition
##[error]Error: error: timed out waiting for the condition
/opt/vsts-agent/_work/_tool/kubectl/1.18.6/x64/kubectl describe Deployment datahub-recon --namespace xyz
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing True ReplicaSetUpdated
OldReplicaSets: <none>
NewReplicaSet: datahub-recon-567c7d6958 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 2m1s deployment-controller Scaled up replica set datahub-recon-567c7d6958 to 1
For more information, go to https://dev.azure.com/pbc/Premera/_environments/23
##[error]Rollout status check failed.
----
/opt/vsts-agent/_work/_tool/kubectl/1.18.6/x64/kubectl rollout status Deployment/datahub-recon --timeout 120s --namespace xyz
error: deployment "datahub-recon" exceeded its progress deadline
##[error]Error: error: deployment "datahub-recon" exceeded its progress deadline
/opt/vsts-agent/_work/_tool/kubectl/1.18.6/x64/kubectl describe Deployment datahub-recon --namespace xyz
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing False ProgressDeadlineExceeded
OldReplicaSets: datahub-recon-6bc6f85fc6 (2/2 replicas created)
NewReplicaSet: datahub-recon-bd7d9754 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 13m deployment-controller Scaled up replica set datahub-recon-bd7d9754 to 1
For more information, go to https://dev.azure.com/pbc/Premera/_environments/23
##[error]Rollout status check failed.

Trying to send variables with terraform plan using concourse ci

I am trying to create a pipeline in concourse, which is going to trigger on github updates on a remote branch, and use that branch to plan, apply and destroy a terraform deployment.
- name: terraform-repo
type: git
icon: github
source:
uri: https://github.com/....
#docker image
- name: terraform-0-13-7
type: registry-image
source:
repository: hashicorp/terraform
tag: 0.13.7
jobs:
- name: terraform-deplyoment
plan:
- get: terraform-0-13-7
- get: terraform-repo
trigger: true
- task: terraform-init
image: terraform-0-13-7
config:
inputs:
- name: terraform-repo
outputs:
- name: terraform-repo
platform: linux
run:
path: terraform
dir: terraform-repo
args:
- init
- task: terraform-plan
image: terraform-0-13-7
config:
inputs:
- name: terraform-repo
outputs:
- name: terraform-repo
platform: linux
run:
path: terraform
dir: terraform-repo
args:
- plan
params:
variable1: "test"
variable2: "test2"
This is erroring out on the concourse GUI when triggering the pipeline mentioning that the vars are not available. Am I doing something wrong with the syntax?

The params are exposed to the task as environment variables so you should use them as input variables
- task: terraform-plan
image: terraform-0-13-7
config:
inputs:
- name: terraform-repo
outputs:
- name: terraform-repo
platform: linux
run:
path: terraform
dir: terraform-repo
args:
- plan
params:
TF_VAR_variable1: "test"
TF_VAR_variable2: "test2"

Grafana Loki does not trigger or push alert on alertmanager

I have configured PLG (Promtail, Grafana & Loki) on an AWS EC2 instance for log management. The Loki uses BoltDB shipper & AWS store.
Grafana - 7.4.5,
Loki - 2.2,
Prommtail - 2.2,
AlertManager - 0.21
The issue I am facing is that the Loki does not trigger or push alerts on alertmanager. I cannot see any alert on the AlertManager dashboard though I can run a LogQL query on Grafana which shows the condition was met for triggering an alert.
The following is a screenshot of my query on Grafana.
LogQL Query Screenshot
The following are my configs.
Docker Compose
$ cat docker-compose.yml
version: "3.4"
services:
alertmanager:
image: prom/alertmanager:v0.21.0
container_name: alertmanager
command:
- '--config.file=/etc/alertmanager/config.yml'
- '--storage.path=/alertmanager'
volumes:
- ./config/alertmanager/alertmanager.yml:/etc/alertmanager/config.yml
ports:
- 9093:9093
restart: unless-stopped
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "10m"
tag: "{{.Name}}"
networks:
- loki-br
loki:
image: grafana/loki:2.2.0-amd64
container_name: loki
volumes:
- ./config/loki/loki.yml:/etc/config/loki.yml:ro
- ./config/loki/rules/rules.yml:/etc/loki/rules/rules.yml
entrypoint:
- /usr/bin/loki
- -config.file=/etc/config/loki.yml
ports:
- "3100:3100"
depends_on:
- alertmanager
restart: unless-stopped
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "10m"
tag: "{{.Name}}"
networks:
- loki-br
grafana:
image: grafana/grafana:7.4.5
container_name: grafana
volumes:
- ./config/grafana/datasource.yml:/etc/grafana/provisioning/datasources/datasource.yml
- ./config/grafana/defaults.ini:/usr/share/grafana/conf/defaults.ini
- grafana:/var/lib/grafana
ports:
- "3000:3000"
depends_on:
- loki
restart: unless-stopped
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "10m"
tag: "{{.Name}}"
networks:
- loki-br
promtail:
image: grafana/promtail:2.2.0-amd64
container_name: promtail
volumes:
- /var/lib/docker/containers:/var/lib/docker/containers
- /var/log:/var/log
- ./config/promtail/promtail.yml:/etc/promtail/promtail.yml:ro
command: -config.file=/etc/promtail/promtail.yml
restart: unless-stopped
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "10m"
tag: "{{.Name}}"
networks:
- loki-br
nginx:
image: nginx:latest
container_name: nginx
volumes:
- ./config/nginx/nginx.conf:/etc/nginx/nginx.conf
- ./config/nginx/default.conf:/etc/nginx/conf.d/default.conf
- ./config/nginx/loki.conf:/etc/nginx/conf.d/loki.conf
- ./config/nginx/ssl:/etc/ssl
ports:
- "80:80"
- "443:443"
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "10m"
loki-url: http://localhost:3100/loki/api/v1/push
loki-external-labels: job=containerlogs
tag: "{{.Name}}"
depends_on:
- grafana
networks:
- loki-br
networks:
loki-br:
driver: bridge
ipam:
config:
- subnet: 192.168.0.0/24
volumes:
grafana: {}
Loki Config
$ cat config/loki/loki.yml
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
schema_config:
configs:
- from: 2020-11-20
store: boltdb-shipper
#object_store: filesystem
object_store: s3 # Config for AWS S3 storage.
schema: v11
index:
prefix: index_loki_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /tmp/loki/boltdb-shipper-active
cache_location: /tmp/loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: s3 # Config for AWS S3 storage.
#filesystem:
# directory: /tmp/loki/chunks
# Config for AWS S3 storage.
aws:
s3: s3://eu-west-1/loki #Uses AWS IAM roles on AWS EC2 instance.
region: eu-west-1
compactor:
working_directory: /tmp/loki/boltdb-shipper-compactor
shared_store: aws
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: true
retention_period: 720h
ruler:
storage:
type: local
local:
directory: /etc/loki/rules
rule_path: /tmp/loki/rules-temp
evaluation_interval: 1m
alertmanager_url: http://alertmanager:9093
ring:
kvstore:
store: inmemory
enable_api: true
enable_alertmanager_v2: true
Loki Rules
$ cat config/loki/rules/rules.yml
groups:
- name: rate-alerting
rules:
- alert: HighLogRate
expr: |
sum by (job, compose_service)
(rate({job="containerlogs"}[1m]))
> 60
for: 1m
labels:
severity: warning
team: devops
category: logs
annotations:
title: "High LogRate Alert"
description: "something is logging a lot"
impact: "impact"
action: "action"
dashboard: "https://grafana.com/service-dashboard"
runbook: "https://wiki.com"
logurl: "https://grafana.com/log-explorer"
AlertManager config
$ cat config/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'severity', 'instance']
group_wait: 45s
group_interval: 10m
repeat_interval: 12h
receiver: 'email-notifications'
receivers:
- name: email-notifications
email_configs:
- to: me#example.com
from: 'alerts#example.com'
smarthost: smtp.gmail.com:587
auth_username: alerts#example.com
auth_identity: alerts#example.com
auth_password: PassW0rD
send_resolved: true
Let me know if I am missing something. I followed Ruan Bekker's blog to set things up

If Loki is running in single tenant mode, the required ID is fake (yes we know this might seem alarming but it’s totally fine, no it can’t be changed).
mkdir /etc/loki/rules/fake
mkdir /tmp/loki/rules-temp/fake
copy your rule files into /etc/loki/rules/fake
So you have to add a fake sub-directory to the rule directory in single tenant mode and everthing worked perfectly.
https://grafana.com/docs/loki/latest/alerting/#interacting-with-the-ruler

Why does my Selenium Grid in Azure Container Instances take the same time to execute tests regardless of number of nodes?

Because ACI doesn't support scaling, we deploy multiple container groups containing an Azure DevOps agent, a selenium grid hub and a selenium grid node. To try and speed things up I've tried to deploy the container groups with an additional node, identical to the first only being started on port 6666 instead of port 5555. I can see the two nodes register with the grid without issue but when I execute the same batch of tests with the additional node and without they take the exact same amount of time. How do I go about finding out what's going on here?
My ACI yaml:
apiVersion: 2018-10-01
location: australiaeast
properties:
containers:
- name: devops-agent
properties:
image: __AZUREDEVOPSAGENTIMAGE__
resources:
requests:
cpu: 0.5
memoryInGb: 1
environmentVariables:
- name: AZP_URL
value: __AZUREDEVOPSPROJECTURL__
- name: AZP_POOL
value: __AGENTPOOLNAME__
- name: AZP_TOKEN
secureValue: __AZUREDEVOPSAGENTTOKEN__
- name: SCREEN_WIDTH
value: "1920"
- name: SCREEN_HEIGHT
value: "1080"
volumeMounts:
- name: downloads
mountPath: /tmp/
- name: selenium-hub
properties:
image: selenium/hub:3.141.59-xenon
resources:
requests:
cpu: 1
memoryInGb: 1
ports:
- port: 4444
- name: chrome-node
properties:
image: selenium/node-chrome:3.141.59-xenon
resources:
requests:
cpu: 1
memoryInGb: 2
environmentVariables:
- name: HUB_HOST
value: localhost
- name: HUB_PORT
value: 4444
- name: SCREEN_WIDTH
value: "1920"
- name: SCREEN_HEIGHT
value: "1080"
volumeMounts:
- name: devshm
mountPath: /dev/shm
- name: downloads
mountPath: /home/seluser/downloads
- name: chrome-node-2
properties:
image: selenium/node-chrome:3.141.59-xenon
resources:
requests:
cpu: 1
memoryInGb: 2
environmentVariables:
- name: HUB_HOST
value: localhost
- name: HUB_PORT
value: 4444
- name: SCREEN_WIDTH
value: "1920"
- name: SCREEN_HEIGHT
value: "1080"
- name: SE_OPTS
value: "-port 6666"
volumeMounts:
- name: devshm
mountPath: /dev/shm
- name: downloads
mountPath: /home/seluser/downloads
osType: Linux
diagnostics:
logAnalytics:
workspaceId: __LOGANALYTICSWORKSPACEID__
workspaceKey: __LOGANALYTICSPRIMARYKEY__
volumes:
- name: devshm
emptyDir: {}
- name: downloads
emptyDir: {}
ipAddress:
type: Public
ports:
- protocol: tcp
port: '4444'
#==================== remove this section if not pulling images from private image registries ===============
imageRegistryCredentials:
- server: __IMAGEREGISTRYLOGINSERVER__
username: __IMAGEREGISTRYUSERNAME__
password: __IMAGEREGISTRYPASSWORD__
#========================================================================================================================
tags: null
type: Microsoft.ContainerInstance/containerGroups
When I run my tests locally against a docker selenium grid either from Visual Studio or via dotnet vstest, my tests run in parallel across all available nodes and complete in half the time.

Multi-broker Kafka on Kubernetes how to set KAFKA_ADVERTISED_HOST_NAME

My current Kafka deployment file with 3 Kafka brokers looks like this:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: kafka
spec:
selector:
matchLabels:
app: kafka
serviceName: kafka-headless
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: Parallel
template:
metadata:
labels:
app: kafka
spec:
containers:
- name: kafka-instance
image: wurstmeister/kafka
ports:
- containerPort: 9092
env:
- name: KAFKA_ADVERTISED_PORT
value: "9092"
- name: KAFKA_ADVERTISED_HOST_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: KAFKA_ZOOKEEPER_CONNECT
value: "zookeeper-0.zookeeper-headless.default.svc.cluster.local:2181,\
zookeeper-1.zookeeper-headless.default.svc.cluster.local:2181,\
zookeeper-2.zookeeper-headless.default.svc.cluster.local:2181"
- name: BROKER_ID_COMMAND
value: "hostname | awk -F '-' '{print $2}'"
- name: KAFKA_CREATE_TOPICS
value: hello:2:1
volumeMounts:
- name: data
mountPath: /var/lib/kafka/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 50Gi
This creates 3 Kafka brokers as a Stateful Set and connects to the Zookeeper cluster using the Kubedns service with FQDN (Fully Qualified Domain Names) such as:
zookeeper-0.zookeeper-headless.default.svc.cluster.local:2181
Broker IDs are generated based on the pod name:
- name: BROKER_ID_COMMAND
value: "hostname | awk -F '-' '{print $2}'"
Result:
kafka-0 = 0
kafka-1 = 1
kafka-2 = 2
However, In order to use the Kubedns names for the Kafka brokers:
kafka-0.kafka-headless.default.svc.cluster.local:9092
kafka-1.kafka-headless.default.svc.cluster.local:9092
kafka-2.kafka-headless.default.svc.cluster.local:9092
I need to be able to set the KAFKA_ADVERTISED_HOST_NAME variable to the above FQDN values based on the name of the pod.
Currently I have the variable set to the name of the pod:
- name: KAFKA_ADVERTISED_HOST_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
Result:
KAFKA_ADVERTISED_HOST_NAME=kafka-0
KAFKA_ADVERTISED_HOST_NAME=kafka-1
KAFKA_ADVERTISED_HOST_NAME=kafka-2
But somehow I would need to append the rest of the DNS name.
Is there a way I could set the DNS value directly?
Something like that:
- name: KAFKA_ADVERTISED_HOST_NAME
valueFrom:
fieldRef:
fieldPath: kubedns.name

I managed to solve the problem with a command field inside the pod definition:
command:
- sh
- -c
- "export KAFKA_ADVERTISED_HOST_NAME=$(hostname).kafka-headless.default.svc.cluster.local &&
start-kafka.sh"
This runs a shell command which exports the advertised hostname environment variable based on the hostname value.

- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: KAFKA_ZOOKEEPER_CONNECT
value: zook-zookeeper.zook.svc.cluster.local:2181
- name: KAFKA_PORT_NUMBER
value: "9092"
- name: KAFKA_LISTENERS
value: SASL_SSL://:$(KAFKA_PORT_NUMBER)
- name: KAFKA_ADVERTISED_LISTENERS
value: SASL_SSL://$(MY_POD_NAME).kafka-kafka-headless.kafka.svc.cluster.local:$(KAFKA_PORT_NUMBER)
The above config would create your FQDN.
You should be able to see those names in your Kafka logs when Kafka server starts.
NOTE: Kubernetes allows you to reference environment variables using the syntax $(VARIABLE)

None of the above worked for me; my setup it wurstmeister/kafka:2.12-2.5.0 and wurstmeister/zookeeper:3.4.6 in a single pod on Kubernetes 1.16 (don't ask); ClusterIp service on top which forwards 9092 to the Kafka container.
This set of environment variables works:
- name: KAFKA_LISTENERS
value: "INSIDE://:9094,OUTSIDE://:9092"
- name: KAFKA_ADVERTISED_LISTENERS
value: "INSIDE://:9094,OUTSIDE://my-service.my-namespace.svc.cluster.local:9092"
- name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
value: "INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT" # not production-ready!
- name: KAFKA_INTER_BROKER_LISTENER_NAME
value: INSIDE
- name: KAFKA_ZOOKEEPER_CONNECT
value: "localhost:2181" # since it's in the same pod
Sources: wurstmeister/kafka doc, Kafka doc
The inherent problem seems to be that Kafka itself needs to be an IP-ish thing to bind to and to talk to itself via, while clients need a DNS-ish name to connect to from the outside. The latter one can't contain the pod name for some reason. (Might be a separate configuration issue on my end.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Unable to push metrics in prometheus - node.js

Related

AKS timeout container kubectl Rollout status check failed

Trying to send variables with terraform plan using concourse ci

Grafana Loki does not trigger or push alert on alertmanager

Why does my Selenium Grid in Azure Container Instances take the same time to execute tests regardless of number of nodes?

Multi-broker Kafka on Kubernetes how to set KAFKA_ADVERTISED_HOST_NAME

Categories

Resources