K8S - using Prometheus to monitor another prometheus instance in secure way - security

I've installed Prometheus operator 0.34 (which works as expected) on cluster A (main prom)
Now I want to use the federation option,I mean collect metrics from other Prometheus which is located on other K8S cluster B
Secnario:
have in cluster A MAIN prometheus operator v0.34 config
I've in cluster B SLAVE prometheus 2.13.1 config
Both installed successfully via helm, I can access to localhost via port-forwarding and see the scraping results on each cluster.
I did the following steps
Use on the operator (main cluster A) additionalScrapeconfig
I've added the following to the values.yaml file and update it via helm.
additionalScrapeConfigs:
- job_name: 'federate'
honor_labels: true
metrics_path: /federate
params:
match[]:
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
static_configs:
- targets:
- 101.62.201.122:9090 # The External-IP and port from the target prometheus on Cluster B
I took the target like following:
on prometheus inside cluster B (from which I want to collect the data) I use:
kubectl get svc -n monitoring
And get the following entries:
Took the EXTERNAL-IP and put it inside the additionalScrapeConfigs config entry.
Now I switch to cluster A and run kubectl port-forward svc/mon-prometheus-operator-prometheus 9090:9090 -n monitoring
Open the browser with localhost:9090 see the graph's and click on Status and there Click on Targets
And see the new target with job federate
Now my main question/gaps. (security & verification)
To be able to see that target state on green (see the pic) I configure the prometheus server in cluster B instead of using type:NodePort to use type:LoadBalacer which expose the metrics outside, this can be good for testing but I need to secure it, how it can be done ?
How to make the e2e works in secure way...
tls
https://prometheus.io/docs/prometheus/1.8/configuration/configuration/#tls_config
Inside cluster A (main cluster) we use certificate for out services with istio like following which works
tls:
mode: SIMPLE
privateKey: /etc/istio/oss-tls/tls.key
serverCertificate: /etc/istio/oss-tls/tls.crt
I see that inside the doc there is an option to config
additionalScrapeConfigs:
- job_name: 'federate'
honor_labels: true
metrics_path: /federate
params:
match[]:
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
static_configs:
- targets:
- 101.62.201.122:9090 # The External-IP and port from the target
# tls_config:
# ca_file: /opt/certificate-authority-data.pem
# cert_file: /opt/client-certificate-data.pem
# key_file: /sfp4/client-key-data.pem
# insecure_skip_verify: true
But not sure which certificate I need to use inside the prometheus operator config , the certificate of the main prometheus A or the slave B?

You should consider using Additional Scrape Configuration
AdditionalScrapeConfigs allows specifying a key of a Secret
containing additional Prometheus scrape configurations. Scrape
configurations specified are appended to the configurations generated
by the Prometheus Operator.
I am affraid this is not officially supported. However, you can update your prometheus.yml section within the Helm chart. If you want to learn more about it, check out this blog
I see two options here:
Connections to Prometheus and its exporters are not encrypted and
authenticated by default. This is one way of fixing that with TLS
certificates and
stunnel.
Or specify Secrets which you can add to your scrape configuration.
Please let me know if that helped.

A couple of options spring to mind:
Put the two clusters in the same network space and put a firewall in-front of them
VPN tunnel between the clusters.
Use istio multicluster routing (but this could get complicated): https://istio.io/docs/setup/install/multicluster

Related

Pushing metrics to prometheus server via prometheus remote write from netdata

I have netdata installed in one of my computers and I want to export data to my prometheus server (both Ubuntu).
But I can't use prometheus' pull system, I need the metrics to be pushed from netdata to prometheus.
Netdata has prometheus remote write implemented in its exporting engine and I am able to configure it to send metrics to my server PC just fine.
But I can't see the metrics in prometheus at all, although I know the metrics are being sent to the server PC as I can see them by listening on the port I'm pushing to, via netcat.
So I think that my prometheus config is wrong.
This is my netdata exporting config:
[prometheus_remote_write:prometheus_receiver]
enabled = yes
destination = 192.168.5.45:9090
remote write URL path = /write
#username = admin
#password = admin
data source = average
prefix = netdata
# hostname = my_hostname
# update every = 10
# buffer on failures = 10
# timeout ms = 20000
# send names instead of ids = yes
# send charts matching = *
send hosts matching = *
And this is my prometheus config:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
remote_read:
- url: http://localhost/api/v1/write
remote_timeout: 30s
If I open the page localhost:9090/api/v1/write I expected to be able to see the metrics pushed from netdata, but instead I get a blank page that says "Method Not Allowed".
I execute prometheus with the flags --web.enable-admin-api --web.enable-remote-write-receiver.
Any clue on what I'm doing wrong?
Try execute prometheus with the flags --enable-feature=remote-write-receiver.
May be you have old version prometheus and this flag will be work.

Wrong connection port despite Kubernetes deployments/services ports specified

It might take a while to explain what I'm trying to do but bear with me please.
I have the following infrastructure specified:
I have a job called questo-server-deployment (I know, confusing but this was the only way to access the deployment without using ingress on minikube)
This is how the parts should talk to one another:
And here you can find the entire Kubernetes/Terraform config file for the above setup
I have 2 endpoints exposed from the node.js app (questo-server-deployment)
I'm making the requests using 10.97.189.215 which is the questo-server-service external IP address (as you can see in the first picture)
So I have 2 endpoints:
health - which simply returns 200 OK from the node.js app - and this part is fine confirming the node app is working as expected.
dynamodb - which should be able to send a request to the questo-dynamodb-deployment (pod) and get a response back, but it can't.
When I print env vars I'm getting the following:
➜ kubectl -n minikube-local-ns exec questo-server-deployment--1-7ptnz -- printenv
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=questo-server-deployment--1-7ptnz
DB_DOCKER_URL=questo-dynamodb-service
DB_REGION=local
DB_SECRET_ACCESS_KEY=local
DB_TABLE_NAME=Questo
DB_ACCESS_KEY=local
QUESTO_SERVER_SERVICE_PORT_4000_TCP=tcp://10.97.189.215:4000
QUESTO_SERVER_SERVICE_PORT_4000_TCP_PORT=4000
QUESTO_DYNAMODB_SERVICE_SERVICE_PORT=8000
QUESTO_DYNAMODB_SERVICE_PORT_8000_TCP_PROTO=tcp
QUESTO_DYNAMODB_SERVICE_PORT_8000_TCP_PORT=8000
KUBERNETES_SERVICE_HOST=10.96.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_PORT=tcp://10.96.0.1:443
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
QUESTO_SERVER_SERVICE_SERVICE_HOST=10.97.189.215
QUESTO_SERVER_SERVICE_PORT=tcp://10.97.189.215:4000
QUESTO_SERVER_SERVICE_PORT_4000_TCP_PROTO=tcp
QUESTO_SERVER_SERVICE_PORT_4000_TCP_ADDR=10.97.189.215
KUBERNETES_PORT_443_TCP_PROTO=tcp
QUESTO_DYNAMODB_SERVICE_PORT_8000_TCP=tcp://10.107.45.125:8000
QUESTO_DYNAMODB_SERVICE_PORT_8000_TCP_ADDR=10.107.45.125
KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
QUESTO_SERVER_SERVICE_SERVICE_PORT=4000
QUESTO_DYNAMODB_SERVICE_SERVICE_HOST=10.107.45.125
QUESTO_DYNAMODB_SERVICE_PORT=tcp://10.107.45.125:8000
KUBERNETES_SERVICE_PORT_HTTPS=443
NODE_VERSION=12.22.7
YARN_VERSION=1.22.15
HOME=/root
so it looks like the configuration is aware of the dynamodb address and port:
QUESTO_DYNAMODB_SERVICE_PORT_8000_TCP=tcp://10.107.45.125:8000
You'll also notice in the above env variables that I specified:
DB_DOCKER_URL=questo-dynamodb-service
Which is supposed to be the questo-dynamodb-service url:port which I'm assigning to the config here (in the configmap) which is then used here in the questo-server-deployment (job)
Also, when I log:
kubectl logs -f questo-server-deployment--1-7ptnz -n minikube-local-ns
I'm getting the following results:
Which indicates that the app (node.js) tried to connect to the db (dynamodb) but on the wrong port 443 instead of 8000?
The DB_DOCKER_URL should contain the full address (with port) to the questo-dynamodb-service
What am I doing wrong here?
Edit ----
I've explicitly assigned the port 8000 to the DB_DOCKER_URL as suggested in the answer but now I'm getting the following error:
Seems to me there is some kind of default behaviour in Kubernetes and it tries to communicate between pods using https ?
Any ideas what needs to be done here?
How about specify the port in the ConfigMap:
...
data = {
DB_DOCKER_URL = ${kubernetes_service.questo_dynamodb_service.metadata.0.name}:8000
...
Otherwise it may default to 443.
Answering my own question in case anyone have an equally brilliant idea of running local dybamodb in a minikube cluster.
The issue was not only with the port, but also with the protocol, so the final answer to the question is to modify the ConfigMap as follows:
data = {
DB_DOCKER_URL = "http://${kubernetes_service.questo_dynamodb_service.metadata.0.name}:8000"
...
}
As a side note:
Also, when you are running various scripts to create a dynamodb table in your amazon/dynamodb-local container, make sure you use the same region for both creating the table like so:
#!/bin/bash
aws dynamodb create-table \
--cli-input-json file://questo_db_definition.json \
--endpoint-url http://questo-dynamodb-service:8000 \
--region local
And the same region when querying the data.
Even though this is just a local copy, where you can type anything you want as a value of your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and actually in the AWS_REGION as well, the region have to match.
If you query the db with a different region it was created with, you get the Cannot do operations on a non-existent table error.

Prometheus - Scraping metrics from different endpoints inside an Azure VM

I have Prometheus running inside a Kubernetes cluster in Azure, and I'm trying to use it to also monitor a few VMs inside the same resource-group.
I have setup Azure SD for the VMs, and it's scanning them correctly, but the point is that in these VMs there are more than 1 service exposing metrics in different ports.
Is there a way to tell Prometheus to scan multiple ports under the azure_service_discovery job?
Or at least have these metrics aggregated, so Prometheus can scrape them from one single port?
The job definition that I'm using is:
azure_sd_configs:
- authentication_method: "OAuth"
subscription_id: AZURE_SUBSCRIPTION_ID
tenant_id: AZURE_TENANT_ID
client_id: AZURE_CLIENT_ID
client_secret: AZURE_CLIENT_SECRET
port: 9100
refresh_interval: 300s
You can't have two different ports in the same sd config.
However you can :
Either have multiple jobs with different azure_sd_configs. This way you can have different configuration for each job (drop some targets, customize sample limit, etc)
- job_name: azure_exporters_a
sample_limit: 1000
azure_sd_configs:
- port: 9100
...
- job_name: azure_exporters_b
sample_limit: 5000
azure_sd_configs:
- port: 9800
...
Or have multiple azure_sd_config for a specific job. In that case (the second one), all of your exporters will be regrouped in the same job, thus they will share the same configuration (sample_limit, scrape_timeout, ...)
- job_name: azure_exporters
sample_limit: 5000
azure_sd_configs:
- port: 9100
...
- port: 9800
...

Micronaut fail to connect to Keyspaces

I'm trying to integrate my service with AWS Cassandra (Keyspaces) with the following config:
cassandra:
default:
advanced:
ssl: true
ssl-engine-factory: DefaultSslEngineFactory
metadata:
schema:
enabled: false
auth-provider:
class: PlainTextAuthProvider
username: "XXXXXX"
password: "XXXXXX"
basic:
contact-points:
- ${CASSANDRA_HOST:"127.0.0.1"}:${CASSANDRA_PORT:"9042"}
load-balancing-policy:
local-datacenter: "${CASSANDRA_DATA_CENTER}:datacenter1"
session-keyspace: "keyspace"
Whenever I'm running the service it fails to load with the following error:
Message: Could not reach any contact point, make sure you've provided valid addresses (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=cassandra.eu-west-1.amazonaws.com/3.248.244.41:9142, hostId=null, hashCode=7296b27b): [com.datastax.oss.driver.api.core.DriverTimeoutException: [s0|control|id: 0x1f1c50a1, L:/172.17.0.3:54802 - R:cassandra.eu-west-1.amazonaws.com/3.248.244.41:9142] Protocol initialization request, step 1 (OPTIONS): timed out after 5000 ms]
There's very little documentation about the cassandra-micronaut library, so I'm not sure what I'm doing wrong here.
UPDATE:
For clarity: the values of our environment variables are as follow:
export CASSANDRA_HOST=cassandra.eu-west-1.amazonaws.com
export CASSANDRA_PORT=9142
export CASSANDRA_DATA_CENTER=eu-west-1
Note that even when I've hard-coded the values into my application.yml the problem continued.
I think you need to adjust your variables in this example. The common syntax for Apache Cassandra or Amazon Keyspaces is host:port. For Amazon Keyspaces the port is always 9142.
Try the following:
contact-points:
- ${CASSANDRA_HOST}:${CASSANDRA_PORT}
or simply hard code them at first.
contact-points:
- cassandra.eu-west-1.amazonaws.com:9142
So this:
contact-points:
- ${CASSANDRA_HOST:"127.0.0.1"}:${CASSANDRA_PORT:"9042"}
Doesn't match up with this:
Node(endPoint=cassandra.eu-west-1.amazonaws.com/3.248.244.41:9142,
Double-check which IP(s) and port Cassandra is broadcasting on (usually seen with nodetool status) and adjust the service to not look for it on 127.0.0.1.

collectd data not showing in influxdb container

I'm trying to in place a global resource monitoring of a small cluster. The chosen stack:
- collectd on the node for data gathering
- influxdb as backend using the official docker container
- grafana as frontend again using the official container
The container are launched on a central server. Grafana is able to connect to influxdb source and I updated collectd agent (network plugin in collectd.conf) and influxdb (influxdb.conf with collectd plugin) to enable them to talk to each others.
But no data is showing up... No much log to check, but for sure the influxdb data file are empty and nothing comes up when querying.
Anyone experienced such context? Any idea where to dig?
collectd conf extract:
# /etc/collectd/collectd.conf
<Plugin network>
Server "<public_IP_of_the_docker_host>" "25826"
</Plugin>
influxdb conf:
[input_plugins.collectd]
enabled = true
address = "public_IP_of_the_docker_host"
port = 25826
database = "collectd"
typesdb = "/usr/share/collectd/types.db"

Resources