I'm trying to in place a global resource monitoring of a small cluster. The chosen stack:
- collectd on the node for data gathering
- influxdb as backend using the official docker container
- grafana as frontend again using the official container
The container are launched on a central server. Grafana is able to connect to influxdb source and I updated collectd agent (network plugin in collectd.conf) and influxdb (influxdb.conf with collectd plugin) to enable them to talk to each others.
But no data is showing up... No much log to check, but for sure the influxdb data file are empty and nothing comes up when querying.
Anyone experienced such context? Any idea where to dig?
collectd conf extract:
# /etc/collectd/collectd.conf
<Plugin network>
Server "<public_IP_of_the_docker_host>" "25826"
</Plugin>
influxdb conf:
[input_plugins.collectd]
enabled = true
address = "public_IP_of_the_docker_host"
port = 25826
database = "collectd"
typesdb = "/usr/share/collectd/types.db"
Related
I've a nodejs project with Docker and ECS in AWS and i need to implement XRay to get the traces but I couldn't get it to work yet
I installed 'aws-xray-sdk' (npm install aws-xray-sdk), then I added
const AWSXRay = require('aws-xray-sdk');
in app.js
Then, before the routes I added
app.use(AWSXRay.express.openSegment('Example'));
and after the routes:
app.use(AWSXRay.express.closeSegment());
I hit some endpoints but I can't see any trace or data in xray, maybe do I need to setup something in AWS ? I have a default group in xray.
Thanks!
It sounds like you do not have the XRay Daemon running in your ECS environment. This daemon must be used in conjunction with the SDKs to send the trace data to AWS XRay service from the SDKs. The daemon listens for the trace data traffic on UDP port 2000. Read more about the daemon here:
https://docs.aws.amazon.com/xray/latest/devguide/xray-daemon.html
See how to run the XRay Daemon on ECS via Docker here:
https://docs.aws.amazon.com/xray/latest/devguide/xray-daemon-ecs.html
You would either need to look at X-Ray SDK, Agent or Open Telemetry SDK, Collector (AWS Distro for Open Telemetry)
We are using spark on kubernetes (using the SparkOperator) and Prometheus to expose the metrics of the application. The application is a spark streaming app (NOT structured streaming).
The application used to run on an image with spark version 2.4.7 and was later migrated to spark 3.1.2.
Because of this migration all spark_streaming_* metrics disappeared, like spark_streaming_driver_totalreceivedrecords (as defined here https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala#L53)
In general the Prometheus setup seems to work because when you curl the prometheus port you can still see a bunch of other metrics - just none of the spark streaming metrics.
The spark-image contains the prometheus-java agent and in the helm chart of the stream-app the monitoring spec is configured to use it
monitoring:
exposeDriverMetrics: true
exposeExecutorMetrics: true
prometheus:
jmxExporterJar: "/prometheus/jmx_prometheus_javaagent-0.11.0.jar"
port: 8888
as described in the docu of the spark-operator https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#monitoring.
This is also how the setup used to work with spark 2.4.7
Are this metrics gone in spark3? Or are we maybe just missing some configuration?
Another side note: when you check the metrics via <spark-driver>:<ui-port>/metrics/json you can see the desired metrics
ok it seems the spark-operator currently is not working properly with the current metrics in Spark3.
The issue could be fixed by provisioning the spark-image with a fixed prometheus configuration file and then use it in your helm chart
monitoring:
exposeDriverMetrics: true
exposeExecutorMetrics: true
prometheus:
jmxExporterJar: "/prometheus/jmx_prometheus_javaagent-0.11.0.jar"
port: 8888
configFile: PATH_TO_THE_CONFIG
more infos and a working prometheus config can be found here: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1117
I have a Spark streaming application (Spark 2.4.4) running on AWS EMR 5.28.0. In the driver application on master node, besides setting up the spark streaming job, I am also running a http server (Akka-http 10.1.6) which can query the driver application for data, I bind to port 6161 like the following:
val bindingFuture: Future[ServerBinding] = Http().bindAndHandle(myapiroutes, "127.0.0.1", 6161)
try {
bindingFuture.map { serverBinding =>
log.info(s"AlertRestApi bound to ${serverBinding.localAddress}")
}
} catch {
case ex: Exception => {
log.error(s"Failed to bind to 127.0.0:6161")
system.terminate()
}
}
then I start spark streaming:
ssc.start()
When I test this on local spark, I am able to access http://localhost:6161/myapp/v1/data and get data from spark streaming, everything is good so far.
However, when I run this application in AWS EMR, I could not access port 6161. I ssh into the driver node and try to curl my url, it gives me error message:
[hadoop#ip-xxx-xx-xx-x ~]$ curl http://xxx.xx.xx.x:6161/myapp/v1/data
curl: (7) Failed to connect to xxx.xx.xx.x port 6161: Connection refused
when I look into the log in the driver node, I do see the port is bound (why the host shows 0:0:0:0:0:0:0:0? I don't know, that is the way in my dev testing, and it works, I see the same log and able to access the url):
20/04/13 16:53:26 INFO MyApp: MyRestApi bound to /0:0:0:0:0:0:0:0:6161
So my question is, what should I do so that I can access the api at port 6161 on the driver node? I realize Yarn resource manager may be involved but I know nothing about Yarn resource manager to point myself where to investigate.
Please help. Thanks
You are mentioning 127.0.0.1 as the host name or 0.0.0.0??
127.0.0.1 will work in your local system but not in AWS as it is loopback address. In such case you need to use 0.0.0.0 as the host name
Also make sure that ports are open and access is provided from your IP. To do that, go to Inbound rules for your instance and add 6161 under custom TCP rule if not done already.
Let me know if this makes any difference
I've installed Prometheus operator 0.34 (which works as expected) on cluster A (main prom)
Now I want to use the federation option,I mean collect metrics from other Prometheus which is located on other K8S cluster B
Secnario:
have in cluster A MAIN prometheus operator v0.34 config
I've in cluster B SLAVE prometheus 2.13.1 config
Both installed successfully via helm, I can access to localhost via port-forwarding and see the scraping results on each cluster.
I did the following steps
Use on the operator (main cluster A) additionalScrapeconfig
I've added the following to the values.yaml file and update it via helm.
additionalScrapeConfigs:
- job_name: 'federate'
honor_labels: true
metrics_path: /federate
params:
match[]:
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
static_configs:
- targets:
- 101.62.201.122:9090 # The External-IP and port from the target prometheus on Cluster B
I took the target like following:
on prometheus inside cluster B (from which I want to collect the data) I use:
kubectl get svc -n monitoring
And get the following entries:
Took the EXTERNAL-IP and put it inside the additionalScrapeConfigs config entry.
Now I switch to cluster A and run kubectl port-forward svc/mon-prometheus-operator-prometheus 9090:9090 -n monitoring
Open the browser with localhost:9090 see the graph's and click on Status and there Click on Targets
And see the new target with job federate
Now my main question/gaps. (security & verification)
To be able to see that target state on green (see the pic) I configure the prometheus server in cluster B instead of using type:NodePort to use type:LoadBalacer which expose the metrics outside, this can be good for testing but I need to secure it, how it can be done ?
How to make the e2e works in secure way...
tls
https://prometheus.io/docs/prometheus/1.8/configuration/configuration/#tls_config
Inside cluster A (main cluster) we use certificate for out services with istio like following which works
tls:
mode: SIMPLE
privateKey: /etc/istio/oss-tls/tls.key
serverCertificate: /etc/istio/oss-tls/tls.crt
I see that inside the doc there is an option to config
additionalScrapeConfigs:
- job_name: 'federate'
honor_labels: true
metrics_path: /federate
params:
match[]:
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
static_configs:
- targets:
- 101.62.201.122:9090 # The External-IP and port from the target
# tls_config:
# ca_file: /opt/certificate-authority-data.pem
# cert_file: /opt/client-certificate-data.pem
# key_file: /sfp4/client-key-data.pem
# insecure_skip_verify: true
But not sure which certificate I need to use inside the prometheus operator config , the certificate of the main prometheus A or the slave B?
You should consider using Additional Scrape Configuration
AdditionalScrapeConfigs allows specifying a key of a Secret
containing additional Prometheus scrape configurations. Scrape
configurations specified are appended to the configurations generated
by the Prometheus Operator.
I am affraid this is not officially supported. However, you can update your prometheus.yml section within the Helm chart. If you want to learn more about it, check out this blog
I see two options here:
Connections to Prometheus and its exporters are not encrypted and
authenticated by default. This is one way of fixing that with TLS
certificates and
stunnel.
Or specify Secrets which you can add to your scrape configuration.
Please let me know if that helped.
A couple of options spring to mind:
Put the two clusters in the same network space and put a firewall in-front of them
VPN tunnel between the clusters.
Use istio multicluster routing (but this could get complicated): https://istio.io/docs/setup/install/multicluster
We have implemented Hazelcast as an embedded cache in our Spring Boot app, and need a way using which Hazelcast members within a "cluster group" can discover each other dynamically so that we dont have to provide possible IP address/port where Hazelcast might be running.
We came across this hazelcast plugin on github:
https://github.com/hazelcast/hazelcast-eureka which seems to provide the same feature using Eureka as discovery/registration tool.
As mentioned in this github documentation, hazelcast-eureka-one library is included within our boot app classpath, we also disabled TCP-IP & multicast discovery and added below discovery strategy in hazelcast.xml:
<discovery-strategies>
<discovery-strategy class="com.hazelcast.eureka.one.EurekaOneDiscoveryStrategy" enabled="true">
<properties>
<property name="self-registration">true</property>
<property name="namespace">hazelcast</property>
</properties>
</discovery-strategy>
</discovery-strategies>
Our application also provides configured EurekaClient, which is what we are autowiring and inject into this plugin implementation:
*
Config hazelcastConfig = new FileSystemXmlConfig(hazelcastConfigFilePath);
**EurekaOneDiscoveryStrategyFactory.setEurekaClient(eurekaClient);**
hazelcastInstance = Hazelcast.newHazelcastInstance(hazelcastConfig);
*
Problem:
We are able to start 2 instances of our spring boot app on same machine and we notice that each app is starting hazelcast instance embedded on separate port (5701, 5702). But it doesnt seem to recognize each other running within a cluster, this is what we see in app logs when 2nd instance is starting:
Members [1] {
Member [10.41.70.143]:5702 - 7c42eb24-3fa0-45cb-9394-17175cc92b9c this
}
17-12-13 12:22:44.480 WARN [main] c.h.i.Node.log(LoggingServiceImpl.java:168) - [10.41.70.143]:5702 [domain-services] [3.8.2] Config seed port is 5701 and cluster size is 1. Some of the ports seem occupied!
which seem to indicate that both hazelcast instances are running independently and doesnt recognize other running instance in a cluster/group.
Also, immediately after restart we see this exception thrown frequently on both the nodes:
*
java.lang.ClassCastException: com.hazelcast.nio.tcp.MemberWriteHandler cannot be cast to com.hazelcast.nio.ascii.TextWriteHandler
at com.hazelcast.nio.ascii.TextReadHandler.<init>(TextReadHandler.java:109) ~[hazelcast-3.8.2.jar:3.8.2]
at com.hazelcast.nio.tcp.SocketReaderInitializerImpl.init(SocketReaderInitializerImpl.java:89) ~[hazelcast-3.8.2.jar:3.8.2]
*
which seem to indicate there is Incompatibility between hazelcast library in the classpath?
It seems like your Eureka service returns the wrong ports. Hazelcast tries to connect to 8080 and other ports in the same range, whereas Hazelcast uses 5701. Not exactly sure why this happens but it feels like you requesting the wrong service name from Eureka which ends up returning the HTTP (Tomcat?!) ports instead of the separate Hazelcast service that should be registered.