i am experiencing really strange behavior with prometheus and grafana. It seems like when prometheus gets under load from grafana e.g. when changing date range to larger or refreshing the graphs after longer time, it generates false metrics, it directly affects the data that prometheus has, as it can be retrieved changed also directly in prometheus.
I am using basic promQL nothing special e.g.
The metrics I am using are custom, created in node.js by prom-client.
avg(rate(udp_uplink_receive_duration_seconds_bucket{ success="true"}[1h]))
But as it can be seen on all of the graphs it looks like something affects all the metrics when under load.
prometheus settings
global:
scrape_interval: 5s
evaluation_interval: 30s
scrape_configs:
- job_name: nodejs
honor_labels:true
static_configs:
- targets: ['...']
there is definitely nothing generating the data from the apps.
any help what to check? is there any config i am missing ?
Related
In my cluster I have an nginx ingress, nodejs server and prometheus. Prometheus is scraping all metrics from nginx, no problem, but it seems that it's ommitting some metrics from my nodejs server.
# HELP nodejs_version_info Node.js version info.
# TYPE nodejs_version_info gauge
nodejs_version_info{version="v16.15.0",major="16",minor="15",patch="0"} 1
This metric is indeed scraped by prometheus because it has nodejs_ in name. However, I also have some metrics which look like this:
# HELP http_request_duration_seconds duration histogram of http responses labeled with: status_code, method, path
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.003",status_code="200",method="GET",path="/"} 0
Metrics without nodejs in name do not appear in the dashboard, like so:
I should mention that I am using https://www.npmjs.com/package/express-prom-bundle for the response time metric. Does anybody know how to fix that?
I am using the Spark operator to run Spark on Kubernetes. (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
I am trying to run a Java agent in Spark driver and executor pods and send the metrics through a Kubernetes service to Prometheus operator.
I am using this example
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/examples/spark-pi-prometheus.yaml
Java agent is exposing the metrics on port 8090 for a short time (I can validate that with port-forwarding kubctl port-forward < spark-driver-pod-name > 8090:8090 ), also the service is also exposing the metrics for few mins ( can validate that with port-forwarding kubctl port-forward svc/< spark-service-name > 8090:8090 ).
Promethues is able to register these pod's URL in the prometheus, but when it is trying to scrape the metrics(runs for every 30 seconds), the pod's URL is down.
How can i make the Java agent JMX exporter to run long, until the driver and executors completed the job. could you please guide or help me here, who have come across this scenario before?
Either Prometheus needs to scrape the metrics of every 5 seconds (chances are that metrics may not be accurate), or you need to use pushgateway, like mentioned in this blog(https://banzaicloud.com/blog/spark-monitoring/) to push the metrics to Prometheus
Pushing the metrics to Prometheus, is a best practice for batch jobs.
Pulling the metrics from Prometheus is a best approach for long running services (ex:REST Services)
I want to collect AWS cloudfront request level metrics [count of request by unique resource] into Prometheus.
I've seen how to use Logstash to forward the logs to ElasticSearch, and I thought of polling/querying ElasticSearch once a minute to get an aggregate, then exporting that result to Prometheus.
But it feels a little sloppy considering potential timing issues or missing/duplicate metric values.
I also saw a metrics filter for Logstash - so maybe I could create a meter for each unique url, then use the http output plugin to send the metrics to Prometheus.
One more thought -
I've never used CloudFront with CloudWatch. Maybe I could use the CloudWatch exporter for Prometheus if it provide request counts at the resource level, or is it higher level aggregates?
You can use cloudwatch_exporter to scrape metrics from cloudwatch for CloudFront.
I am new to Cassandra and trying to setup monitoring tool to monitor Cassandra production cluster. So i have setup one graphite-grafana on one of the cassandra node & i'm able to get metrics of that particular cassandra node on grafana, but now i want to fetch metrics from all the cassandra nodes and display them in grafana.
can anyone direct me about structure i should follow or how to setup graphite-grafana tool for multiple nodes monitoring in production . what are the changes to be made configurations file etc.
I think it is better that Graphite-grafana will be in a separated machine or cluster.
You could send metrics from all your cassandra nodes to the machine/cluster, and make sure that there is identification of cassandra node in the metric key (for example, use the key cassandra.nodes.machine01.blahblahblah for one metric from machine01).
After that, you could use Graphite API to fetch metrics of all your cassandra nodes from that Graphite machine/cluster.
I got my answer after hit-trial.e.g, i have edited metrics_reporter_graphite.yaml like below:
graphite:
-
period: 30
timeunit: 'SECONDS'
prefix: 'cassandra-clustername-node1'
hosts:
- host: 'localhost'
port: 2003
predicate:
color: 'white'
useQualifiedName: true
patterns:
- '^org.apache.cassandra.+'
- '^jvm.+'`enter code here`
Replace localhost with your graphite-grafana server/vm IP address.
I have managed to collect JMX MetricsData from a Java Application and saved it on a Cassandra database (My project leader said to do so).
I know that it is possible to collect with JmxTrans directly from JMX- Endpoints and visualize it within Grafana/Graphite.
My Question is: can I collect the JMX metrics data from cassandra and visualize it in Grafana?
Grafana requires something else (ie graphite, influxdb, cyanite) to store the data. So to answer your question (what I think your asking at least) of if grafana can pull the metrics from JMX itself, it would be "No".
That said you can make the collection easier and faster. JMX isn't a very efficient medium. Instead just have Cassandra send metrics directly to your graphite (or whatever reporter) instances using its graphite reporter. see http://www.datastax.com/dev/blog/pluggable-metrics-reporting-in-cassandra-2-0-2 for details. The steps in blog post are as follows:
Grab your favorite reporter jar (such as metrics-graphite) and add it to the server’s lib
Create a configuration file for the reporters following the sample.
Start the server with -Dcassandra.metricsReporterConfigFile=yourCoolFile.yaml by adding it to JVM_OPTS in cassandra-env.sh
Example config:
graphite:
-
period: 60
timeunit: 'SECONDS'
hosts:
- host: 'graphite-server.domain.local'
port: 2003
predicate:
color: "white"
useQualifiedName: true
patterns:
- "^org.apache.cassandra.metrics.Cache.+"
- "^org.apache.cassandra.metrics.ClientRequest.+"
- "^org.apache.cassandra.metrics.Storage.+"
- "^org.apache.cassandra.metrics.ThreadPools.+"
the question is old, but if you were to do it nowadays , i'd recommend using Prometheus as data source for Grafana along with it's JmxExporter agent on Cassandra.
It seems that you want to use Cassandra as the data store for the JMX metrics that you are collecting from other services; Grafana doesn't have that support yet (the available datastores are listed here.