How to send dynamic metrics from MQTT to Prometheus? - apache-spark

I am trying to send dynamic metrics from IoT devices in Apache Spark Plug format from MQTT to Prometheus, my current setup looks something like this:
MQTT -> MQTT2Prometheus -> Prometheus -> Graphana
My current setup only works for metrics that are pre-configured in the MQTT2Prometheus config and I can not find a way to send metrics dynamically without knowing the metric name beforehand.
Are there any other libraries that can do this or are there ways to make MQTT2Prometheus work with dynamic metrics?

Related

How to get custom metrics in Elastic APM?

I have setup application performance monitoring in my Nodejs Application with Elastic. I can see my metrics coming through for CPU / memory etc but can't seem to get things to work for custom metrics.
I am following their docs on adding a metric for tracking the clients connected to our web socket server but can't see the metrics in the Elastic dashboard anywhere:
I have the following code:
apm.registerMetric('ws.connections', () => socketClients.length);
Under APM -> Services -> Metrics I am searching for ws.connections but no results are showing up.
I have releasedthe code and know everything is working as expected for APM but just seems I can't track down the custom metrics.
Any advice on where to find this or what I can do to try debug it?

Prometheus scraping only some metrics

In my cluster I have an nginx ingress, nodejs server and prometheus. Prometheus is scraping all metrics from nginx, no problem, but it seems that it's ommitting some metrics from my nodejs server.
# HELP nodejs_version_info Node.js version info.
# TYPE nodejs_version_info gauge
nodejs_version_info{version="v16.15.0",major="16",minor="15",patch="0"} 1
This metric is indeed scraped by prometheus because it has nodejs_ in name. However, I also have some metrics which look like this:
# HELP http_request_duration_seconds duration histogram of http responses labeled with: status_code, method, path
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.003",status_code="200",method="GET",path="/"} 0
Metrics without nodejs in name do not appear in the dashboard, like so:
I should mention that I am using https://www.npmjs.com/package/express-prom-bundle for the response time metric. Does anybody know how to fix that?

Read from multiple sources using Nifi, group topics in Kafka and subscribe with Spark

We use Apache Nifi to get data from multiple sources like Twitter and Reddit in specific interval (for example 30s). Then we would like to send it to Apache Kafka and probably it should somehow group both Twitter and Reddit messages into 1 topic so that Spark would always receive data from both sources for given interval at once.
Is there any way to do that?
#Sebastian What you describe is basic NiFI routing. You would just route both Twitter and Redis to the same downstream Kafka Producer and same Topic. After you get data into NiFi from each service, you should run it to UpdateAttribute and set attribute topicName to what you want for each source. If there are additional steps per Data Source do them after Update Attribute and before PublishKafka.
If you code all the upstream routes as above, you could route all the different Data Sources to PublishKafka processor using ${topicName} dynamically.

Use ELK Stack to visualise metrics of Telegraf or StatsD

I am aggregating my logs using the ELK stack. Now I would like to show metrics and create alerts with it too like current CPU usage, number of requests Handled, number of DB queries etc
I can collect the metrics using Telegraf or StatsD but how do I plug them into Logstash? There is no Logstash input for either of these two.
Does this approach even make sense or should I Aggregate time series data in a different system? I would like to have everything under one hood.
I can give you some insight on how to accomplish this with Telegraf:
Option 1: Telegraf output TCP into Logstash. This is what I do personally, because I like to have all of my data go through Logstash for tagging and mutations.
Telegraf output config:
[[outputs.socket_writer]]
## URL to connect to
address = "tcp://$LOGSTASH_IP:8094"
Logstash input config:
tcp {
port => 8094
}
Option 2: Telegraf directly to Elasticsearch. The docs for this are good and should tell you what to do!
From an ideological perspective, inserting metrics into the ELK stack may or may not be the right thing to do - it depends on your use case. I switched to using Telegraf/InfluxDB because I had a lot of metrics and my consumers preferred the Influx query syntax for time-series data and some other Influx features such as rollups.
But there is something to be said about reducing complexity by having all of your data "under one hood". Elastic is also making the push toward being more suitable for time-series data with Timelion and there were a few talks at Elasticon concerning storing time-series data in Elasticsearch. Here's one. I would say that storing your metrics in ELK is a completely reasonable thing to do. :)
Let me know if this helps.
Here are various options for storing metrics from StatsD to ES:
Using statsd module of metricbeat. The metrics can be send to metricbeat in StatsD format. Then metricbeat transfers them to ES.
Example of metricbeat configuration:
metricbeat.modules:
- module: statsd
host: "localhost"
port: 8125
enabled: true
period: 2s
ElasticSearch as StatsD backend. The following project allows to save metrics from StatsD to ES:
https://github.com/markkimsal/statsd-elasticsearch-backend

Pyspark Streaming - How to set up custom logging?

I have a pyspark streaming application that runs on yarn in a Hadoop cluster. The streaming application reads from a Kafka queue every n seconds and makes a REST call.
I have a logging service in place to provide an easy way to collect and store data, send data to Logstash and visualize data in Kibana. The data needs to conform to a template (JSON with specific keys) provided by this service.
I want to send logs from the streaming application to Logstash using this service. For this, I need to do two things:
- Collect some data while the streaming app is reading from Kafka and making the REST call.
- Format it according to the logging service template.
- Forward the log to logstash host.
Any guidance related to this would be very helpful.
Thanks!

Resources