Use ELK Stack to visualise metrics of Telegraf or StatsD - logstash

I am aggregating my logs using the ELK stack. Now I would like to show metrics and create alerts with it too like current CPU usage, number of requests Handled, number of DB queries etc
I can collect the metrics using Telegraf or StatsD but how do I plug them into Logstash? There is no Logstash input for either of these two.
Does this approach even make sense or should I Aggregate time series data in a different system? I would like to have everything under one hood.

I can give you some insight on how to accomplish this with Telegraf:
Option 1: Telegraf output TCP into Logstash. This is what I do personally, because I like to have all of my data go through Logstash for tagging and mutations.
Telegraf output config:
[[outputs.socket_writer]]
## URL to connect to
address = "tcp://$LOGSTASH_IP:8094"
Logstash input config:
tcp {
port => 8094
}
Option 2: Telegraf directly to Elasticsearch. The docs for this are good and should tell you what to do!
From an ideological perspective, inserting metrics into the ELK stack may or may not be the right thing to do - it depends on your use case. I switched to using Telegraf/InfluxDB because I had a lot of metrics and my consumers preferred the Influx query syntax for time-series data and some other Influx features such as rollups.
But there is something to be said about reducing complexity by having all of your data "under one hood". Elastic is also making the push toward being more suitable for time-series data with Timelion and there were a few talks at Elasticon concerning storing time-series data in Elasticsearch. Here's one. I would say that storing your metrics in ELK is a completely reasonable thing to do. :)
Let me know if this helps.

Here are various options for storing metrics from StatsD to ES:
Using statsd module of metricbeat. The metrics can be send to metricbeat in StatsD format. Then metricbeat transfers them to ES.
Example of metricbeat configuration:
metricbeat.modules:
- module: statsd
host: "localhost"
port: 8125
enabled: true
period: 2s
ElasticSearch as StatsD backend. The following project allows to save metrics from StatsD to ES:
https://github.com/markkimsal/statsd-elasticsearch-backend

Related

Query a Kafka Topic with nodejs

I'm a bit puzzled. Is there really no NodeJS API to query Kafka topics (e.g. as with Kafka Streams and the Java API)? Am I missing something?
Just to get this straight. Only be notified of the latest event/record of a topic is not enough for me. I want to query and process the topics' records - and then maybe store an aggregate to another topic.
thanks for your thoughts if this is possible with nodejs and a library only.
Here what worked for me and most people use.
Limited solution
If you are stubborn and want to insist on a node.js library to wrap things up for you: https://nodefluent.github.io/kafka-streams/docs/
As of today they offer:
easy access streams
merge streams
split streams
Full blown solution
The easiest way (as from today - there are rumors Confluent is creating more libraries and also one for nodejs) one can query kafka is by the REST API. It is part of the ksqlDB and ksqlDB is part of the confluent platform which also ships with Zookeeper and a Kafka instance, which you probably already have. If you wonder how to install:
It spins up in 1 minutes with the docker yml file.
Run docker-compose up -d
See the ports and services running with docker ps
Start requesting the status from the REST API by simply sending a GET request to http://0.0.0.0:8088/. It will return service information.
{
"KsqlServerInfo": {
"version": "6.2.0",
"kafkaClusterId": "uOXfee3zQ76vCKBbREw1yg",
"ksqlServiceId": "default_",
"serverStatus": "RUNNING"
}
}
Hope the strips some of you from the initial research. And.... if we are lucky there will be a wrapper library soon.
Then create a stream out of your topic and voila. You are ready to query your topic (through the stream) with the REST API. Since the REST API offers HTTP2, one could also expect continuous updates on freshly arriving records in the stream. Apply Push Queries for this. Pull queries will cut the line after the result has been delivered.

Cloudfront Logs into Prometheus

I want to collect AWS cloudfront request level metrics [count of request by unique resource] into Prometheus.
I've seen how to use Logstash to forward the logs to ElasticSearch, and I thought of polling/querying ElasticSearch once a minute to get an aggregate, then exporting that result to Prometheus.
But it feels a little sloppy considering potential timing issues or missing/duplicate metric values.
I also saw a metrics filter for Logstash - so maybe I could create a meter for each unique url, then use the http output plugin to send the metrics to Prometheus.
One more thought -
I've never used CloudFront with CloudWatch. Maybe I could use the CloudWatch exporter for Prometheus if it provide request counts at the resource level, or is it higher level aggregates?
You can use cloudwatch_exporter to scrape metrics from cloudwatch for CloudFront.

Jmx Monitoring: Possible to collect and visualize Jmx/Mbeans Data saved on Cassandra?

I have managed to collect JMX MetricsData from a Java Application and saved it on a Cassandra database (My project leader said to do so).
I know that it is possible to collect with JmxTrans directly from JMX- Endpoints and visualize it within Grafana/Graphite.
My Question is: can I collect the JMX metrics data from cassandra and visualize it in Grafana?
Grafana requires something else (ie graphite, influxdb, cyanite) to store the data. So to answer your question (what I think your asking at least) of if grafana can pull the metrics from JMX itself, it would be "No".
That said you can make the collection easier and faster. JMX isn't a very efficient medium. Instead just have Cassandra send metrics directly to your graphite (or whatever reporter) instances using its graphite reporter. see http://www.datastax.com/dev/blog/pluggable-metrics-reporting-in-cassandra-2-0-2 for details. The steps in blog post are as follows:
Grab your favorite reporter jar (such as metrics-graphite) and add it to the server’s lib
Create a configuration file for the reporters following the sample.
Start the server with -Dcassandra.metricsReporterConfigFile=yourCoolFile.yaml by adding it to JVM_OPTS in cassandra-env.sh
Example config:
graphite:
-
period: 60
timeunit: 'SECONDS'
hosts:
- host: 'graphite-server.domain.local'
port: 2003
predicate:
color: "white"
useQualifiedName: true
patterns:
- "^org.apache.cassandra.metrics.Cache.+"
- "^org.apache.cassandra.metrics.ClientRequest.+"
- "^org.apache.cassandra.metrics.Storage.+"
- "^org.apache.cassandra.metrics.ThreadPools.+"
the question is old, but if you were to do it nowadays , i'd recommend using Prometheus as data source for Grafana along with it's JmxExporter agent on Cassandra.
It seems that you want to use Cassandra as the data store for the JMX metrics that you are collecting from other services; Grafana doesn't have that support yet (the available datastores are listed here.

logstash with redis sentinel is possible?

I want to build a High availability ELK monitoring system with redis.But a little confuse for how to make redis HA.
Redis Sentinel provides high availability for Redis.
But i donot find any configuration for this on the document. https://www.elastic.co/guide/en/logstash/current/plugins-inputs-redis.html
So can i use it for logstash as input and output? Anyone has experience for this?
Logstash Input Redis Plugin supports only one host in a host option.
I think you have 2 ways to get HA:
1) Continue use Redis. You can create dns A record (or edit your host file), that will point to multiple Redis servers, then put the record to host option.
2) Moving from Redis to Kafka:
https://www.elastic.co/blog/logstash-kafka-intro

Suggestion on Logstash Setup

I've implemented logstash ( in testing ) as below mentioned architecture.
Component Break Down
Rsyslog client: By default syslog installed in all Linux destros, we just need to configure rsyslog to send logs to remote server.
Logstash: Logstash will received logs from syslog client and it will
store in Redis.
Redis: Redis will work as broker, broker is to hold log data sent by agents before logstash indexes it. Having a broker will enhance performance of the logstash server, Redis acts like a buffer for log data, till logstash indexes it and stores it. As it is in RAM its too fast.
Logstash: yes, two instance of logstash, 1st one for syslog server,
2nd for read data from redis and send out to elasticsearch.
Elasticsearch: The main objective of a central log server is to collect all logs at one place, plus it should provide some meaningful data for analysis. Like you should be able to search all log data for your particular application at a specified time period.Hence there must be a searching and well indexing capability on our logstash server. To achieve this, we will install another opensource tool called as elasticsearch.Elasticsearch uses a mechanism of making an index, and then search that index to make it faster. Its a kind of search engine for text data.
Kibana : Kibana is a user friendly way to view, search and visualize
your log data
But I'm little bit confuse with redis. using this scenario I'll be running 3 java process on Logstash server and one redis, this will take hugh ram.
Question
Can I use only one logstash and elastic search ? Or what would be the best way ?
I am actually in the midst of setting up logstash, redis, elasticsearch, kibana (aka ELK architecture) at my company.
I have the processes split between virtual machines. While you could put them on the same machine, what happens if a machine dies? Then you are left with your indexer and cluster down at the same time.
You also have the problem of not being able to properly replicate your shards on the Elasticsearch. Since you only have one server, the shards won't be replicated and your cluster health will always be yellow. You need to add enough servers to avoid the split-brain scenario.
Why keep Redis?
Since Redis can talk to multiple logstash indexers, one key point is that this makes the indexing transparent to your shippers in that if one indexer goes down, the alternates will pick up the load. This makes your setup high availability.
It's not just a matter of shipping logs and having them indexed and searchable. While your setup will likely work in a very small, rare situation, the stuff people are doing with ELK setups are hundreds of servers, even thousands, so the ELK architecture is meant to scale. All of these servers will also need to be remotely managed by something called Puppet.
Finally, if you have not read it yet, I suggest you read The Logstash Book by James Turnbull.
The following are some more recommended books that have helped me so far:
Pro Puppet, Second Edition
Elasticsearch Cookbook, Second Edition
Redis Cookbook
Redis in Action
Mastering Elasticsearch
ElasticSearch Server
Elasticsearch: The Definitive Guide
Puppet Types and Providers
Puppet 3 Cookbook
You can use only one logstash and elasticsearch if you put all the instance in a machine. Logstash directly read the syslog file by using file input plugin.
Otherwise, you have to use two logstash and redis. It is because logstash do not have any buffer mechanism, so it needs redis as its broker to buffer the log event. Redis do not use more ram. When logstash read the log event from it, the memory will release. If redis use large ram, you have to add the logstash workers for processing the logs faster.
You should only be running one instance of logstash. logstash by design has the ability to have multiple input channels and output channels.
input {
# input instances
syslog {
# add other settings accordingly
type => "syslog"
}
redis {
# add other settings accordingly
type => "redis"
}
}
filter {
# add other settings accordingly
}
output {
# output instances
if [type] == "syslog" {
redis {
# add other settings accordingly
}
}
else if [type] == "redis" {
elasticsearch {
# add other settings accordingly
}
}
}

Resources