Process log for logstash pipelines - logstash

My goal is to make process log for any running process in ELK stack.
Initially processes can be logged from Logstash pipelines and different Java processes.
Sample index in elastic for process logs could be something like this:
StartTime | EndTime | ProcessName | Outcome
Question: As logstash pipelines are adding each "event" in defined output index,
such as
output {
elasticsearch {
index => "my-index"
hosts => "localhost:8080"
}
}
is there actually possibility to create Started log before pipeline starts and ended after pipeline ends/fails?

Related

How to identify filebeat source in logstash logs

I have multiple filebeat services running on different applications, which send the logs to a central logstash server for parsing.
Sometimes a few application logs are not in correct format so there is a 'parse error' in the 'logstash-plain.log' file. The problem I am having is that I am not able to identify from logstash-plain.log file where the logs are coming from (since there are a huge number of applications with filebeat running)
Is there a way to trace the filebeat source from logstash logs?
You can use processors from filebeat to add tags
processors:
- add_tags:
tags: [my_app_1]
target: "application_tags"
and then use different filter plugins configuration in logstash to parse the logs properly.
filter {
if "my_app_1" in [application_tags] {
grok {
....
}
}
}

Pipeline Transform Logging on Apache Beam in Dataproc

Recently, I deployed a very simple Apache Beam pipeline to get some insights into how it behaved executing in Dataproc as opposed to on my local machine. I quickly realized that after executing that any DoFn or transform-level logging didn't appear within the job logs within the Google Cloud Console as I would have expected and I'm not entirely sure what might be missing.
All of the high level logging messages are emitted as expected:
// This works
log.info("Testing logging operations...")
pipeline
.apply(Create.of(...))
.apply(ParDo.of(LoggingDoFn))
The LoggingDoFn class here is a very basic transform that emits each of the values that it encounters as seen below:
object LoggingDoFn : DoFn<String, ...>() {
private val log = LoggerFactory.getLogger(LoggingDoFn::class.java)
#ProcessElement
fun processElement(c: ProcessContext) {
// This is never emitted within the logs
log.info("Attempting to parse ${c.element()}")
}
}
As detailed in the comments, I can see logging messages outside of the processElement() calls (presumably because those are being executed by the Spark runner), but is there a way to easily expose those within the inner transform as well? When viewing the logs related to this job, we can see the higher-level logging present, but no mention of a "Attempting to parse ..." message from the DoFn:
The job itself is being executed by the following gcloud command, which has the driver log levels explicitly defined, but perhaps there's another level of logging or configuration that needs to be added:
gcloud dataproc jobs submit spark --jar=gs://some_bucket/deployment/example.jar --project example-project --cluster example-cluster --region us-example --driver-log-levels com.example=DEBUG -- --runner=SparkRunner --output=gs://some_bucket/deployment/out
To summarize, log messages are not being emitted to the Google Cloud Console for tasks that would generally be assigned to the Spark runner itself (e.g. processElement()). I'm unsure if it's a configuration-related issue or something else entirely.

How to setup ELK with node.js

I want to log error from my node.js server to another server. I'm thinking of using elasticsearch, logstash and kibana. I want to know how to setup ELK with my node server.
I had exactly this use case in my older organization. A basic tutorial to startup with Beats + ELK - https://www.elastic.co/guide/en/beats/libbeat/current/getting-started.html
So basically this is how it works - Your nodejs app will log in the files (you can use bunyan for this) in different formats like error/warning/info etc. Filebeat will tail these log files and send messages to logstash server. Logstash input.conf will have some input filters (in your case it will be error filters). When any log message passes these filters then logstash will forward it to some endpoint as decided in output.conf file.
Here is what we did -
Initial architecture - Install filebeat (earlier we used logstash forwarder) client to tail the logs on nodejs server and forward it to logstash machine. Logstash will do some processing on input logs and send them to ES cluster (can be on same machine as Logstash). Kibana is just a visualization on this ES.
Final Architecture - Initial setup was cool for small traffic but we realized that logstash can be single point of failure and may result in log loss when traffic increased. So we integrated Kafka along with Logstash so that system scales smoothly. Here is an article - https://www.elastic.co/blog/logstash-kafka-intro
Hope this helps!
It is possible to use logstash without agents running to collect logs from the application.
Logstash has input plugins (https://www.elastic.co/guide/en/logstash/current/input-plugins.html). This can be configured in the pipeline. One basic setup is to configure the TCP (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-tcp.html) or UDP (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-udp.html)input plugin. Logstash can listen on the port configured in the plugin.
Then the application can send the log directly to the logstash server. The pipeline can then transform and forward to ES.
By configuring Logstash pipeline to be durable, data loss can be avoided. This approach is better when the application servers are ephemeral ( as in containers).
For nodejs, https://www.npmjs.com/package/winston-logstash is a package which is quite active. This gist https://gist.github.com/jgoodall/6323951 provides a good example for the overall approach in other languages.
This is the sample (minimal) TCP input plugin configuration
input {
tcp {
'port' => '9563'
}
}
You can install Logstash in the NodeJS Server, and then create a configuration file that accepts input (location of the log file(s)) and output to your Elastic Server host.
Below is the sample configuration file (custom.conf) which has to created in your logstash directory.
input {
file {
path => "/path to log"
start_position => beginning
}
}
output {
stdout { }
elasticsearch{
type => "stdin-type"
embedded => false
host => "192.168.0.23"
port => "9300"
cluster => "logstash-cluster"
node_name => "logstash"
}
}
Execute the logstash
logstash -f custom.conf
Reference: https://www.elastic.co/guide/en/logstash/current/config-examples.html
If you are planning to customize a NodeJS application for sending error logs then you can install some ELKStack Nodjs wrapper and post error log within your application. ELKStack Nodjs wrapper - https://www.npmjs.com/package/elksdk

logstash monitor spark-on-yarn application logs

I installed a logstash on my spark node.
The problem is it can't collect all logs from the dynamic created containers' folders. It does collect part of the logs, which is from only one of the container. Here is the input part of my logstash config file:
input {
file {
path => "/home/userXXX/hadoop-2.6.0/logs/userlogs/application_*/container_*/*"
}
So the question is, does the path with * cover all the container folders, including those will be created at runtime?

logstash zabbix metric without tags in log

We have the following setup.
1 central logstash server (behind that we have an elasticsearch
cluster based on two nodes)
1 central zabbix server
10 Servers with logstash-forwarder
On our logstash server we are getting syslogs apache/nginx access and error logs from 10 mentioned servers trough logstash-forwarder.
Since we want to see the amount of error logs per server per minute in a nice graph in our zabbix system we are using the metrics plugin (http://logstash.net/docs/1.4.2/contrib-plugins)
Here is the PROBLEM:
we are currently not able the get the logs with the correct tags from the plugin to send them to zabbix.
logstash-forwarder confing and logstash server conf see link
https://db.tt/4cn8DWi2
if anyone has an idea, how we can get rid of this problem, we would be very thankful.
It looks like you are messing up top level sections in your config files, check the logstash config language.
Each file should be something like this;
# section input
input {
# multiple plugin defintions *within* the input section
file {}
file {}
}
# section filter
filter {
grok{}
grok{}
grok{}
}
# section output
output {
elasticsearch{}
stoud{}
}
Your config looks like this:
input {
file {}
}
input {
file {}
}
output {
elasticsearch{}
}
output {
stoud{}
}

Resources