logstash monitor spark-on-yarn application logs - apache-spark

I installed a logstash on my spark node.
The problem is it can't collect all logs from the dynamic created containers' folders. It does collect part of the logs, which is from only one of the container. Here is the input part of my logstash config file:
input {
file {
path => "/home/userXXX/hadoop-2.6.0/logs/userlogs/application_*/container_*/*"
}
So the question is, does the path with * cover all the container folders, including those will be created at runtime?

Related

How to identify filebeat source in logstash logs

I have multiple filebeat services running on different applications, which send the logs to a central logstash server for parsing.
Sometimes a few application logs are not in correct format so there is a 'parse error' in the 'logstash-plain.log' file. The problem I am having is that I am not able to identify from logstash-plain.log file where the logs are coming from (since there are a huge number of applications with filebeat running)
Is there a way to trace the filebeat source from logstash logs?
You can use processors from filebeat to add tags
processors:
- add_tags:
tags: [my_app_1]
target: "application_tags"
and then use different filter plugins configuration in logstash to parse the logs properly.
filter {
if "my_app_1" in [application_tags] {
grok {
....
}
}
}

IoTEdge sometimes re-creates the container

We're running IoT edge modules. Inside our module, we update bunch of files. We noticed that most of the time, if the host is restarted, the container is restarted and the files we updated still exist.
Very few times, however, we noticed that when the host restarted that the container is re-created from the original image thus all data changes were lost.
Our understanding is that iot edge is using docker restart policy = always which should always keep the data of the container.
I would have next suggestions:
do not store important data on the container writable layer => do not rely on the restart policy
the reason of rebuilding the container could be a new version of your module image which was deployed, so the container was recreated using new image
setup your module deployment manifest (example) properly by using the module container createOptions and attach a local volume to the container (createOptions->HostConfig->Binds), and store your data there. This will survive any recreations of your module container . See example. something like:
"createOptions": {
"HostConfig": {
"Binds": [
"/app/db:/app/db"
]
}
}

How to setup ELK with node.js

I want to log error from my node.js server to another server. I'm thinking of using elasticsearch, logstash and kibana. I want to know how to setup ELK with my node server.
I had exactly this use case in my older organization. A basic tutorial to startup with Beats + ELK - https://www.elastic.co/guide/en/beats/libbeat/current/getting-started.html
So basically this is how it works - Your nodejs app will log in the files (you can use bunyan for this) in different formats like error/warning/info etc. Filebeat will tail these log files and send messages to logstash server. Logstash input.conf will have some input filters (in your case it will be error filters). When any log message passes these filters then logstash will forward it to some endpoint as decided in output.conf file.
Here is what we did -
Initial architecture - Install filebeat (earlier we used logstash forwarder) client to tail the logs on nodejs server and forward it to logstash machine. Logstash will do some processing on input logs and send them to ES cluster (can be on same machine as Logstash). Kibana is just a visualization on this ES.
Final Architecture - Initial setup was cool for small traffic but we realized that logstash can be single point of failure and may result in log loss when traffic increased. So we integrated Kafka along with Logstash so that system scales smoothly. Here is an article - https://www.elastic.co/blog/logstash-kafka-intro
Hope this helps!
It is possible to use logstash without agents running to collect logs from the application.
Logstash has input plugins (https://www.elastic.co/guide/en/logstash/current/input-plugins.html). This can be configured in the pipeline. One basic setup is to configure the TCP (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-tcp.html) or UDP (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-udp.html)input plugin. Logstash can listen on the port configured in the plugin.
Then the application can send the log directly to the logstash server. The pipeline can then transform and forward to ES.
By configuring Logstash pipeline to be durable, data loss can be avoided. This approach is better when the application servers are ephemeral ( as in containers).
For nodejs, https://www.npmjs.com/package/winston-logstash is a package which is quite active. This gist https://gist.github.com/jgoodall/6323951 provides a good example for the overall approach in other languages.
This is the sample (minimal) TCP input plugin configuration
input {
tcp {
'port' => '9563'
}
}
You can install Logstash in the NodeJS Server, and then create a configuration file that accepts input (location of the log file(s)) and output to your Elastic Server host.
Below is the sample configuration file (custom.conf) which has to created in your logstash directory.
input {
file {
path => "/path to log"
start_position => beginning
}
}
output {
stdout { }
elasticsearch{
type => "stdin-type"
embedded => false
host => "192.168.0.23"
port => "9300"
cluster => "logstash-cluster"
node_name => "logstash"
}
}
Execute the logstash
logstash -f custom.conf
Reference: https://www.elastic.co/guide/en/logstash/current/config-examples.html
If you are planning to customize a NodeJS application for sending error logs then you can install some ELKStack Nodjs wrapper and post error log within your application. ELKStack Nodjs wrapper - https://www.npmjs.com/package/elksdk

Read log file from a remote machine with file input plugin using logstash

Presently I have my logs and logstash running on the same machine, so I read my logs placed on my local machine with this config(using pull model)
input {
file {
path => "/home/Desktop/Logstash-Input/**/*_log"
start_position => "beginning"
}
}
Now, we have logstash running on a different machine and want to read the logs remote mechine.
Is there a way to set the ip in file input of config file?
EDIT:
I manage to do this with logstash-forwarder which is a push model(log shipper/logstash-forwarder will ship log to logstash index server) but still i am looking for a pull model without shipper, where logstash index server will go and contact directly to remote host.
Take a look to FileBeat: https://www.elastic.co/products/beats/filebeat
It´s not a pull model but it seems a better choice than logstash-forwarder.
It monitors log files and forwards them to Logstash or Elasticsearh. It keeps also the state of log files and guarantees that events will be delivered at least one time (depends on log rotation speed). It's really easy to configure:
Input configuration:
input_type: log
paths:
- /opt/app/logs
Output configuration
output.logstash:
hosts: ["remote_host:5044"]
index: filebeat_logs
In the logstash side you must install and configure the Beats input plugin:
input {
beats {
port => 5044
}
}
Logstash doesn't contain any magic to read files from other computer's file systems (and that's probably a good thing). You'll either have to mount the remote file system that contains the logs you're interested in or you have to install a log shipper (like e.g. Logstash) on the remote machine and configure it to send the data to your current Logstash instance (or an intermediate broker like Redis, RabbitMQ, or Kafka).
You could also use the syslog daemon (that's probably already installed on the machine) to ship logs via the syslog protocol, but keep in mind that there's no guarantee of the maximum allowed length of each message.
You can add the remote system IP in the path and access the logs from Remote machine.
input {
file {
path => "\\IP address/home/Desktop/Logstash-Input/**/*_log"
start_position => "beginning"
}}

logstash zabbix metric without tags in log

We have the following setup.
1 central logstash server (behind that we have an elasticsearch
cluster based on two nodes)
1 central zabbix server
10 Servers with logstash-forwarder
On our logstash server we are getting syslogs apache/nginx access and error logs from 10 mentioned servers trough logstash-forwarder.
Since we want to see the amount of error logs per server per minute in a nice graph in our zabbix system we are using the metrics plugin (http://logstash.net/docs/1.4.2/contrib-plugins)
Here is the PROBLEM:
we are currently not able the get the logs with the correct tags from the plugin to send them to zabbix.
logstash-forwarder confing and logstash server conf see link
https://db.tt/4cn8DWi2
if anyone has an idea, how we can get rid of this problem, we would be very thankful.
It looks like you are messing up top level sections in your config files, check the logstash config language.
Each file should be something like this;
# section input
input {
# multiple plugin defintions *within* the input section
file {}
file {}
}
# section filter
filter {
grok{}
grok{}
grok{}
}
# section output
output {
elasticsearch{}
stoud{}
}
Your config looks like this:
input {
file {}
}
input {
file {}
}
output {
elasticsearch{}
}
output {
stoud{}
}

Resources