Logstash does not close file descriptors - linux

I am using Logstash 2.2.4.
My current configuration
input {
file {
path => "/data/events/*/*.txt"
start_position => "beginning"
codec => "json"
ignore_older => 0
max_open_files => 46000
}
}
filter {
if [type] not in ["A", "B", "C"] {
drop {}
}
}
output {
HTTP {
http_method => "post"
workers => 3
url => "http://xxx.amazonaws.com/event"
}
}
In an input folder, I have about 25000 static (never updatable) txt files.
I configured --pipeline-workers to 16. In described configuration LS process running 1255 threads and opens about 2,560,685 file descriptors.
After some investigation, I found that LS keeping open files descriptor for all the files in the input folder and HTTP output traffic became very slow.
My question is why LS does not close file descriptor of already processed (transferred) files or implementing kind of input files pagination?
Maybe someone meets the same problem? If yes, please share your solution.
Thanks.

Related

logstash Input painfully slow while fetching messages from activemq topic

I have configured JMS input in logstash to subscribe to JMS topic messages and push messages to elastic search.
input {
jms {
id => "my_first_jms"
yaml_file => "D:\softwares\logstash-6.4.0\config\jms-amq.yml"
yaml_section => "dev"
use_jms_timestamp => true
pub_sub => true
destination => "mytopic"
# threads => 50
}
}
filter {
json{
source => "message"
}
}
output {
stdout { codec => json }
elasticsearch {
hosts => ['http://localhost:9401']
index => "jmsindex"
}
}
System specs:
RAM: 16 GB
Type: 64 bit
Processor: Intel i5-4570T CPU # 2.9 GHz
This is extremely slow. Like 1 message every 3-4 minutes. How should I debug to figure out what is missing?
Note: Before this, I was doing this with #JMSListener in java and that could process 200-300 records per sec easily.

logstash file input in read mode for gzip file is consuming very high memory

Currently i am processing gzip files in logstash using file input plugin. its consuming very high memory and keeps on restarting even after giving a high heap size. As of now on an avg we are processing 50 files per min and the planning to process 1000's of file per min. With 100 files the RAM requirement touches 10Gb. What is the best way to tune this config or is there a better way to process such a huge volume of data in logstash.
is it advisable to write a processing engine in nodejs or any other languages.
Below is the logstash conf.
input {
file {
id => "client-files"
mode => "read"
path => [ "/usr/share/logstash/plugins/message/*.gz" ]
codec => "json"
file_completed_action => log_and_delete
file_completed_log_path => "/usr/share/logstash/logs/processed.log"
}
}
filter {
ruby {
code => 'monitor_name = event.get("path").split("/").last.split("_").first
event.set("monitorName", monitor_name )
split_field = []
event.get(monitor_name).each do |x|
split_field << Hash[event.get("Datapoints").zip(x)]
end
event.set("split_field",split_field)'
}
split {
field => "split_field"
}
ruby {
code => "event.get('split_field').each {|k,v| event.set(k,v)}"
remove_field => ["split_field","Datapoints","%{monitorName}"]
}
}

Logstash worker dies with no reason

Using logstash 2.3.4-1 on centos 7 with kafka-input plugin I sometimes get
{:timestamp=>"2016-09-07T13:41:46.437000+0000", :message=>#0, :events_consumed=>822, :worker_count=>1, :inflight_count=>0, :worker_states=>[{:status=>"dead", :alive=>false, :index=>0, :inflight_count=>0}], :output_info=>[{:type=>"http", :config=>{"http_method"=>"post", "url"=>"${APP_URL}/", "headers"=>["AUTHORIZATION", "Basic ${CREDS}"], "ALLOW_ENV"=>true}, :is_multi_worker=>false, :events_received=>0, :workers=>"", headers=>{..}, codec=>"UTF-8">, workers=>1, request_timeout=>60, socket_timeout=>10, connect_timeout=>10, follow_redirects=>true, pool_max=>50, pool_max_per_route=>25, keepalive=>true, automatic_retries=>1, retry_non_idempotent=>false, validate_after_inactivity=>200, ssl_certificate_validation=>true, keystore_type=>"JKS", truststore_type=>"JKS", cookies=>true, verify_ssl=>true, format=>"json">]>, :busy_workers=>1}, {:type=>"stdout", :config=>{"ALLOW_ENV"=>true}, :is_multi_worker=>false, :events_received=>0, :workers=>"\n">, workers=>1>]>, :busy_workers=>0}], :thread_info=>[], :stalling_threads_info=>[]}>, :level=>:warn}
this is the config
input {
kafka {
bootstrap_servers => "${KAFKA_ADDRESS}"
topics => ["${LOGSTASH_KAFKA_TOPIC}"]
}
}
filter {
ruby {
code =>
"require 'json'
require 'base64'
def good_event?(event_metadata)
event_metadata['key1']['key2'].start_with?('good')
rescue
true
end
def has_url?(event_data)
event_data['line'] && event_data['line'].any? { |i| i['url'] && !i['url'].blank? }
rescue
false
end
event_payload = JSON.parse(event.to_hash['message'])['payload']
event.cancel unless good_event?(event_payload['event_metadata'])
event.cancel unless has_url?(event_payload['event_data'])
"
}
}
output {
http {
http_method => 'post'
url => '${APP_URL}/'
headers => ["AUTHORIZATION", "Basic ${CREDS}"]
}
stdout { }
}
Which is odd, since it is written to logstash.log and not logstash.err
What does this error mean and how can I avoid it? (only restarting logstash solves it, until the next time it happens)
According to this github issue your ruby code could be causing the issue. Basically any ruby exception will cause the filter worker to die. Without seeing your ruby code, it's impossible to debug further, but you could try wrapping your ruby code in an exception handler and logging the exception somewhere (at least until logstash is updated to log it).

logstash udp listener died docker

I've got a problem with running logstash. My conf looks like this:
input {
udp {
port => 1514
type => docker
}
}
filter {
grok {
match => {
"message" => "<%{NUMBER}>%{DATA}(?:\s+)%{DATA:hostname}(?:\s+)%{DATA:imageName}(?:\s+)%{DATA:containerName}(?:\s*\[%{NUMBER}\]:) (\s+(?<logDate>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}\s+%{HOUR}:%{MINUTE}:%{SECOND}) %{LOGLEVEL:logLevel}(?:\s*);* %{DATA:logAdditionalInfo};*)?%{GREEDYDATA:logMsg}"
}
keep_empty_captures => false
remove_field => ["message"]
}
}
output {
if [type] == "gelf" {
elasticsearch {
index => "phpteam-%{+YYYY.MM.dd}"
}
} else {
elasticsearch { }
}
}
The configuration is correct, but after running it /var/log/logstash/logstash.log shows the following output:
{:timestamp=>"2016-06-22T11:43:03.105000+0200", :message=>"SIGTERM
received. Shutting down the pipeline.", :level=>:warn}
{:timestamp=>"2016-06-22T11:43:03.532000+0200", :message=>"UDP
listener died", :exception=>#,
:backtrace=>["org/jruby/RubyIO.java:3682:in select'",
"/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-udp-2.0.3/lib/logstash/inputs/udp.rb:77:in
udp_listener'",
"/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-udp-2.0.3/lib/logstash/inputs/udp.rb:50:in
run'",
"/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/pipeline.rb:206:in
inputworker'",
"/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/pipeline.rb:199:in
`start_input'"], :level=>:warn}
The only thing I found to workaround this error is to edit those .rb files, but sadly I have no idea how to do it. Could you help me somehow?
Thanks in advance.
I found solution that is not perfect, but works, so maybe it will help somebody.
After installing the whole instance on new server everything works fine.
Everything crashed after upgrade'ing logstash/elasticsearch/kibana, so maybe there's something wrong with configuration files, but I couldn't figure out which ones.

print messages conditionally on resource synchronization

is there a way how to print out a message based on a resource synchronization? Something like:
required content of the file is following and if that is updated(synchronized) print a message e.g. Please restart the system.
I tried following
file { 'disableselinux':
ensure => present,
path => '/etc/selinux/config',
mode => 0644,
source => "puppet:///modules/base/selinux",
}
notify{'SElinuxChange':
loglevel => warning,
message => 'System needs restart',
subscribe => File['disableselinux'],
}
But that message will be printed every time I guess. Is there any elegant way of doing so avoiding if-then-else flags etc.

Resources