Deploy Logstash in production environment

Deploy Logstash in production environment - logstash

Hi all i have created a logstash config file scheduled every 5 minutes which transport data from MSSql sever to Elasticsearch and i run my logstash application using the windows powershell with the following command .\logstash-7.2.0\bin\logstash -f logstash.conf.txt
Logstash Config
input {
jdbc {
jdbc_driver_library => ""
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://xxxxxx\SQLEXPRESS:1433;databaseName=xxxx;"
jdbc_user => "xxxxx"
jdbc_password => "xxxx"
jdbc_paging_enabled => true
tracking_column => modified_date
use_column_value => true
clean_run => true
tracking_column_type => "timestamp"
schedule => "*/5 * * * * *"
statement => "SELECT * from [xxxxxxxx] where modified_date >:sql_last_value"
}
}
filter {
mutate {
remove_field => ["#version","#timestamp"]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "employee"
document_type => "_doc"
document_id => "%{id}"
}
stdout { codec => rubydebug }
}
How to deploy the same thing in production environment? because in local machine i am using windows powershell to execute my commands how to achieve this in production environment?
Could anyone please guide how to deploy this as a service in production env?

Not sure I understand the question... Are you trying to deploy the same configuration in Linux server in production?
If so, you should change jdbc_driver_class, and possibly also the jdbc_connection_string and hosts parameters, to match the production server.
Check out also the following question:
Set vm.max_map_count on cluster nodes
It may be of help to you, though as I said I'm not sure.
Good luck! :-)

Related

Process never end when I run a command inside a container using Python

with a python script Im running logstash via command inside a docker container, the normal behavior (with logstash installed in the server) is that after the pipeline get the data that pipeline shuts down, but the process never ends.
logstash=subprocess.call(["docker","exec", "-it", "logstash-docker_logstash_1", "/usr/share/logstash/bin/logstash","-f", "/usr/share/logstash/pipeline/site-canvas.conf","--path.data","/usr/share/logstash/config/min-data/"])
Im using docker top to see the running processes inside the container
what can I do to ensure that the process end when finish getting the data?
This is my pipeline
input {
jdbc {
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver:/db-ip:1433;databasename=omi"
jdbc_user => "my-user"
jdbc_password => "my-pass"
statement => "SELECT
TIME_CREATED,DESCRIPTION as problem, SEVERITY as severity_mame, NODEHINTS_DNSNAME as source,CATEGORY
FROM [omi1062event].[dbo].[ALL_EVENTS]
WHERE STATE = 'OPEN'
AND NODEHINTS_DNSNAME LIKE 'mju%'
AND TIME_CREATED >= DATEADD(day, -1, GETDATE())
ORDER BY TIME_CREATED ASC
"
jdbc_default_timezone => "UTC"
}
}
filter {
date {
match => [ "time_created", "ISO8601", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'","yyyy-MM-dd HH:mm:ss", "yyyy-MM-dd HH:mm:ss.SSSSSS" ]
timezone => "Chile/Continental"
}
}
output {
elasticsearch {
hosts => "my-ip:9200"
index => "canvas"
user => "my-user"
password => "my-pass"
}
}

jdbc_static: Getting a sql syntax error I did'nt write

With logstash I am trying to Extract some tables, Transform them locally on the logstash mashine, and then Load the result to ElasticSearch. The reason for this solution is due to very limited computing power on the source server, a MariaDB.
I have tested the input{} separately, it works, so the connection to the mariadb is sound.
I have tested the jdbc_static filter against a microsoftSQL server. So logstash has writing privileges in is current environment.
I have tested the SQL syntax on the MariaDB server directly
I'm running logstash 6.8 and java 8 (java version "1.8.0_211")
I have tried earlier versions of mariadb jdbc connection
(mariadb-java-client-2.4.2.jar, mariadb-java-client-2.2.6-sources,
mariadb-java-client-2.3.0-sources)
My config file
input {
jdbc {
jdbc_driver_library => "C:/Logstash/logstash-6.8.0/plugin/mariadb-java-client-2.4.2.jar"
jdbc_driver_class => "Java::org.mariadb.jdbc.Driver"
jdbc_connection_string => "jdbc:mariadb://xx.xx.xx
jdbc_user => "me"
jdbc_password => "its secret"
schedule => "* * * * *"
statement => "SELECT unqualifiedversionid__ FROM AuditEventFHIR WHERE myUnqualifiedId = '0000134b-fc7f-4c3a-b681-8150068d6dbb'"
}
}
filter {
jdbc_static {
loaders => [
{
id => "auditevent"
query => "SELECT
myUnqualifiedId
,unqualifiedversionid__
,type_
FROM AuditEventFHIR
where myUnqualifiedId = '0000134b-fc7f-4c3a-b681-8150068d6dbb'
"
local_table => "l_ae"
}
]
local_db_objects => [
{
name => "l_ae"
index_columns => ["myUnqualifiedId"]
columns => [
["myUnqualifiedId", "varchar(256)"],
["unqualifiedversionid__", "varchar(24)"],
["type_", "varchar(256)"]
]
}
]
local_lookups => [
{
id => "rawlogfile"
query => "
select myUnqualifiedId from l_ae
"
target => "sql_output"
}
]
jdbc_driver_library => "C:/Logstash/logstash-6.8.0/plugin/mariadb-java-client-2.4.2.jar"
jdbc_driver_class => "Java::org.mariadb.jdbc.Driver"
jdbc_connection_string => "jdbc:mariadb://xx.xx.xx.xx"
jdbc_user => "me"
jdbc_password => "its secret"
}
}
output {
stdout { codec => rubydebug }
}
I am getting this and several other errors, but I suspect fixing the first will fix the rest. But key is that no were in my code are the words "LIMIT 1"
[ERROR][logstash.filters.jdbc.readonlydatabase] Exception occurred when executing loader Jdbc query count {:exception=>"Java::JavaSql::SQLSyntaxErrorException: (conn=1490) You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '\"T1\" LIMIT 1' at line 8", :backtrace=>["org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(org/mariadb/jdbc/internal/util/exceptions/ExceptionMapper.java:242)", "org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(org/mariadb/jdbc/internal/util/exceptions/ExceptionMapper.java:171)", "org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(org/mariadb/jdbc/MariaDbStatement.java:248)", "org.mariadb.jdbc.MariaDbStatement.executeInternal(org/mariadb/jdbc/MariaDbStatement.java:338)", "org.mariadb.jdbc.MariaDbStatement.executeQuery(org/mariadb/jdbc/MariaDbStatement.java:512)", "java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498)", "org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:425)", "org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:292)"]}

The jdbc_static loader, makes a hidden SQL query select count(*) from table limit 1 to get a checksum when downloading the table. This query contains " and mariaDB don't like that.
UNLESS you add 'ANSI_QUOTES' to the sql_mode
Batch command
SET GLOBAL sql_mode = 'ANSI_QUOTES'
Another option is to set the session to allow ansi_quotes
jdbc_connection_string => "jdbc:mariadb://xx.xx.xx/databasename?sessionVariables=sql_mode=ANSI_QUOTES"

Logs are ignoring input section in config files

I have a simple setup for capturing logs though HTTP and TCP.
I've created 2 conf files at /etc/logstash/conf.d/ (see below) but logs sent though HTTP are also being passed through the TCP pipeline and vise versa. For example when I send a log through TCP it ends up both in http-logger-* index and in tcp-logger-*.. it makes no sense to me :(
http_logger.conf
input {
http {
port => 9884
}
}
filter {
grok {
match => ["[headers][request_path]", "\/(?<component>[\w-]*)(?:\/)?(?<env>[\w-]*)(?:\/)?"]
}
}
output {
amazon_es {
hosts => ['XXXXX']
region => 'us-west-2'
aws_access_key_id => 'XXXXX'
aws_secret_access_key => 'XXXXX'
index => 'http-logger-%{+YYYY.MM.dd}'
}
stdout { codec => rubydebug }
}
tcp_logger.conf
input {
tcp {
port => 9885
codec => json
}
}
filter {
}
output {
amazon_es {
hosts => ['XXXXX']
region => 'us-west-2'
aws_access_key_id => 'XXXXX'
aws_secret_access_key => 'XXXXX'
index => 'tcp-logger-%{+YYYY.MM.dd}'
}
stdout { codec => rubydebug }
}
Any ideas on what am I missing?
Thank you

The Input, filter and Output configuration even when split across a different file the logstash while processing it will process it as a single big configuration as if all the input, filter and output is specified in a single file.
So said that the event coming into logstash will pass through all the output and filter plugin configured, in your case, each event picked up by the TCP and HTTP input plugin will pass through filter plugin and output plugin configured in both http_logger.conf and tcp_logger.conf, that's the reason you are seeing events stashed in both http-logger-* and tcp-logger-* index
So in order to fix this, we can specify a unique type field for events picked by both tcp and http input plugins and then apply the filter and output plugin selectively using the type set in the input plugin as shown below
http_logger.conf
input {
http {
port => 9884
type => "http_log"
}
}
filter {
if [type] == "http_log"
{
grok {
match => ["[headers][request_path]", "\/(?<component>[\w-]*)(?:\/)?(?<env>[\w-]*)(?:\/)?"]
}
}
}
output {
if ([type] == "http_log")
{
amazon_es {
hosts => ['XXXXX']
region => 'us-west-2'
aws_access_key_id => 'XXXXX'
aws_secret_access_key => 'XXXXX'
index => 'http-logger-%{+YYYY.MM.dd}'
}
}
stdout { codec => rubydebug }
}
tcp_logger.conf
input {
tcp {
port => 9885
codec => json
type => "tcp_log"
}
}
output {
if ([type] == "tcp_log")
{
amazon_es {
hosts => ['XXXXX']
region => 'us-west-2'
aws_access_key_id => 'XXXXX'
aws_secret_access_key => 'XXXXX'
index => 'tcp-logger-%{+YYYY.MM.dd}'
}
}
stdout { codec => rubydebug }
}

The explanation provided by #Ram is spot on however there is a cleaner way of solving the issue: enter pipelines.yml.
By default it looks like this:
- pipeline.id: main
path.config: "/etc/logstash/conf.d/*.conf"
basically it loads and combines all *.conf files - in my case I had two.
To solve the issue just separate the pipelines like so:
- pipeline.id: httplogger
path.config: "/etc/logstash/conf.d/http_logger.conf"
- pipeline.id: tcplogger
path.config: "/etc/logstash/conf.d/tcp_logger.conf"
The pipelines are now running separately :)
P.S. Don't forget to reload logstash after any changes here

Data missed in Logstash?

Data missed a lot in logstash version 5.0,
is it a serous bug ,when a config the config file so many times ,it useless,data lost happen again and agin, how to use logstash to collect log event property ?
any reply will thankness

Logstash is all about reading logs from specific location and based on you interested information you can create index in elastic search or other output also possible.
Example of logstash conf
input {
file {
# PLEASE SET APPROPRIATE PATH WHERE LOG FILE AVAILABLE
#type => "java"
type => "json-log"
path => "d:/vox/logs/logs/vox.json"
start_position => "beginning"
codec => json
}
}
filter {
if [type] == "json-log" {
grok {
match => { "message" => "UserName:%{JAVALOGMESSAGE:UserName} -DL_JobID:%{JAVALOGMESSAGE:DL_JobID} -DL_EntityID:%{JAVALOGMESSAGE:DL_EntityID} -BatchesPerJob:%{JAVALOGMESSAGE:BatchesPerJob} -RecordsInInputFile:%{JAVALOGMESSAGE:RecordsInInputFile} -TimeTakenToProcess:%{JAVALOGMESSAGE:TimeTakenToProcess} -DocsUpdatedInSOLR:%{JAVALOGMESSAGE:DocsUpdatedInSOLR} -Failed:%{JAVALOGMESSAGE:Failed} -RecordsSavedInDSE:%{JAVALOGMESSAGE:RecordsSavedInDSE} -FileLoadStartTime:%{JAVALOGMESSAGE:FileLoadStartTime} -FileLoadEndTime:%{JAVALOGMESSAGE:FileLoadEndTime}" }
add_field => ["STATS_TYPE", "FILE_LOADED"]
}
}
}
filter {
mutate {
# here converting data type
convert => { "FileLoadStartTime" => "integer" }
convert => { "RecordsInInputFile" => "integer" }
}
}
output {
elasticsearch {
# PLEASE CONFIGURE ES IP AND PORT WHERE LOG DOCs HAS TO PUSH
document_type => "json-log"
hosts => ["localhost:9200"]
# action => "index"
# host => "localhost"
index => "locallogstashdx_new"
# workers => 1
}
stdout { codec => rubydebug }
#stdout { debug => true }
}
To know more you can go throw many available websites like
https://www.elastic.co/guide/en/logstash/current/first-event.html

Logstash override host with filebeat name

I have setup the FileBeat -> Logstash -> ElasticSearch -> Kibana set-up successfully. Now in logstash I want to override the host with the beat.name. However, When I try to refer to the beat metadata, the variable is not resolved.
mutate {
add_field => {
"timestamp" => "%{year}-%{month}-%{day} %{time}"
}
replace_field => {
"host" => "%{[#metadata][beat][name]}"
}
}
I think I am missing some major configuration. Even when Logstash forwards it to elasticsearch, these symbol resolution are not done.
output {
elasticsearch {
hosts => "localhost:9200"
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
How do we refer to filebeat meta information in logstash config file correctly?

The beat.name field is not carried in the #metadata object. beat is a top-level field in the event. So to refer to the value use [beat][name] or in string use "%{[beat][name]}".

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Deploy Logstash in production environment - logstash

Related

Process never end when I run a command inside a container using Python

jdbc_static: Getting a sql syntax error I did'nt write

Logs are ignoring input section in config files

Data missed in Logstash?

Logstash override host with filebeat name

Categories

Resources