Grok pattern for log with pipeline seperator - logstash

I'm trying to figure out the log pattern for the log pattern below.
01/02AVDC190001|00001|4483850000152971|DATAPREP|PREPERATION/ENRICHEMENT |020190201|20:51:52|SCHED
What I've tried so far is :
input {
file {
path => "C:/Elasitcity/Logbase/July10_Logs_SDC/*.*"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
mutate {
gsub => ["message","\|"," "]
}
grok {
match => ["message","%{NUMBER:LOGID} %{NUMBER:LOGPHASE} %{NUMBER:LOGID} %{WORD:LOGEVENT} %{WORD:LOGACTIVITY} %{DATE_US: DATE} %{TIME:LOGTIME}"]
}
}
}
output {
elasticsearch {
hosts => "localhost"
index => "grokcsv"
document_type => "gxs"
}
stdout {}
}
I'm also wondering if its possible to combine the data and time since its seperated by a pipeline character. But that's not the primary question,.

Related

Logstash variable in pipeline config

I am setting up Logstash to ingest Airflow logs. The following config is giving me the output I need:
input {
file {
path => "/my_path/logs/**/*.log"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
if [path] =~ /\/my_path\/logs\/containers\/.*/ or [path] =~ /\/my_path\/logs\/scheduler\/.*/ {
drop{}
}
else {
grok {
"match" => [ "message", "\[%{TIMESTAMP_ISO8601:log_task_execution_datetime}\]%{SPACE}\{%{DATA:log_file_line}\}%{SPACE}%{WORD:log_level}%{SPACE}-%{SPACE}%{GREEDYDATA:log_message}" ]
"remove_field" => [ "message" ]
}
date {
"match" => [ "log_task_execution_datetime", "ISO8601" ]
"target" => "log_task_execution_datetime"
"timezone" => "UTC"
}
dissect {
"mapping" => { "path" => "/my_path/logs/%{dag_id}/%{task_id}/%{dag_execution_datetime}/%{try_number}.%{}" }
"add_field" => { "log_id_template" => "{%{dag_id}}-{%{task_id}}-{%{dag_execution_datetime}}-{%{try_number}}" }
}
}
}
output {
stdout {codec => rubydebug{metadata => true}}
}
But I do not like having to specify the path "/my_path/logs/" multiple times.
In my input section, I tried to use:
add_field => { "[#metadata][base_path]" => "/my_path/logs/" }
and then, in the filter section:
if [path] =~ /[#metadata][base_path].*/ or [path] =~ /[#metadata][base_path].*/ {
drop{}
}
...
dissect {
"mapping" => { "path" => "[#metadata][base_path]%{dag_id}/%{task_id}/%{dag_execution_datetime}/%{try_number}.%{}" }
But it doesn't seem to work for the regex in the filter or in the dissect mapping. I get a similar issue when trying to use an environment variable as described here.
I have the - maybe naïve - notion that I should be able to use one variable for all references to the base path. Is there a way?
Using an environment variable in a conditional is not supported. There has been a github issue requesting it as an enhancement open since 2016. The workaround is to use mutate+add_field to add a field to [#metadata] then test that.
"mapping" => { "path" => "${[#metadata][base_path]}%{dag_id}/%{task_id} ...
should work. The terms in a conditional are not sprintf'd, so you cannot use %{}, but you can do a substring match. If FOO is set to /home/user/dir then
mutate { add_field => { "[#metadata][base_path]" => "${FOO}" } }
mutate { add_field => { "[path]" => "/home/user/dir/file" } }
if [#metadata][base_path] in [path] { mutate { add_field => { "matched" => true } } }
results in the [matched] field getting added. I do not know of a way to anchor the string match, so if FOO were set to /dir/ then that would also match.

Can't create a field with a variable from a grok match regex

I am currently using logstash, elasticsearch and kibana 6.3.0
My log are generated at a unique id path : /tmp/USER_DATA/FactoryContainer/images/(my unique id)/oar/oar_image_job(my unique id).stdout
What I want to do is to match this unique id and to create a field with this id.
I m a bit novice to logstash filter but I don't know why it doesn't want to use my uid and always return me %{uid} in my field or this Failed to execute action error.
my filter :
input {
file {
path => "/tmp/USER_DATA/FactoryContainer/images/*/oar/oar_image_job*.stdout"
start_position => "beginning"
add_field => { "data_source" => "oar-image-job" }
}
}
filter {
grok {
match => ["path","%{UNIXPATH}%{NUMBER:uid}%{UNIXPATH}"]
}
mutate {
add_field => [ "unique_id" => "%{uid}" ]
}
}
output {
if [data_source] == "oar-image-job" {
elasticsearch {
index => "oar-image-job-%{+YYYY.MM.dd}"
hosts => ["localhost:9200"]
}
}
}
the data_source field is to avoid this issue: When you put multiple config files in a directory for Logstash to use, they will all be concatenated
in the grok debugger %{UNIXPATH}%{NUMBER:uid}%{UNIXPATH} my path return me the good value
link to the solution : https://discuss.elastic.co/t/cant-create-a-field-with-a-variable-from-a-grok-match-regex/142613/7?u=thesmartmonkey
the correct filter :
input {
file {
path => "/tmp/USER_DATA/FactoryContainer/images/*/oar/oar_image_job*.stdout"
start_position => "beginning"
add_field => { "data_source" => "oar-image-job" }
}
}
filter {
grok {
match => { "path" => [ "/tmp/USER_DATA/FactoryContainer/images/%{DATA:unique_id}/oar/oar_image_job%{DATA}.stdout" ] }
}
}
output {
if [data_source] == "oar-image-job" {
elasticsearch {
index => "oar-image-job-%{+YYYY.MM.dd}"
hosts => ["localhost:9200"]
}
}
}

Logstash input line by line

How can i read files in logstash line by line using codec?
When i tried the below configuration but it is not working:
file {
path => "C:/DEV/Projects/data/*.csv"
start_position => "beginning"
codec => line {
format => "%{[data]}"
}
Example of configuration with elasticsearch in the output:
input{
file {
path => "C:/DEV/Projects/data/*.csv"
start_position => beginning
}
}
filter {
csv {
columns => [
"COLUMN_1",
"COLUMN_2",
"COLUMN_3",
.
.
"COLUMN_N"
]
separator => ","
}
mutate {
convert => {
"COLUMN_1" => "float"
"COLUMN_4" => "float"
"COLUMN_6" => "float"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "test_index"
}
For filter :
https://www.elastic.co/guide/en/logstash/current/plugins-filters-csv.html

GROK custom pattern filter in logstash

How to create a grok custom pattern filter in logstash?
I want to create a pattern for http response status code
here is my pattern code
STATUS_CODE __ %{NONNEGINT} __
what I reaaly want to do is to have all of my web server hits with user IP and request http headers and payload and also web servers's response.
and here is my logstash.conf
input {
file {
type => "kpi-success"
path => "/var/log/kpi_success.log"
start_position => beginning
}
}
filter {
if [type] == "kpi-success" {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:message} "}
}
multiline {
pattern => "^\["
what => "previous"
negate => true
}
mutate{
add_field => {
"statusCode" => "[STATUS_CODE]"
}
}
}
}
output {
if [type] == "kpi-success" {
elasticsearch {
hosts => "elasticsearch:9200"
index => "kpi-success-%{+YYYY.MM.dd}"
}
}
}
You don't have to use a custom pattern file, you can define a new one directly in the filter.
grok {
match => { "message" => "(?<STATUS_CODE>__ %{NONNEGINT} __)"}
}

Logstash input filename as output elasticsearch index

Is there a way of having the filename of the file being read by logstash as the index name for the output into ElasticSearch?
I am using the following config for logstash.
input{
file{
path => "/logstashInput/*"
}
}
output{
elasticsearch{
index => "FromfileX"
}
}
I would like to be able to put a file e.g. log-from-20.10.2016.log and have it indexed into the index log-from-20.10.2016. Does the logstash input plugin "file" produce any variables for use in the filter or output?
Yes, you can use the path field for that and grok it to extract the filename into the index field
input {
file {
path => "/logstashInput/*"
}
}
filter {
grok {
match => ["path", "(?<index>log-from-\d{2}\.\d{2}\.\d{4})\.log$" ]
}
}
output{
elasticsearch {
index => "%{index}"
}
}
input {
file {
path => "/home/ubuntu/data/gunicorn.log"
start_position => "beginning"
}
}
filter {
grok {
match => {
"message" => "%{USERNAME:u1} %{USERNAME:u2} \[%{HTTPDATE:http_date}\] \"%{DATA:http_verb} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:android_client}\""
remove_field => ["message"]
}
}
date {
match => ["http_date", "dd/MMM/yyyy:HH:mm:ss +ssss"]
}
ruby {
code => "event.set('index_name',event.get('path').split('/')[-1].gsub('.log',''))"
}
}
output {
elasticsearch {
hosts => ["0.0.0.0:9200"]
index => "%{index_name}-%{+yyyy-MM-dd}"
user => "*********************"
password => "*****************"
}
stdout { codec => rubydebug }
}

Resources