Failing to index csv file based data in opendistro elasticsearch - logstash

I am trying to index sample csv based data into opendistro elasticsearch but failing to create the index. Could you please let me what i am missing here.
csv file to index
[admin#fedser32 logstashoss-docker]$ cat /tmp/student.csv
"aaa","bbb",27,"Day Street"
"xxx","yyy",33,"Web Street"
"sss","mmm",29,"Adam Street"
logstash.conf
[admin#fedser32 logstashoss-docker]$ cat logstash.conf
input {
file {
path => "/tmp/student.csv"
start_position => "beginning"
}
}
filter {
csv {
columns => ["firstname", "lastname", "age", "address"]
}
}
output {
elasticsearch {
hosts => ["https://fedser32.stack.com:9200"]
index => "sampledata"
ssl => true
ssl_certificate_verification => false
user => "admin"
password => "admin#1234"
}
}
My Opendistro cluster is listening on 9200 as shown below.
[admin#fedser32 logstashoss-docker]$ curl -X GET -u admin:admin#1234 -k https://fedser32.stack.com:9200
{
"name" : "odfe-node1",
"cluster_name" : "odfe-cluster",
"cluster_uuid" : "5GOEtg12S6qM5eaBkmzUXg",
"version" : {
"number" : "7.10.0",
"build_flavor" : "oss",
"build_type" : "tar",
"build_hash" : "51e9d6f22758d0374a0f3f5c6e8f3a7997850f96",
"build_date" : "2020-11-09T21:30:33.964949Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
As per the logs it does indicate it is able to find the csv file as shown below.
logstash_1 | [2022-03-03T12:11:44,716][INFO ][logstash.outputs.elasticsearch][main] Index Lifecycle Management is set to 'auto', but will be disabled - Index Lifecycle management is not installed on your Elasticsearch cluster
logstash_1 | [2022-03-03T12:11:44,716][INFO ][logstash.outputs.elasticsearch][main] Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"#timestamp"=>{"type"=>"date"}, "#version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}
logstash_1 | [2022-03-03T12:11:44,725][INFO ][logstash.javapipeline ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>1, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>125, "pipeline.sources"=>["/usr/share/logstash/pipeline/logstash.conf"], :thread=>"#<Thread:0x5c537d14 run>"}
logstash_1 | [2022-03-03T12:11:45,439][INFO ][logstash.javapipeline ][main] Pipeline Java execution initialization time {"seconds"=>0.71}
logstash_1 | [2022-03-03T12:11:45,676][INFO ][logstash.inputs.file ][main] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/usr/share/logstash/data/plugins/inputs/file/.sincedb_20d37e3ca625c7debb90eb1c70f994d6", :path=>["/tmp/student.csv"]}
logstash_1 | [2022-03-03T12:11:45,697][INFO ][logstash.javapipeline ][main] Pipeline started {"pipeline.id"=>"main"}
logstash_1 | [2022-03-03T12:11:45,738][INFO ][filewatch.observingtail ][main][2f140d63e9cab8ddc711daddee17a77865645a8de3d2be55737aa0da8790511c] START, creating Discoverer, Watch with file and sincedb collections
logstash_1 | [2022-03-03T12:11:45,761][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
logstash_1 | [2022-03-03T12:11:45,921][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

Could you check the access right for /tmp/student.csv file? it must be readable by user logstash.
check with this command:
#ls -l /tmp
Other way, if you have already indexed the file path, you have to clean up the sincedb

The thing that i was missing is i had to volume mount my CSV file into the logstash container as shown below after which i was able to index my csv data.
[admin#fedser opensearch-logstash-docker]$ cat docker-compose.yml
version: '2.1'
services:
logstash:
image: opensearchproject/logstash-oss-with-opensearch-output-plugin:7.16.2
ports:
- "5044:5044"
volumes:
- $PWD/logstash.conf:/usr/share/logstash/pipeline/logstash.conf
- $PWD/student.csv:/tmp/student.csv

Related

I'm having issues sending logs from a file to a destination using logstash

I'm very new to logstash configuration. Below is my logstash pipeline configuration that I'm using to send a log file to Coralogix without success.
input {
file {
path => "/etc/logstash/conf.d/CSE-Candidate-Exercise-LOG.log"
type => "log" # a type to identify those logs (will need this later)
start_position => "beginning"
sincedb_path => "NUL"
}
}
output {
http {
url => "https://api.coralogix.com/logs/rest/singles "
http_method => "post"
headers => ["private_key", "xxxxxxxx-cccc-7777-ffff-000000000000"]
format => "json_batch"
codec => "line"
mapping => {
"applicationName" => "%{[#metadata][application]}"
"subsystemName" => "%{[#metadata][subsystem]}"
"computerName" => "%{host}"
"text" => "%{[#metadata][event]}"
}
http_compression => true
automatic_retries => 5
retry_non_idempotent => true
connect_timeout => 30
keepalive => false
}
}
The input should be ok but I think my issue is with the output. I'm not sure what "codec" plugin to use, do I need the "format" field?
Each time I run logstash to send the log in the file specified in the path, all I get is the below log snippet:
[INFO ] 2022-08-02 13:25:02.659 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600, :ssl_enabled=>false}
[INFO ] 2022-08-02 13:25:03.588 [Converge PipelineAction::Create<main>] Reflections - Reflections took 135 ms to scan 1 urls, producing 124 keys and 408 values
[INFO ] 2022-08-02 13:25:04.158 [Converge PipelineAction::Create<main>] javapipeline - Pipeline `main` is configured with `pipeline.ecs_compatibility: v8` setting. All plugins in this pipeline will default to `ecs_compatibility => v8` unless explicitly configured otherwise.
[INFO ] 2022-08-02 13:25:04.278 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>500, "pipeline.sources"=>["/etc/logstash/conf.d/logstash-config.conf"], :thread=>"#<Thread:0x5a5e32c1 run>"}
[INFO ] 2022-08-02 13:25:05.154 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>0.87}
[INFO ] 2022-08-02 13:25:05.235 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
[INFO ] 2022-08-02 13:25:05.316 [[main]<file] observingtail - START, creating Discoverer, Watch with file and sincedb collections
[INFO ] 2022-08-02 13:25:05.344 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
I see no log in Coralogix platform. I simply want to send the raw log in the file without any filtering or manipulations to the destination. Any help would be appreciated.

Tags in logstash

How to direct postfix logs to index postfix ?
In logstash config
input {
beats {
port => 5044
}
}
filter {
grok {
}
}
output {
if "postfix" in [tags]{
elasticsearch {
hosts => "localhost:9200"
index => "postfix-%{+YYYY.MM.dd}"
}
}
}
In filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/maillog*
exclude_files: [".gz$"]
tags: ["postfix"]
output.logstash:
hosts: ["10.50.11.8:5044"]
In the log logstash a lot
[WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"newrelicdata", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x4ceb504a>], :response=>{"index"=>{"_index"=>"newrelicdata", "_type"=>"_doc", "_id"=>"V7x2z20Bp3jq-MOqpNbt", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [host] of type [text] in document with id 'V7x2z20Bp3jq-MOqpNbt'. Preview of field's value: '{name=mail.domain.com}'", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:521"}}}}}
Why date from mail.domain.com try to get not in index postfix ? And the data is trying to get into all the indexes ? Any help
I think the logs contain field name "host", so it's throwing an error as "failed to parse field [host] of type [text]"
mutate {
rename => ["host", "server"]
convert => {"server" => "string"}
}
As I tried sending the logs with the below filebeat.yml configurations,
filebeat.inputs:
- type: log
enabled: true
paths:
- /home/varsha/ELK7.4/logs/*.log
tags: ["postfix"]
output.logstash:
hosts: ["localhost:5044"]
By the above configurations and the files you have shared with me are working fine and got the expected result, please check it in https://pastebin.com/f6e0E52S

Logstash: Nothing displayed on console (Mac)

I am trying to set up a very simple logstash config
input {
file {
path => "/path/to/my/log/file"
start_position => "beginning"
ignore_older => 0
}
}
filter {
}
output {
stdout {
codec => rubydebug
}
}
and here is how i start logstash
[logstash-7.1.1]$ bin/logstash -r -f log.conf
but here is all i see on the console
Sending Logstash logs to path/to/logstash-7.1.1/logs which is now configured via log4j2.properties
[2019-05-28T13:22:57,294][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2019-05-28T13:22:57,313][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"7.1.1"}
[2019-05-28T13:23:02,904][INFO ][logstash.javapipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, :thread=>"#<Thread:0x7ad3cf30 run>"}
[2019-05-28T13:23:03,254][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"path/to/logstash-7.1.1/data/plugins/inputs/file/.sincedb_8164b23a475b43f1b0c9aba125f7f5cf", :path=>["/path/to/my/log/file"]}
[2019-05-28T13:23:03,284][INFO ][logstash.javapipeline ] Pipeline started {"pipeline.id"=>"main"}
[2019-05-28T13:23:03,355][INFO ][filewatch.observingtail ] START, creating Discoverer, Watch with file and sincedb collections
[2019-05-28T13:23:03,360][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-05-28T13:23:03,703][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
i can see that
No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"path/to/logstash-7.1.1/data/plugins/inputs/file/.sincedb_8164b23a475b43f1b0c9aba125f7f5cf", :path=>["/path/to/my/log/file"]}
so the path seems correct. Also, my log file is not empty.
What am i doing wrong? Why cant I see the content of my log file on the console?
input {
file {
path => "/salaries.csv"
start_position => "beginning"
type => "data"
}
}
filter {
csv{
separator => ","
}
}
output {
stdout {
codec => rubydebug
}
}
This link may helpful to you

Parsing json using logstash (ELK stack)

I have created a simple json like below
[
{
"Name": "vishnu",
"ID": 1
},
{
"Name": "vishnu",
"ID": 1
}
]
I am holding this values in file named simple.txt . Then i used file beat to listen the file and send the new updates to port 5043,on other side i started the log-stash service which listen to this port in order to parse and pass the json to elastic search.
log-stash is not processing the json values,it hangs in the middle.
logstash
input {
beats {
port => 5043
host => "0.0.0.0"
client_inactivity_timeout => 3600
}
}
filter {
json {
source => "message"
}
}
output {
stdout { codec => rubydebug }
}
filebeat config:
filebeat.prospectors:
- input_type: log
paths:
- filepath
output.logstash:
hosts: ["localhost:5043"]
Logstash output
**
Sending Logstash's logs to D:/elasticdb/logstash-5.6.3/logstash-5.6.3/logs which is now configured via log4j2.properties
[2017-10-31T19:01:17,574][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"D:/elasticdb/logstash-5.6.3/logstash-5.6.3/modules/fb_apache/configuration"}
[2017-10-31T19:01:17,578][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"D:/elasticdb/logstash-5.6.3/logstash-5.6.3/modules/netflow/configuration"}
[2017-10-31T19:01:18,301][INFO ][logstash.pipeline ] Starting pipeline {"id"=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250}
[2017-10-31T19:01:18,388][INFO ][logstash.inputs.beats ] Beats inputs: Starting input listener {:address=>"0.0.0.0:5043"}
[2017-10-31T19:01:18,573][INFO ][logstash.pipeline ] Pipeline main started
[2017-10-31T19:01:18,591][INFO ][org.logstash.beats.Server] Starting server on port: 5043
[2017-10-31T19:01:18,697][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
**
Every time when i am running log-stash using command
logstash -f logstash.conf
And since there is no processing of json i am stopping that service by pressing ctrl + c .
Please help me in finding the solution.Thanks in advance.
finally i got ended up with config like this.It works for me.
input
{
file
{
codec => multiline
{
pattern => '^\{'
negate => true
what => previous
}
path => "D:\elasticdb\logstash-tutorial.log\Test.txt"
start_position => "beginning"
sincedb_path => "D:\elasticdb\logstash-tutorial.log\null"
exclude => "*.gz"
}
}
filter {
json {
source => "message"
remove_field => ["path","#timestamp","#version","host","message"]
}
}
output {
elasticsearch { hosts => ["localhost"]
index => "logs"
"document_type" => "json_from_logstash_attempt3"
}
stdout{}
}
Json format:
{"name":"sachin","ID":"1","TS":1351146569}
{"name":"sachin","ID":"1","TS":1351146569}
{"name":"sachin","ID":"1","TS":1351146569}

Logstash stopping {:plugin=>"LogStash::Inputs::Http"}

I'm trying to run Logstash in an EC2 Ubuntu instance,
but when I run:
logstash-5.2.0/bin/logstash -f logstash.conf --debug
I get:
Starting puma
Trying to start WebServer {:port=>9600}
start
Trying to start WebServer {:port=>9601}
[api-service] start
Successfully started Logstash API endpoint {:port=>9601}
PeriodicPoller: Stopping
stopping pipeline {:id=>"main"}
Closing inputs
stopping {:plugin=>"LogStash::Inputs::Http"}
Closed inputs
This is logstash.conf
input
{
http
{
host => "127.0.0.1"
port => 31311
}
}
output
{
elasticsearch
{
hosts => ["localhost:9200"]
}
stdout
{
codec => rubydebug
}
}
When I run
curl 'http://localhost:9200/?pretty'
I get:
{
"name" : "QrRfI_U",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "LLdvAaAsQSCULTfl_b4xIA",
"version" : {
"number" : "5.2.0",
"build_hash" : "24e05b9",
"build_date" : "2017-01-24T19:52:35.800Z",
"build_snapshot" : false,
"lucene_version" : "6.4.0"
},
"tagline" : "You Know, for Search"
}
So elasticsearch is running fine.
What if you have your hosts as:
hosts => "localhost"
And make sure that the http port which you've mentioned above is not bound to any other process.
If that's not the case just to make sure, run plugin list and check whether http-input plugin does exist.

Resources