Exception. Not retrying {:exception=>#<Java::OrgPostgresqlUtil::PSQLException: ERROR: syntax error at or near "%" - logstash

I want to Add My CSV File to postgreSQL.but whenever i try to connect it Show Error:
[ERROR][logstash.outputs.jdbc
][main][a4f714a30e2d2cae8e83b3c2d215c3537fe40dca0495ca92cc2f50a93ba8088a]
JDBC - Exception. Not retrying
{:exception=>#<Java::OrgPostgresqlUtil::PSQLException: ERROR: syntax
error at or near "%"
it is My Config.conf File :
input {
file {
path => "C:/Users/Desktop/InputData12.csv"
start_position => "beginning"
codec => plain
}
}
filter {
csv {
separator => ","
columns => ["inputdata","metric","source_table","output_column_alias","method"]
}
}
output {
jdbc {
connection_string => "jdbc:postgresql://hostname:5432/database"
username => "username"
password => "password"
driver_jar_path => "C:/Users/Downloads/lib/postgresql-42.5.1.jar"
driver_class => "org.postgresql.Driver"
statement => "INSERT INTO csv_to_postgresql (inputdata,metric,source_table,output_column_alias,method) VALUES (%{inputdata},%{metric},%{source_table},%{output_column_alias},%{method})"
}
}

I would expect that to get a different error (JDBC - Statement has no parameters). There are two types of statement. If you set "unsafe_statement => true" then the output will sprintf the SQL statement. If you do not set that then you should be using
statement => [
"INSERT INTO csv_to_postgresql (inputdata,metric,source_table,output_column_alias,method) VALUES (?, ?, ?, ?, ?)",
"%{inputdata}",
"%{metric}",
"%{source_table}",
"%{output_column_alias}",
"%{method})"
]
in which case the output will sprintf all the parameters to the statement. Setting unsafe_statement can be more expensive than the second approach.

Related

Logstash aggregation return empty message

I have a testing environment to test some logstash plugin before to move to production.
For now, I am using kiwi syslog generator, to generate some syslog for testing.
The field I have are as follow:
#timestamp
message
+ elastic medatadata
Starting from this basic fields, I start filtering my data.
The first thing is to add a new field based on the timestamp and message as follow:
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield", "message_count"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
}
The prune is just to don't process unwanted data.
And this works just fine as I am getting a new field with those 2 values.
The next step was to run some aggregation based on specific content of the message, such as if the message contains logged in or logged out
and to do this, I used the aggregation filter
grok {
match => {
"message" => [
"(?<[#metadata][event_type]>logged out)",
"(?<[#metadata][event_type]>logged in)",
"(?<[#metadata][event_type]>workstation locked)"
]
}
}
aggregate {
task_id => "%{message}"
code => "
map['message_count'] ||= 0; map['message_count'] += 1;
"
push_map_as_event_on_timeout => true
timeout_timestamp_field => "#timestamp"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
}
}
This worked as expected but I am having a problem here. When the aggregation times out. the only field populated for the specific aggregation, is the message_count
As you can see in the above screenshot, the newfield and message(the one on the total left, sorry it didn't fit in the screenshot) are both empty.
For the demostration and testing purpose that's is absolutely fine, but it will because unmanageable if I get hundreds of syslog per second not knowing to with message that message_count refers to.
Please, I am struggling here and I don't know how to solve this issue, can please somebody help me to understand how I can fill the newfield with the content of the message that it refers to?
This is my whole logstash configuration to make it easier.
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield", "message_count"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
grok {
match => {
"message" => [
"(?<[#metadata][event_type]>logged out)",
"(?<[#metadata][event_type]>logged in)",
"(?<[#metadata][event_type]>workstation locked)"
]
}
}
aggregate {
task_id => "%{message}"
code => "
map['message_count'] ||= 0; map['message_count'] += 1;
"
push_map_as_event_on_timeout => true
timeout_timestamp_field => "#timestamp"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
csv {
path => "C:\Users\adminuser\Desktop\syslog\syslogs-%{+yyyy.MM.dd}.csv"
fields => ["timestamp", "message", "message_count", "newfield"]
}
}
push_map_as_event_on_timeout => true
When you use this, and a timeout occurs, it creates a new event using the contents of the map. If you want fields from the original messages to be in the new event then you have to add them to the map. For the task_id there is a shorthand notation to do this using the timeout_task_id_field option on the filter, otherwise you have explicitly add them
map['newfield'] ||= event.get('newfield');

TypeError: no implicit conversion of Integer into String

In my pipeline this problem occurs: TypeError: no implicit conversion of Integer into String
I am using centos7 and I have installed it by yum
# Sample Logstash configuration for creating a simple
input {
jdbc {
jdbc_connection_string => "jdbc:oracle:thin:#0.0.0.0:1521:DB"
# The user we wish to execute our statement as
jdbc_user => "******"
jdbc_password => "*******"
# The path to our downloaded jdbc driver
jdbc_driver_library => "\etc\logstash\conf.d\jdbc\ojdbc7.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
# our query
statement => "select * from test"
}
}
output {
stdout { codec => json_lines }
elasticsearch { }
}
It was expected to insert data into elasticsearch
I had the same problem and solved using the following steps:
Stop your logstash service
Add clean_run => true into your jdbc input
Start your logstash service again
After this logstash is going to restore some inner data and will work again.
Solution found thanks to chu (https://discuss.elastic.co/t/error-registering-plugin-pipeline-aborted-due-to-error-typeerror-cant-dup-fixnum-failed-to-execute-action/128987/4)

Logstash multiline input works on local but not on EC2 instance

I've tried to parse it using the json, json_lines and even the multiline input plugin, yet to no avail. The multiline works well on my local machine but doesn't seem to work on my s3 and ec2 instance.
How would I write the grok filter to parse this?
This is what my JSON file looks like
{
"sourceId":"94:54:93:3B:81:6F1",
"machineId":"c1VR21A0GoCBgU6EMJ78d3CL",
"columnsCSV":"timestamp,state,0001,0002,0003,0004",
"tenantId":"iugcp",
"valuesCSV":"1557920277890,1,98.66,0.07,0.1,0.17 ",
"timestamp":"2019-05-15T11:37:57.890Z"
}
This is my config -
input {
file{
codec => multiline
{
pattern => '^\{'
negate => true
what => previous
}
path => "/home/*myusername*/Desktop/data/*.json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
mutate
{
replace => [ "message", "%{message}}" ]
gsub => [ 'message','\n','']
}
if [message] =~ /^{.*}$/
{
json { source => message }
}
}
//Output tag is correct, haven't included it here
The results I get is just the json file present in the "message" field.
What I wanted is for every json tag, there should be a separate field in the document.

how to write the logstash output config to get ABC,AD_EF,123?

my log format is:
XXX: 03-20 17:52:28: XXX. * 0 XXX [XXX] [X XX: X]:XXX\tABC:AD_EF:123\t0\tXXXXXXXXXXXXXXXX\tXXXXXXXXXXXXXXXXXXX
how to write the logstash output config to get ABC, AD_EF, 123 ?
output example:
good,ABC,DEF,123
output {
file {
path => "/xxx/xxx/xxx/output.txt"
codec => plain {
format => "good,ABC,DEF,123" # how to write this regular expression????
}
flush_interval => 0
}
}
Your log output seems to have embedded tabs in it, and those tabs bracket your data. This is good, as it means the csv filter can pull that out for you.
filter {
csv {
separator => " "
columns => [ 'garbage1', 'good', 'garbage2', 'garbage3', 'garbage4' ]
source => "message"
}
}
Note, that is the actual tab character in there, which is hard to represent here.
You would then output the content of the good field to your file.
Thanks For All help, but maybe I made a mistake.
And Finally, I get the answer for my quest:
filter {
grok {
match => {
"message" => "XXX\t(?<field1>\w+?):(?<field2>\w+?):(?<field3>\d+?)\t"
}
}}

How to call another filter from within a ruby filter in logstash.

I'm building out logstash and would like to build functionality to anonymize fields as specified in the message.
Given the message below, the field fta is a list of fields to anonymize. I would like to just use %{fta} and pass it through to the anonymize filter, but that doesn't seem to work.
{ "containsPII":"True", "fta":["f1","f2"], "f1":"test", "f2":"5551212" }
My config is as follows
input {
stdin { codec => json }
}
filter {
if [containsPII] {
anonymize {
algorithm => "SHA1"
key => "123456789"
fields => %{fta}
}
}
}
output {
stdout {
codec => rubydebug
}
}
The output is
{
"containsPII" => "True",
"fta" => [
[0] "f1",
[1] "f2"
],
"f1" => "test",
"f2" => "5551212",
"#version" => "1",
"#timestamp" => "2016-07-13T22:07:04.036Z",
"host" => "..."
}
Does anyone have any thoughts? I have tried several permutations at this point with no luck.
Thanks,
-D
EDIT:
After posting in the Elastic forums, I found out that this is not possible using base logstash functionality. I will try using the ruby filter instead. So, to ammend my question, How do I call another filter from within the ruby filter? I tried the following with no luck and honestly can't even figure out where to look. I'm very new to ruby.
filter {
if [containsPII] {
ruby {
code => "event['fta'].each { |item| event[item] = LogStash::Filters::Anonymize.execute(event[item],'12345','SHA1') }"
add_tag => ["Rubyrun"]
}
}
}
You can execute the filters from ruby script. Steps will be:
Create the required filter instance in the init block of inline ruby script.
For every event call the filter method of the filter instance.
Following is the example for above problem statement. It will replace my_ip field in event with its SHA1.
Same can be achieved using ruby script file.
Following is the sample config file.
input { stdin { codec => json_lines } }
filter {
ruby {
init => "
require 'logstash/filters/anonymize'
# Create instance of filter with applicable parameters
#anonymize = LogStash::Filters::Anonymize.new({'algorithm' => 'SHA1',
'key' => '123456789',
'fields' => ['my_ip']})
# Make sure to call register
#anonymize.register
"
code => "
# Invoke the filter
#anonymize.filter(event)
"
}
}
output { stdout { codec => rubydebug {metadata => true} } }
Well, I wasn't able to figure out how to call another filter from within a ruby filter, but I did get to the functional goal.
filter {
if [fta] {
ruby {
init => "require 'openssl'"
code => "event['fta'].each { |item| event[item] = OpenSSL::HMAC.hexdigest(OpenSSL::Digest::SHA256.new, '123456789', event[item] ) }"
}
}
}
If the field FTA exists, it will SHA2 encode each of the fields listed in that array.

Resources