Logstash JDBC tracking column value not latest timestamp

Logstash JDBC tracking column value not latest timestamp - logstash

Database
Given the following PostgreSQL table test (some columns omitted, e.g. data which is used in the pipeline):
id (uuid) | updated_at (timestamp with time zone)
652d88d3-e978-48b1-bd0f-b8188054a920 | 2018-08-08 11:02:00.000000
50cf7942-cd18-4730-a65e-fc06f11cfd1d | 2018-08-07 15:30:00.000000
Logstash
Given Logstash 6.3.2 (via Docker) with the following pipeline (jdbc_* omitted):
input {
jdbc {
statement => "SELECT id, data, updated_at FROM test WHERE updated_at > :sql_last_value"
schedule => "* * * * *"
use_column_value => true
tracking_column => "updated_at"
tracking_column_type => "timestamp"
}
}
filter {
mutate { remove_field => "updated_at" }
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
index => "test"
document_id => "%{id}"
}
}
Problem
When this pipeline runs the very first time (or with clean_run => true) I'd expect it to process both database rows (because sql_last_value is 1970-01-01 00:00:00.000000) and set the value of the tracking column stored in .logstash_jdbc_last_run to 2018-08-08 11:02:00.000000000 Z (= the latest of all found updated_at timestamps). It'll be set to 2018-08-07 15:30:00.000000000 Z though, which is the earlier of the two given timestamps. This means that in the 2nd run the other of the two rows will be processed again, even if it hasn't changed.
Is this the expected behaviour? Do I miss some other configuration which controls this aspect?
Edit
It seems that the updated_at of the very last row returned will be used (just tried it with more rows). So I'd have to add an ORDER BY updated_at ASC which I believe isn't that great in terms of DB query performance.
Logs, etc.
sh-4.2$ cat .logstash_jdbc_last_run
cat: .logstash_jdbc_last_run: No such file or directory
[2018-08-09T14:38:01,540][INFO ][logstash.inputs.jdbc ] (0.001254s) SELECT id, data, updated_at FROM test WHERE updated_at > '1970-01-01 00:00:00.000000+0000'
sh-4.2$ cat .logstash_jdbc_last_run
--- 2018-08-07 15:30:00.000000000 Z
[2018-08-09T14:39:00,335][INFO ][logstash.inputs.jdbc ] (0.001143s) SELECT id, data, updated_at FROM test WHERE updated_at > '2018-08-07 15:30:00.000000+0000'
sh-4.2$ cat .logstash_jdbc_last_run
--- 2018-08-08 11:02:00.000000000 Z
[2018-08-09T14:40:00,104][INFO ][logstash.inputs.jdbc ] (0.000734s) SELECT id, data, updated_at FROM test WHERE updated_at > '2018-08-08 11:02:00.000000+0000'
sh-4.2$ cat .logstash_jdbc_last_run
--- 2018-08-08 11:02:00.000000000 Z

I have experiencing the same problem from last month using MySQL to ES. But at the end it is solved. The file .logstash_jdbc_last_run is created in your home directory by default. You can change the path of this file by setting the last_run_metadata_path config option. I am using UTC date format.
Firt time the sql_last_value value is 1970-01-01 00:00:00.000000 . Also it set the date in logstash_jdbc_last_run file which is first record returned by MySQL. That why i use order by update_at DESC. Following code worked for me.
input {
jdbc {
jdbc_default_timezone => "UTC"
statement => "SELECT id, data, DATE_FORMAT(updated_at, '%Y-%m-%d %T') as updated_at, FROM test WHERE updated_at > :sql_last_value order by update_at DESC"
schedule => "* * * * * *"
use_column_value => true
tracking_column => "updated_at"
tracking_column_type => "timestamp"
last_run_metadata_path => /home/logstash_track_date/.logstash_user_jdbc_last_run"
}
}
filter {
mutate { remove_field => "updated_at" }
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => "elasticsearch:9200"
index => "test"
document_id => "%{id}"
}
}

Related

logstash replace #timestamp with logfile timestamp

Below is the timestamp in my logfiles which exist in my s3 bucket.
[2019-10-17 10:23:02.021 GMT] ***** ImpEx process 'CQExport' FINISHED (status: OK Details: error=OK, id: 1571307782013). *****
[2019-11-27 00:15:01.799 GMT] DEBUG []Starting DR Backup
I want to replace logfile timestamp with #timestamp on kibana dashboard.
enter image description here
Ex: i want to replace/visualise Time Dec 16, 2019 #20:04:57.524 with logfile timestamp [2019-10-17 14:21:05.301 GMT] on kibana dashboard
Below is my snippet i have configured but unable to see logfile timestamp.
**filter {
grok {
match => { "message" => "^%{TIMESTAMP_ISO8601:timestamp}" }
}
date {
match => [ "timestamp" , "ISO8601" ]
target => "#logtimestamp"
locale => "en"
timezone => "UTC"
}
}**

What Time Filter field name did you choose when creating your index ?

Try below conf, where target is #timestamp
filter {
grok {
match => { "message" => "\[(?<timestamp>%{TIMESTAMP_ISO8601}) (?<TZ>GMT)\]" }
}
date {
match => [ "timestamp" , "ISO8601" ]
target => "#timestamp"
locale => "en"
timezone => "UTC"
}
}

Stored Query from Database as JDBC input to Logstash

I have table eg QueryConfigTable that holds a query in one column eg ,select * from customertable .I want the query in the column to be hold query to be executed as input to JDBC in logstash I
its taking the column query as value and storing in to elasticSearch
input {
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/MYDB"
//MYDB will be set dynamically.
jdbc_user => "mysql"
parameters => { "favorite_artist" => "Beethoven" }
schedule => "* * * * *"
statement => "SELECT * from QueryConfigTable "
}
}
/// output as elasticSearch
elasticsearch {
hosts => ["http://my-host.com:9200"]
index => "test"
}
final output is
"_index": "test",
"_type": "doc",
"_source": {
"product": "PL SALARIED AND SELF EMPLOYED",
"#version": "1",
"query": "select * from customertable cust where cust.isdeleted !=0"
}
but i want the query value ie, "select * from customertable cust where cust.isdeleted !=0 "to be executed as JDBC input to logstash

The jdbc input will not do this kind of indirection for you. You could write a stored procedure that fetches and executes the SQL and call that from the jdbc input.

Logstash 6.2.4 - match time does not default to current date

I am using logstash 6.2.4 with the following config:
input {
stdin { }
}
filter {
date {
match => [ "message","HH:mm:ss" ]
}
}
output {
stdout { }
}
With the following input:
10:15:20
I get this output:
{
"message" => "10:15:20",
"#version" => "1",
"host" => "DESKTOP-65E12L2",
"#timestamp" => 2019-01-01T09:15:20.000Z
}
I have just a time information, but would like to parse it as current date.
Note that current date is 1. March 2019, so I guess that 2019-01-01 is some sort of default ?
How can I parse time information and add current date information to it ?
I am not really interested in any replace or other blocks as according to the documentation, parsing the time should default to current date.

You need to add a new field merging the current date with the field that contains your time information, which in your example is the message field, then your date filter will need to be tested against this new field, you can do this using the following configuration.
filter {
mutate {
add_field => { "current_date" => "%{+YYYY-MM-dd} %{message}" }
}
date {
match => ["current_date", "YYYY-MM-dd HH:mm:ss" ]
}
}
The result will be something like this:
{
"current_date" => "2019-03-03 10:15:20",
"#timestamp" => 2019-03-03T13:15:20.000Z,
"host" => "elk",
"message" => "10:15:20",
"#version" => "1"
}

Elastic search Logstash How to configure UTC Time according to orcal time stamp

I'am working with Elastic search Logstash ,catching updates from orcal data base into elastic search.
My problem ==> how to configure sql_last_start UTC time parameter with orcal time stamp.
This is my configration ====>
input{
jdbc {
.
.
.
statement => "select * from cm.ELSAYED WHERE 'TIMESTAMP' > ':sql_last_start'"
}
}
filter {
date {
match => [ "TIMESTAMP", "YYYY-MM-dd HH:mm:ss.ssssssssssssss Z" ]
target => "TIMESTAMP"
timezone => "UTC"
}
}

I think this may help you jdbc_default_timezone for input
add jdbc_default_timezone => "UTC" to input

Logstash csv filter create index name based on timestamp

I want to create ES index based on the dates matching from the logfile. I am using logstash CSV filter to process the logs. For instance, the log data appears like below
2016-02-21 00:02:32.238,123.abc.com,data
2016-02-22 00:04:40.145,345.abc.com,data
Below is the logstash configuration file. Obviously the index will be created as testlog, however, i want the index to be created as testlog-2016.02.21 and testlog-2016.02.22, given that YYYY.mm.dd is the logstash preferred format for index dates. I have done this with grok filters, and I am trying the achieve the same with csv, but this doesn't seem to work.
filter {
csv {
columns => [ "timestamp", "host", "data" ]
separator => ","
remove_field => ["message"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "testlog"
}
}
We are on Logstash 2.1.0, ES 2.1.0 and Kibana 4.3.0 version
Any inputs appreciated

You need to specify the #timestamp field in filter, and you also need to specify your index name as below:
filter {
csv {
columns => [ "timestamp", "host", "data" ]
separator => ","
remove_field => ["message"]
}
date {
match => [ "timestamp", "ISO8601" ] #specify timestamp format
timezone => "UTC" #specify timezone
target => "#timestamp"
}
}
output {
elasticsearch
hosts => ["localhost:9200"]
index => "testlog-%{+YYYY.MM.dd}"
}
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Logstash JDBC tracking column value not latest timestamp - logstash

Related

logstash replace #timestamp with logfile timestamp

Stored Query from Database as JDBC input to Logstash

Logstash 6.2.4 - match time does not default to current date

Elastic search Logstash How to configure UTC Time according to orcal time stamp

Logstash csv filter create index name based on timestamp

Categories

Resources