Stored Query from Database as JDBC input to Logstash

Stored Query from Database as JDBC input to Logstash - logstash

I have table eg QueryConfigTable that holds a query in one column eg ,select * from customertable .I want the query in the column to be hold query to be executed as input to JDBC in logstash I
its taking the column query as value and storing in to elasticSearch
input {
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/MYDB"
//MYDB will be set dynamically.
jdbc_user => "mysql"
parameters => { "favorite_artist" => "Beethoven" }
schedule => "* * * * *"
statement => "SELECT * from QueryConfigTable "
}
}
/// output as elasticSearch
elasticsearch {
hosts => ["http://my-host.com:9200"]
index => "test"
}
final output is
"_index": "test",
"_type": "doc",
"_source": {
"product": "PL SALARIED AND SELF EMPLOYED",
"#version": "1",
"query": "select * from customertable cust where cust.isdeleted !=0"
}
but i want the query value ie, "select * from customertable cust where cust.isdeleted !=0 "to be executed as JDBC input to logstash

The jdbc input will not do this kind of indirection for you. You could write a stored procedure that fetches and executes the SQL and call that from the jdbc input.

Related

logstash - Conditionally converts field types

I inherited a logstash config as follows. I do not want to do major changes in this because I do not want to break anything that is working. The metrics are sent as logs with json in format - "metric": "metricname", "value": "int". This has been working great. However, there is a requirement to have a string in value for a new metric. It is not really a metric but to indicate the state of the processing in string. Based on the following filter, it converts everything to integer and any string in value will be converted to 0. The requirement is that if the value is a string, it shouldn't attempt convert. Thank you!
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:ts} - M_%{DATA:task}_%{NUMBER:thread} - INFO - %{GREEDYDATA:jmetric}"}
remove_field => [ "message", "ecs", "original", "agent", "log", "host", "path" ]
break_on_match => false
}
if "_grokparsefailure" in [tags] {
drop {}
}
date {
match => ["ts", "ISO8601"]
target => "#timestamp"
}
json {
source => "jmetric"
remove_field => "jmetric"
}
split {
field => "points"
add_field => {
"metric" => "%{[points][metric]}"
"value" => "%{[points][value]}"
}
remove_field => [ "points", "event", "tags", "ts", "stream", "input" ]
}
mutate {
convert => { "value" => "integer" }
convert => { "thread" => "integer" }
}
}

You should use index mappings for this mainly.
Even if you handle things in logstash, elasticsearch will - if configured with the defaults - do dynamic mapping, which may work against any configuration you do in logstash.
See Elasticsearch index templates
An index template is a way to tell Elasticsearch how to configure an index when it is created.
...
Index templates can contain a collection of component templates, as well as directly specify settings, mappings, and aliases.
Mappings are pr index! This means that when you apply new mapping, you will have to create a new index. You can "rollover" to a new index, or delete / import your data again. What you do depends on your data, how you receive it, etc. ymmv...
No matter what, if your index has the wrong mapping you will need to create a new index to get the new mapping.
PS! If you have a lot of legacy data take a look at the reindex API for elasticsearch.

jdbc_static: Getting a sql syntax error I did'nt write

With logstash I am trying to Extract some tables, Transform them locally on the logstash mashine, and then Load the result to ElasticSearch. The reason for this solution is due to very limited computing power on the source server, a MariaDB.
I have tested the input{} separately, it works, so the connection to the mariadb is sound.
I have tested the jdbc_static filter against a microsoftSQL server. So logstash has writing privileges in is current environment.
I have tested the SQL syntax on the MariaDB server directly
I'm running logstash 6.8 and java 8 (java version "1.8.0_211")
I have tried earlier versions of mariadb jdbc connection
(mariadb-java-client-2.4.2.jar, mariadb-java-client-2.2.6-sources,
mariadb-java-client-2.3.0-sources)
My config file
input {
jdbc {
jdbc_driver_library => "C:/Logstash/logstash-6.8.0/plugin/mariadb-java-client-2.4.2.jar"
jdbc_driver_class => "Java::org.mariadb.jdbc.Driver"
jdbc_connection_string => "jdbc:mariadb://xx.xx.xx
jdbc_user => "me"
jdbc_password => "its secret"
schedule => "* * * * *"
statement => "SELECT unqualifiedversionid__ FROM AuditEventFHIR WHERE myUnqualifiedId = '0000134b-fc7f-4c3a-b681-8150068d6dbb'"
}
}
filter {
jdbc_static {
loaders => [
{
id => "auditevent"
query => "SELECT
myUnqualifiedId
,unqualifiedversionid__
,type_
FROM AuditEventFHIR
where myUnqualifiedId = '0000134b-fc7f-4c3a-b681-8150068d6dbb'
"
local_table => "l_ae"
}
]
local_db_objects => [
{
name => "l_ae"
index_columns => ["myUnqualifiedId"]
columns => [
["myUnqualifiedId", "varchar(256)"],
["unqualifiedversionid__", "varchar(24)"],
["type_", "varchar(256)"]
]
}
]
local_lookups => [
{
id => "rawlogfile"
query => "
select myUnqualifiedId from l_ae
"
target => "sql_output"
}
]
jdbc_driver_library => "C:/Logstash/logstash-6.8.0/plugin/mariadb-java-client-2.4.2.jar"
jdbc_driver_class => "Java::org.mariadb.jdbc.Driver"
jdbc_connection_string => "jdbc:mariadb://xx.xx.xx.xx"
jdbc_user => "me"
jdbc_password => "its secret"
}
}
output {
stdout { codec => rubydebug }
}
I am getting this and several other errors, but I suspect fixing the first will fix the rest. But key is that no were in my code are the words "LIMIT 1"
[ERROR][logstash.filters.jdbc.readonlydatabase] Exception occurred when executing loader Jdbc query count {:exception=>"Java::JavaSql::SQLSyntaxErrorException: (conn=1490) You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '\"T1\" LIMIT 1' at line 8", :backtrace=>["org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(org/mariadb/jdbc/internal/util/exceptions/ExceptionMapper.java:242)", "org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(org/mariadb/jdbc/internal/util/exceptions/ExceptionMapper.java:171)", "org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(org/mariadb/jdbc/MariaDbStatement.java:248)", "org.mariadb.jdbc.MariaDbStatement.executeInternal(org/mariadb/jdbc/MariaDbStatement.java:338)", "org.mariadb.jdbc.MariaDbStatement.executeQuery(org/mariadb/jdbc/MariaDbStatement.java:512)", "java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498)", "org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:425)", "org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:292)"]}

The jdbc_static loader, makes a hidden SQL query select count(*) from table limit 1 to get a checksum when downloading the table. This query contains " and mariaDB don't like that.
UNLESS you add 'ANSI_QUOTES' to the sql_mode
Batch command
SET GLOBAL sql_mode = 'ANSI_QUOTES'
Another option is to set the session to allow ansi_quotes
jdbc_connection_string => "jdbc:mariadb://xx.xx.xx/databasename?sessionVariables=sql_mode=ANSI_QUOTES"

Logstash: unable to filter lines from metrics

I'd need to collect metrics from an URL. The format of the metrics is like that:
# HELP base:classloader_total_loaded_class_count Displays the total number of classes that have been loaded since the Java virtual machine has started execution.
# TYPE base:classloader_total_loaded_class_count counter
base:classloader_total_loaded_class_count 23003.0
I'd need to exclude, from the events collected, all lines which begin with a '#' character.
So I have arranged for the following configuration file:
input {
http_poller {
urls => {
pool_metrics => {
method => "get"
url => "http://localhost:10090/metrics"
headers => {
"Content-Type" => "text/plain"
}
}
}
request_timeout => 30
schedule => { cron => "* * * * * UTC"}
codec => multiline {
pattern => "^#"
negate => "true"
what => previous
}
type => "server_metrics"
}
}
output {
elasticsearch {
# An index is created for each type of metrics inpout
index => "logstash-%{type}"
}
}
Unfortunately, when I check through elastic search the data collected, I see it's not really what I was expecting. For example:
{
"_index" : "logstash-server_metrics",
"_type" : "doc",
"_id" : "2egAvWcBwbQ9kTetvX2o",
"_score" : 1.0,
"_source" : {
"type" : "server_metrics",
"tags" : [
"multiline"
],
"message" : "# TYPE base:gc_ps_scavenge_count counter\nbase:gc_ps_scavenge_count 24.0",
"#version" : "1",
"#timestamp" : "2018-12-17T16:30:01.009Z"
}
},
So it seems that the lines with '#' aren't skipped but appended to the next line from the metrics.
Can you recommend any way to fix it?

The multiline codec doesn't work this way. It merges the events into a single event, appending the lines that don't match ^# as you have observed.
I don't think it's possible to drop messages with a codec, you'll have to use the drop filter instead.
First remove the codec from your input configuration, then add this filter part to your configuration:
filter {
if [message] =~ "^#" {
drop {}
}
}
Using conditionals, if the message matches ^#, the event will be dropped by the drop filter, as you wanted.

Logstash JDBC tracking column value not latest timestamp

Database
Given the following PostgreSQL table test (some columns omitted, e.g. data which is used in the pipeline):
id (uuid) | updated_at (timestamp with time zone)
652d88d3-e978-48b1-bd0f-b8188054a920 | 2018-08-08 11:02:00.000000
50cf7942-cd18-4730-a65e-fc06f11cfd1d | 2018-08-07 15:30:00.000000
Logstash
Given Logstash 6.3.2 (via Docker) with the following pipeline (jdbc_* omitted):
input {
jdbc {
statement => "SELECT id, data, updated_at FROM test WHERE updated_at > :sql_last_value"
schedule => "* * * * *"
use_column_value => true
tracking_column => "updated_at"
tracking_column_type => "timestamp"
}
}
filter {
mutate { remove_field => "updated_at" }
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
index => "test"
document_id => "%{id}"
}
}
Problem
When this pipeline runs the very first time (or with clean_run => true) I'd expect it to process both database rows (because sql_last_value is 1970-01-01 00:00:00.000000) and set the value of the tracking column stored in .logstash_jdbc_last_run to 2018-08-08 11:02:00.000000000 Z (= the latest of all found updated_at timestamps). It'll be set to 2018-08-07 15:30:00.000000000 Z though, which is the earlier of the two given timestamps. This means that in the 2nd run the other of the two rows will be processed again, even if it hasn't changed.
Is this the expected behaviour? Do I miss some other configuration which controls this aspect?
Edit
It seems that the updated_at of the very last row returned will be used (just tried it with more rows). So I'd have to add an ORDER BY updated_at ASC which I believe isn't that great in terms of DB query performance.
Logs, etc.
sh-4.2$ cat .logstash_jdbc_last_run
cat: .logstash_jdbc_last_run: No such file or directory
[2018-08-09T14:38:01,540][INFO ][logstash.inputs.jdbc ] (0.001254s) SELECT id, data, updated_at FROM test WHERE updated_at > '1970-01-01 00:00:00.000000+0000'
sh-4.2$ cat .logstash_jdbc_last_run
--- 2018-08-07 15:30:00.000000000 Z
[2018-08-09T14:39:00,335][INFO ][logstash.inputs.jdbc ] (0.001143s) SELECT id, data, updated_at FROM test WHERE updated_at > '2018-08-07 15:30:00.000000+0000'
sh-4.2$ cat .logstash_jdbc_last_run
--- 2018-08-08 11:02:00.000000000 Z
[2018-08-09T14:40:00,104][INFO ][logstash.inputs.jdbc ] (0.000734s) SELECT id, data, updated_at FROM test WHERE updated_at > '2018-08-08 11:02:00.000000+0000'
sh-4.2$ cat .logstash_jdbc_last_run
--- 2018-08-08 11:02:00.000000000 Z

I have experiencing the same problem from last month using MySQL to ES. But at the end it is solved. The file .logstash_jdbc_last_run is created in your home directory by default. You can change the path of this file by setting the last_run_metadata_path config option. I am using UTC date format.
Firt time the sql_last_value value is 1970-01-01 00:00:00.000000 . Also it set the date in logstash_jdbc_last_run file which is first record returned by MySQL. That why i use order by update_at DESC. Following code worked for me.
input {
jdbc {
jdbc_default_timezone => "UTC"
statement => "SELECT id, data, DATE_FORMAT(updated_at, '%Y-%m-%d %T') as updated_at, FROM test WHERE updated_at > :sql_last_value order by update_at DESC"
schedule => "* * * * * *"
use_column_value => true
tracking_column => "updated_at"
tracking_column_type => "timestamp"
last_run_metadata_path => /home/logstash_track_date/.logstash_user_jdbc_last_run"
}
}
filter {
mutate { remove_field => "updated_at" }
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => "elasticsearch:9200"
index => "test"
document_id => "%{id}"
}
}

Elasticsearch jdbc-importer in Nodejs and postgresql

I want to use Elasticsearch within my project. I am using Nodejs and postgresql.
I want to connect postgresql with elasticsearch for this i am using jdbc-importer. I followed the steps written in their docs to connect with postgresql and i succeded in this but through command line.
I want to use jdbc-importer within my project through nodejs
commondline code to run jdbc importer:
bin=/Users/mac/Documents/elasticsearch-jdbc-2.3.4.1/bin
lib=/Users/mac/Documents/elasticsearch-jdbc-2.3.4.1/lib
echo '{
"type" : "jdbc",
"jdbc" : {
"url" :
"jdbc:postgresql://localhost:5432/development",
"sql" : "select * from \"Products\"",
"index" : "product",
"type" : "product",
"elasticsearch" : {
"cluster" : "elasticsearch",
"host" : "localhost",
"port" : 9300
}
}
}' | java \
-cp "${lib}/*" \
org.xbib.tools.Runner \
org.xbib.tools.JDBCImporter
above command have created index product in elasticsearch and it also have data from Products table of postgresql.
Now, I want to use that jdbc importer through nodejs and If anyone elso knows other efficient way to manage my postgresql data in elasticsearch they are also welcome to give answere.

You can use Logstash:
https://www.elastic.co/blog/logstash-jdbc-input-plugin
With it you can transfer data from Postgres into ElasticSearch. This is my config file:
input {
jdbc {
# Postgres jdbc connection string to our database, mydb
jdbc_connection_string => "jdbc:postgresql://localhost:5432/MyDB"
# The user we wish to execute our statement as
jdbc_user => "postgres"
jdbc_password => ""
# The path to our downloaded jdbc driver
jdbc_driver_library => "postgresql-42.1.1.jar"
# The name of the driver class for Postgresql
jdbc_driver_class => "org.postgresql.Driver"
# our query
statement => "SELECT * from contacts"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "contacts"
document_type => "contact"
document_id => "%{uid}"
}
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Stored Query from Database as JDBC input to Logstash - logstash

The jdbc input will not do this kind of indirection for you. You could write a stored procedure that fetches and executes the SQL and call that from the jdbc input.

Related

logstash - Conditionally converts field types

jdbc_static: Getting a sql syntax error I did'nt write

Logstash: unable to filter lines from metrics

Logstash JDBC tracking column value not latest timestamp

Elasticsearch jdbc-importer in Nodejs and postgresql

Categories

Resources