I am using logstash-output-influxdb plugin to send event from logstash to influx db. Data points configuration of plugin look like
data_points => {
"visitor" => 1
"lead" => 0
"category" => "%{[category]}"
"host" => "%{[host]}"
}
But here problem is visitor and lead fields in influxdb are integer and using above configuration results in following error
input field \\"visitor\\" on measurement \\"visitors_new\\" is type float, already exists as type integer.
Line protocol of influxdb says that you have to append i with the number to indicate that it is an integer, so if I change my configuration to
data_points => {
"visitor" => "1i"
"lead" => "0i"
"category" => "%{[category]}"
"host" => "%{[host]}"
}
Now error becomes
input field \\"visitor\\" on measurement \\"visitors_new\\" is type string, already exists as type integer
If I change configuration to
data_points => {
"visitor" => 1i
"lead" => 0i
"category" => "%{[category]}"
"host" => "%{[host]}"
}
Now logstash does not accept it as a valid configuration.
How can I send integer fields to influxdb using logstash-output-influxdb plugin?
I suggest using the coerce => { } parameter to achieve your data-typing, rather than feeding line-protocol details in the number.
data_points => {
"visitor" => 1
"lead" => 0
"category" => "%{[category]}"
"host" => "%{[host]}"
}
coerce_values => {
"visitor" => "integer"
"lead" => "integer"
}
This tells the plugin these fields are integer, which will likely be more successful.
Related
i am currently working on a project with the Elastic stack for a log monitoring system. The logs i have to load are in a specific format so i have to write my own logstash scripts to read them. In particular one type of logs where i have a date in the start of the file and the timestamp in each of the other lines has no date, my goal is to extract the date from the first line and add it to all the next ones, after some research i found that the aggregate filter can help but i can't get it to work, here is my config file :
input
{
file {
path => "F:/ELK/data/testFile.txt"
#path => "F:/ELK/data/*/request/*"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter
{
mutate {
add_field => { "taskId" => "all" }
}
grok
{
match => {"message" => "-- %{NOTSPACE} %{NOTSPACE}: %{DAY}, %{MONTH:month} %{MONTHDAY:day}, %{YEAR:year}%{GREEDYDATA}"}
tag_on_failure => ["not_date_line"]
}
if "not_date_line" not in [tags]
{
mutate{
replace => {'taskId' => "%{day}/%{month}/%{year}"}
remove_field => ["day","month","year"]
}
aggregate
{
task_id => "%{taskId}"
code => "map['taskId'] = event.get('taskId')"
map_action => "create"
}
}
else
{
dissect
{
mapping => { message => "%{sequence_index} %{time} %{pid} %{puid} %{stack_level} %{operation} %{params} %{op_type} %{form_event} %{op_duration}"}
}
aggregate {
task_id => "%{taskId}"
code => "event.set('taskId', map['taskId'])"
map_action => "update"
timeout => 0
}
mutate
{
strip => ["op_duration"]
replace => {"time" => "%{taskId}-%{time}"}
}
}
mutate
{
remove_field => ['#timestamp','host','#version','path','message','tags']
}
}
output
{
stdout{}
}
the scripts reads the date correctly but then doesn't work to replace the value in the other events :
{
"taskId" => "22/October/2020"
}
{
"pid" => "45",
"sequence_index" => "10853799",
"op_type" => "1",
"time" => "all-16:23:29:629",
"params" => "90",
"stack_level" => "0",
"op_duration" => "",
"operation" => "10",
"form_event" => "0",
"taskId" => "all",
"puid" => "1724"
}
I am using only one worker to ensure the order of the events is kept intact , if you know of any other way to achieve this i'm open to suggestions, thank you !
For the lines which have a date you are setting the taskId to "%{day}/%{month}/%{year}", for the rest of the lines you are setting it to "all". The aggregate filter will not aggregate across events with different task ids.
I suggest you use a constant taskId and store the date in some other field, then in a single aggregate filter you can use something like
code => '
date = event.get("date")
if date
#date = date
else
event.set("date", #date)
end
'
#date is an instance variable, so its scope is limited to that aggregate filter, but it is preserved across events. It is not shared with other aggregate filters (that would require a class variable or a global variable).
Note that you require event order to be preserved, so you should set pipeline.workers to 1.
Thanks to #Badger and some other post he answered on the elastic forum, i found a solution using a single ruby filter and an instance variable, couldn't get it to work with the aggregate filter but that is not an issue for me.
ruby
{
init => '#date = ""'
code => "
event.set('date',#date) unless #date.empty?
#date = event.get('date') unless event.get('date').empty?
"
}
I'm new to grok and I have run into this issue that I just don't know how to solve.
Below is my grok match:
grok {
match => { "source" => "/var/log/nginx/sites/\b\w+\b/\b\w+\b/\b\w+\b/%{DATA:uuid}/" }
}
mutate {
add_field => {
"read_timestamp" => "%{#timestamp}"
"token" => "%{[fields][token]}"
"logzio_codec" => "%{[fields][logzio_codec]}"
"uuid" => "%{uuid}"
"type" => "%{[fields][type]}"
"category" => "%{[fields][category]}"
}
}
for some reason, the uuid is matched and resulted in array of 2 uuid (duplicated values). Instead of uuid_string I get [uuid_string, uuid_string]
I tried on https://grokdebug.herokuapp.com/ and got what I expected so I wonder what is wrong?
So once again I misunderstand how grok works. It seems like once the match is done, all the fields are already added to the output. The additional add_field uuid in the mutate thus causes the field to be added twice and logstash then thinks it's an array.
In my Logstash I have below configuration:
filter {
mutate {
add_field => {
"doclength" => "%{size}"
}
convert => {"doclength" => "integer"}
remove_field => ["size"]
}
}
I intend to store the field "doclength" into ElasticSearch as an integer. But somehow in ES, it shows mapping as "string" only.
Not sure what I am missing in here, the expected behavior is not matching up with the actual one.
Try this one, it worked on my machine.
filter {
mutate {
convert => {"size" => "integer"}
rename => { "size" => "doclength" }
}
}
Im using Elasticsearch with Logstash.
I want to update indexes when database changes. So i decided to use LS schedule. But every 1 minute output appended by database table records.
Example: contract table has 2 rows.
First 1 minute total: 2, 1 minute after total output is : 4;
How can i solve this?
There is my config file. Command is bin/logstash -f contract.conf
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://localhost:5432/resource"
jdbc_user => "postgres"
jdbc_validate_connection => true
jdbc_driver_library => "/var/www/html/iltodgeree/logstash/postgres.jar"
jdbc_driver_class => "org.postgresql.Driver"
statement => "SELECT * FROM contracts;"
schedule => "* * * * *"
codec => "json"
}
}
output {
elasticsearch {
index => "resource_contracts"
document_type => "metadata"
hosts => "localhost:9200"
}
}
You need to modify your output by specifying the document_id setting and use the ID field from your contracts table. That way, you'll never get duplicates.
output {
elasticsearch {
index => "resource_contracts"
document_type => "metadata"
document_id => "%{ID_FIELD}"
hosts => "localhost:9200"
}
}
Also if you have an update timestamp in your contracts table, you can modify the SQL statement in your input like below in order to only copy the records that changed recently:
statement => "SELECT * FROM contracts WHERE timestamp > :sql_last_value;"
I have a file named "Job Code.txt"
job_id=0001,description=Ship data from server to elknode1,result=OK
job_id=0002,description=Ship data from server to elknode2,result=Error: Msg...
job_id=0003,description=Ship data from server to elknode3,result=OK
job_id=0004,description=Ship data from server to elknode4,result=OK
Here is the filter part of my .conf file but it doesn't work. How can I created new field, i.e. jobID, description, result as to be seen in kibana
filter{
grok{ match => {"message" => ["JobID: %{NOTSPACE:job_id}","description: %{NOTSPACE:description}","result: %{NOTSPACE:message}"]}
add_field => {
"JobID" => "%{job_id}"
"Description" => "%{description}"
"Message" => "%{message}"
}
}
if [job_id] == "0001" {
aggregate {
task_id => "%{job_id}"
code => "map['time_elasped']=0"
map_action => "create"
}
}
if [job_id] == "0003" {
aggregate {
task_id => "%{job_id}"
code => "map['time_elasped']=0"
map_action => "update"
}
}
if [job_id] == "0002" {
aggregate {
task_id => "%{job_id}"
code => "map['time_elasped']=0"
map_action => "update"
}
}
I know this is a couple days old, perhaps you still require an answer. Change your grok statement to:
grok {
match => { "message" => "job_id=%{DATA:job_id},description=%{DATA:description},result=%{GREEDYDATA:message}" }
}
You won't need the add_field option, grok will create them for you. The add_field option is to add arbitrary fields. Check the pattern at https://grokdebug.herokuapp.com
Also, unless there are other messages you want to match, I don't think the aggregate statements you have will do what you want.