Logstash Dynamically assign template - logstash

I have read that it is possible to assign dynamic names to the indexes like this:
elasticsearch {
cluster => "logstash"
index => "logstash-%{clientid}-%{+YYYY.MM.dd}"
}
What I am wondering is if it is possible to assign the template dynamically as well:
elasticsearch {
cluster => "logstash"
template => "/etc/logstash/conf.d/%{clientid}-template.json"
}
Also where does the variable %{clientid} come from?
Thanks!

After some testing and feedback from other users, thanks Ben Lim, it seems this is not possible to do so far.
The closest thing would be to do something like this:
if [type] == "redis-input" {
elasticsearch {
cluster => "logstash"
index => "%{type}-logstash-%{+YYYY.MM.dd}"
template => "/etc/logstash/conf.d/elasticsearch-template.json"
template_name => "redis"
}
} else if [type] == "syslog" {
elasticsearch {
cluster => "logstash"
index => "%{type}-logstash-%{+YYYY.MM.dd}"
template => "/etc/logstash/conf.d/syslog-template.json"
template_name => "syslog"
}
}

Full disclosure: I am a Logstash developer at Elastic
You cannot dynamically assign a template because templates are uploaded only once, at Logstash initialization. Without the flow of traffic, deterministic variable completion does not happen. Since there is no traffic flow during initialization, there is nothing there which can "fill in the blank" for %{clientid}.
It is also important to remember that Elasticsearch index templates are only used when a new index is created, and so it is that templates are not uploaded every time a document reached the Elasticsearch output block in Logstash--can you imagine how much slower it would be if Logstash had to do that? If you intend to have multiple templates, they need to be uploaded to Elasticsearch before any data gets sent there. You can do this with a script of your own making using curl and Elasticsearch API calls. This also permits you to update templates without having to restart Logstash. You could run the script any time before index rollover, and when the new indices get created, they'll have the new template settings.
Logstash can send data to a dynamically configured index name, just as you have above. If there is no template present, Elasticsearch will create a best-guess mapping, rather than what you wanted. Templates can and ought to be completely independent of Logstash. This functionality was added for an improved out-of-the-box experience for brand new users. The default template is less than ideal for advanced use cases, and Logstash is not a good tool for template management if you have more than one index template.

Related

How to add hash the whole content of an event in Logstash for OpenSearch?

the problem is the following: I'm investigating how to add some anti-tampering protection to events stored in OpenSearch that are parsed and sent there by Logstash. Info is composed of application logs collected from several hosts. The idea is to add a hashed field that's linked to the original content so that any modification of the fields break the hash result and can be detected.
Currently, we have in place some grok filters that extract information from the received log lines and store it into different fields using several patterns. To make it more difficult for an attacker who modifies these logs to cover their tracks, I'm thinking of adding an extra field where the whole line is hashed and salted before splitting.
Initial part of my filter config is like this. It was used primarily with ELK, but our project will be switching to OpenSearch:
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:mytimestamp} (\[)*%{LOGLEVEL:loglevel}(\])* %{JAVACLASS:javaclass}(.)*(\[/])* %{DATA:component} %{DATA:version} - %{GREEDYDATA:message}"}
overwrite => [ "message" ]
overwrite => [ "version" ]
break_on_match => false
keep_empty_captures => true
}
// do more stuff
}
OpenSearch has some info on Field masking, but this is not exactly what I am after.
If any of you could help me with a pointer or an idea on how to do this. I don't know whether the hash fields available in ELK are also available in OpenSearch, or whether the Logstash plugin that does the hashing of fields would be usable without licensing issues. But maybe there are other and better options that I am not aware. I was looking for info on how to call an external script to do this during the filter execution, but I don't even know whether that is possible (apparently not, at least I couldn't find anything).
Any ideas? Thank you!

How to use properly exported resources with Puppet

I've been thinking for hours on how to solve a problem with Puppet 4.
Here is my case :
I have a module "Cassandra", and I have 3 machines
Cluster1 (Hostname : CassandraCluster1)
Cluster2 (Hostname : CassandraCluster1)
Cluster3 (Hostname : CassandraCluster1)
I want to collect the hostnames of three clusters in an array, so I can pass them to the cassandra configuration file (which I'm using as a template epp) :
cassandra.yaml.epp
- seeds: "<%= $cluster::hostnames %>"
So the solution is Exported Resources I came up with :)
I've been playing with all day, but no idea how to make up this work. here is what I've tried :
on each cluster I add this code to collect the hostnames :
##file {"${hostname}":
content => 'epp(puppet://modules/cassandra/cassandra.yaml.epp)',
}
# Collect:
File <<| |>>
But I'm not sure if this is a good idea ?!
I've found a temporary solution that I'm using now :
I have a role fact which I'm adding to each machine. and I'm retrieving the hosts from puppetdb using puppetdbquery module.
and then on puppet code I use this query to find the nodes with the role attached
$hosts = query_nodes("role=apache")
To the extent that the question is how to create the wanted file with use of exported resources, it's important to understand that collecting exported resources causes them to be added to the target node's catalog. Exporting and collecting resources is not merely a data-transfer excercise. Your example exports several File resources; collecting these will result in the same number of separate files being managed on the target (or else a duplicate resource error if you happen to have a resource title collision).
For a long time, the usual way to build a file based on pieces contributed by multiple nodes was to use the puppetlabs/concat module, with each contributing node exporting a concat::fragment resource. For example:
##concat::fragment { "${hostname} Cassandra seed":
target => '/path/to/file',
content => " ${hostname}",
order => '10',
tag => 'cassandra'
}
Those would be collected into the catalog for the target node, to be used in conjunction with a corresponding concat resource declared there:
concat { '/path/to/file':
ensure => 'present',
# ...
}
concat::fragment { "Cassandra seed start":
target => '/path/to/file',
content => ' - seeds: "'
order => '05',
}
Concat::Fragment <<| tag == 'cassandra' |>>
concat::fragment { "Cassandra seed tail":
target => '/path/to/file',
content => '"'
order => '15',
}
That still works just fine, but it is becoming more common to rely instead on querying puppetdb, as you demonstrate in your own answer.

logstash - add only first time value

Here's what I want, it's a bit the opposite of incremental data.
some data's are logs with a specific token, and I want to be able to keep (or to show in Elasticsearch) only the first submitted data, the oldiest information of each token.
I want to ignore any new log of the same token ?
How can I do that ? is it in logstash or elasticsearch ?
Thanks
Updates 2016-05-31
I think we can see that in different perspective. but globally what I want is the table like in the picture, but without the red lines, I want them to be ignored by logstash, or not display in ES queries.
I know it can be done, if I was able to add any flag in those lines I want to delete, but it's not possible, the only fact that tell us they can be removed is because we already have a key first-AAA that has been logged before.
At the logging process, we don't have this information.
You can achieve this using the elasticsearch filter. The filter would check in ES if the record already exists and if it is the case, we ask Logstash to just drop the line.
Note that I'm making the assumption that the Id field (AAA) is used as the document _id and is also present in the document as the Id field. Feel free to change whatever needs to, but this will work.
input {
...
}
filter {
elasticsearch {
hosts => ["localhost:9200"]
query => "_type:your_type AND _id:%{[Id]}"
fields => {"Id" => "found"}
}
if [found] {
drop {}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
...
}
}

Integrate Elasticsearch with PostgreSQL while using Sails.js with Waterline ORM

I am trying to integrate Elasticsearch with Sails.js and my database isn't MongoDB: I use PostgreSQL, so this post doesn't help.
I have installed Elasticsearch on my Ubuntu box and now it's running successfully. I also installed this package on my Sails project, but I cannot create indexes on my existing models.
How can I define indexes on my models, and how can I search using Elasticsearch inside my Models?
What are the hooks which I need to define it inside models?
Here you could find a pretty straightforward package (sails-elastic). It operates by configs directly from elasticsearch itself.
Elasticsearch docs and index creation in particular
There are lots of approach to solve this issue. The recommended way is to use logstash by elasticsearch which I have given in detail.
I would list most of the approaches that I know here:
Using Logstash
curl https://download.elastic.co/logstash/logstash/logstash-2.3.2.tar.gz > logstash.tar.gz
tar -xzf logstash.tar.gz
cd logstash-2.3.2
Install the jdbc input plugin:
bin/logstash-plugin install logstash-input-jdbc
Then download postgresql jdbc driver.
curl https://jdbc.postgresql.org/download/postgresql-9.4.1208.jre7.jar > postgresql-9.4.1208.jre7.jar
Now create a configuration file for logstash to use jdbc input as input.conf:
input {
jdbc {
jdbc_driver_library => "/Users/khurrambaig/Downloads/logstash-2.3.2/postgresql-9.4.1208.jre7.jar"
jdbc_driver_class => "org.postgresql.Driver"
jdbc_connection_string => "jdbc:postgresql://localhost:5432/khurrambaig"
jdbc_user => "khurrambaig"
jdbc_password => ""
schedule => "* * * * *"
statement => 'SELECT * FROM customer WHERE "updatedAt" > :sql_last_value'
type => "customer"
}
jdbc {
jdbc_driver_library => "/Users/khurrambaig/Downloads/logstash-2.3.2/postgresql-9.4.1208.jre7.jar"
jdbc_driver_class => "org.postgresql.Driver"
jdbc_connection_string => "jdbc:postgresql://localhost:5432/khurrambaig"
jdbc_user => "khurrambaig"
jdbc_password => ""
schedule => "* * * * *"
statement => 'SELECT * FROM employee WHERE "updatedAt" > :sql_last_value'
type => "employee"
}
# add more jdbc inputs to suit your needs
}
output {
elasticsearch {
index => "khurrambaig"
document_type => "%{type}" # <- use the type from each input
document_id => "%{id}" # <- To avoid duplicates
hosts => "localhost:9200"
}
}
Now run logstash using the above file:
bin/logstash -f input.conf
For every model that you want to insert as a document(table) type in a index(database, khurrambaig here), use appropriate SQL statement ( SELECT * FROM employee WHERE "updatedAt" > :sql_last_value here). Here I have use sql_last_value to put only updated data only. You can do scheduling also and many stuff in logstash. Here I am using every minute. For more details refer this.
To see the documents which has been inserted into index for a particular type:
curl -XGET 'http://localhost:9200/khrm/user/_search?pretty=true'
This will list all the documents under customer models for my case. Look into elastic search api. Use that. Or use nodejs official client.
Using jdbc input
https://github.com/jprante/elasticsearch-jdbc
You can read its readme. It's quite straightforward. But this doesn't provide scheduling and many of the things that are provided by logstash.
Using sails-elastic
You need to use multiple adapters as given in README.
But this isn't recommended because it will slow down your requests. For every creation, updation and deletion, you will be calling two dbs : elastic search and postgresql.
In logstash, indexing of documents is independent of requests. This approach is used by many including wikipedia. Also you remain independent of framework. Today you are using sails, tomorrow you might use something else but you don't need to change anything in case of logstash if you still use postgresql. (If you change db, even then many of the db's input are available and in case of change from any sql rdbms to another, you just need to change to jdbc driver)
There's zombodb also but it work for pre 2.0 elastic only currently (Support for > ES 2.0 coming also).

Conditional creating fields depends on filtering results in logstash influxdb output

I'm using the logstash for collecting sar metrics from the server and store its in influxdb.
Metrics from different sources (CPU, Memory, Network) should be inserted to the different series in influxdb. Of course amount and names of fields in those series depends type of metric source.
This is my config file: https://github.com/evgygor/test/blob/master/logstash.conf
For each [type] of metrics I should configure separate influxdb output. In this example, I configured two types of metrics, but I'm planning to use it for SAR metrics, JMX metrics, csv from Jmeter metrics, that mean - I need configure the appropriate output for each of them (tens).
Questions:
How can I elaborate desired configuration?
I there any option to use conditions inside plugin. Example:
if [type]=="system.cpu" {
data_points => {
"time" => "%{time}"
"user" => "%{user}"
}
}
else {
data_points => {
"time" => "%{time}"
"kbtotalmemory" => "%{kbtotalmemory}"
"kbmemfree" => "%{kbmemfree}"
"kbmemused" => "%{kbmemused}"
}
}
Is there any flag to define to influxdb plugin to use by default fields names/data types from input?
Is there any flag/ability to define default datatype?
Is there any ability to set field name "time" reserved with datatype integer?
Thank a lot.
I cooked some nice solution.
This fork permits to create fields on the fly, accrding to fields names and datatypes that arrives to that output plugin.
I added 2 configuration paramters:
This settings revokes the needs to use data_points and coerce_values configuration # to create appropriate insert to influxedb. Should be used with fields_to_skip configuration # This setting sets data points (column) names as field name from arrived to plugin event, # value for data points config :use_event_fields_for_data_points, :validate => :boolean, :default => true
The array with keys to delete from future processing. # By the default event that arrived to the output plugin contains keys "#version", "#timestamp" # and can contains another fields like, for example, "command" that added by input plugin EXEC. # Of course we doesn't needs those fields to be processed and inserted to influxdb when configuration # use_event_fields_for_data_points is true. # We doesn't deletes the keys from event, we creates new Hash from event and after that, we deletes unwanted # keys.
config :fields_to_skip, :validate => :array, :default => []
This is my example config file: I'm retrieving different number of fields with differnt names from CPU, memory, disks, but I doesn't need defferent configuration per data type as in master branch. I'm creating relevant fields names and datatypes on filter stage and just skips the unwanted fields in outputv plugin.
https://github.com/evgygor/logstash-output-influxdb

Resources