I have a file with the following format:
10302\t<document>.....</document>
12303\t<document>.....</document>
10054\t<document>.....</document>
10034\t<document>.....</document>
as you can see there are two values separated by a tab char. I need to
index the first token (e.g. 10302, 12303...) as ID
extract (and then index) some information from the second token (the XML document). In other words, the second token would be used with the xml filter for extracting some information
Is it possibile to do that separating the two values using the kv filter? Ideally I should end, for each line, with a document like this:
id:10302
msg:<document>....</document>
I could use a grok filter but I'd like to avoid any regex as the field detection is very easy and can be accomplished with a simple key-value logic. However, using a plain kv detection I'm ending with the following:
"10302": <document>.....</document>
"12303": <document>.....</document>
"10054": <document>.....</document>
"10034": <document>.....</document>
and this is not want I need.
It is not possible to use kv for the job you want to do, as far as I know, since there are no possible key for the id (10302, 10303, 10304...). There are no possible key since there is nothing before the id.
This grok configuration would work, assuming each id + document is on the same line :
grok {
match => { "message" => "^%{INT:ID}\t%{GREEDYDATA:msg}"}
}
Related
I'm trying to filter an influx DB query (using the nodeJS influxdb-client library).
As far as I can tell, it only works with "flux" queries.
I would like to filter out all records that share a specific attribute with any record that matches a particular condition. I'm filtering using the filter-function, but I'm not sure how I can continue from there. Is this possible in a single query?
My filter looks something like this:
|> filter(fn:(r) => r["_value"] == 1 and r["button"] == "1" ) and I would like to leave out all the record that have the same r["session"] as any that match this filter.
Do I need two queries; one to get those r["session"]s and one to filter on those, or is it possible in one?
Update:
Trying the two-step process. Got the list of r["session"]s into an array, and attempting to use the contains() flux function now to filter values included in that array called sessionsExclude.
Flux query section:
|> filter(fn:(r) => contains(value: r["session"], set: ${sessionsExclude}))
Getting an error unexpected token for property key: INT ("102")'. Not sure why. Looks like flux tries to turn the values into Integers? The r["session"] is also a String (and the example in the docs also uses an array of Strings)...
Ended up doing it in two queries. Still confused about the Strings vs Integers, but casting the value as an Int and printing out the array of r["session"] within the query seems to work like this:
'|> filter(fn:(r) => not contains(value: int(v: r["session"]), set: [${sessionsExclude.join(",")}]))'
Added the "not" to exclude instead of retain the values matching the array...
the problem is the following: I'm investigating how to add some anti-tampering protection to events stored in OpenSearch that are parsed and sent there by Logstash. Info is composed of application logs collected from several hosts. The idea is to add a hashed field that's linked to the original content so that any modification of the fields break the hash result and can be detected.
Currently, we have in place some grok filters that extract information from the received log lines and store it into different fields using several patterns. To make it more difficult for an attacker who modifies these logs to cover their tracks, I'm thinking of adding an extra field where the whole line is hashed and salted before splitting.
Initial part of my filter config is like this. It was used primarily with ELK, but our project will be switching to OpenSearch:
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:mytimestamp} (\[)*%{LOGLEVEL:loglevel}(\])* %{JAVACLASS:javaclass}(.)*(\[/])* %{DATA:component} %{DATA:version} - %{GREEDYDATA:message}"}
overwrite => [ "message" ]
overwrite => [ "version" ]
break_on_match => false
keep_empty_captures => true
}
// do more stuff
}
OpenSearch has some info on Field masking, but this is not exactly what I am after.
If any of you could help me with a pointer or an idea on how to do this. I don't know whether the hash fields available in ELK are also available in OpenSearch, or whether the Logstash plugin that does the hashing of fields would be usable without licensing issues. But maybe there are other and better options that I am not aware. I was looking for info on how to call an external script to do this during the filter execution, but I don't even know whether that is possible (apparently not, at least I couldn't find anything).
Any ideas? Thank you!
I try to convert a field "tmp_reponse" in integer in the file "conf" with logstash as follows :
mutate {
convert => {"TMP_REPONSE" => "integer"}
}
,but on Kibana it shows me that he is still string. I do not understand how I can make a convertion to use my fields "tmp_response" to use it like as a metric fields on kibana
thank you help me please and if there is anyone who can explain to me how I can master the metrics on Kibana and use fields as being of metrics fields
mutate{} will change the type of the field in logstash. If you added a stdout{} output stanza, you would see that it's an integer at that point.
How elasticsearch treats it is another problem entirely. Elasticsearch usually sets the type of a field based on the first input received, so if you sent documents in before you added the mutate to your logstash config, they would have been strings and the elasticsearch index will always consider that field to be a string.
The type may also have been defined in an elasticsearch template or mapping.
The good news is that your mutate will probably set the type when a new index is created. If you're using daily indexes (the default in logstash), you can just wait a day. Or you can delete the index (losing any data so far) and let a new one be created. Or you could rebuild the index.
Good luck.
I'm new at ELK-stack and want to add a field in kibana(discover) interface that matches a specific part of the message text (one word or a sentence).
for example:
I want to have a field in the left side that matches the word 'installed' in the message text.
Which filter in logstash should I use and how does it look like?
How about grok{}, which applies a regular expression to your input message and can make new fields?
Thanks for the answer. I used grok as following to match how many users created new accounts.
grok {
match => [ "message", "(?<user_created>(user_created))"]
break_on_match => false
}
Anyway I found out the problem is that Kibana is showing old logs and doesn't care what I do in the logstash config file! still can't find out why!
I have many messages like this:
Error GetMilesFromLocationService(Eastvale, CA,Yorkshire, NY,1561517,19406,True.)
The problem is that they are unique because of the city names. In a Kibana Visualization, is it possible group these into "Error GetMilesFromLocationService" messages? Here's an example of my metrics visual. Ideally, they would all be in one row.
These could be easily grouped by a regex match.
Of course, I could add a new field with Logstash, but if Kibana is able to do this, I'll be happy.
Thanks!
Use a grok filter to parse the message and extract fields from it. At the very least you'll want to extract "Error GetMilesFromLocationService" into a separate field (perhaps error_type?) to allow aggregation. Or perhaps it would be better to extract "GetMilesFromLocationService" into a function field? Without knowing the structure of your log messages giving firm advice is hard.
This grok filter extracts an error_type field:
filter {
grok {
match => [
"message",
"^(?<error_type>Error %{WORD})"
]
}
}