Paring a variable length dot separated string in grok

Paring a variable length dot separated string in grok - logstash

I am new with logstash and grok filters. I am trying to parse a string from an Apache Access Log, with a grok filter in logstash, where the username is part of the access log in the following format:
name1.name2.name3.namex.id
I want to build a new field called USERNAME where it is name1.name2.name3.namex with the id stripped off. I have it working, but the problem is that the number of names are variable. Sometimes there are 3 names (lastname.firstname.middlename) and sometimes there are 4 names (lastname.firstname.middlename.suffix - SMITH.GEORGE.ALLEN.JR
%{WORD:lastname}.%{WORD:firstname}.%{WORD:middle}.%{WORD:id}
When there are 4 names or more it does not parse correctly. I was hoping someone can help me out with the right grok filter. I know I am missing something probably pretty simple.

You could use two patterns, adding another one that matches when there are 4 fields:
%{WORD:lastname}.%{WORD:firstname}.%{WORD:middle}.%{WORD:suffix}.%{WORD:id}
But in this case, you're creating fields that it sounds like you don't even want.
How about a pattern that splits off the ID, leaving everything in front of it, perhaps:
%{DATA:name}.%{INT}

Related

How to get a substring with Regex in Python

I am trying to formnulate a regex to get the ids from the below two strings examples:
/drugs/2/drug-19904-5106/magnesium-oxide-tablet/details
/drugs/2/drug-19906/magnesium-moxide-tablet/details
In the first case, I should get 19904-5106 and in the second case 19906.
So far I tried several, the closes I could get is [drugs/2/drug]-.*\d but would return g-19904-5106 and g-19907.
Please any help to get ride of the "g-"?
Thank you in advance.

When writing a regex expression, consider the patterns you see so that you can align it correctly. For example, if you know that your desired IDs always appear in something resembling ABCD-1234-5678 where 1234-5678 is the ID you want, then you can use that. If you also know that your IDs are always digits, then you can refine the search even more
For your example, using a regex string like
.+?-(\d+(?:-\d+)*)
should do the trick. In a python script that would look something like the following:
match = re.search(r'.+?-(\d+(?:-\d+)*)', my_string)
if match:
my_id = match.group(1)
The pattern may vary depending on the depth and complexity of your examples, but that works for both of the ones you provided

This is the closest I could find: \d+|.\d+-.\d+

Terraform string manipulation to remove X elements

I have several strings that I just want to get a subset of like so:
my-bucket-customer-staging-ie-app-data
my-bucket-customer2-longname-prod-uk-app-data
and I just need to get the customers name from the string, so with the above examples I'd be left with
customer
customer2-longname
There's probably a simple way of doing this with regex although I've failed miserably in my attempts.
I'm able to strip the first part of the string using
trimprefix("my-bucket-customer-staging-ie-app-data", "my-bucket-")
trimprefix("my-bucket-customer-longname-prod-uk-app-data", "my-bucket-")
resulting in
customer-staging-ie-app-data
customer-longname-prod-uk-app-data
However Terraform's trimsuffix won't work as there can be several different regions/environments used.
What I'd like to do is slice the string and ignore the last 4 elements, which should then return the customer name regardless of whether it contains an additional delimiter in it.
Something like this captures the customer, however fails for long customer names:
element(split("-",trimprefix("my-bucket-customer-staging-uk-app-data", "my-bucket-")), length(split("-",trimprefix("my-bucket-customer-staging-uk-app-data", "my-bucket-")))-5)
customer
and is also quite messy.
Is there a more obvious solution I'm missing

I believe it's what regex is for.
> try(one(regex("\\w*-\\w*-(\\w*(?:-\\w*)*)-\\w*-\\w*-\\w*-\\w*","my-bucket-customer-staging-ie-app-data")),"")
"customer"
> try(one(regex("\\w*-\\w*-(\\w*(?:-\\w*)*)-\\w*-\\w*-\\w*-\\w*","my-bucket-customer2-longname-prod-uk-app-data")),"")
"customer2-longname"
> try(one(regex("\\w*-\\w*-(\\w*(?:-\\w*)*)-\\w*-\\w*-\\w*-\\w*","my-bucket-customer2-longname-even-longer-prod-uk-app-data")),"")
"customer2-longname-even-longer"
Reference: https://www.terraform.io/language/functions/regex

Logstash Grok pattern to cut split a string and remove last part

Below is the field that is filebeat log path, that I need to split with delimiter '/' and remove the log file name in the text.
"source" : "/var/log/test/testapp/c.log"
I need only this part
"newfield" : "/var/log/test/testapp"

If you do a little of research you can find that this is a trivial question and it has not much complexity. You can use grok-patterns to match the interesting parts and differentiate the one you want to retrieve from the one you don't.
A pattern like this will match as you expected, having the newfield as you desire:
%{GREEDYDATA:newfield}(/%{DATA}.log)
Anyway, you can test your Grok patterns with this tool, and here you have some usefull grok-patterns. I recommend you to take a look to those resources.

logstash custom patterns not parsing

i am facing an issue in parsing the below pattern
the log file will have log importance in the form of == or <= or >= or << or >>
I am trying the below custom pattern. Some of the log msgs may not have this pattern, so I am using *
(?(=<>)*)
But the log mesages are not parsing and give 'grokparsefailure'
kindly check and suggest if the above pattern is wrong.. Thanks much

below pattern is working fine.
(?[=<>]*)
the one which I used earlier and was erroring is
(?(=<>)*)

One thing to note, there is a better way to handle the "some do, some don't" aspect of your log-data.
(?<Importance>(=<>)*)
That will match more than you want. To get the sense of 'sometimes':
((?<Importance>(=<>)*)|^)
This says, match these three characters and define the field Importance, or leave the field unset.
Second, you're matching specifically two characters, in combinations:
((?<Importance>(<|>|=){2})|^)
This should match two instances of any of the trio of characters you're looking for.

Logstash - Convert Field Names to Lowercase

I am processing http logs and converting querystring parameters to fields.
kv
{
source => "uriQuerystring"
field_split => "&"
target => "uriQuerystringKeys"
}
However because callers are using mixed case parameters, I end up with numerous duplicates.
eg: uriQuerystringKeys.apiKey, uriQuerystringKeys.ApiKey, uriQuerystringKeys.APIKey
What do I need to do in my logstash configuration to convert all these field names to lowercase?
I see there's an open issue for this feature to be implemented in Logstash, but it's incomplete. There's a suggestion for some ruby code to be directly executed, but it looks like this converts all fields (not just ones of a certain prefix).

Here's a prior answer that contains the basic code you would need.
You can see a conditional inside the loop, which you could use to enforce the prefix limitations on the fields.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Paring a variable length dot separated string in grok - logstash

Related

How to get a substring with Regex in Python

Terraform string manipulation to remove X elements

Logstash Grok pattern to cut split a string and remove last part

logstash custom patterns not parsing

Logstash - Convert Field Names to Lowercase

Categories

Resources