Replace regex pattern in array of strings - Logstash - logstash

I'm trying to remove url prefix from urls Array in logstash using ruby:
The url looks like this: urlArray = ['https://www.google.com','https://www.bcc.com']
I tried to use other array and do something like:
urlArray.each{ |url| newUrlArray.push(url.gsub("(https?://)?(www\.)?","")) }
I also tried:
newUrlArray = urlArray.map{ |url| url.gsub("(https?://)?(www\.)?","") }
I think I miss here something with the gsub.
Thanks

I suggest to use slice and then capture the group you're interested in.
urlArray = [
"https://www.google.com",
"https://www.bcc.com",
"http://hello.com",
"www.world.com"
]
newUrlArray = []
pattern = /(https?:\/\/)?(www.)?(.*)/
urlArray.each{ |url| newUrlArray.push(url.slice(pattern, 3)) }
puts newUrlArray
# google.com
# bcc.com
# hello.com
# world.com

Related

How to get a value of variable in groovy

String elkEndpoint = 'https://elastic.beta.tower.am.health.ge.com/'
I need to use value of elkEndpoint somewhere in code like below:
// Some code
'metrics': [
'elasticEndpoint': elkEndpoint,
'esConnection': ''
],
// Some code
I tried using below, but its not working:
'elasticEndpoint': elkEndpoint,
2 'elasticEndpoint': ${elkEndpoint},
3 'elasticEndpoint': $elkEndpoint,
What is way to use value of a variable?
What is way to use value of a variable?
You can do this:
String elkEndpoint = 'https://elastic.beta.tower.am.health.ge.com/'
Map metrics = [ elasticEndpoint: elkEndpoint, esConnection: '' ]
println metrics
That will output the following:
[elasticEndpoint:https://elastic.beta.tower.am.health.ge.com/, esConnection:]

Python Error: Cannot extract data from dictionary

I have a json output (y) like this below.
{
"WebACL":{
"Name":"aBlockKnownBadInputs-WebAcl",
"Id":"4312a5d0-9878-4feb-a083-09d7a9cfcfbb",
"ARN":"arn:aws:wafv2:us-east-1:100467320728:regional/webacl/aBlockKnownBadInputs-WebAcl/4312a5d0-9878-4feb-a083-09d7a9cfcfbb",
"DefaultAction":{
"Allow":{
}
},
"Description":"",
"Rules":[
{
"Name":"AWS-AWSManagedRulesKnownBadInputsRuleSet",
"Priority":500,
"Statement":{
"ManagedRuleGroupStatement":{
"VendorName":"AWS",
"Name":"AWSManagedRulesKnownBadInputsRuleSet"
}
},
"OverrideAction":{
"None":{
}
},
"VisibilityConfig":{
"SampledRequestsEnabled":true,
"CloudWatchMetricsEnabled":true,
"MetricName":"AWS-AWSManagedRulesKnownBadInputsRuleSet"
}
}
]
}
}
I want to extract "AWS-AWSManagedRulesKnownBadInputsRuleSet" from the section:-
"Name":"AWS-AWSManagedRulesKnownBadInputsRuleSet",
"Priority":500,
"Statement":{
"ManagedRuleGroupStatement":{
"VendorName":"AWS",
"Name":"AWSManagedRulesKnownBadInputsRuleSet"*
At the minute my code is returning a key error:
KeyError: 'Rules[].Statement[].ManagedRuleGroupStatement[].Name'
The format of this line is clearly wrong, but I don't know why.
ruleset = y['Rules[].Statement[].ManagedRuleGroupStatement[].Name']
My code block:
respons = client.get_web_acl(Name=(acl),Scope='REGIONAL',Id=(ids))
for y in response['WebACLs']:
ruleset = y['Rules[].Statement[].ManagedRuleGroupStatement[].Name']
Does anyone know what I'm doing wrong here?
UPDATE :- In case anyone else comes up against this problem, I fixed this by doing it a slightly different way.
#Requesting the info from AWS via get_web_acl request
respons = client.get_web_acl(Name=(acl),Scope='REGIONAL',Id=(ids))
#Converting the dict output to a string to make it searchable
result = json.dumps(respons)
#Defining what I want to search for
fullstring = "AWS-AWSManagedRulesKnownBadInputsRuleSet"
#Searching the output & printing the result: if = true / else = false
if fullstring in result:
print("Found WAF ruleset: AWS-AWSManagedRulesKnownBadInputsRuleSet!")
else:
print("WAF ruleset not found!")
Also, as part of my research I found a cool thing called jello.
(https://github.com/kellyjonbrazil/jello).
jello is similar to jq in that it processes JSON and JSON Lines data except it uses standard python dict and list syntax.
So, I copied my json into a file called file.json
Ran cat file.json | jello -s to print a grep-able schema by using the -s option
Found the bit I was interested in - in my case the name of the ManagedRuleGroupStatement and ran the following:
cat file.json | jello -s _.WebACL.Rules[0].Statement.ManagedRuleGroupStatement.Name
This gave me exactly what I wanted!
It doesn't work inside a python script but was something new and interesting to learn!

How to get a unique pattern from a list

I have a list like this:
[ '0D',
'0A,0C',
'0C,0A',
'0C,0E,0D,0F',
'0C,0D,0E,0F',
'0B,0G',
'0B,0F'
]
In this list '0A,0C' and '0C,0A'.Also '0C,0E,0D,0F' &
'0C,0D,0E,0F' are similar. How to get the unique items from a list like this. I tried set but I guess the functionality of set is a bit different.
´set´ is good, if you use ´split´ first:
l = ['0D', '0A,0C', '0C,0A', '0C,0E,0D,0F', '0C,0D,0E,0F', '0B,0G', '0B,0F']
for i in range(len(l)):
l[i] = ','.join(sorted(l[i].split(',')))
l = set(l)
# {'0A,0C', '0B,0F', '0B,0G', '0C,0D,0E,0F', '0D'}

node uri regex not capturing capture groups

I know there are a billion regex questions on stackoverflow, but I can't understand why my uri matcher isn't working in node.
I have the following:
var uri = "file:tmp.db?mode=ro"
function parseuri2db(uri){
var regex = new RegExp("(?:file:)(.*)(?:\\?.*)");
let dbname = uri.match(regex)
return dbname
}
I'm trying to identify only the database name, which I expect to be:
After an uncaptured file: group
Before an optional ? + parameters to end of string.
While I'm using:
var regex1 = new RegExp("(?:file:)(.*)(?:\\?.*)");
I thought the answer was actually more like:
var regex2 = new RegExp("(?:file:)(.*)(?:\\??.*)");
With a 0 or 1 ? quantifier on the \\? literal. But the latter fails.
Anyway, my result is:
console.log(parseuri2db(conf.db_in.filename))
[ 'file:tmp.db?mode=ro',
'tmp.db',
index: 0,
input: 'file:tmp.db?mode=ro' ]
Which seems to be capturing the whole string in the first argument, rather than just the single capture group I asked for.
My questions are:
What am I doing wrong that I'm getting multiple captures?
How can I rephrase this to capture my capture groups with names?
I expected something like the following to work for (2):
function parseuri2db(uri){
// var regex = new RegExp("(?:file:)(.*)(?:\\?.*)");
// let dbname = uri.match(regex)
var regex = new RegExp("(?<protocol>file:)(?<fname>.*)(<params>\\?.*)");
let [, protocol, fname, params] = uri.match(regex)
return dbname
}
console.log(parseuri2db(conf.db_in.filename))
But:
SyntaxError: Invalid regular expression: /(?<protocol>file:)(?<fname>.*)(<params>\?.*)/: Invalid group
Update 1
Answer to my first question is that I needed to not capture the ? literal in the second capture group:
"(?:file:)([^?]*)(?:\\??.*)"
That particular node regex library does not support groups.

Why does this String→List→Map conversion doesn't work in Groovy

I have input data of type
abc 12d
uy 76d
ce 12a
with the lines being separated by \n and the values by \t.
The data comes from a shell command:
brlist = 'mycommand'.execute().text
Then I want to get this into a map:
brmap = brlist.split("\n").collectEntries {
tkns = it.tokenize("\t")
[ (tkns[0]): tkns[1] ]
}
I also tried
brmap = brlist.split("\n").collectEntries {
it.tokenize("\t").with { [ (it[0]): it[1] ] }
}
Both ways gave the same result, which is a map with a single entry:
brmap.toString()
# prints "[abc:12d]"
Why does only the first line of the input data end up being in the map?
Your code works, which means the input String brlist isn't what you say it is...
Are you sure that's what you have? Try printing brlist, and then it inside collectEntries
As an aside, this does the same thing as your code:
brlist.split('\n')*.split('\t')*.toList().collectEntries()
Or you could try (incase it's spaces not tabs, this will expect both)
brlist.split('\n')*.split(/\s+/)*.toList().collectEntries()
This code works
// I use 4 spaces as tab.
def text = 'sh abc.sh'.execute().text.replaceAll(" " * 4, "\t")
brmap = text.split("\n").collectEntries {
tkns = it.tokenize("\t")
[(tkns[0]) : tkns[1]]
}
assert[abc:"12d", uy:"76d", ce:"12a"] == brmap
abc.sh
#!/bin/sh
echo "abc 12d"
echo "uy 76d"
echo "ce 12a
Also, I think your groovy code is correct. maybe your mycommand has some problem.
Ok, thanks for the hints, it is a bug in Jenkins: https://issues.jenkins-ci.org/browse/JENKINS-26481.
And it has been mentioned here before: Groovy .each only iterates one time

Resources