Currently I'm using this command on Linux:
grep Ban /var/log/fail2ban.log | grep -v 'Restore Ban' | sed 's/\s\s*/ /g' | cut -d" " -f8 | sort | uniq -c | sort -t ' ' -n -b
The log file looks like this:
2019-03-04 07:14:45,778 fail2ban.filter [19052]: INFO [sshd] Found 2*8.1*7.1*9.2*9
2019-03-04 07:14:46,412 fail2ban.actions [19052]: NOTICE [sshd] Ban 2*8.1*7.1*9.2*9
2019-03-04 07:15:04,708 fail2ban.actions [19052]: NOTICE [sshd] Unban 1*9.2*.2*4.1*6
...
The output looks like this:
8 1*2.2*6.1*1.1*5
12 3*.1*.*4.*6
18 1*5.2*8.2*5.4
19 1*2.2*6.1*1.1*4
72 3*.1*6.2*.9*
I already tried it with Get-Content but I don't understand all of the PowerShell syntax.
Your Linux command packs a lot of functionality into a single pipeline.
Your lack of effort to solve the problem yourself notwithstanding, constructing an equivalent PowerShell command is an interesting exercise in contrasting
a Unix-utilities solution with a PowerShell solution:
To set the scene, let me explain what your command does:
grep Ban /var/log/fail2ban.log case-sensitively finds lines that contain the word Ban in file /var/log/fail2ban.log and passes only those on.
grep -v 'Restore Ban' further (case-sensitively) filters out (-v) lines that contain the phrase 'Restore Ban'.
sed 's/\s\s*/ /g' replaces all (g) runs of 1 or more whitespace chars. (\s; in a modern regex dialect you'd use \s+) with a single space ...
... which then allows cut -d" " -f8 to reliably extract the 8th field from each line from the resulting space-separated list (e.g., 2*8.1*7.1*9.2*9).
sort then lexically sorts the resulting lines, and uniq -c weeds out duplicates, while prepending each unique line with the count of duplicates (-c), with 1 indicating a unique line.
Finally, sort -t ' ' -n -b sorts the resulting lines numerically by duplicate count.
In short: your command filters a log file via regex matching, extracts the 8th field from each line, eliminates duplicates, and prints unique fields prefixed with their duplicate count, sorted by duplicate count in ascending order.
Below is a near-equivalent PowerShell command, which:
is more readable (and therefore, of necessity, more verbose)
involves fewer steps
ultimately offers much more flexibility, due to:
sending objects through the pipeline, not just text that must often be (re)parsed - it is this feature that constitutes PowerShell's evolutionary quantum leap from traditional shells.
far superior language features (compared to POSIX-like shells such as bash) that can easily be woven into a pipeline.
That said, the price you pay for the increased power is performance:
Directly comparable commands perform much better using Unix utilities, though the usually higher level of abstraction and flexibility provided by PowerShell cmdlets may make up for that.
Here's the command, with the roughly corresponding Unix-utility calls in comments:
Select-String -CaseSensitive '(?<!Restore )Ban' /var/log/fail2ban.log | #grep,grep -v
ForEach-Object { (-split $_.Line)[7] } | # sed, cut -f8
Group-Object | # uniq -c
Select-Object Count, Name | # construction of output *objects*
Sort-Object Count, Name # sort, sort -n
The command outputs objects with a .Count (duplicate count) and .Name property (the 8th field from the log file), which:
allow for robust additional processing (no parsing of textual output needed).
render in a friendly manner to the console (see below).
Example output:
Count Name
----- ----
8 1*2.2*6.1*1.1*5
12 3*.1*.*4.*6
18 1*5.2*8.2*5.4
19 1*2.2*6.1*1.1*4
72 3*.1*6.2*.9*
For an explanation of the command, consult the following help topics, which are also available locally, as part of a PowerShell installation, via the Get-Help cmdlet:
Select-String
about_Regular_Expressions
ForEach-Object
about_Split (the -split operator)
Group-Object
Select-Object
To learn about renaming properties or creating calculated properties, see this answer.
Sort-Object
((Get-Content "fail2ban.log") -cmatch "(?<!Restore )Ban" | Select-String -Pattern "[0-9.*]+$" -AllMatches).matches.value | Group-Object | foreach {"$($_.count) $($_.name)"}
Get-Content here is grabbing each line of the fail2ban.log file. The -cmatch operator is performing a case-sensitive regex match. The regex pattern looks for the string Ban with a negative look behind of string Restore . The Select-String is looking for a regex pattern at the end of each line that has characters in the set (0123456789.*). The matches.value property outputs only the matched strings from the regex. Group-Object groups each identically matched value as property Name and adds a count property. Since the OP was capturing count, I decided to use Group-Object to easily get that. The foreach is simply doing formatting to match the output presentation of the OP.
Related
Could someone clarify how sls (Select-String) works compared to grep and findstr?
grep: grep <pattern> files.txt
sls: sls <pattern> files.txt
(default parameter position for sls is pattern then file)
grep examples: grep "search text" *.log ; cat *.log | grep "search text"
sls examples: sls "search text" *.log ; cat *.log | grep "search text"
As an aside, all PowerShell Cmdlets are case-insensitive, unlike Linux tools which are generally always case-sensitive but also older tools like findstr which are case sensitive too, but findstr can be used in PowerShell, and works in situations where sls does not, for example: Get-Service | findstr "Sec" (this works without a problem!), but when we try to use sls in a similar way Get-Service | sls "Sec" we get nothing (presumably this fails because sls works with strings, but Get-Service returns an object, so that's understandable - but what is findstr doing then in that it can see the output as a string?).
So, my thinking is "ok, I need to make the output from Get-Service into a string to work with PowerShell Cmdlets", but that doesn't work (or not in a way that I would expect):
Get-Service | Out-String | sls "Sec" (gives results, but odd)
(Get-Service).ToString() | sls "Sec" (.ToString() just returns "System.Object[]")
How in general should I turn an object into a string so that it can manipulate the information (in the same way that Get-Service | findstr "Sec" can do so easily)?
Would appreciate if someone could clarify how things fit together in the above so that I can make more use of sls. In particular, Get-Service | Out-String | sls "Sec" does return stuff, just not the stuff I was expecting (is it searching for each character of "s" and "e" and "c" so is returning lots - that would not be very intuitive if so in my opinion)?
When you use Out-String by default, it turns the piped input object (an array of service objects in this case) into a single string. Luckily, the -Stream switch allows each line to be output as a single string instead. Regarding case-sensitivity, Select-String supports the -CaseSensitive switch.
# For case-insensitive regex match
Get-Service | Out-String -Stream | Select-String "Sec"
# For case-sensitive regex match
Get-Service | Out-String -Stream | Select-String "Sec" -CaseSensitive
# For case-sensitive non-regex match
Get-Service | Out-String -Stream | Select-String "Sec" -CaseSensitive -SimpleMatch
In either case, Select-String uses regex (use the -SimpleMatch switch to do a string match) to pattern match against each input string and outputs the entire string that matched the pattern. So if you only pipe into it a single string with many lines, then all lines will be returned on a successful match.
To complement AdminOfThings' helpful answer:
In order to find strings among the lines of the for-display string representations of non-string input objects as they would print to the console you indeed have to pipe to Out-String -Stream, whereas by default, simple .ToString() stringification is applied[1].
You shouldn't have to do this manually, however: Select-String should do it implicitly, as suggested in GitHub issue #10726.
Curiously, when piping to external programs such as findstr.exe, PowerShell already does apply Out-String -Stream implicitly; e.g:
Get-Date 1/1/2019 | findstr January works (in en-based cultures), because it is implicitly the same as Get-Date 1/1/2019 | Out-String -Stream | findstr January
By contrast, Get-Date 1/1/2019 | Select-String January is the equivalent of (Get-Date 1/1/2019).ToString([cultureinfo]::InvariantCulture) | Select-String January, and therefore does not work, because the input evaluates to 01/01/2019 00:00:00.
[1] More accurately, .psobject.ToString() is called, either as-is, or - if the object's ToString method supports an IFormatProvider-typed argument - as .psobject.ToString([cultureinfo]::InvariantCulture) so as to obtain a culture-invariant representation - see this answer for more information.
I want to change my command:
anzahl=`cat $1 | grep -i "error" | wc -l`
This command also counts messages which are like this:
2017-07-15 03:07:02,746 [INFO] blabla:123 #blabla:123 - rhsmd started. Error.
But there is the word Info. So I dont want that it counts.
I just want messages like this:
2017-07-15 06:12:45,362 [ERROR] blabla:123 #blabla:123- Either the consumer is not registered or the certificates are corrupted. Certificate update using daemon failed.
Some tips how I can do this?
Generally you want:
anzahl=$(grep -c '\[ERROR\]' "$1")
This would search for the literal string [ERROR] in the logfile, -c returns the number of matches which makes wc -l superfluous.
Anyhow this would still match [ERROR] at any position of the strings. While this should be good enough in most cases, more precise would be this awk command:
anzahl=$(awk '$3=="[ERROR]"{c++}END{print c}' "$1")
This command would check if [ERROR] appears exactly in the third column of a line and counts those lines. At the end of input it prints the count.
Btw, German variable names doesn't suit for an international audience as on Stackoverflow. I recommend to use English variable names: count
If you don't actually want a regular expression but really just want to count a string, there are grep options for that:
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by new-
lines, any of which is to be matched.
So your command should be:
anzahl=$(grep -c -F '[ERROR]' "$1")
Of course, even that string might appear some place other than the third whitespace-delimited field of the line. If you want to stick with grep rather than switching to a tool like awk for your counting, you can do so by going back to what is perhaps an awkward regular expression:
anzahl=$(grep -c -E '^[^ ]+ [^ ]+ [[]ERROR[]]' "$1")
This uses grep's -E option to specify that you're using an Extended regular expression. The expression consists of two strings of not-space, each followed by a space, all of which is followed by your error tag.
echo "DISPLAY QL($Queue) CURDEPTH" \
| runmqsc Queue_Managr \
| grep 'CURDEPTH(' \
| sed 's/.*CURDEPTH//' \
| tr -d '()'.
Can anyone suggest how this script works? Actually this command displays the current depth value for a particular Q_Manager for a particular queue.
I understand "DISPLAY QL($Queue) CURDEPTH" | runmqsc Queue_Managr" - this command displays the queue name and curdepth{value}.
But I don't understand grep 'CURDEPTH(' | sed 's/.*CURDEPTH//' | tr -d '(). How does this command work?
It's a pipeline. It contains five stages, separated by the pipe character |. The output of one stage is used as the input to the next stage.
echo "DISPLAY blatti blatti" - this just outputs some text.
runmqsc Queue_Managr - Uses the text as input to the runmqsc-command, which does some MQ magic and outputs data.
grep 'CURDEPTH(' - Grep is a standard unix utility. It filters its input. In this case, only lines containing the text CURDEPTH( is allowed through to the next stage.
sed 's/.*CURDEPTH//' - Sed is another standard utility. It's short for "stream editor", and allows you edit the input as it passes through. In this case, the expression 's/.*CURDEPTH//' means to delete everything from the start of each line, up to and including the text CURDEPTH. (remember, only lines containing that text was passed through from the previous stage).
tr -d '()' - Finally, another standard utility, tr, which also allows editing the text that flows through from input to output. -d '()' means delete the characters ( and ) from the text.
The output from the final stage is shown in the terminal (if you ran your script in a terminal).
It's a fairly common way of building scripts in a unix shell. Generate the input data somehow, push it to a command, and massage the output data through a couple of stages each doing its little bit.
Long dissertations can be (and probably have been) written about all of grep, sed and tr. Look them up if you're interested.
CURDEPTH(3) DEFBIND(OPEN)
Notice that there are 2 pairs of attribute-value in this output. We need to handle only the appropriate pair.
We might be tempted to use the "cut" command to do simple trimming of the first pair to get the value.
However, the output from runmqsc for queues that have very long names (such as 48 characters) shows CURDEPTH as the 2nd pair (as shown below). Thus, a simple use of "cut" is no longer possible:
CRTIME(09.08.08) CURDEPTH(3)
The use of the "sed" (stream editor) can help us to get the value. Notice that the parenthesis are included.
$ echo "DISPLAY QL($QNAME) CURDEPTH" | runmqsc $QMNAME | grep 'CURDEPTH(' | sed 's/.*CURDEPTH//'
(3)
Notice that the answer is: (3)
Finally, it is necessary to remove the open and close parenthesis. This can be done using "tr" as follows:
$ echo "DISPLAY QL($QNAME) CURDEPTH" | runmqsc $QMNAME | grep 'CURDEPTH(' | sed 's/.*CURDEPTH//' | tr -d '()'
3
Notice that the answer is: 3
I'm trying to make the output of a ps -ef more readable on Red Hat Linux. I know this has been asked many times, but I've several java processes that I regulary need to monitor and the line length for each process is atleast 500 characters but each line is a different length. I need the 1st 14 characters so I get the pid and around the last 40 characters of the same line to get the name.
What I've got so far is:
ps -ef | grep -v 'eclipse' | grep java | cut -c1-14
which strips out my copies of Eclipse that are running and then gets the other java processes and then cuts in the 1st part of the line.
I know how to get the last part by using rev both sides of the cut, but I can't work out how to combine the 2 together.
You can give cut several regions to cut but it can't cut from the end, so to cut the last 40 characters, you need to know the line length in advance.
I suggest to use a more powerful tool like gawk:
ps -ef|gawk '
/eclipse/ {next}
/java/ {
printf("%-10s %8s ...%s\n", $1, $2, substr($0,length()-40));
}'
which also allows you to format the output nicely.
Okay so what I am trying to figure out is how do I count the number of periods in a string and then cut everything up to that point but minus 2. Meaning like this:
string="aaa.bbb.ccc.ddd.google.com"
number_of_periods="5"
number_of_periods=`expr $number_of_periods-2`
string=`echo $string | cut -d"." -f$number_of_periods`
echo $string
result: "aaa.bbb.ccc.ddd"
The way that I was thinking of doing it was sending the string to a text file and then just greping for the number of times like this:
grep -c "." infile
The reason I don't want to do that is because I want to avoid creating another text file for I do not have permission to do so. It would also be simpler for the code I am trying to build right now.
EDIT
I don't think I made it clear but I want to make finding the number of periods more dynamic because the address I will be looking at will change as the script moves forward.
If you don't need to count the dots, but just remove the penultimate dot and everything afterwards, you can use Bash's built-in string manuipulation.
${string%substring}
Deletes shortest match of $substring from back of $string.
Example:
$ string="aaa.bbb.ccc.ddd.google.com"
$ echo ${string%.*.*}
aaa.bbb.ccc.ddd
Nice and simple and no need for sed, awk or cut!
What about this:
echo "aaa.bbb.ccc.ddd.google.com"|awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
(further shortened by helpful comment from #steve)
gives:
aaa.bbb.ccc.ddd
The awk command:
awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
works by separating the input line into fields (FS) by ., then joining them as output (OFS) with ., but the number of fields (NF) has been reduced by 2. The final 1 in the command is responsible for the print.
This will reduce a given input line by eliminating the last two period separated items.
This approach is "shell-agnostic" :)
Perhaps this will help:
#!/bin/sh
input="aaa.bbb.ccc.ddd.google.com"
number_of_fields=$(echo $input | tr "." "\n" | wc -l)
interesting_fields=$(($number_of_fields-2))
echo $input | cut -d. -f-${interesting_fields}
grep -o "\." <<<"aaa.bbb.ccc.ddd.google.com" | wc -l
5