Parsing multi-line variables using grep

Parsing multi-line variables using grep - linux

I am trying to figure out why this does not work and of course how to address it, I have a long list of dates in a variable and would like to count the number of occurrences using grep, it seems like splitting a variable over new lines does not work as expected? Example,
$ list="2015-a 2015-b 2016-a" ; count=`echo $list | tr " " \\n | grep 2015 | wc -l` ; echo $count
1
$ list="2015-a,2015-b,2016-a" ; count=`echo $list | tr , \\n | grep 2015 | wc -l` ; echo $count
1
$ list="2015-a,2015-b,2016-a" ; count=`echo $list | sed s/,/\\n/g | grep 2015 | wc -l` ; echo $count
1
Any ideas?

The problem is with the way backticks interpret \\:
Backslashes () inside backticks are handled in a non-obvious manner:
$ echo "`echo \\a`" "$(echo \\a)"
a \a
$ echo "`echo \\\\a`" "$(echo \\\\a)"
\a \\a
# Note that this is true for *single quotes* too!
$ foo=`echo '\\'`; bar=$(echo '\\'); echo "foo is $foo, bar is $bar"
foo is \, bar is \\
So instead of saying:
$ echo "`echo $list | tr " " \\n`"
2015-an2015-bn2016-a
You have to say:
$ echo "`echo $list | tr " " \\\\n`"
2015-a
2015-b
2016-a
Even though it is best to use $() because backticks are deprecated:
$ echo "$(echo $list | tr " " '\n')"
2015-a
2015-b
2016-a
If you still want to use backticks, the cleanest solution is to use " " as a wrapper instead of escaping with such a \\\\:
$ echo "`echo $list | tr " " "\n"`"
2015-a
2015-b
2016-a
All of this can be read in Why is $(...) preferred over ... (backticks)?.
All in all, if you just want to count how many words contain 2015 you may consider using grep -o as suggested in the comments or maybe something more robust like this awk:
awk '{for (i=1;i<=NF;i++) if ($i~2015) count++; print count}'
See some examples:
$ awk '{for (i=1;i<=NF;i++) if ($i~2015) s++; print s}' <<< "2015-a 2015-b 2016-a" 2
$ awk '{for (i=1;i<=NF;i++) if ($i~2015) s++; print s}' <<< "2015-a 2015-b 2016-a 20152015-c"
3

Related

Difficulty to create .txt file from loop in bash

I've this data :
cat >data1.txt <<'EOF'
2020-01-27-06-00;/dev/hd1;100;/
2020-01-27-12-00;/dev/hd1;100;/
2020-01-27-18-00;/dev/hd1;100;/
2020-01-27-06-00;/dev/hd2;200;/usr
2020-01-27-12-00;/dev/hd2;200;/usr
2020-01-27-18-00;/dev/hd2;200;/usr
EOF
cat >data2.txt <<'EOF'
2020-02-27-06-00;/dev/hd1;120;/
2020-02-27-12-00;/dev/hd1;120;/
2020-02-27-18-00;/dev/hd1;120;/
2020-02-27-06-00;/dev/hd2;230;/usr
2020-02-27-12-00;/dev/hd2;230;/usr
2020-02-27-18-00;/dev/hd2;230;/usr
EOF
cat >data3.txt <<'EOF'
2020-03-27-06-00;/dev/hd1;130;/
2020-03-27-12-00;/dev/hd1;130;/
2020-03-27-18-00;/dev/hd1;130;/
2020-03-27-06-00;/dev/hd2;240;/usr
2020-03-27-12-00;/dev/hd2;240;/usr
2020-03-27-18-00;/dev/hd2;240;/usr
EOF
I would like to create a .txt file for each filesystem ( so hd1.txt, hd2.txt, hd3.txt and hd4.txt ) and put in each .txt file the sum of the value from each FS from each dataX.txt. I've some difficulties to explain in english what I want, so here an example of the result wanted
Expected content for the output file hd1.txt:
2020-01;/dev/hd1;300;/
2020-02;/dev/hd1;360;/
2020-03;/dev/hd1;390:/
Expected content for the file hd2.txt:
2020-01;/dev/hd2;600;/usr
2020-02;/dev/hd2;690;/usr
2020-03;/dev/hd2;720;/usr
The implementation I've currently tried:
for i in $(cat *.txt | awk -F';' '{print $2}' | cut -d '/' -f3| uniq)
do
cat *.txt | grep -w $i | awk -F';' -v date="$(cat *.txt | awk -F';' '{print $1}' | cut -d'-' -f-2 | uniq )" '{sum+=$3} END {print date";"$2";"sum}' >> $i
done
But it doesn't works...
Can you show me how to do that ?

Because the format seems to be so constant, you can delimit the input with multiple separators and parse it easily in awk:
awk -v FS='[;-/]' '
prev != $9 {
if (length(output)) {
print output >> fileoutput
}
prev = $9
sum = 0
}
{
sum += $9
output = sprintf("%s-%s;/%s/%s;%d;/%s", $1, $2, $7, $8, sum, $11)
fileoutput = $8 ".txt"
}
END {
print output >> fileoutput
}
' *.txt
Tested on repl generates:
+ cat hd1.txt
2020-01;/dev/hd1;300;/
2020-02;/dev/hd1;360;/
2020-03;/dev/hd1;390;/
+ cat hd2.txt
2020-01;/dev/hd2;600;/usr
2020-02;/dev/hd2;690;/usr
2020-03;/dev/hd2;720;/usr
Alternatively, you could -v FS=';' and use split to split first and second column to extract the year and month and the hdX number.
If you seek a bash solution, I suggest you invert the loops - first iterate over files, then over identifiers in second column.
for file in *.txt; do
prev=
output=
while IFS=';' read -r date dev num path; do
hd=$(basename "$dev")
if [[ "$hd" != "${prev:-}" ]]; then
if ((${#output})); then
printf "%s\n" "$output" >> "$fileoutput"
fi
sum=0
prev="$hd"
fi
sum=$((sum + num))
output=$(
printf "%s;%s;%d;%s" \
"$(cut -d'-' -f1-2 <<<"$date")" \
"$dev" "$sum" "$path"
)
fileoutput="${hd}.txt"
done < "$file"
printf "%s\n" "$output" >> "$fileoutput"
done
You could also almost translate awk to bash 1:1 by doing IFS='-;/' in while read loop.

Increment variable when matched awk from tail

I'm monitoring from an actively written to file:
My current solution is:
ws_trans=0
sc_trans=0
tail -F /var/log/file.log | \
while read LINE
echo $LINE | grep -q -e "enterpriseID:"
if [ $? = 0 ]
then
((ws_trans++))
fi
echo $LINE | grep -q -e "sc_ID:"
if [ $? = 0 ]
then
((sc_trans++))
fi
printf "\r WSTRANS: $ws_trans \t\t SCTRANS: $sc_trans"
done
However when attempting to do this with AWK I don't get the output - the $ws_trans and $sc_trans remains 0
ws_trans=0
sc_trans=0
tail -F /var/log/file.log | \
while read LINE
echo $LINE | awk '/enterpriseID:/ {++ws_trans} END {print | ws_trans}'
echo $LINE | awk '/sc_ID:/ {++sc_trans} END {print | sc_trans}'
printf "\r WSTRANS: $ws_trans \t\t SCTRANS: $sc_trans"
done
Attempting to do this to reduce load. I understand that AWK doesn't deal with bash variables, and it can get quite confusing, but the only reference I found is a non tail application of AWK.
How can I assign the AWK Variable to the bash ws_trans and sc_trans? Is there a better solution? (There are other search terms being monitored.)

You need to pass the variables using the option -v, for example:
$ var=0
$ printf %d\\n {1..10} | awk -v awk_var=${var} '{++awk_var} {print awk_var}'
To set the variable "back" you could use declare, for example:
$ declare $(printf %d\\n {1..10} | awk -v awk_var=${var} '{++awk_var} END {print "var=" awk_var}')
$ echo $var
$ 10
Your script could be rewritten like this:
ws_trans=0
sc_trans=0
tail -F /var/log/system.log |
while read LINE
do
declare $(echo $LINE | awk -v ws=${ws_trans} '/enterpriseID:/ {++ws} END {print "ws_trans="ws}')
declare $(echo $LINE | awk -v sc=${sc_trans} '/sc_ID:/ {++sc} END {print "sc_trans="sc}')
printf "\r WSTRANS: $ws_trans \t\t SCTRANS: $sc_trans"
done

Count number of patterns with a single command

I'd like to count the number of occurrences in a string. For example, in this string :
'apache2|ntpd'
there are 2 different strings separated by | character.
Another example :
'apache2|ntpd|authd|freeradius'
In this case there are 4 different strings separated by | character.
Would you know a shell or perl command that could simply count this for me?

you can use awk command as below;
echo "apache2|ntpd" | awk -F'|' '{print NF}'
-F'|' is to field separator;
NF means Number of Fields
Example;
user#host:/tmp$ echo 'apache2|ntpd|authd|freeradius' | awk -F'|' '{print NF}'
4
you can also use this;
user#host:/tmp$ echo "apache2|ntpd" | tr '|' ' ' | wc -w
2
user#host:/tmp$ echo 'apache2|ntpd|authd|freeradius' | tr '|' ' ' | wc -w
4
tr '|' ' ' : translate | to space
wc -w : print the word counts
if there are spaces in the string, wc -w not correct result, so
echo 'apac he2|ntpd' | tr '|' '\n' | wc -l
user#host:/tmp$ echo 'apac he2|ntpd' | tr '|' ' ' | wc -w
3 --> not correct
user#host:/tmp$ echo 'apac he2|ntpd' | tr '|' '\n' | wc -l
2
tr '|' '\n' : translate | to newline
wc -l : number of lines

Do can do this just within bash without calling external languages like awk or external programs like grep and tr.
data='apache2|ntpd|authd|freeradius'
res=${data//[!|]/}
num_strings=$(( ${#res} + 1 ))
echo $num_strings
Let me explain.
res=${data//[!|]/} removes all characters that are not (that's the !) pipes (|).
${#res} gives the length of the resulting string.
num_strings=$(( ${#res} + 1 )) adds one to the number of pipes to get the number of fields.
It's that simple.

Another pure bash technique using positional-parameters
$ userString="apache2|ntpd|authd|freeradius"
$ printf "%s\n" $(IFS=\|; set -- $userString; printf "%s\n" "$#")
4
Thanks to cdarke's suggestion from the commands, the above command can directly store the count to a variable
$ printf -v count "%d" $(IFS=\|; set -- $userString; printf "%s\n" "$#")
$ printf "%d\n" "$count"
4

With wc and parameter expansion:
$ data='apache2|ntpd|authd|freeradius'
$ wc -w <<< ${data//|/ }
4
Using parameter expansion, all pipes are replaced with spaces. The result string is passed to wc -w for word count.
As #gniourf_gniourf mentionned, it works with what at first looks like process names but will fail if strings contain spaces.

You can do this with grep as well-
echo "apache2|ntpd|authd|freeradius" | grep -o "|" | wc -l
Output-
3
That output is the number of pipes.
To get the number of commands-
var=$(echo "apache2|ntpd|authd|freeradius" | grep -o "|" | wc -l)
echo $((var + 1))
Output -
4

You could use awk to count the occurrances of delimiters +1:
$ awk '{print gsub(/\|/,"")+1}' <(echo "apache2|ntpd|authd|freeradius")
4

may be this will help you.
IN="apache2|ntpd"
mails=$(echo $IN | tr "|" "\n")
for addr in $mails
do
echo "> [$addr]"
done

sorting a "key/value pair" array in bash

How do I sort a "python dictionary-style" array e.g. ( "A: 2" "B: 3" "C: 1" ) in bash by the value? I think, this code snippet will make it bit more clear about my question.
State="Total 4 0 1 1 2 0 0"
W=$(echo $State | awk '{print $3}')
C=$(echo $State | awk '{print $4}')
U=$(echo $State | awk '{print $5}')
M=$(echo $State | awk '{print $6}')
WCUM=( "Owner: $W;" "Claimed: $C;" "Unclaimed: $U;" "Matched: $M" )
echo ${WCUM[#]}
This will simply print the array: Owner: 0; Claimed: 1; Unclaimed: 1; Matched: 2
How do I sort the array (or the output), eliminating any pair with "0" value, so that the result like this:
Matched: 2; Claimed: 1; Unclaimed: 1
Thanks in advance for any help or suggestions. Cheers!!

Quick and dirty idea would be (this just sorts the output, not the array):
echo ${WCUM[#]} | sed -e 's/; /;\n/g' | awk -F: '!/ 0;?/ {print $0}' | sort -t: -k 2 -r | xargs

echo -e ${WCUM[#]} | tr ';' '\n' | sort -r -k2 | egrep -v ": 0$"
Sorting and filtering are independent steps, so if you only like to filter 0 values, it would be much more easy.
Append an
| tr '\n' ';'
to get it to a single line again in the end.
nonull=$(for n in ${!WCUM[#]}; do echo ${WCUM[n]} | egrep -v ": 0;"; done | tr -d "\n")
I don't see a good reason to end $W $C $U with a semicolon, but $M not, so instead of adapting my code to this distinction I would eliminate this special case. If not possible, I would append a semicolon temporary to $M and remove it in the end.

Another attempt, using some of the bash features, but still needs sort, that is crucial:
#! /bin/bash
State="Total 4 1 0 4 2 0 0"
string=$State
for i in 1 2 ; do # remove unnecessary fields
string=${string#* }
string=${string% *}
done
# Insert labels
string=Owner:${string/ /;Claimed:}
string=${string/ /;Unclaimed:}
string=${string/ /;Matched:}
# Remove zeros
string=(${string[#]//;/; })
string=(${string[#]/*:0;/})
string=${string[#]}
# Format
string=${string//;/$'\n'}
string=${string//:/: }
# Sort
string=$(sort -t: -nk2 <<< "$string")
string=${string//$'\n'/;}
echo "$string"

Need to grab data inbetween tilde character

Can any one advise how to search on linux for some data between a tilde character. I need to get IP data however its been formed like the below.
Details:
20110906000418~118.221.246.17~DATA~DATA~DATA

One more:
echo '20110906000418~118.221.246.17~DATA~DATA~DATA' | sed -r 's/[^~]*~([^~]+)~.*/\1/'

echo "20110906000418~118.221.246.17~DATA~DATA~DATA" | cut -d'~' -f2
This uses the cut command with the delimiter set to ~. The -f2 switch then outputs just the 2nd field.
If the text you give is in a file (called filename), try:
grep "[0-9]*~" filename | cut -d'~' -f2

With cut:
echo "20110906000418~118.221.246.17~DATA~DATA~DATA" | cut -d~ -f2
With awk:
echo "20110906000418~118.221.246.17~DATA~DATA~DATA"
| awk -F~ '{ print $2 }'

In awk:
echo '20110906000418~118.221.246.17~DATA~DATA~DATA' | awk -F~ '{print $2}'

Just use bash
$ string="20110906000418~118.221.246.17~DATA~DATA~DATA"
$ echo ${string#*~}
118.221.246.17~DATA~DATA~DATA
$ string=${string#*~}
$ echo ${string%%~*}
118.221.246.17

one more, using perl:
$ perl -F~ -lane 'print $F[1]' <<< '20110906000418~118.221.246.17~DATA~DATA~DATA'
118.221.246.17
bash:
#!/bin/bash
IFS='~'
while read -a array;
do
echo ${array[1]}
done < ip
If string is constant, the following parameter expansion performs substring extraction:
$ a=20110906000418~118.221.246.17~DATA~DATA~DATA
$ echo ${a:15:14}
118.221.246.17
or using regular expressions in bash:
$ echo $(expr "$a" : '[^~]*~\([^~]*\)~.*')
118.221.246.17
last one, again using pure bash methods:
$ tmp=${a#*~}
$ echo $tmp
118.221.246.17~DATA~DATA~DATA
$ echo ${tmp%%~*}
118.221.246.17

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Parsing multi-line variables using grep - linux

Related

Difficulty to create .txt file from loop in bash

Increment variable when matched awk from tail

Count number of patterns with a single command

sorting a "key/value pair" array in bash

Need to grab data inbetween tilde character

Categories

Resources