Simple aggregation using linux scripting - linux

Let's say I have a text file with lines like these:
foo 10
bar 15
bar 5
foo 30
...
What's the simplest way to generate the following output:
foo 40
bar 20
?

This will do:
awk '{arr[$1]+=$2;} END { for (i in arr) print i, arr[i]}' file
For more information, read on Awk's associative arrays.

Use this awk script:
awk '{sums[$1] += $2} END {for (a in sums) print a, sums[a]}' infile
OUTPUT:
foo 40
bar 20
Use this awk tutorial on using associative arrays:

If you are interested in perl:
perl -F -lane '$X{$F[0]}=$X{$F[0]}+$F[1];if(eof){foreach (keys %X){print $_." ".$X{$_}}}' your_file

Here's one way with sort, GNU sed and bc:
sort infile |
sed -r ':a; N; s/([^ ]+) +([^\n]+)\n\1/\1 \2 +/; ta; P; D' |
sed -r 'h; s/[^ ]+/echo/; s/$/ | bc/e; G; s/([^\n]+)\n([^ ]+).*/\2 \1/'
Output:
bar 20
foo 40
The first sed joins adjacent lines with the same key adding a + between the numbers, the second passes the sums to bc.

Related

Select subdomains using print command

cat a.txt
a.b.c.d.e.google.com
x.y.z.google.com
rev a.txt | awk -F. '{print $2,$3}' | rev
This is showing:
e google
x google
But I want this output
a.b.c.d.e.google
b.c.d.e.google
c.d.e.google
e.google
x.y.z.google
y.z.google
z.google
With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk.
awk '
BEGIN{
FS=OFS="."
}
{
nf=NF
for(i=1;i<(nf-1);i++){
print
$1=""
sub(/^[[:space:]]*\./,"")
}
}
' Input_file
Here is one more awk solution:
awk -F. '{while (!/^[^.]+\.[^.]+$/) {print; sub(/^[^.]+\./, "")}}' file
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
Using sed
$ sed -En 'p;:a;s/[^.]+\.(.*([^.]+\.){2}[[:alpha:]]+$)/\1/p;ta' input_file
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
Using bash:
IFS=.
while read -ra a; do
for ((i=${#a[#]}; i>2; i--)); do
echo "${a[*]: -i}"
done
done < a.txt
Gives:
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
(I assume the lack of d.e.google.com in your expected output is typo?)
For a shorter and arguably simpler solution, you could use Perl.
To auto-split the line on the dot character into the #F array, and then print the range you want:
perl -F'\.' -le 'print join(".", #F[0..$#F-1])' a.txt
-F'\.' will auto-split each input line into the #F array. It will split on the given regular expression, so the dot needs to be escaped to be taken literally.
$#F is the number of elements in the array. So #F[0..$#F-1] is the range of elements from the first one ($F[0]) to the penultimate one. If you wanted to leave out both "google" and "com", you would use #F[0..$#F-2] etc.

Arithmetic in Shell script

After executing this
ta=`zcat abc.log.2019071814.gz |grep "R_MT"|grep "A:1234"|grep "ID:413"|awk -F"|" '{print $20}'|sort|uniq -c|awk '{$1=$1};1'`
Here $20 indicates the "S:" entry in each row (I am taking the unique count of all s values),I am getting result as
93070 S:1 11666 S:8 230 S:9
so what I need is the sum of all occurrence of s values .i.e 93070+11666+230
so result be total=104966
$ echo 93070 S:1 11666 S:8 230 S:9 | sed -E 's,S:[0-9]+,,g' | sed 's, ,+,g' | bc -
104966
Append to your last awk:
| awk '{sum+=$1} END {print sum}'
or use this (awk ignores columns with S:1, S:8 and S:9):
echo $ta | awk '{for(i=1;i<=NF;i++) t+=$i; print t; t=0}'
or use every second column:
echo $ta | awk '{for(i=1;i<=NF;i=i+2) t+=$i; print t; t=0}'
I won't help you all the way, but know that you can use bc to perform arithmetic.
echo "93070 + 11666 + 230" | bc
would give you:
104966

Reverse file using tac and sed

I have a usecase where I need to search and replace the last occurrence of a string in a file and write the changes back to the file. The case below is a simplified version of that usecase:
I'm attempting to reverse the file, make some changes reverse it back again and write to the file. I've tried the following snippet for this:
tac test | sed s/a/b/ | sed -i '1!G;h;$!d' test
test is a text file with contents:
a
1
2
3
4
5
I was expecting this command to make no changes to the order of the file, but it has actually reversed the contents to:
5
4
3
2
1
b
How can i make the substitution as well as retain the order of the file?
You can tac your file, apply substitution on first occurrence of desired pattern, tac again and tee result to a temporary file before you rename it with the original name:
tac file | sed '0,/a/{s//b/}' | tac > tmp && mv tmp file
Another way is to user grep to get the number of the last line that contains the text you want to change, then use sed to change that line:
$ linno=$( grep -n 'abc' <file> | tail -1 | cut -d: -f1 )
$ sed -i "${linno}s/abc/def/" <file>
Try to cat test | rev | sed -i '1!G;h;$!d' | rev
Or you can use only sed coomand:
For example you want to replace ABC on DEF:
You need to add 'g' to the end of your sed:
sed -e 's/\(.*\)ABC/\1DEF/g'
This tells sed to replace every occurrence of your regex ("globally") instead of only the first occurrence.
You should also add a $, if you want to ensure that it is replacing the last occurrence of ABC on the line:
sed -e 's/\(.*\)ABC$/\1DEF/g'
EDIT
Or simply add another | tac to your command:
tac test | sed s/a/b/ | sed -i '1!G;h;$!d' | tac
Here is a way to do this in a single command using awk.
First input file:
cat file
a
1
2
3
4
a
5
Now this awk command:
awk '{a[i++]=$0} END{p=i; while(i--) if (sub(/a/, "b", a[i])) break;
for(i=0; i<p; i++) print a[i]}' file
a
1
2
3
4
b
5
To save output back into original file use:
awk '{a[i++]=$0} END{p=i; while(i--) if (sub(/a/, "b", a[i])) break;
for(i=0; i<p; i++) print a[i]}' file >> $$.tmp && mv $$.tmp f
Another in awk. First a test file:
$ cat file
a
1
a
2
a
and solution:
$ awk '
$0=="a" && NR>1 { # when we meet "a"
print b; b="" # output and clear buffer b
}
{
b=b (b==""?"":ORS) $0 # gether the buffer
}
END { # in the end
sub(/^a/,"b",b) # replace the leading "a" in buffer b with "b"
print b # output buffer
}' file
a
1
a
2
b
Writing back the happens by redirecting the output to a temp file which replaces the original file (awk ... file > tmp && mv tmp file) or if you are using GNU awk v. 4.1.0+ you can use inplace edit (awk -i inplace ...).

Write the contents of the variable to a file

How can I save the contents of the variable sum in this operation?
$ seq 1 5 | awk '{sum+=$1} end {print sum; echo "$sum" > test_file}'
It looks like you're confusing BASH syntax and Awk. Awk is a programming language, and it has very different syntax from BASH.
$ seq 1 5 | awk '{ sum += $1 } END { print sum }'
15
You want to capture that 15 into a file:
$ seq 1 5 | awk '{ sum += $1 } END { print sum }' > test_file
That is using the shell's redirection. The > appears outside of the Awk program where the shell has control, and redirects standard out into the file test_file.
You can also redirect inside of Awk, but this is Awk's redirection. However, it uses the same syntax as BASH:
$ seq 1 5 | awk '{ sum += $1 } END { print sum > "test_file" }'
Note that the file name has to be quoted, or Awk will assume that test_file is a variable, and you'll get some error about redirecting to a null file name.
To write your output into a file, you have to redirect to "test_file" like this:
$ seq 5 | awk '{sum+=$1} END{print sum > "test_file"}'
$ cat test_file
15
Your version was not working because you were not quoting test_file, so for awk it was considered a variable. And as you have not defined it beforehand, awk couldn't redirect properly. David W's answer explains it pretty well.
Note also that seq 5 is equivalent to seq 1 5.
In case you want to save the result into a variable, you can use the var=$(command) syntax:
$ sum=$(seq 5 | awk '{sum+=$1} END{print sum}')
$ echo $sum
15
echo won't work in the awk command. Try this:
seq 1 5 | awk '{sum+=$1} END {print sum > "test_file"}
You don't need awk for this. You can say:
$ seq 5 | paste -sd+ | bc > test_file
$ cat test_file
15
This question is tagged with bash so here is a pure bash solution:
for ((i=1; i<=5; i++)); do ((sum+=i)); done; echo "$sum" > 'test_file'
Or this one:
for i in {1..5}; do ((sum+=i)); done; echo "$sum" > 'test_file'
http://sed.sourceforge.net/grabbag/scripts/add_decs.sed
#! /bin/sed -f
# This is an alternative approach to summing numbers,
# which works a digit at a time and hence has unlimited
# precision. This time it is done with lookup tables,
# and uses only 10 commands.
G
s/\n/-/
s/$/-/
s/$/;9aaaaaaaaa98aaaaaaaa87aaaaaaa76aaaaaa65aaaaa54aaaa43aaa32aa21a100/
:loop
/^--[^a]/!{
# Convert next digit from both terms into analog form
# and put the two groups next to each other
s/^\([0-9a]*\)\([0-9]\)-\([^-]*\)-\(.*;.*\2\(a*\)\2.*\)/\1-\3-\5\4/
s/^\([^-]*\)-\([0-9a]*\)\([0-9]\)-\(.*;.*\3\(a*\)\3.*\)/\1-\2-\5\4/
# Back to decimal, but keeping the carry in analog form
# \2 matches an `a' if there are at least ten a's, else nothing
#
# 1------------- 3- 4----------------------
# 2 5----
s/-\(aaaaaaaaa\(a\)\)\{0,1\}\(a*\)\([0-9b]*;.*\([0-9]\)\3\5\)/-\2\5\4/
b loop
}
s/^--\([^;]*\);.*/\1/
h

how to insert a newline \n after x numbers of words, with AWK or Sed

I have this in one line:
We were born in the earth beyond the land
I want it in 3 words lines, to be like this:
We were born
in the earth
beyond the land
$ xargs -n3 < file
We were born
in the earth
beyond the land
$ egrep -o '\S+\s+\S+\s+\S+' file
We were born
in the earth
beyond the land
Using awk:
awk '{
for ( i = 1; i <= NF; i++ ) {
printf "%s%c", $i, (i % 3 == 0) ? ORS : OFS
}
}' infile
It yields:
We were born
in the earth
beyond the land
With GNU sed:
sed 's/\(\w\w*\W*\w\w*\W*\w\w*\W*\)/\1\n/g' input
and short version:
sed 's/\(\(\w\w*\W*\)\{3\}\)/\1\n/g' input
Here's one sed solution:
sed -e 's/\s/\n/3;s/\s/\n/6;s/\s/\n/9'
It replaces the third, sixth and ninth spaces with newlines.
This one will handle longer lines, even if they aren't multiples of three words:
sed -e 's/\(\(\S\{1,\}\s*\)\{3\}\)/\1\n/g'
Any system that has awk and sed will almost certainly also have Perl:
cat myfile.txt | perl -lane 'while(($a,$b,$c) = splice(#F,0,3)){print "$a $b $c"}'

Resources