bash Sort uniq list of numbers and strings

bash Sort uniq list of numbers and strings - string

I would like to sort and merge list in the following format
123 ABC
1 ABC
345 BGF
3 BGF
to
124 ABC
348 BGF
Thank you.
In bash thank you

Using awk you can do this:
awk '{a[$2]+=$1} END{for (i in a) print a[i], i}' file
124 ABC
348 BGF

Related

Appending the line even though there is no match with awk

I am trying to compare two files and append another column if there is certain condition satisfied.
file1.txt
1 101 111 . BCX 123
1 298 306 . CCC 234
1 299 305 . DDD 345
file2.txt
1 101 111 BCX P1#QQQ
1 299 305 DDD P2#WWW
The output should be:
1 101 111 . BCX 123;P1#QQQ
1 298 306 . CCC 234
1 299 305 . DDD 345;P2#WWW
What I can do is, to only do this for the lines having a match:
awk 'NR==FNR{ a[$1,$2,$3,$4]=$5; next }{ s=SUBSEP; k=$1 s $2 s $3 s $5 }k in a{ print $0,a[k] }' file2.txt file1.txt
1 101 111 . BCX 123 P1#QQQ
1 299 305 . DDD 345 P2#WWW
But then, I am missing the second line in file1.
How can I still keep it even though there is no match with file2 regions?

If you want to print every line, you need your print command not to be limited by your condition.
awk '
NR==FNR {
a[$1,$2,$3,$4]=$5; next
}
{
s=SUBSEP; k=$1 s $2 s $3 s $5
}
k in a {
$6=$6 ";" a[k]
}
1' file2.txt file1.txt
The 1 is shorthand that says "print every line". It's a condition (without command statements) that always evaluates "true".
The k in a condition simply replaces your existing 6th field with the concatenated one. If the condition is not met, the replacement doesn't happen, but we still print because of the 1.

Following awk may help you in same.
awk 'FNR==NR{a[$1,$2,$3,$4]=$NF;next} (($1,$2,$3,$5) in a){print $0";"a[$1,$2,$3,$5];next} 1' file2.txt file1.txt
Output will be as follows.
1 101 111 . BCX 123;P1#QQQ
1 298 306 . CCC 234
1 299 305 . DDD 345;P2#WWW

another awk
$ awk ' {t=5-(NR==FNR); k=$1 FS $2 FS $3 FS $t}
NR==FNR {a[k]=$NF; next}
k in a {$0=$0 ";" a[k]}1' file2 file1
1 101 111 . BCX 123;P1#QQQ
1 298 306 . CCC 234
1 299 305 . DDD 345;P2#WWW
last component of the key is either 4th or 5th field based on first or second file input; set it accordingly and use a single k variable in the script. Note that
t=5-(NR==FNR)
can be written as conventionally,
t=NR==FNR?4:5

accessing text between specific words in UNIX multiple times

if the file is like this:
ram_file
abc
123
end_file
tony_file
xyz
456
end_file
bravo_file
uvw
789
end_file
now i want to access text between ram_file and end_file, tony_file & end _file and bravo_file & end_file simultaneously. I tried sed command but i don't know how to specify *_file in this
Thanks in advance

This awk should do the job for you.
This solution threat the end_file as an end of block, and all other xxxx_file as start of block.
It will not print text between the block of there are some, like in my example do not print this.
awk '/end_file/{f=0} f; /_file/ && !/end_file/ {f=1}' file
abc
123
xyz
456
uvw
789
cat file
ram_file
abc
123
end_file
do not print this
tony_file
xyz
456
end_file
nor this data
bravo_file
uvw
789
end_file
If you like some formatting, it can be done easy with awk
awk -F_ '/end_file/{printf (f?RS:"");f=0} f; /file/ && !/end_file/ {f=1;print "-Block-"++c"--> "$1}' file
-Block-1--> ram
abc
123
-Block-2--> tony
xyz
456
-Block-3--> bravo
uvw
789

reformatting report file using linux shell commands combining multiple lines output into one

I have a file that contains the following input:
name: ted
position:11.11.11.11"
applicationKey:88
channel:45
protocol:4
total:350
name:janet
position:170.198.80.209
applicationKey:256
channel:44
protocol:4
total:1
I like the out put to look like this
tedd 11.11.11.11 88 45 4 350
janet 170.198.80.209 256 44 4 1
Can someone help with this please ?

This should work:
awk -F':' '{printf "%s %s",$2,ORS=NF?"":"\n"}END{print "\n"}' file
$ cat file
name:ted
position:11.11.11.11
applicationKey:88
channel:45
protocol:4
total:350
name:janet
position:170.198.80.209
applicationKey:256
channel:44
protocol:4
total:1
$ awk -F':' '{printf "%s %s",$2,ORS=NF?"":"\n"}END{print "\n"}' file
ted 11.11.11.11 88 45 4 350
janet 170.198.80.209 256 44 4 1

Combine results of column one Then sum column 2 to list total for each entry in column one

I am bit of Bash newbie, so please bear with me here.
I have a text file dumped by another software (that I have no control over) listing each user with number of times accessing certain resource that looks like this:
Jim 109
Bob 94
John 92
Sean 91
Mark 85
Richard 84
Jim 79
Bob 70
John 67
Sean 62
Mark 59
Richard 58
Jim 57
Bob 55
John 49
Sean 48
Mark 46
.
.
.
My goal here is to get an output like this.
Jim [Total for Jim]
Bob [Total for Bob]
John [Total for John]
And so on.
Names change each time I run the query in the software, so static search on each name and then piping through wc does not help.

This sounds like a job for awk :) Pipe the output of your program to the following awk script:
your_program | awk '{a[$1]+=$2}END{for(name in a)print name " " a[name]}'
Output:
Sean 201
Bob 219
Jim 245
Mark 190
Richard 142
John 208
The awk script itself can be explained better in this format:
# executed on each line
{
# 'a' is an array. It will be initialized
# as an empty array by awk on it's first usage
# '$1' contains the first column - the name
# '$2' contains the second column - the amount
#
# on every line the total score of 'name'
# will be incremented by 'amount'
a[$1]+=$2
}
# executed at the end of input
END{
# print every name and its score
for(name in a)print name " " a[name]
}
Note, to get the output sorted by score, you can add another pipe to sort -r -k2. -r -k2 sorts the by the second column in reverse order:
your_program | awk '{a[$1]+=$2}END{for(n in a)print n" "a[n]}' | sort -r -k2
Output:
Jim 245
Bob 219
John 208
Sean 201
Mark 190
Richard 142

Pure Bash:
declare -A result # an associative array
while read name value; do
((result[$name]+=value))
done < "$infile"
for name in ${!result[*]}; do
printf "%-10s%10d\n" $name ${result[$name]}
done
If the first 'done' has no redirection from an input file
this script can be used with a pipe:
your_program | ./script.sh
and sorting the output
your_program | ./script.sh | sort
The output:
Bob 219
Richard 142
Jim 245
Mark 190
John 208
Sean 201

GNU datamash:
datamash -W -s -g1 sum 2 < input.txt
Output:
Bob 219
Jim 245
John 208
Mark 190
Richard 142
Sean 201

shell script for log analysis

I am getting the logs in a specific format on my linux server as
id \t IP \t login-id \t login-error Code \t attempts
I want to know all possible login-error codes which a user might have encountered.
The sample file is:
123 10.12.34.234 anshul 11 1
432 10.12.34.234 ra 11 2
342 10.12.34.234 anshul 12 1
445 10.12.34.234 yahoo 3 1
and the output should be:
anshul: 11,12
I have tried:
cat aaa | sort +2 -3 | grep anshul | awk -F"\t" {' print $4'}
This would print
11
12
But I want the output in the format as anshul: 11,12
Can we store the value in some variables and display as it is required.
Also the problem with this code is it was catch all the anshul whether it anshulg or anshuln or anshulp? Any suggestion to solve this.
I have done the sorting on login just to verify if the data I am getting is correct or not, since all the unique names would be sorted to single chunk.

1) Simple solution, but you will get extra , at the end:
cat aaa | grep "anshul" | awk '{print $4}' | tr '\n' ','
output: 11,12,
2) without extra ,:
tmp=`cat aaa | grep "anshul" | awk '{print $4}' | tr '\n' ','`
echo ${tmp%?}
output: 11,12
Of course you can easily convert this to a script that takes username as a parameter and output something like "user: anshul error(s): 11,12"

#% cat t
123 10.12.34.234 anshul 11 1
432 10.12.34.234 ra 11 2
342 10.12.34.234 anshul 12 1
445 10.12.34.234 yahoo 3 1
One line Perl.
perl -ane 'BEGIN{$x='anshul';}push #{$X{#F[2]}}, $F[3];END{print "$x: ",join(",",#{$X{$x}}),"\n";}' < t
Gives:
anshul: 11,12

awk '{a[$3]=a[$3]","$4;next}END{for(i in a)print i,substr(a[i],2)}' <your_file>|grep anshul
Or you can directly use awk without a grep.
awk '{a[$3]=a[$3]","$4;next}END{print "anshul",substr(a["anshul"],2)}' <your_file>
Tested below:
> cat temp
123 10.12.34.234 anshul 11 1
432 10.12.34.234 ra 11 2
342 10.12.34.234 anshul 12 1
445 10.12.34.234 yahoo 3 1
> awk '{a[$3]=a[$3]","$4;next}END{for(i in a)print i,substr(a[i],2)}' temp
anshul 11,12
ra 11
yahoo 3
> awk '{a[$3]=a[$3]","$4;next}END{for(i in a)print i,substr(a[i],2)}' temp|grep anshul
anshul 11,12
>
> awk '{a[$3]=a[$3]","$4;next}END{print "anshul",substr(a["anshul"],2)}' temp
anshul 11,12

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

bash Sort uniq list of numbers and strings - string

I would like to sort and merge list in the following format 123 ABC 1 ABC 345 BGF 3 BGF to 124 ABC 348 BGF Thank you. In bash thank you

Using awk you can do this: awk '{a[$2]+=$1} END{for (i in a) print a[i], i}' file 124 ABC 348 BGF

Related

Appending the line even though there is no match with awk

accessing text between specific words in UNIX multiple times

reformatting report file using linux shell commands combining multiple lines output into one

Combine results of column one Then sum column 2 to list total for each entry in column one

shell script for log analysis

Categories

Resources