Combine results of column one Then sum column 2 to list total for each entry in column one - linux

I am bit of Bash newbie, so please bear with me here.
I have a text file dumped by another software (that I have no control over) listing each user with number of times accessing certain resource that looks like this:
Jim 109
Bob 94
John 92
Sean 91
Mark 85
Richard 84
Jim 79
Bob 70
John 67
Sean 62
Mark 59
Richard 58
Jim 57
Bob 55
John 49
Sean 48
Mark 46
.
.
.
My goal here is to get an output like this.
Jim [Total for Jim]
Bob [Total for Bob]
John [Total for John]
And so on.
Names change each time I run the query in the software, so static search on each name and then piping through wc does not help.

This sounds like a job for awk :) Pipe the output of your program to the following awk script:
your_program | awk '{a[$1]+=$2}END{for(name in a)print name " " a[name]}'
Output:
Sean 201
Bob 219
Jim 245
Mark 190
Richard 142
John 208
The awk script itself can be explained better in this format:
# executed on each line
{
# 'a' is an array. It will be initialized
# as an empty array by awk on it's first usage
# '$1' contains the first column - the name
# '$2' contains the second column - the amount
#
# on every line the total score of 'name'
# will be incremented by 'amount'
a[$1]+=$2
}
# executed at the end of input
END{
# print every name and its score
for(name in a)print name " " a[name]
}
Note, to get the output sorted by score, you can add another pipe to sort -r -k2. -r -k2 sorts the by the second column in reverse order:
your_program | awk '{a[$1]+=$2}END{for(n in a)print n" "a[n]}' | sort -r -k2
Output:
Jim 245
Bob 219
John 208
Sean 201
Mark 190
Richard 142

Pure Bash:
declare -A result # an associative array
while read name value; do
((result[$name]+=value))
done < "$infile"
for name in ${!result[*]}; do
printf "%-10s%10d\n" $name ${result[$name]}
done
If the first 'done' has no redirection from an input file
this script can be used with a pipe:
your_program | ./script.sh
and sorting the output
your_program | ./script.sh | sort
The output:
Bob 219
Richard 142
Jim 245
Mark 190
John 208
Sean 201

GNU datamash:
datamash -W -s -g1 sum 2 < input.txt
Output:
Bob 219
Jim 245
John 208
Mark 190
Richard 142
Sean 201

Related

How to delete lines in file1 based on column match with file2

I have 2 files; file1 and file2. File1 has many lines/rows and columns. File2 has just one column, with several lines/rows. All of the strings in file2 are found in file1. I want to create a new file (file3), such that the lines in file1 that contain the any of the strings in file2 are deleted.
For example,
File1:
Sally ate 083 popcorn
Rick has 241 cars
John won 505 dollars
Bruce knows 121 people
File2:
083
121
Desired file3:
Rick has 241 cars
John won 505 dollars
Note that I do not want to enter the strings in file 2 into a command manually (the actual files are much larger than in the example).
Thanks!
awk approach:
awk 'BEGIN{p=""}FNR==NR{if(!/^$/){p=p$0"|"} next} $0!~substr(p, 1, length(p)-1)' file2 file1 > file3
p="" the variable treated as pattern containing all column values from file2
FNR==NR ensures that the next expression is performed for the first input file i.e. file2
if(!/^$/){p=p$0"|"} means: if it's not an empty line !/^$/ (as it could be according to your input) concatenate pattern parts with | so it eventually will look like 083|121|
$0!~substr(p, 1, length(p)-1) - checks if a line from the second input file(file1) is not matched with pattern(i.e. file2 column values)
The file3 contents:
Rick has 241 cars
John won 505 dollars
grep suites your purpose better than a line editor
grep -v -f File2 File1 >File3
Try this -
#cat f1
Sally ate 083 popcorn
Rick has 241 cars
John won 505 dollars
Bruce knows 121 people
#cat f2
083
121
#grep -vwf f2 f1
Rick has 241 cars
John won 505 dollars

bash Sort uniq list of numbers and strings

I would like to sort and merge list in the following format
123 ABC
1 ABC
345 BGF
3 BGF
to
124 ABC
348 BGF
Thank you.
In bash thank you
Using awk you can do this:
awk '{a[$2]+=$1} END{for (i in a) print a[i], i}' file
124 ABC
348 BGF

reformatting report file using linux shell commands combining multiple lines output into one

I have a file that contains the following input:
name: ted
position:11.11.11.11"
applicationKey:88
channel:45
protocol:4
total:350
name:janet
position:170.198.80.209
applicationKey:256
channel:44
protocol:4
total:1
I like the out put to look like this
tedd 11.11.11.11 88 45 4 350
janet 170.198.80.209 256 44 4 1
Can someone help with this please ?
This should work:
awk -F':' '{printf "%s %s",$2,ORS=NF?"":"\n"}END{print "\n"}' file
$ cat file
name:ted
position:11.11.11.11
applicationKey:88
channel:45
protocol:4
total:350
name:janet
position:170.198.80.209
applicationKey:256
channel:44
protocol:4
total:1
$ awk -F':' '{printf "%s %s",$2,ORS=NF?"":"\n"}END{print "\n"}' file
ted 11.11.11.11 88 45 4 350
janet 170.198.80.209 256 44 4 1

Cut command not extracting fields properly by default delimiter

I have a text file in which I must cut the fields 3,4,5 and 8:
219 432 4567 Harrison Joel M 4540 Accountant 09-12-1985
219 433 4587 Mitchell Barbara C 4541 Admin Asst 12-14-1995
219 433 3589 Olson Timothy H 4544 Supervisor 06-30-1983
219 433 4591 Moore Sarah H 4500 Dept Manager 08-01-1978
219 431 4527 Polk John S 4520 Accountant 09-22-1998
219 432 4567 Harrison Joel M 4540 Accountant 09-12-1985
219 432 1557 Harrison James M 4544 Supervisor 01-07-2000
Since the delimiter by default is tab the command to extract the fields would be:
cut -f 3,4,5,8 filename
The thing is that the output is the same as the original file content. What is happening here? Why doesn't this work?
Your file doesn't actually contain any tab characters.
By default, cut prints any lines which don't contain the delimiter, unless you specify the -s option.
Since your records are aligned on character boundaries rather than tab-separated, you should use the -c option to specify which columns to cut. For example:
cut -c 9-12,14-25,43-57 file

merging two files based on two columns

I have a question very similar to a previous post:
Merging two files by a single column in unix
but i want to merge my data based on two columns (The orders are the same, so no need to sort).
Example,
subjectid subID2 name age
12 121 Jane 16
24 241 Kristen 90
15 151 Clarke 78
23 231 Joann 31
subjectid subID2 prob_disease
12 121 0.009
24 241 0.738
15 151 0.392
23 231 1.2E-5
And the output to look like
subjectid SubID2 prob_disease name age
12 121 0.009 Jane 16
24 241 0.738 Kristen 90
15 151 0.392 Clarke 78
23 231 1.2E-5 Joanna 31
when i use join it only considers the first column(subjectid) and repeats the SubID2 column.
Is there a way of doing this with join or some other way please? Thank you
join command doesn't have an option to scan more than one field as a joining criteria. Hence, you will have to add some intelligence into the mix. Assuming your files has a FIXED number of fields on each line, you can use something like this:
join f1 f2 | awk '{print $1" "$2" "$3" "$4" "$6}'
provided the the field counts are as given in your examples. Otherwise, you need to adjust the scope of print in the awk command, by adding or taking away some fields.
If the orders are identical, you could still merge by a single column and specify the format of which columns to output, like:
join -o '1.1 1.2 2.3 1.3 1.4' file_a file_b
as described in join(1).

Resources