Using a pipe to input in an awk statement - linux

So I'm dealing with a file named cars, here's it contents:
toyota corolla 1970 2500
chevy malibu 1999 3000
ford mustang 1965 10000
volvo s80 1998 9850
ford thundbd 2003 10500
chevy malibu 2000 3500
honda civic 1985 450
honda accord 2001 6000
ford taurus 2004 17000
toyota rav4 2002 750
chevy impala 1985 1550
ford explor 2003 9500
I'm using grep to filter for lines containing a specific automaker and then piping that to my awk statement, and finally piping the final result to a new pipe with tee.
Here's the line of code I'm having trouble with:
grep "$model" cars |
awk '($3+0) >= ("'$max_year'"+0) && ($4+0) <= ("'$max_price'"+0)' |
tee last_search
I previously defined variables max_year and max_price as a user input in my script.
The file last_search is made but it's always empty.

You almost certainly have something wrong with your variables, you should print them out and gradually build up the pipeline one command at a time to debug.
As it stands, it works fine for the following values:
$ max_year=2000
$ max_price=10000
$ model=a
$ grep "$model" cars
toyota corolla 1970 2500
chevy malibu 1999 3000
ford mustang 1965 10000
chevy malibu 2000 3500
honda civic 1985 450
honda accord 2001 6000
ford taurus 2004 17000
toyota rav4 2002 750
chevy impala 1985 1550
$ grep "$model" cars | awk '($3+0) >= ("'$max_year'"+0) && ($4+0) <= ("'$max_price'"+0)'
chevy malibu 2000 3500
honda accord 2001 6000
toyota rav4 2002 750
There are also better ways of doing it without having to manage your command string the way you have, since it's probably prone to errors. You can use:
grep "$model" cars |
awk -vY=$max_year -vP=$max_price '$3>=Y&&$4<=P{print}'
(you'll note I'm not using the string+0 trick there, GNU awk, which you're almost certainly using under Linux, handles that just fine, it will compare numerically if both arguments are numeric in nature).

set -a
model=malibu
max_year=2000
max_price=4000
awk '
$2 == ENVIRON["model"] &&
$3 >= ENVIRON["max_year"] &&
$4 <= ENVIRON["max_price"]
' cars |
tee last_search

Related

Trying to sort two different columns of a text file, (one asc, one desc) in the same awk script

I have tried to do it separately, and I am getting the right result, but I need help to combine the two.
This is the csv file:
maruti swift 2007 50000 5
honda city 2005 60000 3
maruti dezire 2009 3100 6
chevy beat 2005 33000 2
honda city 2010 33000 6
chevy tavera 1999 10000 4
toyota corolla 1995 95000 2
maruti swift 2009 4100 5
maruti esteem 1997 98000 1
ford ikon 1995 80000 1
honda accord 2000 60000 2
fiat punto 2007 45000 3
I am using this script to sort by first field:
BEGIN { print "========Sorted Cars by Maker========"
}
{arr[$1]=$0}
END{
PROCINFO["sorted_in"]="#val_str_desc"
for(i in arr)print arr[i]
}
I also want to run a sort on the year($3) ascending in the same script.
I have tried many ways but to no avail.
A little help to do that would be appreciated..
One in GNU awk:
$ gawk '
{
a[$1][$3][++c[$1,$3]]=$0
}
END {
PROCINFO["sorted_in"]="#ind_str_desc"
for(i in a) {
PROCINFO["sorted_in"]="#ind_str_asc"
for(j in a[i]) {
PROCINFO["sorted_in"]="#ind_num_asc"
for(k in a[i][j])
print a[i][j][k]
}
}
}' file
Output:
toyota corolla 1995 95000 2
maruti esteem 1997 98000 1
maruti swift 2007 50000 5
...
Assumptions:
individual fields do not contain white space
primary sort: 1st field in descending order
secondary sort: 3rd field in ascending order
no additional sorting requirements provided in case there's a duplicate of 1st + 3rd fields (eg, maruti + 2009) so we'll maintain the input ordering
One idea using sort:
sort -k1,1r -k3,3n auto.dat
Another idea using GNU awk (for arrays of arrays and PROCINFO["sorted_in"]):
awk '
{ cars[$1][$3][n++]=$0 } # "n" used to distinguish between duplicates of $1+$3
END { PROCINFO["sorted_in"]="#ind_str_desc"
for (make in cars) {
PROCINFO["sorted_in"]="#ind_num_asc"
for (yr in cars[make])
for (n in cars[make][yr])
print cars[make][yr][n]
}
}
' auto.dat
Both of these generate:
toyota corolla 1995 95000 2
maruti esteem 1997 98000 1
maruti swift 2007 50000 5
maruti dezire 2009 3100 6
maruti swift 2009 4100 5
honda accord 2000 60000 2
honda city 2005 60000 3
honda city 2010 33000 6
ford ikon 1995 80000 1
fiat punto 2007 45000 3
chevy tavera 1999 10000 4
chevy beat 2005 33000 2

Printing columns and their headers using awk

I am trying to set titles for columns in my table and i used this code below:
$ awk 'BEGIN {print "Name\tDescription\tType\tPrice";}
> {print $1,"\t",$2,"\t",$3,"\t",$4,"\t",$NF;}
> END{print "Report Generated\n--------------";
> }' toys | column -s '/' -t
I have tried this code als, but I don't understand the sytax and the implementation there:
$ awk 'BEGIN {print "Name\tDescription\tType\tPrice";}
> {printf "%-24s %s\n", $1,$2,$3,$NF;}
> }' toys
The data and results that I want to have:
The table contains:
Mini car Kids under 5 Wireless 2000
Barbie Kids 6 - 12 Electronic 3000
Horse Kids 6 -8 Electronic 4000
Gitar 12 -16 ELectronic 45000
When I print the above command it gives me this output:
Name Description Type Price
Mini car Kids under 5 Wireless 2000
Barbie Kids 6 - 12 Electronic 3000
Horse Kids 6 -8 Electronic 4000
Gitar 12 -16 ELectronic 45000
I want help to print them like that:
Name Description Type Price
Mini car Kids under 5 Wireless 2000
Barbie Kids 6 - 12 Electronic 3000
Horse Kids 6 -8 Electronic 4000
Gitar 12 -16 ELectronic 45000
You need a formatting operator for each value that's being printed. Since you're printing 4 columns, you need 4 formatting operators, to specify the width of each column.
$ awk 'BEGIN {printf("%-10s %-15s %-10s %s\n", "Name", "Description", "Type", "Price");}
> {printf("%-10s %-15s %-10s %d\n", $1, $2, $3, $4)}
> }' toys
`%s` is for printing strings, `%d` is for printing integers.

how to Aggregate files or merge

Can any one help to merge defferent files by common data (columns)? please =(
file1.txt
ID Kg Year
3454 1000 2010
3454 1200 2011
3323 1150 2009
2332 1000 2011
3454 1156 201
file2.txt
ID Place
3454 A1
3323 A2
2332 A6
5555 A9
file 1+2
ID Kg Year Place
3454 1000 2010 A1
3454 1200 2011 A1
3323 1150 2009 A2
2332 1000 2011 A6
3454 1156 2013 A1
So second file should be connected to first. As you can see ID 5555 from file 2 just not using.
How to do it in linux or....
If you start with sorted files, the tool is join. In your case, you can sort on the fly.
join <(sort file1.txt) <(sort file2.txt)
The headers will be joined as well but won't appear on top. Pipe to sort -r
If you don't care about maintaining the order of lines, use karakfa's join command.
To keep the original order of lines, use awk
awk '
NR==FNR {place[$1]=$2; next}
$1 in place {print $0, place[$1]}
' file2.txt file1.txt | column -t
ID Kg Year Place
3454 1000 2010 A1
3454 1200 2011 A1
3323 1150 2009 A2
2332 1000 2011 A6
3454 1156 201 A1

Cut command not extracting fields properly by default delimiter

I have a text file in which I must cut the fields 3,4,5 and 8:
219 432 4567 Harrison Joel M 4540 Accountant 09-12-1985
219 433 4587 Mitchell Barbara C 4541 Admin Asst 12-14-1995
219 433 3589 Olson Timothy H 4544 Supervisor 06-30-1983
219 433 4591 Moore Sarah H 4500 Dept Manager 08-01-1978
219 431 4527 Polk John S 4520 Accountant 09-22-1998
219 432 4567 Harrison Joel M 4540 Accountant 09-12-1985
219 432 1557 Harrison James M 4544 Supervisor 01-07-2000
Since the delimiter by default is tab the command to extract the fields would be:
cut -f 3,4,5,8 filename
The thing is that the output is the same as the original file content. What is happening here? Why doesn't this work?
Your file doesn't actually contain any tab characters.
By default, cut prints any lines which don't contain the delimiter, unless you specify the -s option.
Since your records are aligned on character boundaries rather than tab-separated, you should use the -c option to specify which columns to cut. For example:
cut -c 9-12,14-25,43-57 file

Combine results of column one Then sum column 2 to list total for each entry in column one

I am bit of Bash newbie, so please bear with me here.
I have a text file dumped by another software (that I have no control over) listing each user with number of times accessing certain resource that looks like this:
Jim 109
Bob 94
John 92
Sean 91
Mark 85
Richard 84
Jim 79
Bob 70
John 67
Sean 62
Mark 59
Richard 58
Jim 57
Bob 55
John 49
Sean 48
Mark 46
.
.
.
My goal here is to get an output like this.
Jim [Total for Jim]
Bob [Total for Bob]
John [Total for John]
And so on.
Names change each time I run the query in the software, so static search on each name and then piping through wc does not help.
This sounds like a job for awk :) Pipe the output of your program to the following awk script:
your_program | awk '{a[$1]+=$2}END{for(name in a)print name " " a[name]}'
Output:
Sean 201
Bob 219
Jim 245
Mark 190
Richard 142
John 208
The awk script itself can be explained better in this format:
# executed on each line
{
# 'a' is an array. It will be initialized
# as an empty array by awk on it's first usage
# '$1' contains the first column - the name
# '$2' contains the second column - the amount
#
# on every line the total score of 'name'
# will be incremented by 'amount'
a[$1]+=$2
}
# executed at the end of input
END{
# print every name and its score
for(name in a)print name " " a[name]
}
Note, to get the output sorted by score, you can add another pipe to sort -r -k2. -r -k2 sorts the by the second column in reverse order:
your_program | awk '{a[$1]+=$2}END{for(n in a)print n" "a[n]}' | sort -r -k2
Output:
Jim 245
Bob 219
John 208
Sean 201
Mark 190
Richard 142
Pure Bash:
declare -A result # an associative array
while read name value; do
((result[$name]+=value))
done < "$infile"
for name in ${!result[*]}; do
printf "%-10s%10d\n" $name ${result[$name]}
done
If the first 'done' has no redirection from an input file
this script can be used with a pipe:
your_program | ./script.sh
and sorting the output
your_program | ./script.sh | sort
The output:
Bob 219
Richard 142
Jim 245
Mark 190
John 208
Sean 201
GNU datamash:
datamash -W -s -g1 sum 2 < input.txt
Output:
Bob 219
Jim 245
John 208
Mark 190
Richard 142
Sean 201

Resources