How to append a column for the result set in shell script - linux

I need a script for the below scenario. I am very new to shell script.
wc file1 file2
the above query results with following result
40 149 947 file1
2294 16638 97724 file2
Now I need to get result as follows: 1st column, 3rd column ,4th column of above result set and new column with default values
40 947 file1 DF.tx1
2294 97724 file2 DF.rb2
Here the last column values is always known values i.e for file1 DF.tx1 and file2 DF.rb2.
If the give filenames in any order the default values should not change.
Please help me to write this script. Thanks in advance!!

You can use awk:
wc file1 file2 |
awk '$4 != "total"{if ($4 ~ /file1/) f="DF.tx1"; else if ($4 ~ /file2/) f="DF.rb2";
else if ($4 ~ /file3/) f="foo.bar"; print $1, $3, $4, f}'
1 12 file1 DF.tx1
9 105 file2 DF.rb2
5 15 file3 foo.bar

Related

AWK to filter to files if their columns match

I basically am working with two files (file1 and file2). The goal is to write a script that pulls rows from file1, if columns 1,2,3 match between files1 and files2. Here's the code I have been playing with:
awk -F'|' 'NR==FNR{c[$1$2$3]++;next};c[$1$2$3] > 0' file1 file2 > filtered.txt
ile1 and file2 both look like this (but has many more columns):
name1 0 c
name1 1 c
name1 2 x
name2 3 x
name2 4 c
name2 5 c
The awk code I provided isn't producing any output. Any help would be appreciated!
your delimiter isn't pipe, try this
$ awk 'NR==FNR {c[$1,$2,$3]++; next} c[$1,$2,$3]' file1 file2 > filtered.txt
or
$ awk 'NR==FNR {c[$0]++; next} c[$0]' file1 file2 > filtered.txt
however, if you're matching the whole line perhaps easier with grep
$ grep -xFf file1 file2 > filtered.txt
awk '{key=$1 FS $2 FS $3} NR==FNR{file2[key];next} key in file2' file2 file1

Count duplicates from several files

I have five files which contain some duplicate strings.
file1:
a
file2:
b
file3:
a
b
file4:
b
file5:
c
So i used awk 'NR==FNR{A[$0];next}$0 in A' file1 file2 file3 file4 file5
And it prints $ a, but as you see there is b string 3 times repeated in other files, but print only a.
So how to get all repeated string (a b) from analysing/comparing every file with each other using one line command? Also how do I get the number of repeats for each element.
I suggest with GNU sort and uniq:
sort file[1-5] | uniq -dc
Output:
2 a
3 b
From man uniq:
-d: only print duplicate lines
-c: prefix lines by the number of occurrences
you can use one of these;
awk '{count[$0]++}END{for (a in count) {if (count[a] > 1 ) {print a}}}' file1 file2 file3 file4 file5
or
awk 'seen[$0]++ == 1' file1 file2 file3 file4 file5
you could test this for a=3 and b=4.
awk '{count[$0]++} END {for (line in count) if ( count[line] == 3 && line == "a" || count[line] == 4 && line == "b" ) {print line} }' file1 file2 file3 file4 file5
test:
$ awk '{count[$0]++}END{for (a in count) {if (count[a] > 1 ) {print a}}}' file1 file2 file3 file4 file5
a
b
$ awk 'seen[$0]++ == 1' file1 file2 file3 file4 file5
a
b
$ awk '{count[$0]++} END {for (line in count) if ( count[line] == 2 && line == "a" || count[line] == 3 && line == "b" ) {print line, count[line]} }' 1 2 3 4 5
a 2
b 3
In awk:
$ awk '{ a[$1]++ } END { for(i in a) if(a[i]>1) print i,a[i] }' file[1-5]
a 2
b 3
It counts the occurrances of each record (character in this case) and prints out the ones with count more than one.

How can I adjust the length of a column field in bash using awk or sed?

I've an input.csv file in which columns 2 and 3 have variable lengtt.
100,Short Column, 199
200,Meeedium Column,1254
300,Loooooooooooong Column,35
I'm trying to use the following command to achieve a clean tabulation, but I need to fill the 2nd column with a certain number of blank spaces in order to get a fixed lenght column (let's say that a total lenght of 30 is enough).
awk -F, '{print $1 "\t" $2 "\t" $3;}' input.csv
My current output looks like this:
100 Short Column 199
200 Meeedium Column 1254
300 Loooooooooooong Column 35
And I would like to achieve the following output, by filling 2nd and 3rd column properly:
100 Short Column 199
200 Meeedium Column 1254
300 Loooooooooooong Column 35
Any good idea out there about awk or sed command should be used?
Thanks everybody.
Use printf in awk
$ awk -F, '{gsub(/ /, "", $3); printf "%-5s %-25s%5s\n", $1, $2, $3}' file input.csv
100 Short Column 199
200 Meeedium Column 1254
300 Loooooooooooong Column 35
What I have done above, is set the IFS,field separator to ,; since the file has some white-spaces in the 3rd column alone, it mangles, how printf processes the strings, removing it with gsub and formatting in C-style printf.
Rather than picking some arbitrary number as the width of each field, do a 2-pass approach where the first pass calculates the max length of each field and the 2nd prints the fields in a width that size plus a couple of spaces between fields:
$ cat tst.awk
BEGIN { FS=" *, *"; OFS=" " }
NR==FNR {
for (i=1;i<=NF;i++) {
w[i] = (length($i) > w[i] ? length($i) : w[i])
if ($i ~ /[^0-9]/) {
a[i] = "-"
}
}
next
}
{
for (i=1;i<=NF;i++) {
printf "%"a[i]w[i]"s%s", $i, (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file file
100 Short Column 199
200 Meeedium Column 1254
300 Loooooooooooong Column 35
The above also uses left-alignment for non-digit fields, right alignment for all-digits fields. It'll work no matter how long the input fields are and no matter how many fields you have:
$ cat file1
100000,Short Column, 199,a
100,Now is the Winter of our discontent with fixed width fields,20000,b
100,Short Column, 199,c
200,Meeedium Column,1254,d
300,Loooooooooooong Column,35,e
$ awk -f tst.awk file1 file1
100000 Short Column 199 a
100 Now is the Winter of our discontent with fixed width fields 20000 b
100 Short Column 199 c
200 Meeedium Column 1254 d
300 Loooooooooooong Column 35 e
Solution using perl
$ perl -pe 's/([^,]+),([^,]+),([^,]+)/sprintf "%-6s%-30s%5s", $1,$2,$3/e' input.csv
100 Short Column 199
200 Meeedium Column 1254
300 Loooooooooooong Column 35

Difference between column sum in file 1 and column sum in file 2

I have this input
file 1 file 2
A 10 222 77.11 11
B 20 2222 1.215 22
C 30 22222 12.021 33
D 40 222222 145.00 44
The output I need is (11+22+33+44)- (10+20+30+40) = 110-100=10
thank you in advance for your help
Here you go:
paste file1.txt file2.txt | awk '{s1+=$5; s2+=$2} END {print s1-s2}'
Or better yet (clever of #falsetru's answer to do the summing with a single variable):
paste file1.txt file2.txt | awk '{sum+=$5-$2} END {print sum}'
If you want to work with column N in file1 and column M in file2, this might be "easier", but less efficient:
paste <(awk '{print $N}' file1) <(awk '{print $M}' file2.txt) | awk '{sum+=$2-$1} END {print sum}'
It's easier in the sense that you don't have to count the right position of the column in the second file, but less efficient because of the added extra awk sub-processes.
Using awk:
$ cat test.txt
file 1 file 2
A 10 222 77.11 11
B 20 2222 1.215 22
C 30 22222 12.021 33
D 40 222222 145.00 44
$ tail -n +2 test.txt | awk '{s += $5 - $2} END {print s}'
10

deleting lines from a text file with bash

I have two sets of text files. First set is in AA folder. Second set is in BB folder. The content of ff.txt file from first set(AA folder) is shown below.
Name number marks
john 1 60
maria 2 54
samuel 3 62
ben 4 63
I would like to print the second column(number) from this file if marks>60. The output will be 3,4. Next, read the ff.txt file in BB folder and delete the lines containing numbers 3,4. How can I do this with bash?
files in BB folder looks like this. second column is the number.
marks 1 11.824 24.015 41.220 1.00 13.65
marks 1 13.058 24.521 40.718 1.00 11.82
marks 3 12.120 13.472 46.317 1.00 10.62
marks 4 10.343 24.731 47.771 1.00 8.18
awk 'FNR == NR && $3 > 60 {array[$2] = 1; next} {if ($2 in array) next; print}' AA/ff.txt BB/filename
This works, but is not efficient (is that matter?)
gawk 'BEGIN {getline} $3>60{print $2}' AA/ff.txt | while read number; do gawk -v number=$number '$2 != number' BB/ff.txt > /tmp/ff.txt; mv /tmp/ff.txt BB/ff.txt; done
Of course, the second awk can be replaced with sed -i
For multi files:
ls -1 AA/*.txt | while read file
do
bn=`basename $file`
gawk 'BEGIN {getline} $3>60{print $2}' AA/$bn | while read number
do
gawk -v number=$number '$2 != number' BB/$bn > /tmp/$bn
mv /tmp/$bn BB/$bn
done
done
I didn't test it, so if there is a problem, please comment.

Resources