deleting lines from a text file with bash - linux

I have two sets of text files. First set is in AA folder. Second set is in BB folder. The content of ff.txt file from first set(AA folder) is shown below.
Name number marks
john 1 60
maria 2 54
samuel 3 62
ben 4 63
I would like to print the second column(number) from this file if marks>60. The output will be 3,4. Next, read the ff.txt file in BB folder and delete the lines containing numbers 3,4. How can I do this with bash?
files in BB folder looks like this. second column is the number.
marks 1 11.824 24.015 41.220 1.00 13.65
marks 1 13.058 24.521 40.718 1.00 11.82
marks 3 12.120 13.472 46.317 1.00 10.62
marks 4 10.343 24.731 47.771 1.00 8.18

awk 'FNR == NR && $3 > 60 {array[$2] = 1; next} {if ($2 in array) next; print}' AA/ff.txt BB/filename

This works, but is not efficient (is that matter?)
gawk 'BEGIN {getline} $3>60{print $2}' AA/ff.txt | while read number; do gawk -v number=$number '$2 != number' BB/ff.txt > /tmp/ff.txt; mv /tmp/ff.txt BB/ff.txt; done
Of course, the second awk can be replaced with sed -i
For multi files:
ls -1 AA/*.txt | while read file
do
bn=`basename $file`
gawk 'BEGIN {getline} $3>60{print $2}' AA/$bn | while read number
do
gawk -v number=$number '$2 != number' BB/$bn > /tmp/$bn
mv /tmp/$bn BB/$bn
done
done
I didn't test it, so if there is a problem, please comment.

Related

Sum of two maximum patterns in linux file

I am a newbie in linux. Need help for a command.
I have file in linux with following values:
2-1
2-10
2-11
2-12
2-2
2-3
1-1
1-10
1-11
1-2
1-3
1-9
Needed output needed is 23. Sum of maximum from 1- & 2- pattern i.e. 11 from 1-11 & 12 from 2-12
awk -F"-" 'BEGIN{a=0; b=0;} {if(int($1)==1 && int($2)>a){a=int($2)}; if(int($1)==2 && int($2)>b){b=int($2)}}END{print a+b}' file
output:
23
Another awk using ternary operator
awk -v FS='-' '{m1=($1==1?(m1>$2?m1:$2):m1);m2=($1==2?(m2>$2?m2:$2):m2)}END{print m1+m2}' file
sort + awk pipeline:
sort -t- -k2 -n file | awk -F'-' '{a[$1]=$2}END{ print a[1]+a[2] }'
The output:
23
$ awk -F'-' '{max[$1] = ($2 > max[$1] ? $2 : max[$1])} END{for (key in max) sum+=max[key]; print sum}' file
23
$ awk -F- 'a[$1]<$2{a[$1]=$2}END{for(i in a)s+=a[i]; print s}' infile
23

compare if two columns of a file is identiical in linux

I would like to compare if two columns (mid) in a file is identical to each other. I am not sure of how to do it...Since the original file that I am working one is rather huge (in Gb)
file 1 (column1 and column 4 - to check if they are identical)
mid A1 A2 mid A3 A4 A5 A6
18 we gf 18 32 23 45 89
19 ew fg 19 33 24 46 90
21 ew fg 21 35 26 48 92
Thanks
M
if you just need to find the different row, awk will do,
awk '$1!=$4{print $1,$4}' data
You can check using diff and awk for advance difference.
diff <(awk '{print $1}' data) <(awk '{print $4}' data)
The status code ($?)of this command will tell if they are same (zero) or different (non-zero).
You can use that in base expression like this too,
if diff <(awk '{print $1}' data) <(awk '{print $4}' data) >& /dev/null;
then
echo same;
else
echo different;
fi
Something like this:
awk '{ if ($1 == $4) { print "same"; } else { print "different"; } }' < foo.txt
Completing a litle bit the Shiplu Mokaddim answer, if you have another delimiter, for example in a csv file, you can use:
awk -F; '$1!=$4{print $1,$4}' data.csv | sed -r 's/ /;/g'
In this sample, the delimiter is a ";". The sed command in the end is to replace again the delimiter to the original one. Be sure that you donĀ“t have another space in you answer, i.e. and date time.
Question: Compare two columns value in the same file.
Answer:
cut -d, -f1 a.txt > b.txt ; cut -d, -f3 a.txt > c.txt ; cmp b.txt c.txt && echo "Column values are same"; rm -rf b.txt c.txt

Difference between column sum in file 1 and column sum in file 2

I have this input
file 1 file 2
A 10 222 77.11 11
B 20 2222 1.215 22
C 30 22222 12.021 33
D 40 222222 145.00 44
The output I need is (11+22+33+44)- (10+20+30+40) = 110-100=10
thank you in advance for your help
Here you go:
paste file1.txt file2.txt | awk '{s1+=$5; s2+=$2} END {print s1-s2}'
Or better yet (clever of #falsetru's answer to do the summing with a single variable):
paste file1.txt file2.txt | awk '{sum+=$5-$2} END {print sum}'
If you want to work with column N in file1 and column M in file2, this might be "easier", but less efficient:
paste <(awk '{print $N}' file1) <(awk '{print $M}' file2.txt) | awk '{sum+=$2-$1} END {print sum}'
It's easier in the sense that you don't have to count the right position of the column in the second file, but less efficient because of the added extra awk sub-processes.
Using awk:
$ cat test.txt
file 1 file 2
A 10 222 77.11 11
B 20 2222 1.215 22
C 30 22222 12.021 33
D 40 222222 145.00 44
$ tail -n +2 test.txt | awk '{s += $5 - $2} END {print s}'
10

linux command to get the last appearance of a string in a text file

I want to find the last appearance of a string in a text file with linux commands. For example
1 a 1
2 a 2
3 a 3
1 b 1
2 b 2
3 b 3
1 c 1
2 c 2
3 c 3
In such a text file, i want to find the line number of the last appearance of b which is 6.
I can find the first appearance with
awk '/ b / {print NR;exit}' textFile.txt
but I have no idea how to do it for the last occurrence.
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1
cat -n prints the file to STDOUT prepending line numbers.
grep greps out all lines containing "b" (you can use egrep for more advanced patterns or fgrep for faster grep of fixed strings)
tail -1 prints last line of those lines containing "b"
cut -f 1 prints first column, which is line # from cat -n
Or you can use Perl if you wish (It's very similar to what you'd do in awk, but frankly, I personally don't ever use awk if I have Perl handy - Perl supports 100% of what awk can do, by design, as 1-liners - YMMV):
perl -ne '{$n=$. if / b /} END {print "$n\n"}' textfile.txt
This can work:
$ awk '{if ($2~"b") a=NR} END{print a}' your_file
We check every second file being "b" and we record the number of line. It is appended, so by the time we finish reading the file, it will be the last one.
Test:
$ awk '{if ($2~"b") a=NR} END{print a}' your_file
6
Update based on sudo_O advise:
$ awk '{if ($2=="b") a=NR} END{print a}' your_file
to avoid having some abc in 2nd field.
It is also valid this one (shorter, I keep the one above because it is the one I thought :D):
$ awk '$2=="b" {a=NR} END{print a}' your_file
Another approach if $2 is always grouped (may be more efficient then waiting until the end):
awk 'NR==1||$2=="b",$2=="b"{next} {print NR-1; exit}' file
or
awk '$2=="b"{f=1} f==1 && $2!="b" {print NR-1; exit}' file

select the second line to last line of a file

How can I select the lines from the second line to the line before the last line of a file by using head and tail in unix?
For example if my file has 15 lines I want to select lines from 2 to 14.
tail -n +2 /path/to/file | head -n -1
perl -ne 'print if($.!=1 and !(eof))' your_file
tested below:
> cat temp
1
2
3
4
5
6
7
> perl -ne 'print if($.!=1 and !(eof))' temp
2
3
4
5
6
>
alternatively in awk you can use below:
awk '{a[count++]=$0}END{for(i=1;i<count-1;i++) print a[i]}' your_file
To print all lines but first and last ones you can use this awk as well:
awk 'NR==1 {next} {if (f) print f; f=$0}'
This always prints the previous line. To prevent the first one from being printed, we skip the line when NR is 1. Then, the last one won't be printed because when reading it we are printing the penultimate!
Test
$ seq 10 | awk 'NR==1 {next} {if (f) print f; f=$0}'
2
3
4
5
6
7
8
9

Resources