add new column of a number - linux

I am writing the following codes to extract data from an existing file using awk within a for loop.
for c in {1..300}
do
awk '{if($28==1) print $12,$26,$28}' file1.txt > file2.txt
done
This is ok and I have file2.txt
Now, I want to add a new column containing c generated from the for loop above.
I do this but it does not work.
awk '{if($28==1) print $12,$26,$28, paste c}' file1.txt > file2.txt
This only works when I replace c by a real number such as 1,2,...
Finally, I want to append file1.txt to file2.txt, i.e. every time the loop runs it will add new data to file2.txt.
I have this but it seems not the best:
awk <file2.txt>> file_final.txt
Can you please give me some advice? Thank you!
Phuong
Using the approach recommended by William below, I am able to do produce outputs that I want. Thank you very much!

rm file2.txt
for c in $(seq 1 300); do
awk '$28==1{print $12,$26,$28,c}' c=$c file1.txt >> file2.txt
done

You need to transfer the shell variable c to awk:
awk -v c="$c" '{if($28==1) print $12,$26,$28,c}' OFS=', ' file1.txt
-v c="$c" creates an awk variable, called c, which has the value of the shell variable $c.
Example
Using fake input data:
$ c=2; echo {1..28} | awk -v c="$c" '{if($28==1) print $12,$26,$28,c}' OFS=', ' file1.txt
12, 26, 1, 2
Example in a loop
Let's start with this sample file:
$ cat file1.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 1
Now, let's run the awk command in a loop and look at the output:
$ for c in {1..3}; do awk -v c="$c" '{if($28==1) print $12,$26,$28,c}' OFS=', ' file1.txt; done >file2.txt
$ cat file2.txt
12, 26, 1, 1
12, 26, 1, 2
12, 26, 1, 3

Related

How to remove lines based on another file? [duplicate]

This question already has answers here:
How to delete rows from a csv file based on a list values from another file?
(3 answers)
Closed 2 years ago.
Now I have two files as follows:
$ cat file1.txt
john 12 65 0
Nico 3 5 1
king 9 5 2
lee 9 15 0
$ cat file2.txt
Nico
king
Now I would like to remove each line which contains a name fron the second file in its first column.
Ideal result:
john 12 65 0
lee 9 15 0
Could anyone tell me how to do that? I have tried the code like this:
for i in 'less file2.txt'; do sed "/$i/d" file1.txt; done
But it does not work properly.
You don't need to iterate it, you just need to use grep with-v option to invert match and -w to force pattern to match only WHOLE words
grep -wvf file2.txt file1.txt
This job suites awk:
awk 'NR == FNR {a[$1]; next} !($1 in a)' file2.txt file1.txt
john 12 65 0
lee 9 15 0
Details:
NR == FNR { # While processing the first file
a[$1] # store the first field in an array a
next # move to next line
}
!($1 in a) # while processing the second file
# if first field doesn't exist in array a then print

Sum all the numbers in a file given by positional parameter

I want to sum all the numbers in a file (columns and lines) given by the first parameter, but my program shows sum=sum+$i instead of the numeric sum:
sum=0;
file=$1
for i in $file
do
sum=sum+$i;
done;
echo "The sum is: " $sum
Input file:
$cat file.txt
10 20 10
40
50
Expected output :
The sum is: 21
Maybe if there is an awk method to solve this?
Try this -
$cat file1.txt
10 20 10
40
50
$awk '{for(i=1;i<=NF;i++) {sum+=$i}} END {print sum}' file1.txt
130
OR
$xargs < file1.txt| tr ' ' + | bc
130
cat file.txt | xargs | sed -e 's/\ /+/g' | bc
You can also use a simple read and an array to sum the value relying on word splitting to separate the values into an array via the default IFS (Internal Field Separator), e.g.
#!/bin/bash
declare -i sum=0
fn="${1:-/dev/stdin}" ## read from file as 1st argument (default stdin)
while read -r line; do ## read each line
a=( $line ) ## separate values into array
for i in ${a[#]}; do ## for each value in array
((sum += i)) ## add to sum
done
done <"$fn"
echo "sum: $sum"
Example Input File
$ cat dat/numfile.txt
10 20 10
40
50
Example Use/Output
$ bash sumnumfile.sh dat/numfile.txt
sum: 130
Another for some awks (at least mawk and gawk):
$ awk -v RS="[^0-9]" '{s+=$1}END{print s}' file
130

how to sort this in bash

Hello I have a file containing these lines:
apple
12
orange
4
rice
16
how to use bash to sort it by numbers ?
Suppose each number is the price for the above object.
I want they are formatted like this:
12 apple
4 orange
16 rice
or
apple 12
orange 4
rice 16
Thanks
A solution using paste + sort to get each product sorted by its price:
$ paste - - < file|sort -k 2nr
rice 16
apple 12
orange 4
Explanation
From paste man:
Write lines consisting of the sequentially corresponding lines from
each FILE, separated by TABs, to standard output. With no FILE, or
when FILE is -, read standard input.
paste gets the stream coming from the stdin (your <file) and figures that each line belongs to the fictional archive represented by - , so we get two columns using - -
sort use the flag -k 2nr to get paste output sorted by second column in reverse numerical order.
you can use awk:
awk '!(NR%2){printf "%s %s\n" ,$0 ,p}{p=$0}' inputfile
(slightly adapted from this answer)
If you want to sort the output afterwards, you can use sort (quite logically):
awk '!(NR%2){printf "%s %s\n" ,$0 ,p}{p=$0}' inputfile | sort -n
this would give:
4 orange
12 apple
16 rice
Another solution using awk
$ awk '/[0-9]+/{print prev, $0; next} {prev=$0}' input
apple 12
orange 4
rice 16
while read -r line1 && read -r line2;do
printf '%s %s\n' "$line1" "$line2"
done < input_file
If you want lines to be sorted by price, pipe the result to sort -k2:
while read -r line1 && read -r line2;do
printf '%s %s\n' "$line1" "$line2"
done < input_file | sort -k2
You can do this using paste and awk
$ paste - - <lines.txt | awk '{printf("%s %s\n",$2,$1)}'
12 apple
4 orange
16 rice
an awk-based solution without needing external paste / sort, using regex, calculating modulo % of anything, or awk/bash loops
{m,g}awk '(_*=--_) ? (__ = $!_)<__ : ($++NF = __)_' FS='\n'
12 apple
4 orange
16 rice

Difference between column sum in file 1 and column sum in file 2

I have this input
file 1 file 2
A 10 222 77.11 11
B 20 2222 1.215 22
C 30 22222 12.021 33
D 40 222222 145.00 44
The output I need is (11+22+33+44)- (10+20+30+40) = 110-100=10
thank you in advance for your help
Here you go:
paste file1.txt file2.txt | awk '{s1+=$5; s2+=$2} END {print s1-s2}'
Or better yet (clever of #falsetru's answer to do the summing with a single variable):
paste file1.txt file2.txt | awk '{sum+=$5-$2} END {print sum}'
If you want to work with column N in file1 and column M in file2, this might be "easier", but less efficient:
paste <(awk '{print $N}' file1) <(awk '{print $M}' file2.txt) | awk '{sum+=$2-$1} END {print sum}'
It's easier in the sense that you don't have to count the right position of the column in the second file, but less efficient because of the added extra awk sub-processes.
Using awk:
$ cat test.txt
file 1 file 2
A 10 222 77.11 11
B 20 2222 1.215 22
C 30 22222 12.021 33
D 40 222222 145.00 44
$ tail -n +2 test.txt | awk '{s += $5 - $2} END {print s}'
10

select the second line to last line of a file

How can I select the lines from the second line to the line before the last line of a file by using head and tail in unix?
For example if my file has 15 lines I want to select lines from 2 to 14.
tail -n +2 /path/to/file | head -n -1
perl -ne 'print if($.!=1 and !(eof))' your_file
tested below:
> cat temp
1
2
3
4
5
6
7
> perl -ne 'print if($.!=1 and !(eof))' temp
2
3
4
5
6
>
alternatively in awk you can use below:
awk '{a[count++]=$0}END{for(i=1;i<count-1;i++) print a[i]}' your_file
To print all lines but first and last ones you can use this awk as well:
awk 'NR==1 {next} {if (f) print f; f=$0}'
This always prints the previous line. To prevent the first one from being printed, we skip the line when NR is 1. Then, the last one won't be printed because when reading it we are printing the penultimate!
Test
$ seq 10 | awk 'NR==1 {next} {if (f) print f; f=$0}'
2
3
4
5
6
7
8
9

Resources