write a two column file from two files using awk - linux

I have two files of one column each
1
2
3
and
4
5
6
I want to write a unique file with both elements as
1 4
2 5
3 6
It should be really simple I think with awk.

You could try paste -d ' ' <file1> <file2>. (Without -d ' ' the delimiter would be tab.)

paste works okay for the example given but it doesn't handle variable length lines very well. A nice little-know core-util pr provides a more flexible solution:
$ pr -mtw 4 file1 file2
1 4
2 5
3 6
A variable length example:
$ pr -mtw 22 file1 file2
10 4
200 5
300,000,00 6
And since you asked about awk here is one way:
$ awk '{a[FNR]=a[FNR]$0" "}END{for(i=1;i<=length(a);i++)print a[i]}' file1 file2
1 4
2 5
3 6

Using awk
awk 'NR==FNR { a[FNR]=$0;next } { print a[FNR],$0 }' file{1,2}
Explanation:
NR==FNR will ensure our first action statement runs for first file only.
a[FNR]=$0 with this we are inserting first file into array a indexed at line number
Once first file is complete we move to second action
Here we print each line of first file along with second file

Related

awk key-value issue for array

I meet a awk array issue, details as below:
[~/temp]$ cat test.txt
1
2
3
4
1
2
3
Then I want to count the frequency of the number.
[~/temp]$ awk 'num[$1]++;END{for (i in num){printf("%s\t%-s\n", num[i],i)|"sort -r -n -k1"} }' test.txt
1
2
3
2 3
2 2
2 1
1 4
As you see, why does the output of first 3 line '1 2 3' will come blank value?
Thank for your answer.
An awk statement consists of a pattern and related action. Omitted pattern matches every record of input. Omitted action is an alias to {print $0}, ie. output the current record, which is what you are getting. Looking at the first part of your program:
$ awk 'num[$1]++' file
1
2
3
Let's change that a bit to understand what happens there:
$ awk '{print "NR:",NR,"num["$1"]++:",num[$1]++}' file
NR: 1 num[1]++: 0
NR: 2 num[2]++: 0
NR: 3 num[3]++: 0
NR: 4 num[4]++: 0
NR: 5 num[1]++: 1
NR: 6 num[2]++: 1
NR: 7 num[3]++: 1
Since you are using postfix operator num[$1]++ in the pattern, on records 1-4 it gets evaluated to 0 before it's value is incremented. Output would be different if you used the prefix operator ++num[$1] which would first increment the value of the variable after which it would get evaluated and would lead to outputing every record of input, not just the last three, which you were getting.
Correct way would've been to use num[$1]++ as an action, not as a pattern:
$ awk '{num[$1]++}' file
Put your "per line" part in {} i.e. { num[$1]++; }
awk programs a a collection of [pattern] { actions } (the pattern is optional, the {} is not). Seems that in your case your line is being treated as the pattern.

Compare 2 files using awk-If 2nd field is same,sum the 1st field & print it-if not print it(true for non-matching entries in both the files)

I have two files -
File 1:
2 923000026531
1 923000031178
2 923000050000
1 923000050278
1 923000051178
1 923000060000
File 2:
2 923000050000
3 923000050278
1 923000051178
1 923000060000
4 923000026531
1 923335980059
I want to achieve the following using awk:
1- If 2nd field is same, sum the 1st field and print it.
2- If 2nd field is not same, print the line as it is. This will have two cases.
2(a) If 2nd field is not same & record belongs to first file
2(b) If 2nd field is not same & record belongs to second file
I have achieved the following using this command:
Command: gawk 'FNR==NR{f1[$2]=$1;next}$2 in f1{print f1[$2]+$1,$2}!($2 in f1){print $0}' f1 f2
Result:
4 923000050000
4 923000050278
2 923000051178
2 923000060000
6 923000026531
1 923335980059
However, this doesn't contains the records which were in first file & whose second field didn't match that of the second file i.e. case 2(a), to be more specific, the following record is not present in the final file:
1 923000031178
I know there are multiple work around using extra commands but I am interested if this can be somehow done in the same command.
give this one-liner a try:
$ awk '{a[$2]+=$1}END{for(x in a)print a[x], x}' f1 f2
2 923000060000
2 923000051178
1 923000031178
6 923000026531
4 923000050278
4 923000050000
1 923335980059

Extract substring from first column

I have a large text file with 2 columns. The first column is large and complicated, but contains a name="..." portion. The second column is just a number.
How can I produce a text file such that the first column contains ONLY the name, but the second column stays the same and shows the number? Basically, I want to extract a substring from the first column only AND have the 2nd column stay unaltered.
Sample data:
application{id="1821", name="app-name_01"} 0
application{id="1822", name="myapp-02", optionalFlag="false"} 1
application{id="1823", optionalFlag="false", name="app_name_public"} 3
...
So the result file would be something like this
app-name_01 0
myapp-02 1
app_name_public 3
...
If your actual Input_file is same as the shown sample then following code may help you in same.
awk '{sub(/.*name=\"/,"");sub(/\".* /," ")} 1' Input_file
Output will be as follows.
app-name_01 0
myapp-02 1
app_name_public 3
Using GNU awk
$ awk 'match($0,/name="([^"]*)"/,a){print a[1],$NF}' infile
app-name_01 0
myapp-02 1
app_name_public 3
Non-Gawk
awk 'match($0,/name="([^"]*)"/){t=substr($0,RSTART,RLENGTH);gsub(/name=|"/,"",t);print t,$NF}' infile
app-name_01 0
myapp-02 1
app_name_public 3
Input:
$ cat infile
application{id="1821", name="app-name_01"} 0
application{id="1822", name="myapp-02", optionalFlag="false"} 1
application{id="1823", optionalFlag="false", name="app_name_public"} 3
...
Here's a sed solution:
sed -r 's/.*name="([^"]+).* ([0-9]+)$/\1 \2/g' Input_file
Explanation:
With the parantheses your store in groups what's inbetween.
First group is everything after name=" till the first ". [^"] means "not a double-quote".
Second group is simply "one or more numbers at the end of the line preceeded with a space".

awk print number of row only in uniq column

I have data set like this:
1 A
1 B
1 C
2 A
2 B
2 C
3 B
3 C
And I have a script which calculates me:
Number of occurrences in searching string
Number of rows
awk -v search="A" \
'BEGIN{count=0} $2 == search {count++} END{print count "\n" NR}' input
That works perfectly fine.
I would like to add to my awk one liner number of unique lines from the first column.
So the output should be separated by \n:
2
8
3
I can do this in separate awk code, but I am not able to integrate it to my original awk code.
awk '{a[$1]++}END{for(i in a){print i}}' input | wc -l
Any idea how to integrate it in one awk solution without piping ?
Looks like you want this:
awk -v search="A" '{a[$1]++}
$2 == search {count++}
END{OFS="\n";print count+0, NR, length(a)}' file

Generate ranges from entries in a file

I have a txt file containing entries as below
1
2
3
4
7
8
9
12
14
15
I need to generate ranges as below
1-4
7-9
12-12
14-15
How do I achieve the above output?
This is what I tried:
awk '{q=$1}{f=$1}{print $q} $1!=p+1{print l"-"f}{l=p+1}{p=$1} END{print}' filename
I would say...
awk -v OFS=- 'prev+1<$0 {print first ? first : 1,prev; first=$0}
{prev=$0}
END {print first, prev}' file
For your given file it returns:
$ awk -v OFS=- 'prev+1<$0 {print first ? first : 1,prev; first=$0} {prev=$0} END {print first, prev}' file
1-4
7-9
12-12
14-15
I won't go through your attempt awk '{q=$1}{f=$1}{print $q} $1!=p+1{print l"-"f}{l=p+1}{p=$1} END{print}' filename but I do suggest you to use more representative variable names, as well as to start from a little piece and then make your script grow. Otherwise, it becomes a jungle you want to throw away once it does not work.

Resources