how to read a file in awk command - linux

I have two files that look like:
**file1.txt**
"a","1","11","111"
"b","2","22","222"
"c","3","33","333"
"d","4","44","444"
"e","5","55","555"
"f","6","66","666"
**file2.txt**
"b"
"d"
"a"
"c"
"e"
"f"
I need to create a script that changes the order of file1 and begin with the order of file2. e.g.:
"b","2","22","222"
"d","4","44","444"
"a","1","11","111"
"c","3","33","333"
"e","5","55","555"
"f","6","66","666"
I created a command that looks like:
nawk '/^("b")/' file1 ; nawk '/^("d")/' file1 ; nawk '/^("a")/' file1 ; nawk '/^("c")/' file1 ; nawk '/^("e")/' file1 ; nawk '/^("f")/' file1
It does the trick, however I would like to further automate it, but don't know how to proceed. How could I create a command or variable that would look at line 1 of file2("b") and put it the above command, then look at line 2 of file2("d"), and put it in the above command, and so on. Basically if possible, I would like the command to look at file 2 and fill in the blanks in the above command. Any other more convenient commands you guys can suggest would be appreciated. Note that I currently have to manually insert the letters from file 2 in the above command.
The actual file may contain well over 100 lines

awk -F, 'NR==FNR { a[$1]=$0; next }
($1 in a) { print a[$1] }' file1 file2
This reads all of file1 into memory, then prints in the order of file2. If file1 is very large, this may not be feasible.
This is a common Awk idiom; search the many near-duplicates if you need a more detailed explanation.

Related

how to Merge 2 tables with awk

First of all, sorry for my English and I know there's a lot of various topics regarding AWK but it's a very difficult function to me...
I would like to merge two tables using common columns with awk. The tables differ in the amount of rows. I have my first table that I want to modify and the second as a reference table. I would like to compare my colunme1.F1 with my column1.F2. When it matches, add the column2.F2 in my file1. But I need to keep all my lines in file1.
I give you an example:
File1
Num_id,Name,description1,description2,description3
?,atlanta_1,,,
RO_5,babeni_SW,,,
? ,Bib1,,,
RO_9,BoUba_456,,,
?,Castor,,,
File2
official_Num_id,official_Name
RO_1,America
RO_2,Andre
RO_3,Atlanta
RO_4,Axa
RO_5,Babeni
RO_6,Barba
RO_7,Bib
RO_8,Bilbao
RO_9,Bouba
RO_10,Castor
File3
Num_id,Name,description1,description2,description3,official_Name
?,atlanta_1,,,
RO_5,babeni_SW,,,Babeni
?,Bib1,,,
RO_9,BoUba_456,,,Bouba
?,Castor,,,
I read a lot of solution on Internet and it seems that awk could work ..
I tried awk 'NR==FNR {h[$1] = $2; next} {print $0,h[$1]}' $File1 $File2 > file3
But my command doesn't work, my File3 looks exactly that File1.
In a second time, I don't know if it's possible to compare my two second columns when names have difference like atlanta_1 and Atlanta and add the official_num_id and the official_name in my File1.
Any hero over there?
You had it, except for two small things. First you need to set your file separators to , and, second, reverse the order of your input files on the command line so that the reference file is processed first:
$ awk 'BEGIN {FS=OFS=","} NR==FNR {h[$1] = $2; next} {print $0,h[$1]}' File2 File1
Num_id,Name,description1,description2,description3,
?,atlanta_1,,,,
RO_5,babeni_SW,,,,Babeni
? ,Bib1,,,,
RO_9,BoUba_456,,,,Bouba
?,Castor,,,,
You can also use the join command for this:
join --header --nocheck-order -t, -1 1 -2 1 -a 1 file1 file2
To answer your question if it's possible to compare my two second columns when names have difference like atlanta_1 and Atlanta and add the official_num_id and the official_name in my File1:
$ awk '
BEGIN { FS=OFS="," }
NR==FNR { # file2
a[tolower($2)]=$0 # hash on lowercase city
next
}
{ # file1
split($2,b,"[^[:alpha:]]") # split on non-alphabet
print $0 (tolower(b[1]) in a?OFS a[tolower(b[1])]:"")
}' file2 file1
Num_id,Name,description1,description2,description3
?,atlanta_1,,,,RO_3,Atlanta
RO_5,babeni_SW,,,,RO_5,Babeni
? ,Bib1,,,,RO_7,Bib
RO_9,BoUba_456,,,,RO_9,Bouba
?,Castor,,,,RO_10,Castor
split will split Name field on non-alphabetic characters, ie _ in atlanta_1, 1 in Bib1 etc. so it might fail on cities with dashes etc., edit the pattern [^[:alpha:]] in split accordingly. Header doesn't match with those names, rethink the header names.

Arranging text in Linux

Let's say I have a file "example.txt" with the below content
user#computer ~ » cat example.txt
Output will be like "aws"
"aws-ec2"
"aws-reinvent-2016"
Now I want to rearrange the output like below:
"aws", "aws-ec2", "aws-reinvent-2016"
How can I do it?
You can use awk:
awk '{$1=$1}1' ORS=', ' example.txt
ORS=', ' set's the output record separator to ,. $1=$1 does not change anything in the line but still tells awk to reassemble the line using the new ORS. 1 will always evaluate to true and makes awk print the record.

read different fields and pass on to awk to extract those fields

Probably this is answered somewhere, but the things I have explored is not matching my need.
I would like to read different fields from one file (FILE1) and pass this on to a awk script, which can extract those fields from another file (FILE2).
FILE1
1 156202173 156702173
2 26915624 27415624
4 111714419 112214419
so read lines from this file and pass it on to the following script
awk ' BEGIN {FS=OFS="\t"};
{if ($1==$1 && $2>= $2 && $2<= $3 ) {print $0}}' FILE2 > extracted.file
The FILE2 looks like this;
1 156202182 rs7929618
16 8600861 rs7190157
4 111714800 rs12364336
12 3840048 rs4766166
7 20776538 rs35621824
so the awk script print only when there is a match with the first field and the value falls between the 2nd and 3rd field.
Expected output is
1 156202182 rs7929618
4 111714800 rs12364336
Thanks so much in advance for your response.
there should be plenty of similar questions but writing the script is faster than looking up.
$ awk 'NR==FNR{lower[$1]=$2; upper[$1]=$3; next}
lower[$1]<$2 && $2<upper[$1]' file1 file2
1 156202182 rs7929618
4 111714800 rs12364336

Comparing two files using awk and printing contains which are matching from other files

I have two files:
file1.txt
919167,hutch,mumbai
919594,idea,mumbai
file2.txt
919167000000
919594000000
Output
919167000000,hutch,mumbai
919594000000,idea,mumbai
How can I achieve this using AWK? I've got a huge file of phone numbers which needs to be compared like this. I believe Awk can handle it; if not please let me know how can I do this.
Extra definitions
Is the common part always a 6-digit number? Yes always 6.
Are the two files already sorted? file1 is not sorted. file2 can be sorted.
Are the trailing digits in file 2 always zeros? No, these are phone numbers this can vary, purpose of this is to get series information of the phone number.
Is there any danger of file 1 containing three records for a given number while file 2 contains 2 records, or is it one-to-one? It's one-to-one.
Can there be records in file 1 with no match in file 2, or vice versa?_ Yes.
If so, do you want to see the unmatched records? Yes I want both records.
Extended data
file1.txt
919167,hutch,mumbai
919594,idea,mumbai
918888,airtel,karnataka
file2.txt
919167838888
919594998484
919212334323
Output Expected:
919167838888,hutch,mumbai
919594998484,idea,mumbai
919212334323,nomatch,nomatch
As I noted in a comment, there's a lot of unstated information needed to give a definitive answer. However, we can make some plausible guesses:
The common number is the first 6 digits of file 2 (we don't care about the trailing digits, but will simply copy them to the output).
The files are sorted in order.
If there are unmatched records in either file, those records will be ignored.
The tools of choice are probably sed and join:
sed 's/^\([0-9]\{6\}\)/\1,\1/' file2.txt |
join -t, -o 1.2,2.2,2.3 - file1.txt
This edits file2.txt to create a comma-separated first field with the 6-digit phone number followed by all the rest of the line. The input is fed to the join command, which joins on the first column, and outputs the 'rest of the line' (column 2) from file2.txt and columns 2 and 3 from file1.txt.
If the phone numbers are variable length, then the matching operation is horribly complex. For that, I'd drop into Perl (or Python) to do the work. If the data is unsorted, it can be sorted before being fed into the commands. If you want unmatched records, you can specify how to handle those in the options to join.
The extra information needed is now available. The key information is the 6-digits is fixed — phew! Since you're on Linux, I'm assuming bash is available with 'process substitution':
sort file2.txt |
sed 's/^\([0-9]\{6\}\)/\1,\1/' |
join -t, -o 1.2,2.2,2.3 -a 1 -a 2 -e 'no-match' - <(sort file1.txt)
If process substitution is not available, simply sort file1.txt in situ:
sort -o file1.txt file1.txt
Then use file1.txt in place of <(sort file1.txt).
I think the comment might be asking for inputs such as:
file1.txt
919167,hutch,mumbai
919594,idea,mumbai
902130,airtel,karnataka
file2.txt
919167000000
919594000000
919342313242
Output
no-match,airtel,karnataka
919167000000,hutch,mumbai
919342313242,no-match,no-match
919594000000,idea,mumbai
If that's not what the comment is about, please clarify by editing the question to add the extra data and output in a more readable format than comments allow.
Working with the extended data, this mildly modified command:
sort file2.txt |
sed 's/^\([0-9]\{6\}\)/\1,\1/' |
join -t, -o 1.2,2.2,2.3 -a 1 -e 'no-match' - <(sort file1.txt)
produces the output:
919167838888,hutch,mumbai
919212334323,no-match,no-match
919594998484,idea,mumbai
which looks rather like a sorted version of the desired output. The -a n options control whether the unmatched records from file 1 or file 2 (or both) are printed; the -e option controls the value printed for the unmatched fields. All of this is readily available from the man pages for join, of course.
Here's one way using GNU awk. Run like:
awk -f script.awk file2.txt file1.txt
Contents of script.awk:
BEGIN {
FS=OFS=","
}
FNR==NR {
sub(/[ \t]+$/, "")
line = substr($0, 0, 6)
array[line]=$0
next
}
{
printf ($1 in array) ? $0"\n" : "FILE1 no match --> "$0"\n"
dup[$1]++
}
END {
for (i in array) {
if (!(i in dup)) {
printf "FILE2 no match --> %s\n", array[i]
}
}
}
Alternatively, here's the one-liner:
awk 'BEGIN { FS=OFS="," } FNR==NR { sub(/[ \t]+$/, ""); line = substr($0, 0, 6); array[line]=$0; next } { printf ($1 in array) ? $0"\n" : "FILE1 no match --> "$0"\n"; dup[$1]++} END { for (i in array) if (!(i in dup)) printf "FILE2 no match --> %s\n", array[i] }' file2.txt file1.txt
awk -F, 'FNR==NR{a[$1]=$2","$3;next}{for(i in a){if($1~/i/) print $1","a[i]}}' your_file

extracting data from two list using a shell script

I am trying to create a shell script that pulls a line from a file and checks another file for an instance of the same. If it finds an entry then it adds it to another file and loops through the first list until the it has gone through the whole file. The data in the first file looks like this -
email#address.com;
email2#address.com;
and so on
The other file in which I am looking for a match and placing the match in the blank file looks like this -
12334 email#address.com;
32213 email2#address.com;
I want it to retain the numbers as well as the matching data. I have an idea of how this should work but need to know how to implement it.
My Idea
#!/bin/bash
read -p "enter first file name:" file1
read -p "enter second file name:" file2
FILE_DATA=( $( /bin/cat $file1))
FILE_DATA1=( $( /bin/cat $file2))
for I in $((${#FILE_DATA[#]}))
do
echo $FILE_DATA[$i] | grep $FILE_DATA1[$i] >> output.txt
done
I want the output to look like this but only for addresses that match -
12334 email#address.com;
32213 email2#address.com;
Thank You
quite like manipulating text using SQL:
$ cat file1
b#address.com
a#address.com
c#address.com
d#address.com
$ cat file2
10712 e#address.com
11457 b#address.com
19985 f#address.com
22519 d#address.com
$ join -1 1 -2 2 <(sort file1) <(sort -k2 file2) | awk '{print $2,$1}'
11457 b#address.com
22519 d#address.com
make keys sorted(we use emails as keys here)
join on keys(file1.column1, file2.column2)
format output(use awk to reverse columns)
As you've learned about diff and comm, now it's time to learn about another tool in the unix toolbox, join.
Join does just what the name indicates, it joins together 2 files. The way you join is based on keys embedded in the file.
The number 1 restraint on using join is that the data must be sorted in both files on the same column.
file1
a abc
b bcd
c cde
file2
a rec1
b rec2
c rec3
join file1 file2
a abc rec1
b bcd rec2
c cde rec3
you can consult the join man page for how to reduce and reorder the columns of output. for example
1>join -o 1.1 2.2 file1 file2
a rec1
b rec2
c rec3
You can use your code for file name input to turn this into a generalizable script.
Your solution using a pipeline inside a for loop will work for small sets of data, but as the size of data grows, the cost of starting a new process for each word you are searching for will drag down the run time.
I hope this helps.
Read line by the file1.txt file and assign the line to var ADDR. grep file2.txt with the content of var ADDR and append the output to file_result.txt.
(while read ADDR; do grep "${ADDR}" file2.txt >> file_result.txt ) < file1.txt
This awk one-liner can help you do that -
awk 'NR==FNR{a[$1]++;next}($2 in a){print $0 > "f3.txt"}' f1.txt f2.txt
NR and FNR are awk's built-in variables that stores the line numbers. NR does not get reset to 0 when working with two files. FNR does. So while that condition is true we add everything to an array a. Once the first file is completed, we check for the second column of second file. If a match is present in the array we put the entire line in a file f3.txt. If not then we ignore it.
Using data from Kev's solution:
[jaypal:~/Temp] cat f1.txt
b#address.com
a#address.com
c#address.com
d#address.com
[jaypal:~/Temp] cat f2.txt
10712 e#address.com
11457 b#address.com
19985 f#address.com
22519 d#address.com
[jaypal:~/Temp] awk 'NR==FNR{a[$1]++;next}($2 in a){print $0 > "f3.txt"}' f1.txt f2.txt
[jaypal:~/Temp] cat f3.txt
11457 b#address.com
22519 d#address.com

Resources