Using a script to Organize a file - linux

So I have a two flat files that is in the format as follows:
File1.txt
Customer1 12345 12346 12347
Customer2 14444 14445
File2.txt
12345 aol.com
12347 gmail.com
12346 google.com
14444 yahoo.com
14445 outlook.com
I need to be able to translate the file into something like this:
Customer1 aol.com google.com gmail.com
Customer2 yahoo.com outlook.com
This is what I have so far
$ awk 'NR==FNR {a[$1]=$2; next} $2 in a {print $0, a[$2]}' OFS='\t' File2.txt File1.txt
However this only looks at Column 2 of the first file, how can I expand it to look at all columns in file1

awk can loop through fields. Try something like this -
$: awk 'NR==FNR {a[$1]=$2; next;}
{ printf "%s ", $1;
for (i=2;i<=NF;i++) {
printf "%s ", a[$i];
};
printf "\n";
}' File2.txt File1.txt
Customer1 aol.com google.com gmail.com
Customer2 yahoo.com outlook.com

You can let awk go through the file, separate NF==2 and NF==3 records into two hashtables/arrays, then link the two arrays and print output.
You can also go through the file twice to do the same logic.
I think this gives you a good start.

Related

Print out only last 4 digits of mac addresses from 2nd column using awk in linux

I have made a shell script for getting the list of mac address using awk and arp-scan command. I want to strip the mac address to only last 4 digits i.e (i want to print only the letters yy)
ac:1e:04:0e:yy:yy
ax:8d:5c:27:yy:yy
ax:ee:fb:55:yy:yy
dx:37:42:c9:yy:yy
cx:bf:9c:a4:yy:yy
Try cut -d: -f5-
(Options meaning: delimiter : and fields 5 and up.)
EDIT: Or in awk, as you requested:
awk -F: '{ print $5 ":" $6 }'
here are a few
line=cx:bf:9c:a4:yy:yy
echo ${line:(-5)}
line=cx:bf:9c:a4:yy:yy
echo $line | cut -d":" -f5-
I imagine you want to strip the trailing spaces, but it isn't clear whether you want yy:yy or yyyy.
Anyhow, there are multiple ways to it but you already are running AWK and have the MAC in $2.
In the first case it would be:
awk '{match($2,/([^:]{2}:[^:]{2}) *$/,m); print m[0]}'
yy:yy
In the second (no colon :):
awk 'match($2,/([^:]{2}):([^:]{2}) *$/,m); print m[1]m[2]}'
yyyy
In case you don't have match available in your AWK, you'd need to resort to gensub.
awk '{print gensub(/.*([^:]{2}:[^:]{2}) *$/,"\\1","g",$2)}'
yy:yy
or:
awk '{print gensub(/.*([^:]{2}):([^:]{2}) *$/,"\\1\\2","g",$0)}'
yyyy
Edit:
I now realized the trailing spaces were added by anubhava in his edit; they were not present in the original question! You can then simply keep the last n characters:
awk '{print substr($2,13,5)}'
yy:yy
or:
awk '{print substr($2,13,2)substr($2,16,2)}'
yyyy
Taking into account that the mac address always is 6 octets, you probably could just do something like this to get the last 2 octets:
awk '{print substr($0,13)}' input.txt
While testing on the fly by using arp -an I notice that the output was not always printing the mac addresses in some cases it was returning something like:
(169.254.113.54) at (incomplete) on en4 [ethernet]
Therefore probably is better to filter the input to guarantee a mac address, this can be done by applying this regex:
^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$
Applying the regex in awk and only printing the 2 last octecs:
arp -an | awk '{if ($4 ~ /^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$/) print substr($4,13)}'
This will filter the column $4 and verify that is a valid MAC address, then it uses substr to just return the last "letters"
You could also split by : and print the output in multiple ways, for example:
awk '{if ($4 ~ /^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$/) split($4,a,":"); print a[5] ":" a[6]}
Notice the exp ~ /regexp/
This is true if the expression exp (taken as a string) is matched by regexp.
The following example matches, or selects, all input records with the upper-case letter `J' somewhere in the first field:
$ awk '$1 ~ /J/' inventory-shipped
-| Jan 13 25 15 115
-| Jun 31 42 75 492
-| Jul 24 34 67 436
-| Jan 21 36 64 620
So does this:
awk '{ if ($1 ~ /J/) print }' inventory-shipped

how to use awk/sed to deal with these two files to get a result that I want

I want to use awk/sed to deal with two files(a.txt and b.txt) below and get the result
cat a.txt
a UK
b Japan
c China
d Korea
e US
And cat b.txt results
c Russia
e Canada
The result that I want is as below:
a UK
b Japan
c Russia
d Korea
e Canada
With awk:
First fill aray/hash a with complete row ($0) and use first column ($1) from this row as index. Finally, print all elements of array/hash a with a loop.
awk '{a[$1]=$0} END{for(i in a) print a[i]}' file1 file2
Output:
a UK
b Japan
c Russia
d Korea
e Canada
try:
awk 'FNR==NR{A[$1]=$NF;next} {printf("%s %s\n",$1,$1 in A?A[$1]:$NF)}' b.txt a.txt
Checking here condition FNR==NR which will be TRUE only when first file(b.txt) is being read. Then creating an array named A whose index is $1 and have the value last column. Then using printf for printing 2 strings where first string is $1 and another is if $1 of a.txt is present in array A then print array A's value whose index is $1 else print last column of a.tzt itself.
EDIT: as OP had carriage characters into Input_files so please remove them by following too.
tr -d '\r' < b.txt > temp_b.txt && mv temp_b.txt b.txt
You can use the below one-liner:
join -a 1 -a 2 a.txt <( awk '{print $1, "--", $0, "--"}' < b.txt ) | sed 's/ --$//' | awk -F ' -- ' '{print $NF}'
We use awk to prefix each line in b.txt with a key and -- to give us a split point later:
<( awk '{print $1, "--", $0, "--"}' < b.txt )
Use the join command to join the files on common keys. The -a 1 option tells the command to
join -a 1 -a 2 a.txt <( awk '{print $1, "--", $0, "--"}' < b.txt )
Use sed to remove the -- parts that are on some end of lines:
sed 's/ --$//'
Use awk to print the last item on each line:
awk -F ' -- ' '{print $NF}'
$ awk 'NR==FNR{b[$1]=$2;next} {print $1, ($1 in b ? b[$1] : $2)}' b.txt a.txt
a UK
b Japan
c Russia
d Korea
e Canada

print a file content side by side bash

I have a file with below contents. I need to print each line side by side
hello
1223
man
2332
xyz
abc
Output desired:
hello 1223
man 2332
xyz abc
Is there any other alternative than paste command?
You can use this awk:
awk '{ORS = (NR%2 ? FS : RS)} 1' file
hello 1223
man 2332
xyz abc
This sets ORS (output record separator) equal to input field separator (FS) for odd numbered lines, for even numbered lines it will be set to input record separator (RS).
To get tabular data use column -t:
awk '{ORS = (NR%2 ? FS : RS)} 1' file | column -t
hello 1223
man 2332
xyz abc
awk/gawk solution:
$ gawk 'BEGIN{ OFS="\t"} { COL1=$1; getline; COL2=$1; print(COL1,COL2)}' file
hello 1223
man 2332
xyz abc
Bash solution (no paste command):
$ echo $(cat file) | while read col1 col2; do printf "%s\t%s\n" $col1 $col2; done
hello 1223
man 2332
xyz abc

Two text file comparison with grep

I have two files (a.txt, b.txt)
a.txt is a list of English words (one word in ever row)
b.txt contains in every row: a number, a space character, a 5-65 char long string
(for example b.txt can contain: 1234 dsafaaraehawada)
I would like to know which row in b.txt contains words from a.txt and how many of them?
Example input:
a.txt
green
apple
bar
b.txt
1212 greensdsdappleded
12124 dfsfsd
123 bardws
output:
2 1212 greensdsdappleded
1 123 bardws
First row contains 'green' and 'apple' (2)
Second row contains nothing.
Third row contains 'bar' (1)
Thats all I would like to know.
The code (By Mr. Barmar):
grep -F -o -f a.txt b.txt | sort | uniq -c | sort -nr
But it need to be modified.
Try something like this:
awk 'NR==FNR{A[$1]; next} {t=0; for (i in A) t+=gsub(i,"&",$2)} t{print t, $0}' file1 file2
Try something like this:
awk '
NR==FNR { list[$1]++; next }
{
cnt=0
for(word in list) {
if(index($2,word) > 0)
cnt++
}
if(cnt>0)
print cnt,$0
}' a.txt b.txt
Test:
$ cat a.txt
green
apple
bar
$ cat b.txt
1212 greensdsdappleded
12124 dfsfsd
123 bardws
$ awk '
NR==FNR { list[$1]++; next }
{
cnt=0
for(word in list) {
if(index($2,word) > 0)
cnt++
}
if(cnt>0)
print cnt,$0
}' a.txt b.txt
2 1212 greensdsdappleded
1 123 bardws

Awk coping when column is occassionly empty

I have output from a foreach loop in the form:
ABC123603LP 44Bq AAAA
ABC123603P 3BU AAAA
ABC123603ZZP AAAA
ABC123604DP 3BU BBBB
ABC123604LP 44Bq BBBB
ABC123605AP 4q CCCC
ABC123605DP 33BGU CCCC
ABC123606AP 35Bjq DDDD
ABC123606DP 4B DDDD
From this I wish to print columns 1 and 2 to the terminal with
echo ... | awk '{print $1, $2}'
However the third row and others prints ABC123603ZZP AAAA as the second column is blank in this case. How do I get around this?
Check the number of field before you print:
$ awk 'BEGIN {OFS="\t"}{ if (NF==2) print $1; else print $1, $2}' file
ABC123603LP 44Bq
ABC123603P 3BU
ABC123603ZZP
ABC123604DP 3BU
ABC123604LP 44Bq
ABC123605AP 4q
ABC123605DP 33BGU
ABC123606AP 35Bjq
ABC123606DP 4B
You can use sed instead:
echo ... | sed 's/\(A[^ ]*[\t ]*\)\([^ \t]*[0-9][^ \t]*\)*.*/\1 \2/'
One technique to remove the last column is to make awk think there are 1 fewer fields:
awk -v OFS='\t' '{NF--; print}'

Resources