Matching emails from second column in one file against another file - linux

I have two files, one with emails in it(useremail.txt), and another with email:phonenumber(emailnumber.txt).
useremail.txt contains:
John smith:blabla#hotmail.com
David smith:haha#gmail.com
emailnumber.txt contains:
blabla#hotmail.com:093748594
So the solution needs to grab the email from the second column of useremail and then search through the emailnumber file and find matches and output John smith:093748594, so just the name and phone number.
I'm on windows so I need a gawk or grep solution, I have tried for a long time trying to get it to work with awk/grep and can't find the right solution, any help would be really appreciated.

Another in (GNU) awk:
$ awk '
BEGIN {
# RS=ORS="\r\n" # since you are using GNU awk this replaces the sub()
FS=OFS=":" # input and output field separators
}
NR==FNR { # processing the first file
sub(/\r$/,"",$NF) # remove the \r after the email OR uncomment RS above
a[$2]=$1 # hash name, index on email
next # on to the next record
}
($1 in a) { # if email in second file matches one in hash
print a[$1],$2 # output. If ORS uncommented above, output ends in \r
# if not, you may want to add it to the print ... "\r"
}' useremail emailnumber
Output:
John smith:093748594
Since you tried the accepted answer in Linux and Windows and you use GNU awk, in the future you could set RS="\r?\n" which accepts both forms, \r\n and bare \n. However, I've recently ran into a problem with that form in a specific condition (for which I've not yet filed a bug report).

You could try this:
awk -F":" '(FNR==NR){a[$2]=$1}(FNR!=NR){print a[$1]":"$2}' useremail.txt emailnumber.txt
If there are entries in emailnumber.txt with no matching entry in useremail.txt:
awk -F":" '(FNR==NR){a[$2]=$1}(FNR!=NR){if(a[$1]){print a[$1]":"$2}}' useremail.txt emailnumber.txt

Related

Retrieve different information from several files to bring them together in one. BASH

I have a problem with my bash script, I would like to retrieve information contained in several files and gather them in one.
I have a file in this form which contains about 15000 lines: (file1)
1;1;A0200101C
2;2;A0200101C
3;3;A1160101A
4;4;A1160101A
5;5;A1130304G
6;6;A1110110U
7;7;A1110110U
8;8;A1030002V
9;9;A1030002V
10;10;A2120100C
11;11;A2120100C
12;12;A3410071A
13;13;A3400001A
14;14;A3385000G1
15;15;A3365070G1
I would need to retrieve the first record of each row matching the id.
My second file is this, I just need to retrieve the 3rd row: (file2)
count
-------
131
(1 row)
I would therefore like to be able to assemble the id of (file1) and the 3rd line of (file2) in order to achieve this result:
1;131
2;131
3;131
4;131
5;131
6;131
7;131
8;131
9;131
11;131
12;131
13;131
14;131
15;131
Thank you.
One possible way:
#!/usr/bin/env bash
count=$(awk 'NR == 3 { print $1 }' file2)
while IFS=';' read -r id _; do
printf "%s;%s\n" "$id" "$count"
done < file1
First, read just the third line of file2 and save that in a variable.
Then read each line of file1 in a loop, extracting the first semicolon-separated field, and print it along with that saved value.
Using the same basic approach in a purely awk script instead of shell will be much faster and more efficient. Such a rewrite is left as an exercise for the reader (Hint: In awk, FNR == NR is true when reading the first file given, and false on any later ones. Alternatively, look up how to pass a shell variable to an awk script; there are Q&As here on SO about it.)

Adding new line before string matching regex in linux (Jenkins)

Hi I'm trying to do some CSV manipulation before processing. Now I'm strungling with following scenario.
Input file (no line breaks):
timeStamp,elapsed,label,responseCode,responseMessage,threadName,success,failureMessage,bytes,sentBytes,Latency,IdleTime 1611013105559,492,REST API,200,,REST API 1-1,true,,1221,32292,492,0 1611013107054,575,DB check,200,OK,REST API 1-1,true,,177,0,575,0 1611013251449,231,DB check,null 0,"java.sql.SQLException: Cannot create PoolableConnectionFactory (ORA-28040: No matching authentication protocol )",REST API 1-1,false,Row not inserted properly.,89,0,0,0
Desired output (new line before the timestamp):
timeStamp,elapsed,label,responseCode,responseMessage,threadName,success,failureMessage,bytes,sentBytes,Latency,IdleTime
1611013105559,492,REST API,200,,REST API 1-1,true,,1221,32292,492,0
1611013107054,575,DB check,200,OK,REST API 1-1,true,,177,0,575,0
1611013251449,231,DB check,null 0,"java.sql.SQLException: Cannot create PoolableConnectionFactory (ORA-28040: No matching authentication protocol )",REST API 1-1,false,Row not inserted properly.,89,0,0,0
Actual output:
timeStamp,elapsed,label,responseCode,responseMessage,threadName,success,failureMessage,bytes,sentBytes,Latency,IdleTime
[0-9]{13},492,REST API,200,,REST API 1-1,true,,1221,32292,492,0
[0-9]{13},575,DB check,200,OK,REST API 1-1,true,,177,0,575,0
[0-9]{13},231,DB check,null 0,"java.sql.SQLException: Cannot create PoolableConnectionFactory (ORA-28040: No matching authentication protocol )",REST API 1-1,false,Row not inserted properly.,89,0,0,0
Using this command:
awk -v patt=[0-9]{13} '$0 ~ patt {gsub(patt, "\n"patt)}1' < input.jtl > output.jtl
Anyone can help please?
Regards Jan
With awk could you please try following, written and tested with shown samples.
awk '{gsub(/[0-9]{13},[0-9]{3}/,ORS"&")} 1' Input_file > output.jtl
Explanation: Simply globally substitutinggsub matched regex [0-9]{13},[0-9]{3} value with ORS(new line) and with the matched value itself. 1 will print the edited/non-edited current line.
If you want to use backreference use gensub. In your case we might do
awk '{print gensub(/([0-9]{13})/, "\n\\1", "g")}' input.jtl
Note that I enclosed [0-9]{13} in () thus making it first (and only) group which I then reference as \\1, g mean global replacement (all occurences). gensub does return new string rather than changing, so I print it. If you want to know more about gensub then read String Functions docs.
You can use a GNU sed like
sed -E 's/\<[0-9]{13}\>/\n&/g' input.jtl > output.jtl
Details:
-E - enables POSIX ERE syntax (less escaping required)
\<[0-9]{13}\> - matches a leading word boundary, thirteen digits and a trailing word boundary
\n& - replaces the match with a newline and the match itself
g - all occurrences on a line.

How to use awk for filtering(perl automation)

This is my txt file
type=0
vcpu_count=10
maste=0
h=0
p=0
memory=23.59
num=2
I want to get the vcpu_count and memory values and store it in some array through perl(automating script) .
awk -F'=' '/vcpu_count/{printf "\n",$1}' .vmConfig.txt
i am using this command just to test on terminal.but am getting a blank line. How do i do it. I need to get these two values and check for condition
If you are using Perl anyway, just use Perl for this too.
my %array;
open ($config, "<", ".vmConfig.txt") or die "$0: Could not open .vmConfig.txt: $!\n";
while (<$config>) {
next unless /^\s*(vcpu_count|memory)\s*=\s*(.*?)\s*\n/;
$array{$1} = $2;
}
close($config);
If you don't want the result to be an associative array (aka hash), refactoring should be relatively easy.
Following awk may help you on same.
Solution 1st:
awk '/vcpu_count/{print;next} /memory/{print}' Input_file
Output will be as follows:
vcpu_count=10
memory=23.59
Solution 2nd:
In case you want to print the values on a single line using printf then following may help you on same:
awk '/vcpu_count/{val=$0;next} /memory/{printf("%s AND %s\n",val,$0)}' Input_file
Output will be as follows:
vcpu_count=10 AND memory=23.59
when you use awk -F'=' '/vcpu_count/{printf "\n",$1}' .vmConfig.txt there are a couple of mistakes. Firstly, printf "\n" will only ever print a new line, as you have found. You need to add a format specifier - something like printf "%s\n", $2 will treat field 2 as a string and add it into the printed string. Checking out man printf at the command line will explain a bit more,.
Secondly, as I changed there, when you used $1 you were using the first field, which is the key in this case (while $0 is the whole line.)
Triplees solution is probably the most appropriate, but if there is a particular reason to start awk to perform this before perl, the following may help.
As you have done, it splits on =, but then outputs as csv, which you can change as appropriate. Even if input lines are not always in same order, will output in predictable order on single line
awk 'BEGIN {
FS="=";
OFS="," # tabs, etc if wanted, delete for spaces.
}
/vcpu_count/ {cpu=$2}
/memory/ {mem=$2}
END { print cpu, mem }'
This gives
10,23.59

Awk can only print the whole line; cannot access the specific fields

I am currently working on my capstone project for Unix OS I. I'm very close to finishing, but I'm stuck on this part: basically, my assignment is to create a menu-based application wherein a user can enter a first and last name, I take that data, use it create a user name, and then I translate it from lowercase to uppercase, and finally store the data as: firstname:lastname:username.
When asked to display the data I must display it based on the username instead of the first name, and formatted with spaces instead of tabs. For example, it should look like: username firstname lastname. So, I've tried multiple commands, such as sort and awk, but I seem to be only able to access the fields in the file as one big field; e.g when I do awk '{print NF}' users.txt to find the number of fields per row, it will return 1, clearly showing that my data is only being entered as one field, instead of the necessary 3 I need. So my question is this: how do I go about changing the number of fields in the text file? Here is my code to add the firstname:lastname:username to the file users.txt:
userInfo=~/capstoneProject/users.txt
#make sure strings is not empty before writing to disk
if [[ "$fname" != "" && "$lname" != "" ]]
then #write to userInfo (users.txt)
echo "$fname:$lname:$uname" | tr a-z A-Z >> $userInfo
#change to uppercase using |
fi
Is it because of the way I am entering the data into my file? Using echo "$fname:$lname:$uname" ? Because this is the way my textbook showed me how to do it, and they had no trouble later on when using the sort function with specific fields, as I am trying to do now. If more detail is necessary, please let me know; this is the last thing I need before I can submit my project, due tonight.
Your input file has :-separated fields so you need to tell awk that:
awk -F':' '{print NF}' users.txt

How to extract last name from /etc/passwd using awk?

For example in /etc/passwd row 5 ($5) I have the name "Smith Simon". How can I extract the string "Simon" and store it using awk? Thank you in advance.
extract the string "Simon" and store it
gawk approach:
firstname=$(awk -F':' '{split($5, s, " "); print s[2]}' /etc/passwd)
echo $firstname
Simon
split($5, s, " ") - split the 5th column into pieces separated by space
print all last names from /etc/passwd:
awk -F':' '{sub(/.* /,"",$5);$0=$5}7' /etc/passwd
In output:
(empty) -> ''(empty)
first middle last -> last
nickname -> nickname
Some more info, may relate (or not) to this question:
I don't think you want to get the last name from system reserved users. Like root, or daemon...
Linux (I saw the tag) has usually user id > 1000, if you created user by useradd. Thus you can just add awk -F':' '$2>=1000{sub(.....' so that those system user won't be printed out.
I said usually above, because some POSIX system has 500 defined. You can find out the value from your /etc/login.defs
It is also easy to let awk read this file and find out the min user id then check it in /etc/passwd, if it really required.
awk F: '/smith/{print substr($5,7,5)}' /etc/passwd
Simon

Resources