How to extract last name from /etc/passwd using awk? - linux

For example in /etc/passwd row 5 ($5) I have the name "Smith Simon". How can I extract the string "Simon" and store it using awk? Thank you in advance.

extract the string "Simon" and store it
gawk approach:
firstname=$(awk -F':' '{split($5, s, " "); print s[2]}' /etc/passwd)
echo $firstname
Simon
split($5, s, " ") - split the 5th column into pieces separated by space

print all last names from /etc/passwd:
awk -F':' '{sub(/.* /,"",$5);$0=$5}7' /etc/passwd
In output:
(empty) -> ''(empty)
first middle last -> last
nickname -> nickname
Some more info, may relate (or not) to this question:
I don't think you want to get the last name from system reserved users. Like root, or daemon...
Linux (I saw the tag) has usually user id > 1000, if you created user by useradd. Thus you can just add awk -F':' '$2>=1000{sub(.....' so that those system user won't be printed out.
I said usually above, because some POSIX system has 500 defined. You can find out the value from your /etc/login.defs
It is also easy to let awk read this file and find out the min user id then check it in /etc/passwd, if it really required.

awk F: '/smith/{print substr($5,7,5)}' /etc/passwd
Simon

Related

Retrieve different information from several files to bring them together in one. BASH

I have a problem with my bash script, I would like to retrieve information contained in several files and gather them in one.
I have a file in this form which contains about 15000 lines: (file1)
1;1;A0200101C
2;2;A0200101C
3;3;A1160101A
4;4;A1160101A
5;5;A1130304G
6;6;A1110110U
7;7;A1110110U
8;8;A1030002V
9;9;A1030002V
10;10;A2120100C
11;11;A2120100C
12;12;A3410071A
13;13;A3400001A
14;14;A3385000G1
15;15;A3365070G1
I would need to retrieve the first record of each row matching the id.
My second file is this, I just need to retrieve the 3rd row: (file2)
count
-------
131
(1 row)
I would therefore like to be able to assemble the id of (file1) and the 3rd line of (file2) in order to achieve this result:
1;131
2;131
3;131
4;131
5;131
6;131
7;131
8;131
9;131
11;131
12;131
13;131
14;131
15;131
Thank you.
One possible way:
#!/usr/bin/env bash
count=$(awk 'NR == 3 { print $1 }' file2)
while IFS=';' read -r id _; do
printf "%s;%s\n" "$id" "$count"
done < file1
First, read just the third line of file2 and save that in a variable.
Then read each line of file1 in a loop, extracting the first semicolon-separated field, and print it along with that saved value.
Using the same basic approach in a purely awk script instead of shell will be much faster and more efficient. Such a rewrite is left as an exercise for the reader (Hint: In awk, FNR == NR is true when reading the first file given, and false on any later ones. Alternatively, look up how to pass a shell variable to an awk script; there are Q&As here on SO about it.)

Matching emails from second column in one file against another file

I have two files, one with emails in it(useremail.txt), and another with email:phonenumber(emailnumber.txt).
useremail.txt contains:
John smith:blabla#hotmail.com
David smith:haha#gmail.com
emailnumber.txt contains:
blabla#hotmail.com:093748594
So the solution needs to grab the email from the second column of useremail and then search through the emailnumber file and find matches and output John smith:093748594, so just the name and phone number.
I'm on windows so I need a gawk or grep solution, I have tried for a long time trying to get it to work with awk/grep and can't find the right solution, any help would be really appreciated.
Another in (GNU) awk:
$ awk '
BEGIN {
# RS=ORS="\r\n" # since you are using GNU awk this replaces the sub()
FS=OFS=":" # input and output field separators
}
NR==FNR { # processing the first file
sub(/\r$/,"",$NF) # remove the \r after the email OR uncomment RS above
a[$2]=$1 # hash name, index on email
next # on to the next record
}
($1 in a) { # if email in second file matches one in hash
print a[$1],$2 # output. If ORS uncommented above, output ends in \r
# if not, you may want to add it to the print ... "\r"
}' useremail emailnumber
Output:
John smith:093748594
Since you tried the accepted answer in Linux and Windows and you use GNU awk, in the future you could set RS="\r?\n" which accepts both forms, \r\n and bare \n. However, I've recently ran into a problem with that form in a specific condition (for which I've not yet filed a bug report).
You could try this:
awk -F":" '(FNR==NR){a[$2]=$1}(FNR!=NR){print a[$1]":"$2}' useremail.txt emailnumber.txt
If there are entries in emailnumber.txt with no matching entry in useremail.txt:
awk -F":" '(FNR==NR){a[$2]=$1}(FNR!=NR){if(a[$1]){print a[$1]":"$2}}' useremail.txt emailnumber.txt

Awk can only print the whole line; cannot access the specific fields

I am currently working on my capstone project for Unix OS I. I'm very close to finishing, but I'm stuck on this part: basically, my assignment is to create a menu-based application wherein a user can enter a first and last name, I take that data, use it create a user name, and then I translate it from lowercase to uppercase, and finally store the data as: firstname:lastname:username.
When asked to display the data I must display it based on the username instead of the first name, and formatted with spaces instead of tabs. For example, it should look like: username firstname lastname. So, I've tried multiple commands, such as sort and awk, but I seem to be only able to access the fields in the file as one big field; e.g when I do awk '{print NF}' users.txt to find the number of fields per row, it will return 1, clearly showing that my data is only being entered as one field, instead of the necessary 3 I need. So my question is this: how do I go about changing the number of fields in the text file? Here is my code to add the firstname:lastname:username to the file users.txt:
userInfo=~/capstoneProject/users.txt
#make sure strings is not empty before writing to disk
if [[ "$fname" != "" && "$lname" != "" ]]
then #write to userInfo (users.txt)
echo "$fname:$lname:$uname" | tr a-z A-Z >> $userInfo
#change to uppercase using |
fi
Is it because of the way I am entering the data into my file? Using echo "$fname:$lname:$uname" ? Because this is the way my textbook showed me how to do it, and they had no trouble later on when using the sort function with specific fields, as I am trying to do now. If more detail is necessary, please let me know; this is the last thing I need before I can submit my project, due tonight.
Your input file has :-separated fields so you need to tell awk that:
awk -F':' '{print NF}' users.txt

Please explain this awk script for taking Fixed Width to CSV

I'm learning some awk. I found an example online of taking a fixed width file and converting it to a csv file. There is just one part I do not understand, even after going through many man pages and online tutorials:
1: awk -v FIELDWIDTHS='1 10 4 2 2' -v OFS=',' '
2: { $1=$1 ""; print }
3: ' data.txt`
That is verbatim from the sample online (found here).
What I don't understand is line 2. I get there is no condition, so the 'program' (contained in brackets) will always execute per record (line). I don't understand why it is doing the $1=$1 as well as the empty string statement "";. However, removing these causes incorrect behavior.
$1=$1 assigns a value to $1 (just happens to be the same value it already had). Assigning any value to a field cause awk to recompile the current record using the OFS value between fields (effectively replacing all FSs or FIELDSEPS spacings with OFSs).
$ echo 'a,b,c' | awk -F, -v OFS="-" '{print; $1=$1; print}'
a,b,c
a-b-c
The "" is because whoever wrote the script doesn't fully understand awk and thinks that's necessary to ensure numbers retain their precision by converting them to a string before the assignment.

How to randomly sort one key while the other is kept in its original sort order with GNU "sort"

Given an input list like the following:
405:alice#level1
405:bob#level2
405:chuck#level1
405:don#level3
405:eric#level1
405:francis#level1
004:ac#jjj
004:la#jjj
004:za#zzz
101:amy#floor1
101:brian#floor3
101:christian#floor1
101:devon#floor1
101:eunuch#floor2
101:frank#floor3
005:artie#le2
005:bono#nuk1
005:bozo#nor2
(As you can see, the first field was randomly sorted (the original input had all of the first field in numerical order, with 004 coming first, then 005, 101, 405, et al) but the second field is in alphabetical order on the first character.)
What is desired is a randomized sort where the first field - as separated by a colon ':', is randomly sorted so that all of the entries of the second field do not matter during the random sort, so long as all lines where the first field are the same are grouped together but randomly distributed throughout the file - is to have the second field randomly sorted as well. That is, in the final output, lines with the same value in the first field are grouped together (but randomly distributed throughout the file) but also to have the second field randomly sorted. I am unable to get this desired result as I am not too familiar with sort keys and whatnot.
The desired output would look similar to this:
405:francis#level1
405:don#level3
405:eric#level1
405:bob#level2
405:alice#level1
405:chuck#level1
004:za#zzz
004:ac#jjj
004:la#jjj
101:christian#floor1
101:amy#floor1
101:frank#floor3
101:eunuch#floor2
101:brian#floor3
101:devon#floor1
005:bono#nuk1
005:artie#le2
005:bozo#nor2
Does anyone know how to achieve this type of sort?
Thank you!
You can do this with awk pretty easily.
As a one-liner:
awk -F: 'BEGIN{cmd="sort -R"} $1 != key {close(cmd)} {key=$1; print | cmd}' input.txt
Or, broken apart for easier explanation:
-F: - Set awk's field separator to colon.
BEGIN{cmd="sort -R"} - before we start, set a variable that is a command to do the "randomized sort". This one works for me on FreeBSD. Should work with GNU sort as well.
$1 != key {close(cmd)} - If the current line has a different first field than the last one processed, close the output pipe...
{key=$1; print | cmd} - And finally, set the "key" var, and print the current line, piping output through the command stored in the cmd variable.
This usage takes advantage of a bit of awk awesomeness. When you pipe through a string (be it stored in a variable or not), that pipe is automatically created upon use. You can close it any time, and a subsequent use will reopen a new command.
The impact of this is that each time you close(cmd), you print the current set of randomly sorted lines. And awk closes cmd automatically once you come to the end of the file.
Of course, for this solution to work, it's vital that all lines with a shared first field are grouped together.
not as elegant but a different method
$ awk -F: '!($1 in a){a[$1]=c++} {print a[$1] "\t" $0}' file |
sort -R -k2 |
sort -nk1,1 -s |
cut -f2-
or, this alternative which doesn't assume initial grouping
$ sort -R file |
awk -F: '!($1 in a){a[$1]=c++} {print a[$1] "\t" $0}' |
sort -nk1,1 -s |
cut -f2-

Resources