Extract domain then paste into the same line using sed/awk/grep/perl - linux

I've started my tech adventure not so long ago - as you will feel from question - but now I'm stucked because after almost a whole day thinking and searching I don't know the proper solution for my problem.
Briefly, I got a file with thousand lines which contains email and firstname. The thing is I really need another column just with the domain name itself for example next to the email address. Please take a look at the examples below.
This is how it looks now:
something#nothing.tld|:|george|-|
anything#another.tld|:|thomas|-|
third#address.tld|:|kelly|-|
How I wanted to look like:
something#nothing.tld|:|nothing.tld|--|george|-|
anything#another.tld|:|another.tld|--|thomas|-|
third#address.tld|:|address.tld|--|kelly|-|
My best guess was using sed to start the process and extract the domain but how can I paste that extracted domain within the same line that's where I stucked.
sed -e 's/.*#\(.*\)|:|*/\1/'
If you could also give a short explanation along with a solution that would be really helpful.
Any help is appreciated.

If you have the following data in a file named, file1,
something#nothing.tld|:|george|-|
anything#another.tld|:|thomas|-|
third#address.tld|:|kelly|-|
you can use : and # as delimiters and add data after it using awk, then save it to a new file,
awk -F '[#:]' '{ print $1"#"$2 ":|" $2"--" $3 }' file1 > file2
Above command saves following data in a file called file2,
something#nothing.tld|:|nothing.tld|--|george|-|
anything#another.tld|:|another.tld|--|thomas|-|
third#address.tld|:|address.tld|--|kelly|-|

With GNU awk for gensub():
$ awk 'BEGIN{FS=OFS="|"} {print $1, $2, gensub(/.*#/,"",1,$1), "--", $3, $4, $5}' file
something#nothing.tld|:|nothing.tld|--|george|-|
anything#another.tld|:|another.tld|--|thomas|-|
third#address.tld|:|address.tld|--|kelly|-|
With any awk:
$ awk 'BEGIN{FS=OFS="|"} {d=$1; sub(/.*#/,"",d); print $1, $2, d, "--", $3, $4, $5}' file
something#nothing.tld|:|nothing.tld|--|george|-|
anything#another.tld|:|another.tld|--|thomas|-|
third#address.tld|:|address.tld|--|kelly|-|

You can do it like this with sed:
sed -E 's/#([^|]+)\|:\|/&\1|--|/' infile
Note the use of a negated-group ([^|]), i.e. match anything except this character group.
Output:
something#nothing.tld|:|nothing.tld|--|george|-|
anything#another.tld|:|another.tld|--|thomas|-|
third#address.tld|:|address.tld|--|kelly|-|

Related

Awk 3rd column if second coulmn matches with a variable

I am new to Awk and linux. I want to print 3rd column if 2nd column matches with a variable.
file.txt
1;XYZ;123
2;ABC;987
3;ZZZ;999
So I want to print 987, After checking if 2nd column is ABC
name="ABC"
awk -F';' '$2==$name { print $3 }' file.txt
But this is not working. Please help. Please note, I want to use AWK only, to understand how this can be achieved using awk.
Do following and it should fly then. In awk variables don't work like shell you have to explicitly mention them by using -v var_name in awk code.
name="ABC"
awk -F';' -v name="$name" '$2==name{ print $3 }' file.txt

Reading a column from excel using shell script in Linux environment

/need to read the 1st column and last of the excel file in linux environment.
Can some one help me with examples
Your best bet would be to export the excel sheet as a CSV then manipulate it using awk or similar, such as:
awk -F"," '{print $1, $NF}' file.csv
For example:
# cat test.csv
hello, goodbye, seeya
# awk -F"," '{print $1, $NF}' test.csv
hello seeya
Edit - For info, "$NF" is Number of Fields, so essentially 'last field'.

Linux Awk help on code

I need to print the contents of a file, and give a title to each column, leaving enough space to be readable, and then I need to output this into a new file. I followed this tutorial for a good while but I've gotten stuck.
http://www.thegeekstuff.com/2010/01/awk-introduction-tutorial-7-awk-print-examples
This is the example code they use, which would give me exactly what I need to do with mine. But it will not work when I adjust it.
$ awk 'BEGIN {print "Name\tDesignation\tDepartment\tSalary";}
{print $2,"\t",$3,"\t",$4,"\t",$NF;}
END{print "Report Generated\n--------------";
}' employee.txt
This is mine, as unlike the example, I want the whole document printed and don't really want this "report generated" nonsense under it. I tried adding {print;}' to the end after end, and made sure to start a new line and... nothing.
$ awk 'BEGIN {Print "Firstname\tLastname\tPoints";} END > awktest.txt > done
Where have I gone wrong? It keeps giving me the response Source line 2.
To remove the footline, just drop out anything starting from END till the closing ':
awk 'BEGIN {print "Name\tDesignation\tDepartment\tSalary";} {print $2,"\t",$3,"\t",$4,"\t",$NF;}' employee.txt
In your second example, you left out the closing ', and I suspect you put one more ">" than needed:
awk 'BEGIN {print "Firstname\tLastname\tPoints";}' awktest.txt > done
The latter example will however silently ignore everything read from "awktest.txt".
It looks like what you need is just to insert a header line, which can easily be done with sed (as well as awk) or with cat
$ sed '1iFirstname\tLastname\tPoints' file > output.file
or
$ awk 'BEGIN{print "Firstname\tLastname\tPoints"} 1' file > output.file
or
$ cat <(echo -e "Firstname\tLastname\tPoints") file > output.file
It seems you have missed out the actual printing of the columns and a source file, also I read that you don't need any END actions...
awk 'BEGIN {Print "Firstname\tLastname\tPoints";} END > awktest.txt > done`
Should be...
awk 'BEGIN {Print "Firstname\tLastname\tPoints";}{print $1,"\t",$2,"\t",$3;}' source_file.txt > awktest.txt`
Just remember to change the $1,$2,$3 to what columns on the source file you need.
FYI. I'm no expert, just reading the tuts :)
The awk print function is named print, not Print. idk why all the solutions are including ,"\t", in their print statements. You don't want that - you want to set -v OFS='\t' at the start of the script and then just use , between fields. All you want is:
awk -v OFS='\t' '
BEGIN {print "Name", "Designation", "Department", "Salary"}
{print $2, $3, $4, $NF}
}' employee.txt
assuming those are the correct field numbers you want to print from your data. Sample input/output in your question would be extremely useful to help us answer it.

How can I get the second column of a very large csv file using linux command?

I was given this question during an interview. I said I could do it with java or python like xreadlines() function to traverse the whole file and fetch the column, but the interviewer wanted me to just use linux cmd. How can I achieve that?
You can use the command awk. Below is an example of printing out the second column of a file:
awk -F, '{print $2}' file.txt
And to store it, you redirect it into a file:
awk -F, '{print $2}' file.txt > output.txt
You can use cut:
cut -d, -f2 /path/to/csv/file
I'd add to Andreas answer, but can't comment yet.
With csv, you have to give awk a field seperator argument, or it will define fields bound by whitespace instead of commas. (Obviously, csv that uses a different field seperator will need a different character to be declared.)
awk -F, '{print $2}' file.txt

Unix (ksh) script to read file, parse and output certain columns only

I have an input file that looks like this:
"LEVEL1","cn=APP_GROUP_ABC,ou=dept,dc=net","uid=A123456,ou=person,dc=net"
"LEVEL1","cn=APP_GROUP_DEF,ou=dept,dc=net","uid=A123456,ou=person,dc=net"
"LEVEL1","cn=APP_GROUP_ABC,ou=dept,dc=net","uid=A567890,ou=person,dc=net"
I want to read each line, parse and then output like this:
A123456,ABC
A123456,DEF
A567890,ABC
In other words, retrieve the user id from "uid=" and then the identifier from "cn=APP_GROUP_". Repeat for each input record, writing to a new output file.
Note that the column positions aren't fixed, so can't rely on positions, guessing I have to search for the "uid=" string and somehow use the position maybe?
Any help much appreciated.
You can do this easily with sed:
sed 's/.*cn=APP_GROUP_\([^,]*\).*uid=\([^,]*\).*/\2,\1/'
The regex captures the two desired strings, and outputs them in reverse order with a comma between them. You might need to change the context of the captures, depending on the precise nature of your data, because the uid= will match the last uid= in the line, if there are more than one.
You can use awk to split in columns, split by ',' and then split by =, and grab the result. You can do it easily as awk -F, '{ print $5}' | awk -F= '{print $2}'
Take a look at this line looking at the example you provided:
cat file | awk -F, '{ print $5}' | awk -F= '{print $2}'
A123456
A123456
A567890

Resources