Unix (ksh) script to read file, parse and output certain columns only - linux

I have an input file that looks like this:
"LEVEL1","cn=APP_GROUP_ABC,ou=dept,dc=net","uid=A123456,ou=person,dc=net"
"LEVEL1","cn=APP_GROUP_DEF,ou=dept,dc=net","uid=A123456,ou=person,dc=net"
"LEVEL1","cn=APP_GROUP_ABC,ou=dept,dc=net","uid=A567890,ou=person,dc=net"
I want to read each line, parse and then output like this:
A123456,ABC
A123456,DEF
A567890,ABC
In other words, retrieve the user id from "uid=" and then the identifier from "cn=APP_GROUP_". Repeat for each input record, writing to a new output file.
Note that the column positions aren't fixed, so can't rely on positions, guessing I have to search for the "uid=" string and somehow use the position maybe?
Any help much appreciated.

You can do this easily with sed:
sed 's/.*cn=APP_GROUP_\([^,]*\).*uid=\([^,]*\).*/\2,\1/'
The regex captures the two desired strings, and outputs them in reverse order with a comma between them. You might need to change the context of the captures, depending on the precise nature of your data, because the uid= will match the last uid= in the line, if there are more than one.

You can use awk to split in columns, split by ',' and then split by =, and grab the result. You can do it easily as awk -F, '{ print $5}' | awk -F= '{print $2}'
Take a look at this line looking at the example you provided:
cat file | awk -F, '{ print $5}' | awk -F= '{print $2}'
A123456
A123456
A567890

Related

How can I isolate a single value from a list within an awk field?

Lets say i have a file called test and in this file contains some data:
jon:TX:34:red,green,yellow,black,orange
I'm trying to make it so it will only print the 4th field up until the comma and nothing else. But I need to leave the current FS in place because the fields are separated by the ":". Hope this makes sense.
I have been running this command:
awk '{ FS=":"; print $4 }' /test
I want my output to look like this.
jon:TX:34:red
or if you could even just figure out how i could just print the 4th field would be a good help too
red
It's overkill for your needs but in general to print the yth ,-separated subfield of the xth :-separated field of any input would be:
$ awk -F':' -v s=',' -v x=4 -v y=1 '{split($x,a,s); print a[y]}' file
red
Or
awk -F '[:,]' '{print $4}' test
output
red
It sounds like you are trying to extract the first field of the fourth field. Top level fields are delimited by ":" and the nested field is delimited by ",".
Combining two cut processes achieves this easily:
<input.txt cut -d: -f4 | cut -d, -f1
If you want all fields until the first comma, extract the first comma-delimited field without first cutting on colon:
cut -d, -f1 input.txt
if you want a purely regex approach :
echo 'jon:TX:34:red,green,yellow,black,orange' |
mawk NF=NF FS='.+:|,.+' OFS=
red
if you only want "red" without the trailing newline ("\n"), use RS/ORS instead of FS/OFS — (the % is the command prompt, i.e. no trailing \n):
mawk2 8 RS='.+:|,.+' ORS=
red%
if u wanna hard-code in the $4 :
gawk '$_= $4' FS=,\|: # gawk or nawk
mawk '$!NF=$4' FS=,\|: # any mawk
red
and if you only want the non-numeric text :
nawk NF=NF FS='[!-<]+' OFS='\f\b'
jon
TX
red
green
yellow
black
orange
If you have
jon:TX:34:red,green,yellow,black,orange
and desired output is
jon:TX:34:red
then just treat input as comma-separated and get 1st field, which might be expressed in GNU AWK as
echo "jon:TX:34:red,green,yellow,black,orange" | awk 'BEGIN{FS=","}{print $1}'
gives output
jon:TX:34:red
Explanation: I inform GNU AWK that , character is field separator (FS), for each line I print 1st column ($1)
(tested in GNU Awk 5.0.1)

How Can I Perform Awk Commands Only On Certain Fields

I have CSV columns that I'm working with:
info,example-string,super-example-string,otherinfo
I would like to get:
example-string super example string
Right now, I'm running the following command:
awk -F ',' '{print $3}' | sed "s/-//g"
But, then I have to paste the lines together to combine $2 and $3.
Is there anyway to do something like this?
awk -F ',' '{print $2" "$3}' | sed "s/-//g"
Except, where the sed command is only performed on $3 and $2 stays in place? I'm just concerned later on if the lines don't match up, the data could be misaligned.
Please note: I need to keep the pipe for the SED command. I just used a simple example but I end up running a lot of commands after that as well.
Try:
$ awk -F, '{gsub(/-/," ",$3); print $2,$3}' file
example-string super example string
How it works
-F,
This tells awk to use a comma as the field separator.
gsub(/-/," ",$3)
This replaces all - in field 3 with spaces.
print $2,$3
This prints fields 2 and 3.
Examples using pipelines
$ echo 'info,example-string,super-example-string,otherinfo' | awk -F, '{gsub(/-/," ",$3); print $2,$3}'
example-string super example string
In a pipeline with sed:
$ echo 'info,example-string,super-example-string,otherinfo' | awk -F, '{gsub(/-/," ",$3); print $2,$3}' | sed 's/string/String/g'
example-String super example String
Though best solution will be either use a single sed or use single awk. Since you have requested to use awk and sed solution so providing this. Also considering your actual data will be same as shown sample Input_file.
awk -F, '{print $2,$3}' Input_file | sed 's/\([^ ]*\)\([^-]*\)-\([^-]*\)-\([^-]*\)/\1 \2 \3 \4/'
Output will be as follows.
example-string super example string

How To Substitute Piped Output of Awk Command With Variable

I'm trying to take a column and pipe it through an echo command. If possible, I would like to keep it in one line or do this as efficiently as possible. While researching, I found that I have to use single quotes to expand the variable and to escape the double quotes.
Here's what I was trying:
awk -F ',' '{print $2}' file1.txt | while read line; do echo "<href=\"'${i}'\">'${i}'</a>"; done
But, I keep getting the number of lines than the single line's output. If you know how to caputure each line in field 4, that would be so helpful.
File1.txt:
Hello,http://example1.com
Hello,http://example2.com
Hello,http://example3.com
Desired output:
<href="http://example1.com">http://example1.com</a>
<href="http://example2.com">http://example2.com</a>
<href="http://example3.com">http://example3.com</a>
$ awk -F, '{printf "<href=\"%s\">%s</a>\n", $2, $2}' file
<href="http://example1.com">http://example1.com</a>
<href="http://example2.com">http://example2.com</a>
<href="http://example3.com">http://example3.com</a>
Or slightly briefer but less robustly:
$ sed 's/.*,\(.*\)/<href="\1">\1<\/a>/' file
<href="http://example1.com">http://example1.com</a>
<href="http://example2.com">http://example2.com</a>
<href="http://example3.com">http://example3.com</a>

How can I get the second column of a very large csv file using linux command?

I was given this question during an interview. I said I could do it with java or python like xreadlines() function to traverse the whole file and fetch the column, but the interviewer wanted me to just use linux cmd. How can I achieve that?
You can use the command awk. Below is an example of printing out the second column of a file:
awk -F, '{print $2}' file.txt
And to store it, you redirect it into a file:
awk -F, '{print $2}' file.txt > output.txt
You can use cut:
cut -d, -f2 /path/to/csv/file
I'd add to Andreas answer, but can't comment yet.
With csv, you have to give awk a field seperator argument, or it will define fields bound by whitespace instead of commas. (Obviously, csv that uses a different field seperator will need a different character to be declared.)
awk -F, '{print $2}' file.txt

command to find words and display columns

I want to search some words in a log file & display only given column numbers from those lines in the file.
eg: i want to search "word" in abc.log and print columns 4,11
grep "word" abc.log | awk '{print $4}' | awk '{print $4}'
but this doesn't workout can some one please help
You need to print $4 and $11 together rather than piping $4 into another awk.
Also, you don't need grep because awk can grep.
Try it like this:
awk '/word/{print $4,$11}' abc.log

Resources