How can I isolate a single value from a list within an awk field? - linux

Lets say i have a file called test and in this file contains some data:
jon:TX:34:red,green,yellow,black,orange
I'm trying to make it so it will only print the 4th field up until the comma and nothing else. But I need to leave the current FS in place because the fields are separated by the ":". Hope this makes sense.
I have been running this command:
awk '{ FS=":"; print $4 }' /test
I want my output to look like this.
jon:TX:34:red
or if you could even just figure out how i could just print the 4th field would be a good help too
red

It's overkill for your needs but in general to print the yth ,-separated subfield of the xth :-separated field of any input would be:
$ awk -F':' -v s=',' -v x=4 -v y=1 '{split($x,a,s); print a[y]}' file
red

Or
awk -F '[:,]' '{print $4}' test
output
red

It sounds like you are trying to extract the first field of the fourth field. Top level fields are delimited by ":" and the nested field is delimited by ",".
Combining two cut processes achieves this easily:
<input.txt cut -d: -f4 | cut -d, -f1
If you want all fields until the first comma, extract the first comma-delimited field without first cutting on colon:
cut -d, -f1 input.txt

if you want a purely regex approach :
echo 'jon:TX:34:red,green,yellow,black,orange' |
mawk NF=NF FS='.+:|,.+' OFS=
red
if you only want "red" without the trailing newline ("\n"), use RS/ORS instead of FS/OFS — (the % is the command prompt, i.e. no trailing \n):
mawk2 8 RS='.+:|,.+' ORS=
red%
if u wanna hard-code in the $4 :
gawk '$_= $4' FS=,\|: # gawk or nawk
mawk '$!NF=$4' FS=,\|: # any mawk
red
and if you only want the non-numeric text :
nawk NF=NF FS='[!-<]+' OFS='\f\b'
jon
TX
red
green
yellow
black
orange

If you have
jon:TX:34:red,green,yellow,black,orange
and desired output is
jon:TX:34:red
then just treat input as comma-separated and get 1st field, which might be expressed in GNU AWK as
echo "jon:TX:34:red,green,yellow,black,orange" | awk 'BEGIN{FS=","}{print $1}'
gives output
jon:TX:34:red
Explanation: I inform GNU AWK that , character is field separator (FS), for each line I print 1st column ($1)
(tested in GNU Awk 5.0.1)

Related

How Can I Perform Awk Commands Only On Certain Fields

I have CSV columns that I'm working with:
info,example-string,super-example-string,otherinfo
I would like to get:
example-string super example string
Right now, I'm running the following command:
awk -F ',' '{print $3}' | sed "s/-//g"
But, then I have to paste the lines together to combine $2 and $3.
Is there anyway to do something like this?
awk -F ',' '{print $2" "$3}' | sed "s/-//g"
Except, where the sed command is only performed on $3 and $2 stays in place? I'm just concerned later on if the lines don't match up, the data could be misaligned.
Please note: I need to keep the pipe for the SED command. I just used a simple example but I end up running a lot of commands after that as well.
Try:
$ awk -F, '{gsub(/-/," ",$3); print $2,$3}' file
example-string super example string
How it works
-F,
This tells awk to use a comma as the field separator.
gsub(/-/," ",$3)
This replaces all - in field 3 with spaces.
print $2,$3
This prints fields 2 and 3.
Examples using pipelines
$ echo 'info,example-string,super-example-string,otherinfo' | awk -F, '{gsub(/-/," ",$3); print $2,$3}'
example-string super example string
In a pipeline with sed:
$ echo 'info,example-string,super-example-string,otherinfo' | awk -F, '{gsub(/-/," ",$3); print $2,$3}' | sed 's/string/String/g'
example-String super example String
Though best solution will be either use a single sed or use single awk. Since you have requested to use awk and sed solution so providing this. Also considering your actual data will be same as shown sample Input_file.
awk -F, '{print $2,$3}' Input_file | sed 's/\([^ ]*\)\([^-]*\)-\([^-]*\)-\([^-]*\)/\1 \2 \3 \4/'
Output will be as follows.
example-string super example string

variable assignment is not working in rhel6 linux

file1
ABY37499|ANK37528|DEL37508|SRILANKA|195203230000|445500759
ARJU7499|CHA38008|DEL37508|SRILANKA|195203230000|445500759
IB1704174|ANK37528|DEL37508|SRILANKA|195203230000|445500759
IB1704174|CHA38008|DEL37508|SRILANKA|195203230000|445500759
ABY37500|ANK37529|DEL37509|BRAZIL|195203240000|445500757
ARJU7500|CHA38009|DEL37509|BRAZIL|195203240000|445500757
IB1704175|ANK37529|DEL37509|BRAZIL|195203240000|445500757
i want to convert the fifth column date to another format script below
#!/bin/sh
dt="%Y-%m-%d %H:%M"
awk -F '|' '{print $5}' file1 | sed 's/.\{8\}/& /g'> f1.txt
aa=`(date -f f1.txt +"$dt")`
echo "$aa"
awk -F '|' '$5=$aa' file1
echo "$aa" got desired output but i cannot assign $aa to $5 please help me.
Thanks
I corrected my answer after the commento of Etan Reisner
from AWK man:
The input is read in units called records, and processed by the rules
of your program one record at a time. By default, each record is one
line. Each record is automatically split into chunks called fields.
This makes it more convenient for programs to work on the parts of a
record.
Fields are stored in variables $1, $2, ...
And
The contents of a field, as seen by awk, can be changed within an awk
program; this changes what awk perceives as the current input record.
see the man page
thus, this expression:
awk -F '|' '$5=$aa' file1
does not have the effect of substitute the fifth column of file1.
You have to write the modified output to a second file.
May be this could help you in sed
echo 195203240000 | sed -n -e "s_\(....\)\(..\)\(..\)\(..\)\(..\)_\1-\2-\3 \4:\5_p"
1952-03-24 00:00
This awk script should do what you want.
It isn't exactly pretty but it works assuming the input format is consistent.
awk '{$5=sprintf("%s-%s-%s %s:%s\n",
substr($5,1,4), substr($5,5,2), substr($5,7,2),
substr($5,9,2), substr($5,11,2))} 7' file1 > file1.new
It assigns the new value for the field to $5 and then uses 7 (as a truth-y value) to get the default awk {print} action to print the modified line.

Empty string as a output field seperator for Cut

How can I use cut with --output-delimiter=""? I want to join two columns using cut.
I tried the following command. However cat -v shows that there are non printable characters. Specifically "^#". Any suggestions to how can I overcome this?
cut -d, -f 3,6 --output-delimiter="" file1.csv | cat -v
This is the content of my file
011,IBM,Palmisano,t,t,t
012,INTC,Otellini,t,t,t
013,SAP,Snabe,t,t,t
014,VMW,Maritz,t,t,t
015,ORCL,Ellison,t,t,t
017,RHT,Whitehurst,t,t,t
When i run my command I'm seeing
Palmisano^#t
Otellini^#t
Snabe^#t
Maritz^#t
Ellison^#t
Whitehurst^#t
Expected output: Basically I want to exclude ^# in the output
Palmisanot
Otellinit
Snabet
Maritzt
Ellisont
Whitehurstt
Thank you.
The output delimiter is not an empty string, but probably the NULL character. You might want to try
cut -d, -f 3,6 --output-delimiter=$'\00' file1.csv
(Assuming your shell supports $'...'-quoting; bash and zsh are fine here, not sure about others).
edit:
cut apparently puts the NULL character if the output separator is set to the empty string. I do not see a way around it.
If awk is an acceptable solution, this will do the trick:
awk -F, '{print $3 $6}' file*
If you want to be more verbose and explicit:
awk 'BEGIN{FS=","; OFS=""}; {print $3,$6}' file*
FS="," sets the field separator to ,.
OFS="" sets the Output Field Separator to the empty string.
You probably don't want to cut by fields but instead by characters or perhaps bytes. See the description of -c and/or -b in the man page, instead of using -f.

Unix (ksh) script to read file, parse and output certain columns only

I have an input file that looks like this:
"LEVEL1","cn=APP_GROUP_ABC,ou=dept,dc=net","uid=A123456,ou=person,dc=net"
"LEVEL1","cn=APP_GROUP_DEF,ou=dept,dc=net","uid=A123456,ou=person,dc=net"
"LEVEL1","cn=APP_GROUP_ABC,ou=dept,dc=net","uid=A567890,ou=person,dc=net"
I want to read each line, parse and then output like this:
A123456,ABC
A123456,DEF
A567890,ABC
In other words, retrieve the user id from "uid=" and then the identifier from "cn=APP_GROUP_". Repeat for each input record, writing to a new output file.
Note that the column positions aren't fixed, so can't rely on positions, guessing I have to search for the "uid=" string and somehow use the position maybe?
Any help much appreciated.
You can do this easily with sed:
sed 's/.*cn=APP_GROUP_\([^,]*\).*uid=\([^,]*\).*/\2,\1/'
The regex captures the two desired strings, and outputs them in reverse order with a comma between them. You might need to change the context of the captures, depending on the precise nature of your data, because the uid= will match the last uid= in the line, if there are more than one.
You can use awk to split in columns, split by ',' and then split by =, and grab the result. You can do it easily as awk -F, '{ print $5}' | awk -F= '{print $2}'
Take a look at this line looking at the example you provided:
cat file | awk -F, '{ print $5}' | awk -F= '{print $2}'
A123456
A123456
A567890

How to reverse order of fields using AWK?

I have a file with the following layout:
123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010
How can I convert it into the following by using AWK?
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2009-12-01
Didn't read the question properly the first time. You need a field separator that can be either a dash or a comma. Once you have that you can use the dash as an output field separator (as it's the most common) and fake the comma using concatenation:
awk -F',|-' 'OFS="-" {print $1 "," $4,$3,$2}' file
Pure awk
awk -F"," '{ n=split($2,b,"-");$2=b[3]"-"b[2]"-"b[1];$i=$1","$2 } 1' file
sed
sed -r 's/(^.[^,]*,)([0-9]{2})-([0-9]{2})-([0-9]{4})/\1\4-\3-\2/' file
sed 's/\(^.[^,]*,\)\([0-9][0-9]\)-\([0-9][0-9]\)-\([0-9]\+\)/\1\4-\3-\2/' file
Bash
#!/bin/bash
while IFS="," read -r a b
do
IFS="-"
set -- $b
echo "$a,$3-$2-$1"
done <"file"
Unfortunately, I think standard awk only allows one field separator character so you'll have to pre-process the data. You can do this with tr but if you really want an awk-only solution, use:
pax> echo '123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010' | awk -F, '{print $1"-"$2}' | awk -F- '{print $1","$4"-"$3"-"$2}'
This outputs:
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2010-12-01
as desired.
The first awk changes the , characters to - so that you have four fields separated with the same character (this is the bit I'd usually use tr ',' '-' for).
The second awk prints them out in the order you specified, correcting the field separators at the same time.
If you're using an awk implementation that allows multiple FS characters, you can use something like:
gawk -F ',|-' '{print $1","$4"-"$3"-"$2}'
If it doesn't need to be awk, you could use Perl too:
$ perl -nle 'print "$1,$4-$3-$2" while (/(\d{3}),(\d{2})-(\d{2})-(\d{4})\s*/g)' < file.txt

Resources