how to print 3rd field in 3rd column itself - linux

In my file I have 3 fields, I want to print only the third field in the third column only but output is getting to the first row. Please check my file and output:
cat filename
1st field 2nd field 3rd field
--------- --------- -----------
a,b,c,d d,e,f,g,h 1,2,3,4,5,5
q,w,e,r t,y,g,t,i 9,8,7,6,5,5
I'm using the following command to print the third field only in the third column
cat filename |awk '{print $3}' |tr ',' '\n'
OUTPUT printing 3rd field strings in the 1st field place, i want that to print in only 3rd field area only
first field :-
---------------
1
2
3
4
5
5
expected output
1st field 2nd field 3rd field
--------- --------- -----------
a,b,c,d d,e,f,g,h 1
2
3
4
5
5
q,w,e,r t,y,g,t,i 9
8
7
6
5
5

Input
[akshay#localhost tmp]$ cat file
1st field 2nd field 3rd field
--------- --------- -----------
a,b,c,d d,e,f,g,h 1,2,3,4,5,5
q,w,e,r t,y,g,t,i 9,8,7,6,5,5
Script
[akshay#localhost tmp]$ cat test.awk
NR<3 || !NF{ print; next}
{
split($0,D,/[^[:space:]]*/)
c1=sprintf("%*s",length($1),"")
c2=sprintf("%*s",length($2),"")
split($3,A,/,/)
for(i=1; i in A; i++)
{
if(i==2)
{
$1 = c1
$2 = c2
}
printf("%s%s%s%s%d\n",$1,D[2],$2,D[3],A[i])
}
}
Output
[akshay#localhost tmp]$ awk -f test.awk file
1st field 2nd field 3rd field
--------- --------- -----------
a,b,c,d d,e,f,g,h 1
2
3
4
5
5
q,w,e,r t,y,g,t,i 9
8
7
6
5
5
Explanation
NR<3 || !NF{ print; next}
NR gives you the total number of records being processed or line number, in short NR variable has line number.
NF gives you the total number of fields in a record.
The next statement forces awk to immediately stop processing the
current record and go on to the next record.
If line number is less than 3 or not NF (meaning no fields in record that is blank line), print current record and go to next record.
split($0,D,/[^[:space:]]*/)
Since we are interested to preserve the formatting, so we are saving separators between fields on array D here, if you have GNU awk you can make use of 4th arg for split() - it lets you split the line into 2 arrays, one of the fields and the other of the separators between the fields and then you can just operate on the fields array and print using the separators array between each field array element to rebuild the original $0.
c1=sprintf("%*s",length($1),"") and c2=sprintf("%*s",length($2),"")
Here sprintf function is used to fill space char of field ($1 or $2) length.
split($3,A,/,/)
split(string, array [, fieldsep [, seps ] ])
Divide string into pieces separated by fieldsep and store the pieces
in array and the separator strings in the seps array. The first piece
is stored in array[1], the second piece in array[2], and so forth. The
string value of the third argument, fieldsep, is a regexp describing
where to split string (much as FS can be a regexp describing where to
split input records). If fieldsep is omitted, the value of FS is used.
split() returns the number of elements created.
Loop till as long as i in A is true, I just came to know that i=1 and i++ control the order of traversal of the array, Thanks to Ed Morton
if(i==2)
{
$1 = c1
$2 = c2
}
when i = 1 we print a,b,c,d and d,e,f,g,h, in next iteration we modify $1 and $2 value with c1 and c2 we created above since you are interested to show only once as requested.
printf("%s%s%s%s%d\n",$1,D[2],$2,D[3],A[i])
Finally print field1 ($1), separator between field1 and field2 to we saved above, that is D[2], field2 ($2), separator between field2 and field3 and array A element only by one which we created from (split($3,A,/,/)).

$ cat tst.awk
NR<3 || !NF { print; next }
{
front = gensub(/((\S+\s+){2}).*/,"\\1","")
split($3,a,/,/)
for (i=1;i in a;i++) {
print front a[i]
gsub(/\S/," ",front)
}
}
$ awk -f tst.awk file
1st field 2nd field 3rd field
--------- --------- -----------
a,b,c,d d,e,f,g,h 1
2
3
4
5
5
q,w,e,r t,y,g,t,i 9
8
7
6
5
5
The above uses GNU awk for gensub(), with other awks use match()+substr(). It also uses \S and \s shorthand for [^[:space:]] and [[:space:]].

Considering the columns are tab separated, I would say:
awk 'BEGIN{FS=OFS="\t"}
NR<=2 || !NF {print; next}
NR>2{n=split($3,a,",")
for (i=1;i<=n; i++)
print (i==1?$1 OFS $2:"" OFS ""), a[i]
}' file
This prints the 1st, 2nd and empty lines normally
Then, slices the 3rd field using the comma as separator.
Finally, loops through the amount of pieces printing each one every time; it prints the first two columns the first time, then just the last value.
Test
$ awk 'BEGIN{FS=OFS="\t"} NR<=2 || !NF {print; next} NR>2{n=split($3,a,","); for (i=1;i<=n; i++) print (i==1?$1 OFS $2:"" OFS ""), a[i]}' a
1st field 2nd field 3rd field
--------- --------- -----------
a,b,c,d d,e,f,g,h 1
2
3
4
5
5
q,w,e,r t,y,g,t,i 9
8
7
6
5
5
Note the output is a bit ugly, since tab separating the columns lead them like this.

Related

Awk - Count Each Unique Value and Match Values Between Two Files without printing all feilds of matched line [duplicate]

This question already has answers here:
Awk - Count Each Unique Value and Match Values Between Two Files
(2 answers)
Closed 2 years ago.
I have two files. First I am trying to get the count of each unique field in column 4.
And then match the unique field value from the 2nd column of the 2nd file.
File1 - column 4's each unique value and File2 - columns 2 contains the value that I need to match between the two files
So essentially, I am trying to -> take each unique value and value count from column 4 from File1, if there is a match in column2 of file2
File1
1 2 3 6 5
2 3 4 5 1
3 5 7 6 1
2 3 4 6 2
4 6 6 5 1
File2
1 2 3 hello "6"
1 3 3 hi "5"
needed output
total count of hello,6 : 3
total count of hi,5 : 2
My test code
awk 'NR==FNR{a[$4]++}NR!=FNR{gsub(/"/,"",$2);b[$2]=$0}END{for( i in b){printf "Total count of %s,%d : %d\n",gensub(/^([^ ]+).*/,"\\1","1",b[i]),i,a[i]}}' File1 File2
I believe I should be able to do this with awk, but for some reason I am really struggling with this one.
Thanks
With your shown samples, could you please try following.
awk '
FNR==NR{
count[$4]++
next
}
{
gsub(/"/,"",$NF)
}
($NF in count){
print "total count of "$(NF-1)","$NF" : "count[$NF]
}
' file1 file2
Sample output will be as follows.
total count of hello,6 : 3
total count of hi,5 : 2
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when file1 is being read.
count[$4]++ ##Creating array count with index of 4th field and keep increasing its value with 1 here.
next ##next will skip all further statements from here.
}
{
gsub(/"/,"",$NF) ##Globally substituting " with NULL in last field of current line.
}
($NF in count){ ##Checking condition if last field is present in count then do following.
print "total count of "$(NF-1)","$NF" : "count[$NF]
##Printing string 2nd last field, last field : and count value here as per request.
}
' file1 file2 ##Mentioning Input_file names here.

Sum of 2nd and 3rd column for same value in 1st column

I want to sum the value in column 2nd and 3rd column for same value in 1st column
1555971000 6 1
1555971000 0 2
1555971300 2 0
1555971300 3 0
Output would be like
1555971000 6 3
1555971300 5 0
I have tried below command
awk -F" " '{b[$2]+=$1} END { for (i in b) { print b[i],i } } '
but this seems to be for only one column.
Here is another way with reading Input_file 2 times and it will provide output in same sequence as Input_file's sequence.
awk 'FNR==NR{a[$1]+=$2;b[$1]+=$3;next} ($1 in a){print $1,a[$1],b[$1];delete a[$1]}' Input_file Input_file
if data in 'd' without sort, tried on gnu awk,
awk 'BEGIN{f=1} {if($1==a||f){b+=$2;c+=$3;f=0} else{print a,b,c;b=$2;c=$3} a=$1} END{print a,b,c}' d
with sort gnu awk
awk '{w[NR]=$0} END{asort(w);f=1;for(;i++<NR;){split(w[i],v);if(v[1]==a||f){f=0;b+=v[2];c+=v[3]} else{print a,b,c;b=v[2];c=v[3];} a=v[1]} print a,b,c;}' d
You can do it with awk by first saving the fields in the first record, and then for all subsequent records, comparing if the first field matches, if so, add the contents of fields two and three and continue. If the first field fails to match, then output your first field and the running-sums, e.g.
awk '{
if ($1 == a) {
b+=$2; c+=$3;
}
else {
print a, b, c; a=$1; b=$2; c=$3;
}
} END { print a, b, c; }' file
With your input in file, you can copy and paste the foregoing into your terminal and obtain, the following:
Example Use/Output
$ awk '{
> if ($1 == a) {
> b+=$2; c+=$3;
> }
> else {
> print a, b, c; a=$1; b=$2; c=$3;
> }
> } END { print a, b, c; }' file
1555971000 6 3
1555971300 5 0
Using awk Arrays
A shorter more succinct alternative using arrays that does not require your input to be in sorted order would be:
awk '{a[$1]+=$2; b[$1]+=$3} END{ for (i in a) print i, a[i], b[i] }' file
(same output)
Using arrays allows the summing of columns for like field1 to work equally well if your data file contained the following lines in random order, e.g.
1555971300 2 0
1555971000 0 2
1555971000 6 1
1555971300 3 0
Another awk that would work regardless of any order of records whether or not they are not sorted :
awk '{r[$1]++}
r[$1]==1{o[++c]=$1}
{f[$1]+=$2;s[$1]+=$3}
END{for(i=1;i<=c;i++){print o[i],f[o[i]],s[o[i]]}}' file
Assuming when you wrote:
awk -F" " '{b[$2]+=$1} END { for (i in b) { print b[i],i } } '
you meant to write:
awk '{ b[$1]+=$2 } END{ for (i in b) print i,b[i] }'
It shouldn't be a huge leap to figure out:
$ awk '{ b[$1]+=$2; c[$1]+=$3 } END{ for (i in b) print i,b[i],c[i] }' file
1555971000 6 3
1555971300 5 0
Please get the book "Effective Awk Programming", 4th Edition, by Arnold Robbins and just read a paragraph or 2 about fields and arrays.

Find duplicate lines based on column and print both lines and their numbers with awk

I have a following file:
userID PWD_HASH
test 1234
admin 1234
user 6789
abcd 5555
efgh 6666
root 1234
Using AWK,
I need to find both original lines and their duplicates with row numbers,
so that get the output like:
NR $0
1 test 1234
2 admin 1234
6 root 1234
I have tried the following, but it does not print the correct row number with NR :
awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt
Any help would be appreciated!
$ awk '
($2 in a) { # look for duplicates in $2
if(a[$2]) { # if found
print a[$2] # output the first, stored one
a[$2]="" # mark it outputed
}
print NR,$0 # print the duplicated one
next # skip the storing part that follows
}
{
a[$2]=NR OFS $0 # store the first of each with NR and full record
}' file
Output (with the header in file):
2 test 1234
3 admin 1234
7 root 1234
Using GAWK, you can do this by below construct : -
awk '
{
NR>1
{
a[$2][NR-1 " " $0];
}
}
END {
for (i in a)
if(length(a[i]) > 1)
for (j in a[i])
print j;
}
' Input_File.txt
Create a 2-dimensional array.
In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).
To display only duplicate ones, you can use length(a[i] > 1) condition.
Could you please try following.
awk '
FNR==NR{
a[$2]++
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
next
}
a[$2]>1{
print b[$2,FNR]
}
' Input_file Input_file
Output will be as follows.
1 test 1234
2 admin 1234
6 root 1234
Explanation: Following is the explanation for above code.
awk ' ##Starting awk program here.
FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
next ##Using next for skipping all further statements from here.
}
a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
}
' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.
Without using awk, but GNU coretutils tools:
tail -n+2 file | nl | sort -k3n | uniq -D -f2
tail remove the first line.
nl add line number.
sort based on the 3rd field.
uniq only prints duplicate based on the 3rd field.

Split and compare in awk

I want to split and comparison in awk command.
Input file (tab-delimited)
1 aaa 1|3
2 bbb 3|3
3 ccc 0|2
Filtration
First column value > 1
First value of third column value splitted by "|" > 2
Process
Compare first column value if bigger than 1
Split third column value by "|"
Compare first value of the third column if bigger than 2
Print if the first value bigger than 2 only
Command line (example)
awk -F "\t" '{if($1>1 && ....?) print}' file
Output
2 bbb 3|3
Please let me know command line for above processing.
You can set the field separator to either tab or pipe and check the 1st and 3rd values:
awk -F'\t|\\|' '$1>1 && $3>2' file
or
awk -F"\t|\\\\|" '$1>1 && $3>2' file
You can read about all this character escaping in this comprehensive answer by Ed Morton in awk: fatal: Invalid regular expression when setting multiple field separators.
Otherwise, you can split the 3rd field and check the value of the first slice:
awk -F"\t" '{split($3,a,"|")} $1>1 && a[1]>=2' file

Separate comma delimited cells to new rows with shell script

I have a table with comma delimited columns and I want to separate the comma delimited values in my specified column to new rows. For example, the given table is
Name Start Name2
A 1,2 X,a
B 5 Y,b
C 6,7,8 Z,c
And I need to separate the comma delimited values in column 2 to get the table below
Name Start Name2
A 1 X,a
A 2 X,a
B 5 Y,b
C 6 Z,c
C 7 Z,c
C 8 Z,c
I am wondering if there is any solution with shell script, so that I can create a workflow pipe.
Note: the original table may contain more than 3 columns.
Assuming the format of your input and output does not change:
awk 'BEGIN{FS="[ ,]"} {print $1, $2, $NF; print $1, $3, $NF}' input_file
Input:
input_file:
A 1,2 X
B 5,6 Y
Output:
A 1 X
A 2 X
B 5 Y
B 6 Y
Explanation:
awk: invoke awk, a tool for manipulating lines (records) and fields
'...': content enclosed by single-quotes are supplied to awk as instructions
'BEGIN{FS="[ ,]"}: before reading any lines, tell awk to use both space and comma as delimiters; FS stands for Field Separator.
{print $1, $2, $NF; print $1, $3, $NF}: For each input line read, print the 1st, 2nd and last field on one line, and then print the 1st, 3rd, and last field on the next line. NF stands for Number of Fields, so $NF is the last field.
input_file: supply the name of the input file to awk as an argument.
In response to updated input format:
awk 'BEGIN{FS="[ ,]"} {print $1, $2, $4","$5; print $1, $3, $4","$5}' input_file
After Runner's modification of the original question another approach might look like this:
#!/bin/sh
# Usage $0 <file> <column>
#
FILE="${1}"
COL="${2}"
# tokens separated by linebreaks
IFS="
"
for LINE in `cat ${FILE}`; do
# get number of columns
COLS="`echo ${LINE} | awk '{print NF}'`"
# get actual field by COL, this contains the keys to be splitted into individual lines
# replace comma with newline to "reuse" newline field separator in IFS
KEYS="`echo ${LINE} | cut -d' ' -f${COL}-${COL} | tr ',' '\n'`"
COLB=$(( ${COL} - 1 ))
COLA=$(( ${COL} + 1 ))
# get text from columns before and after actual field
if [ ${COLB} -gt 0 ]; then
BEFORE="`echo ${LINE} | cut -d' ' -f1-${COLB}` "
else
BEFORE=""
fi
AFTER=" `echo ${LINE} | cut -d' ' -f${COLA}-`"
# echo "-A: $COLA ($AFTER) | B: $COLB ($BEFORE)-"
# iterate keys and re-build original line
for KEY in ${KEYS}; do
echo "${BEFORE}${KEY}${AFTER}"
done
done
With this shell file you might do what you want. This will split column 2 into multiple lines.
./script.sh input.txt 2
If you'd like to pass inputs though standard input using pipes (e.g. to split multiple columns in one go) you could change the 6. line to:
if [ "${1}" == "-" ]; then
FILE="/dev/stdin"
else
FILE="${1}"
fi
And run it this way:
./script.sh input.txt 1 | ./script.sh - 2 | ./script.sh - 3
Note that cut is very sensitiv about the field separators. Soif the line starts with a space character, column 1 would be "" (empty). If the fields were separated by amixture of spaces and tabs this script would have other issues too. In this case (as explained above) filtering the input resource (so that fields are only separated by one space character) should do it. If this is not possible or the data in each column contains space characters too, the script might get more complicated.

Resources