I have a data file in the following format:
Program1, Program2, Program3, Program4
0, 1, 1, 0
1, 1, 1, 0
Columns are program names, and rows are features of programs. I need to write an awk loop that will go through every row, check if a value is equal to one, and then return the column names and put them into a "results.csv" file. The desired output should be this:
Program2, Program3
Program1, Program2, Program3
I was trying this code, but it wouldn't work:
awk -F, '{for(i=1; i<=NF; i++) if ($i==1) {FNR==1 print$i>>results}; }'
Help would be very much appreciated!
awk -F', *' '
NR==1 {for(i=1;i<=NF;i++) h[i]=$i; next}
{
sep="";
for(x=1;x<=NF;x++) {
if($x) {
printf "%s%s", sep, h[x];
sep=", ";
}
}
print ""
}' file
outputs:
Program2, Program3
Program1, Program2, Program3
$ cat tst.awk
BEGIN { FS=", *" }
NR==1 { split($0,a); next }
{
out = ""
for (i=1; i<=NF; i++)
out = out ($i ? (out?", ":"") a[i] : "")
print out
}
$ awk -f tst.awk file
Program2, Program3
Program1, Program2, Program3
My take on things is more verbose, but should handle the trailing comma. Not really a one-liner, though.
BEGIN {
# Formatting for the input and output files.
FS = ", *"
OFS = ", "
}
FNR == 1 {
# First line in the file
# Read the headers into a list for later use.
for (i = 1; i <= NF; i++) {
headers[i] = $i
}
}
FNR > 1 {
# Print the header for each column containing a 1.
stop = 0
for (i = 1; i <= NF; i++) {
# Gather the results from this line.
if ($i > 0) {
stop += 1
results[stop] = headers[i]
}
}
if (stop > 0) {
# If this input line had no results, the output line is blank
for (i = 1; i <= stop; i++) {
# Print the appropriate headers for this result.
if (i < stop) {
# Results other than the last
printf("%s%s", results[i], OFS)
} else {
# The last result
printf("%s", results[i])
}
}
}
printf("%s", ORS)
}
Save this as something like script.awk, and then run it as something like:
awk -f script.awk infile.txt > results
Related
I want to change all duplicate names in .csv to unique, but after finding duplicate I cannot reach previous line, because it's already printed. I've tried to save all lines in array and print them in End section, but it doesn't work and I don't understand how to access specific field in this array (two-dimensional array isn't supported in awk?).
sample input
...,9,phone,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone,...
desired output
...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...
My attempt ($2 - id field, $3 - name field)
BEGIN{
FS=","
OFS=","
marker=777
}
{
if (names[$3] == marker) {
$3 = $3 $2
#Attempt to change previous duplicate
results[nameLines[$3]]=$3 id[$3]
}
names[$3] = marker
id[$3] = $2
nameLines[$3] = NR
results[NR] = $0
}
END{
#it prints some numbers, not saved lines
for(result in results)
print result
}
Here is single pass awk that stores all records in buffer:
awk -F, '
{
rec[NR] = $0
++fq[$3]
}
END {
for (i=1; i<=NR; ++i) {
n = split(rec[i], a, /,/)
if (fq[a[3]] > 1)
a[3] = a[3] a[2]
for (k=1; k<=n; ++k)
printf "%s", a[k] (k < n ? FS : ORS)
}
}' file
...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...
This could be easily done in 2 pass Input_file in awk where we need not to create 2 dimensional arrays in it. With your shown samples written in GNU awk.
awk '
BEGIN{FS=OFS=","}
FNR==NR{
arr1[$3]++
next
}
{
$3=(arr1[$3]>1?$3 $2:$3)
}
1
' Input_file Input_file
Output will be as follows:
...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...
I have an input file with billions of records and a header.
Header consists of meta info, total number of rows and sum of the sixth column. I am splitting the file into small sizes, due to which my header record must be updated as the sum of sixth column and total rows is changed.
This is the sample record
filename: testFile.text
00|STMT|08-09-2022 13:24:56||5|13.10|SHA2
10|000047290|8ddcf4b2356dfa7f326ca8004a9bdb6096330fc4f3b842a971deaf660a395f65|18-01-2020|12:36:57|3.10|00004729018-01-20201|APP
10|000052736|cce280392023b23df2a00ace4b82db8eb61c112bb14509fb273c523550059317|07-02-2017|16:27:49|2.00|00005273607-02-20171|APP
10|000070355|f2e86d2731d32f9ce960a0f5883e9b688c7e57ab9c2ead86057f98426407d87a|17-07-2019|20:25:02|1.00|00007035517-07-20192|APP
10|000070355|54c1fc2667e160a11ae1dbf54d3ba993475cd33d6ececdd555fb5c07e64a241b|17-07-2019|20:25:02|5.00|00007035517-07-20192|APP
10|000072420|f5dac143082631a1693e0fb5429d3a185abcf3c47b091be2f30cd50b5cf4be11|14-06-2021|20:52:21|2.00|00007242014-06-20212|APP
Expected:
filename: testFile_1.text
00|STMT|08-09-2022 13:24:56||3|6.10|SHA2
10|000047290|8ddcf4b2356dfa7f326ca8004a9bdb6096330fc4f3b842a971deaf660a395f65|18-01-2020|12:36:57|3.10|00004729018-01-20201|APP
10|000052736|cce280392023b23df2a00ace4b82db8eb61c112bb14509fb273c523550059317|07-02-2017|16:27:49|2.00|00005273607-02-20171|APP
10|000070355|f2e86d2731d32f9ce960a0f5883e9b688c7e57ab9c2ead86057f98426407d87a|17-07-2019|20:25:02|1.00|00007035517-07-20192|APP
filename: testFile_2.text
00|STMT|08-09-2022 13:24:56||2|7.00|SHA2
10|000070355|54c1fc2667e160a11ae1dbf54d3ba993475cd33d6ececdd555fb5c07e64a241b|17-07-2019|20:25:02|5.00|00007035517-07-20192|APP
10|000072420|f5dac143082631a1693e0fb5429d3a185abcf3c47b091be2f30cd50b5cf4be11|14-06-2021|20:52:21|2.00|00007242014-06-20212|APP
I am able to split the file and calculate the sum but unable to replace the value in header part.
This is the script I have made
#!/bin/bash
splitRowCount=$1
transactionColumn=$2
filename=$(basename -- "$3")
extension="${filename##*.}"
nameWithoutExt="${filename%.*}"
echo "splitRowCount: $splitRowCount"
echo "transactionColumn: $transactionColumn"
awk 'NR == 1 { head = $0 } NR % '$splitRowCount' == 2 { filename = "'$nameWithoutExt'_" int((NR-1)/'$splitRowCount')+1 ".'$extension'"; print head > filename } NR != 1 { print >> filename }' $filename
ls *.txt | while read line
do
firstLine=$(head -n 1 $line);
awk -F '|' 'NR !=1 {sum += '$transactionColumn'}END {print sum} ' $line
done
Here's an awk solution for splitting the original file into files of n records. The idea is to accumulate the records until the given count is reached then generate a file with the updated header and the accumulated records:
n=3
file=./testFile.text
awk -v numRecords="$n" '
BEGIN {
FS = OFS = "|"
if ( match(ARGV[1],/[^\/]\.[^\/]*$/) ) {
filePrefix = substr(ARGV[1],1,RSTART)
fileSuffix = substr(ARGV[1],RSTART+1)
} else {
filePrefix = ARGV[1]
fileSuffix = ""
}
if (getline headerStr <= 0)
exit 1
split(headerStr, headerArr)
}
(NR-2) % numRecords == 0 && recordsCount {
outfile = filePrefix "_" ++filesCount fileSuffix
print headerArr[1],headerArr[2],headerArr[3],headerArr[4],recordsCount,recordsSum,headerArr[7] > outfile
printf("%s", records) > outfile
close(outfile)
records = ""
recordsCount = recordsSum = 0
}
{
records = records $0 ORS
recordsCount++
recordsSum += $6
}
END {
if (recordsCount) {
outfile = filePrefix "_" ++filesCount fileSuffix
print headerArr[1],headerArr[2],headerArr[3],headerArr[4],recordsCount,recordsSum,headerArr[7] > outfile
printf("%s", records) > outfile
close(outfile)
}
}
' "$file"
With the given sample you'll get:
testFile_1.text
00|STMT|08-09-2022 13:24:56||3|6.1|SHA2
10|000047290|8ddcf4b2356dfa7f326ca8004a9bdb6096330fc4f3b842a971deaf660a395f65|18-01-2020|12:36:57|3.10|00004729018-01-20201|APP
10|000052736|cce280392023b23df2a00ace4b82db8eb61c112bb14509fb273c523550059317|07-02-2017|16:27:49|2.00|00005273607-02-20171|APP
10|000070355|f2e86d2731d32f9ce960a0f5883e9b688c7e57ab9c2ead86057f98426407d87a|17-07-2019|20:25:02|1.00|00007035517-07-20192|APP
testFile_2.text
00|STMT|08-09-2022 13:24:56||2|7|SHA2
10|000070355|54c1fc2667e160a11ae1dbf54d3ba993475cd33d6ececdd555fb5c07e64a241b|17-07-2019|20:25:02|5.00|00007035517-07-20192|APP
10|000072420|f5dac143082631a1693e0fb5429d3a185abcf3c47b091be2f30cd50b5cf4be11|14-06-2021|20:52:21|2.00|00007242014-06-20212|APP
With your shown samples please try following awk code(Written and tested in GNU awk). Here I have defined awk variables named fileInitials which contains your output file's initial name eg: testFile then extension which contains output file's extension eg: .txt here. Then comes lines which will be your value on how many lines you want to have in a output file.
You need not to run shell + awk code, this could be done in a single awk like shown following.
awk -v count="1" -v fileInitials="testFile" -v extension=".txt" -v lines="3" '
BEGIN { FS=OFS="|" }
FNR==1{
match($0,/^([^|]*\|[^|]*\|[^|]*\|[^|]*\|[^|]*)\|[^|]*(.*)/,arr)
header1=arr[1]
header2=arr[2]
outputFile=(fileInitials count extension)
next
}
{
if(prev!=count){
print (header1,sum header2 ORS val) > (outputFile)
close(outputFile)
outputFile=(fileInitials count extension)
sum=0
val=""
}
sum+=$6
val=(val?val ORS:"") $0
prev=count
count=(++countline%lines==0?++count:count)
}
END{
if(count && val){
print (header1,sum header2 ORS val) > (outputFile)
close(outputFile)
}
}
' Input_file
I have a .txt file with this record:
field_1 value01a value01b value01c
field_2 value02
field_3 value03a value03b value03c
field_1 value11
field_2 value12a value12b
field_3 value13
field_1 value21
field_2 value22
field_3 value23
...
field_1 valuen1
field_2 valuen2
field_3 valuen3
I would like to convert them like that:
field1 field2 field3
value01a value01b value01c valu02 value03a value03b value03c
value11 value12a value12b value13
value21 value22 value23
...
valuen1 valuen2 valuen3
I have tried something like:
awk '{for (i = 1; i <NR; i ++) FNR == i {print i, $ (i + 1)}}' filename
or like
awk '
{
for (i=1; i<=NF; i++) {
a[NR,i] = $i
}
}
NF>p { p = NF }
END {
for(j=1; j<=p; j++) {
str=a[1,j]
for(i=2; i<=NR; i++){
str=str" "a[i,j]
}
print str
}
}'
but i can't get it to work
I would like the values to be transposed and that each tuple of values associated with a specific field is aligned with the others
Any suggestions?
Thank you in advance
I have downloaded your bigger sample file. And here is what I have come up with:
awk -v OFS='\t' -v RS= '
((n = split($0, a, / {2,}| *\n/)) % 2) == 0 {
# print header
if (NR==1)
for (i=1; i<=n; i+=2)
printf "%s", a[i] (i < n-1 ? OFS : ORS)
# print all records
for (i=2; i<=n; i+=2)
printf "%s", a[i] (i < n ? OFS : ORS)
}' reclamiTestFile.txt | column -t -s $'\t'
Code Demo
Could you please try following, written and tested with shown samples in GNU awk.
awk '
{
first=$1
$1=""
sub(/^ +/,"")
if(!arr[first]++){
++indArr
counter[indArr]=first
}
++count[first]
arr[first OFS count[first]]=$0
}
END{
for(j=1;j<=indArr;j++){
printf("%s\t%s",counter[j],j==indArr?ORS:"\t")
}
for(i=1;i<=FNR;i++){
for(j=1;j<=indArr;j++){
if(arr[counter[j] OFS i]){
printf("%s\t%s",arr[counter[j] OFS i],j==indArr?ORS:"\t")
}
}
}
}' Input_file | column -t -s $'\t'
column command is taken from #anubhava sir's answer here.
I'm struggling to reformat a comma separated file using awk. The file contains minute data for a day for multiple servers and for multiple metrics
e.g 2 records, per minute, per server for 24hrs
Example input file:
server01,00:01:00,AckDelayAverage,9999
server01,00:01:00,AckDelayMax,8888
server01,00:02:00,AckDelayAverage,666
server01,00:02:00,AckDelayMax,5555
.....
server01,23:58:00,AckDelayAverage,4545
server01,23:58:00,AckDelayMax,8777
server01,23:59:00,AckDelayAverage,4686
server01,23:59:00,AckDelayMax,7820
server02,00:01:00,AckDelayAverage,1231
server02,00:01:00,AckDelayMax,4185
server02,00:02:00,AckDelayAverage,1843
server02,00:02:00,AckDelayMax,9982
.....
server02,23:58:00,AckDelayAverage,1022
server02,23:58:00,AckDelayMax,1772
server02,23:59:00,AckDelayAverage,1813
server02,23:59:00,AckDelayMax,9891
I'm trying to re-format the file to have a single row for each minute with a unique concatenation of fields 1 & 3 as the column headers
e.g the expected output file would look like:
Minute, server01-AckDelayAverage,server01-AckDelayMax, server02-AckDelayAverage,server02-AckDelayMax
00:01:00,9999,8888,1231,4185
00:02:00,666,5555,1843,8892
...
...
23:58:00,4545,8777,1022,1772
23:59:00,4686,7820,1813,9891
A solution using GNU awk. Call this as awk -F, -f script input_file:
/Average/ { average[$2, $1] = $4; }
/Max/ { maximum[$2, $1] = $4; }
{
if (!($2 in minutes)) {
minutes[$2] = 1;
}
if (!($1 in servers)) {
servers[$1] = 1;
}
}
END {
mcount = asorti(minutes, smin);
scount = asorti(servers, sserv);
printf "minutes";
for (col = 1; col <= scount; col++) {
printf "," sserv[col] "-average," sserv[col] "-maximum";
}
print "";
for (row = 1; row <= mcount; row++) {
key = smin[row];
printf key;
for (col = 1; col <= scount; col++) {
printf "," average[key, sserv[col]] "," maximum[key, sserv[col]];
}
print "";
}
}
run awk command : ./script.awk file
#! /bin/awk -f
BEGIN{
FS=",";
OFS=","
}
$1 ~ /server01/ && $3 ~ /Average/{
a[$2]["Avg01"] = $4;
}
$1 ~ /server01/ && $3 ~ /Max/{
a[$2]["Max01"] = $4;
}
$1 ~ /server02/ && $3 ~ /Average/{
a[$2]["Avg02"] = $4;
}
$1 ~ /server02/ && $3 ~ /Max/{
a[$2]["Max02"] = $4;
}
END{
print "Minute","server01-AckDelayAverage","server01-AckDelayMax","server02-AckDelayAverage","server02-AckDelayMax"
for(i in a){
print i,a[i]["Avg01"],a[i]["Max01"],a[i]["Avg02"],a[i]["Max02"] | "sort"
}
}
With awk and sort:
awk -F, -v OFS=, '{
a[$2]=(a[$2]?a[$2]","$4:$4)
}
END{
for ( i in a ) print i,a[i]
}' File | sort
If $4 has 0 values:
awk -F, -v OFS=, '!a[$2]{a[$2]=$2} {a[$2]=a[$2]","$4} END{for ( i in a ) print a[i]}' | sort
!a[$2]{a[$2]=$2}: If array with a with Index $2 ( the time in Minute) doesn't exit, array a with index as $2( the time in Minute) with value as $2 is created. True when Minute entry first time occurs in line.
{a[$2]=a[$2]","$4}: Concatenate value $4 to this array
END: Print all values of in array a
Finally pipe this awk result to sort.
This is what i have to do:
Display the content of the files given as arguments as follows: the
files on the positions 1, 3, 5, ... will be displayed normally. The
files on the positions 2, 4, 6, ... print each line with the words in
revers order (last word is displayed first, next to last is displayed
second, etc).
I tryed in many ways but i can't figure out how to verify the position of the filename in the awk arguments list.
if(j%2!=0){
for(i=1;i<=NF;i++)
print $i
}
else
for(i=NF;i=1;i--)
print $i
}
This is how i can print the lines from a file.
BEGIN{
for(j=1;j<ARGC;j++)
a[j]=j
}
Here i tried to make a list with the number of arguments.
But how can i use the list with the if? Or how can i do this in a different way?
$ awk -f 2.awk 1.txt 2.txt 3.txt
This is the command i used, where 2.awk is the source file.
Text file example:
1.txt
1 2 3 4
a b c b
With GNU awk for ARGIND:
gawk '!(ARGIND%2){for (i=NF;i>1;i--) printf "%s ",$i; print $1; next} 1' file1 file2 ...
with other awks, just create your own "ARGIND" by incrementing a variable in an FNR==1 block, assuming none of the files are empty.
Okay, you can massage this to your needs. Heres an awk executable file that you would run like:
awkex 1.txt 2.txt 3.txt
The contents of the executable awkex file are:
awk '
BEGIN {
for( i = 1; i < ARGC; i++ )
{
if( i % 2 == 0 ) { even_args[ ARGV[ i ] ] = ARGV[ i ]; }
else { odd_args[ ARGV[ i ] ] = ARGV[ i ]; }
}
}
{
if( odd_args[ FILENAME ] != "" )
{
for( i = 1; i <= NF; i++ )
printf( "%s ", $i );
printf( "\n" );
}
else
{
for( j = NF; j > 0; j-- )
printf( "%s ", $j );
printf( "\n" );
}
}
' $*
It's assuming every arg is a filename. The odd ones got into one map, the evens into another map. If the currently handled FILENAME is in the odd array, do one thing, else do the other. It's also assuming the default separator. You could change that with the -F flag in the "awkex" file.
This is a perfect time to use the FNR variable. If FNR==1 then you're at the first line of a file:
awk '
FNR==1 {filenum++}
filenum%2==0 {
# reverse the words of this line
n = NF
for (i=n; i>=1; i--) $(NF+1) = $i
for (i=1; i<=n; i++) $i = $(n+i)
NF = n
}
1
' one two three four five six
Testing:
# here's the contents of my files:
$ for f in one two three four five six ; do printf "%s: %s\n" $f "$(<$f)"; done
one: words in file one
two: words in file two
three: words in file three
four: words in file four
five: words in file five
six: words in file six
$ awk '
FNR==1 {filenum++}
filenum%2==0 {
n = NF
for (i=n; i>=1; i--) $(NF+1) = $i
for (i=1; i<=n; i++) $i = $(n+i)
NF = n
}
1
' one two three four five six
outputs
words in file one
two file in words
words in file three
four file in words
words in file five
six file in words