How can I adjust the length of a column field in bash using awk or sed? - linux

I've an input.csv file in which columns 2 and 3 have variable lengtt.
100,Short Column, 199
200,Meeedium Column,1254
300,Loooooooooooong Column,35
I'm trying to use the following command to achieve a clean tabulation, but I need to fill the 2nd column with a certain number of blank spaces in order to get a fixed lenght column (let's say that a total lenght of 30 is enough).
awk -F, '{print $1 "\t" $2 "\t" $3;}' input.csv
My current output looks like this:
100 Short Column 199
200 Meeedium Column 1254
300 Loooooooooooong Column 35
And I would like to achieve the following output, by filling 2nd and 3rd column properly:
100 Short Column 199
200 Meeedium Column 1254
300 Loooooooooooong Column 35
Any good idea out there about awk or sed command should be used?
Thanks everybody.

Use printf in awk
$ awk -F, '{gsub(/ /, "", $3); printf "%-5s %-25s%5s\n", $1, $2, $3}' file input.csv
100 Short Column 199
200 Meeedium Column 1254
300 Loooooooooooong Column 35
What I have done above, is set the IFS,field separator to ,; since the file has some white-spaces in the 3rd column alone, it mangles, how printf processes the strings, removing it with gsub and formatting in C-style printf.

Rather than picking some arbitrary number as the width of each field, do a 2-pass approach where the first pass calculates the max length of each field and the 2nd prints the fields in a width that size plus a couple of spaces between fields:
$ cat tst.awk
BEGIN { FS=" *, *"; OFS=" " }
NR==FNR {
for (i=1;i<=NF;i++) {
w[i] = (length($i) > w[i] ? length($i) : w[i])
if ($i ~ /[^0-9]/) {
a[i] = "-"
}
}
next
}
{
for (i=1;i<=NF;i++) {
printf "%"a[i]w[i]"s%s", $i, (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file file
100 Short Column 199
200 Meeedium Column 1254
300 Loooooooooooong Column 35
The above also uses left-alignment for non-digit fields, right alignment for all-digits fields. It'll work no matter how long the input fields are and no matter how many fields you have:
$ cat file1
100000,Short Column, 199,a
100,Now is the Winter of our discontent with fixed width fields,20000,b
100,Short Column, 199,c
200,Meeedium Column,1254,d
300,Loooooooooooong Column,35,e
$ awk -f tst.awk file1 file1
100000 Short Column 199 a
100 Now is the Winter of our discontent with fixed width fields 20000 b
100 Short Column 199 c
200 Meeedium Column 1254 d
300 Loooooooooooong Column 35 e

Solution using perl
$ perl -pe 's/([^,]+),([^,]+),([^,]+)/sprintf "%-6s%-30s%5s", $1,$2,$3/e' input.csv
100 Short Column 199
200 Meeedium Column 1254
300 Loooooooooooong Column 35

Related

linux/unix convert delimited file to fixed width

I have a requirement to convert a delimited file to fixed-width file, details as follows.
Input file sample:
AAA|BBB|C|1234|56
AA1|BB2|DD|12345|890
Output file sample:
AAA BBB C 1234 56
AA1 BB2 DD 12345 890
Details of field positions
Field 1 Start at position 1 and length should be 5
Field 2 start at position 6 and length should be 6
Field 3 Start at position 12 and length should be 4
Field 4 Start at position 16 and length should be 6
Field 5 Start at position 22 and length should be 3
Another awk solution:
echo -e "AAA|BBB|C|1234|56\nAA1|BB2|DD|12345|890" |
awk -F '|' '{printf "%-5s%-6s%-4s%-6s%-3s\n",$1,$2,$3,$4,$5}'
Note the - before the %-3s in the printf statement, which will left-align the fields, as required in the question. Output:
AAA BBB C 1234 56
AA1 BB2 DD 12345 890
With the following awk command you can achive your goal:
awk 'BEGIN { RS=" "; FS="|" } { printf "%5s%6s%4s%6s%3s\n",$1,$2,$3,$4,$5 }' your_input_file
Your record separator (RS) is a space and your field separator (FS) is a pipe (|) character. In order to parse your data correctly we set them in the BEGIN statement (before any data is read). Then using printf and the desired format characters we output the data in the desired format.
Output:
AAA BBB C 1234 56
AA1 BB2 DD 12345890
Update:
I just saw your edits on the input file format (previously they seemed different). If your input data records are separated with a new line then simply remove the RS=" "; part from the above one-liner and apply the - modifiers for the format characters to left align your fields:
awk 'BEGIN { FS="|" } { printf "%-5s%-6s%-4s%-6s%-3s\n",$1,$2,$3,$4,$5 }' your_input_file

how can I make awk match up lines in file 1 with the lines in file 2 based on some number ranges in file 2

I have the following two files:
file 1:
22
2
42
32
file 2:
1 10 valuea
11 20 valueb
21 30 valuec
31 40 valued
41 50 valuee
51 60 valuef
How can I make awk grab each value from file 1, match it up with file 2 based on whether it falls between the number range in columns 1 and 2 of file 2, and then print out column 3 from the matched column in file 2? The output would resemble the following:
valuec
valuea
valuee
valued
I tried using the following AWK command (based on what I found in this post: How to check value of a column lies between values of two columns in other file and print corresponding value from column in Unix?), but it does not seem to be working correctly.
#!/bin/bash
awk 'FNR == NR { val[$1] = $1 }
FNR != NR { if (val[$1] >= $1 && val[$1] <= $2)
print $3
}' file1 file2
Also I did not include it in here for obvious reasons, but for the actual application of this script, file 1 would include around 7,000 entries while file 2 would include 68,000 entries
alternative awk script
$ awk 'FNR == NR {a[$1]=$2; v[$1]=$3; next}
{for(k in a)
if(k+0<=$1 && $1+0<=a[k]) print v[k]}' file2 file1
valuec
valuea
valuee
valued
note that file2 is the first file. This will cover multiple range matches as well. +0 is to force for numerical comparison.

Select rows in one file based on specific values in the second file (Linux)

I have two files:
One is "total.txt". It has two columns: the first column is natural numbers (indicator) ranging from 1 to 20, the second column contains random numbers.
1 321
1 423
1 2342
1 7542
2 789
2 809
2 5332
2 6762
2 8976
3 42
3 545
... ...
20 432
20 758
The other one is "index.txt". It has three columns:(1.indicator, 2:low value, 3: high value)
1 400 5000
2 600 800
11 300 4000
I want to output the rows of "total.txt" file with first column matches with the first column of "index.txt" file. And at the same time, the second column of output results must be larger than (>) the second column of the "index.txt" and smaller than (<) the third column of the "index.txt".
The expected result is as follows:
1 423
1 2342
2 809
2 5332
2 6762
11 ...
11 ...
I have tried this:
awk '$1==(awk 'print($1)' index.txt) && $2 > (awk 'print($2)' index.txt) && $1 < (awk 'print($2)' index.txt)' total.txt > result.txt
But it failed!
Can you help me with this? Thank you!
You need to read both files in the same awk script. When you read index.txt, store the other columns in an array.
awk 'FNR == NR { low[$1] = $2; high[$1] = $3; next }
$2 > low[$1] && $2 < high[$1] { print }' index.txt total.txt
FNR == NR is the common awk idiom to detect when you're processing the first file.
Use join like Barmar said:
# To join on the first columns
join -11 -21 total.txt index.txt
And if the files aren't sorted in lexical order by the first column then:
join -11 -21 <(sort -k1,1 total.txt) <(sort -k1,1 index.txt)

How to append a column for the result set in shell script

I need a script for the below scenario. I am very new to shell script.
wc file1 file2
the above query results with following result
40 149 947 file1
2294 16638 97724 file2
Now I need to get result as follows: 1st column, 3rd column ,4th column of above result set and new column with default values
40 947 file1 DF.tx1
2294 97724 file2 DF.rb2
Here the last column values is always known values i.e for file1 DF.tx1 and file2 DF.rb2.
If the give filenames in any order the default values should not change.
Please help me to write this script. Thanks in advance!!
You can use awk:
wc file1 file2 |
awk '$4 != "total"{if ($4 ~ /file1/) f="DF.tx1"; else if ($4 ~ /file2/) f="DF.rb2";
else if ($4 ~ /file3/) f="foo.bar"; print $1, $3, $4, f}'
1 12 file1 DF.tx1
9 105 file2 DF.rb2
5 15 file3 foo.bar

Find the maximum values in 2nd column for each distinct values in 1st column using Linux

I have two columns as follows
ifile.dat
1 10
3 34
1 4
3 32
5 3
2 2
4 20
3 13
4 50
1 40
2 20
What I look for is to find the maximum values in 2nd column for each 1,2,3,4,5 in 1st column.
ofile.dat
1 40
2 20
3 34
4 50
5 3
I found someone has done this using other program e.g. Get the maximum values of column B per each distinct value of column A
awk seems a prime candidate for this task. Simply traverse your input file and keep an array indexed by the first column values and storing a value of column 2 if it is larger than the currently stored value. At the end of the traversal iterate over the array to print indices and corresponding values
awk '{
if (a[$1] < $2) {
a[$1]=$2
}
} END {
for (i in a) {
print i, a[i]
}
}' ifile.dat
Now the result will not be sorted numerically on the first column but that should be easily fixable if that is required
Another way is using sort.
First numeric sort on column 2 decreasing and then remove non unique values of column 1, a one-liner:
sort -n -r -k 2 ifile.dat| sort -u -n -k 1
The easiest command to find the maximum value in the second column is something like this
sort -nrk2 data.txt | awk 'NR==1{print $2}'
When doing min/max calculations, always seed the min/max variable using the first value read:
$ cat tst.awk
!($1 in max) || $2>max[$1] { max[$1] = $2 }
END {
PROCINFO["sorted_in"] = "#ind_num_asc"
for (key in max) {
print key, max[key]
}
}
$ awk -f tst.awk file
1 40
2 20
3 34
4 50
5 3
The above uses GNU awk 4.* for PROCINFO["sorted_in"] to control output order, see http://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Array-Traversal.
Considering that your 1st field will be starting from 1 if yes then try one more solution in awk also.
awk '{a[$1]=$2>a[$1]?$2:(a[$2]?a[$2]:$2);} END{for(j=1;j<=length(a);j++){if(a[j]){print j,a[j]}}}' Input_file
Adding one more way for same too here.
sort -k1 Input_file | awk 'prev != $1 && prev{print prev, val;val=prev=""} {val=val>$2?val:$2;prev=$1} END{print prev,val}'

Resources