How to format decimal space using awk in linux - linux

original file :
a|||a 2 0.111111
a|||book 1 0.0555556
a|||is 2 0.111111
now i need to control third columns with 6 decimal space
after i tried awk {'print $1,$2; printf "%.6f\t",$3'}
but the output is not what I want
result :
a|||a 2
0.111111 a|||book 1
0.055556 a|||is 2
that's weird , how can I do that will just modify third columns

Your print() is adding a newline character. Include your third field inside it, but formatted. Try with sprintf() function, like:
awk '{print $1,$2, sprintf("%.6f", $3)}' infile
That yields:
a|||a 2 0.111111
a|||book 1 0.055556
a|||is 2 0.111111

Print adds a newline on the end of printed strings, whereas printf by default doesn't. This means a newline is added after every second field and none is added after the third.
You can use printf for the whole string and manually add a newline.
Also I'm not sure why you are adding a tab to the end of the lines, so i removed that
awk '{printf "%s %d %.6f\n",$1,$2,$3}' file
a|||a 2 0.111111
a|||book 1 0.055556
a|||is 2 0.111111

Related

How can a Linux cpuset be iterated over with shell?

Linux's sys filesystem represents sets of CPU ids with the syntax:
0,2,8: Set of CPUs containing 0, 2 and 8.
4-6: Set of CPUs containing 4, 5 and 6.
Both syntaxes can be mixed and matched, for example: 0,2,4-6,8
For example, running cat /sys/devices/system/cpu/online prints 0-3 on my machine which means CPUs 0, 1, 2 and 3 are online.
The problem is the above syntax is difficult to iterate over using a for loop in a shell script. How can the above syntax be converted to one more conventional such as 0 2 4 5 6 8?
Try:
$ echo 0,2,4-6,8 | awk '/-/{for (i=$1; i<=$2; i++)printf "%s%s",i,ORS;next} 1' ORS=' ' RS=, FS=-
0 2 4 5 6 8
This can be used in a loop as follows:
for n in $(echo 0,2,4-6,8 | awk '/-/{for (i=$1; i<=$2; i++)printf "%s%s",i,ORS;next} 1' RS=, FS=-)
do
echo cpu="$n"
done
Which produces the output:
cpu=0
cpu=2
cpu=4
cpu=5
cpu=6
cpu=8
Or like:
printf "%s" 0,2,4-6,8 | awk '/-/{for (i=$1; i<=$2; i++)printf "%s%s",i,ORS;next} 1' RS=, FS=- | while read n
do
echo cpu="$n"
done
Which also produces:
cpu=0
cpu=2
cpu=4
cpu=5
cpu=6
cpu=8
How it works
The awk command works as follows:
RS=,
This tells awk to use , as the record separator.
If, for example, the input is 0,2,4-6,8, then awk will see four records: 0 and 2 and 4-6 and 8.
FS=-
This tells awk to use - as the field separator.
With FS set this way and if, for example, the input record consists of 2-4, then awk will see 2 as the first field and 4 as the second field.
/-/{for (i=$1; i<=$2; i++)printf "%s%s",i,ORS;next}
For any record that contains -, we print out each number starting with the value of the first field, $1, and ending with the value of the second field, $2. Each such number is followed by the Output Record Separator, ORS. By default, ORS is a newline character. For some of the examples above, we set ORS to a blank.
After we have printed these numbers, we skip the rest of the commands and jump to the next record.
1
If we get here, then the record did not contain - and we print it out as is. 1 is awk's shorthand for print-the-line.
A Perl one:
echo "0,2,4-6,8" | perl -lpe 's/(\d+)-(\d+)/{$1..$2}/g; $_="echo {$_}"' | bash
Just convert the original string into echo {0,2,{4..6},8} and let bash 'brace expansion' to interpolate it.
eval echo $(cat /sys/devices/system/cpu/online | sed 's/\([[:digit:]]\+\)-\([[:digit:]]\+\)/$(seq \1 \2)/g' | tr , ' ')
Explanation:
cat /sys/devices/system/cpu/online reads the file from sysfs. This can be changed to any other file such as offline.
The output is piped through the substitution s/\([[:digit:]]\+\)-\([[:digit:]]\+\)/$(seq \1 \2)/g. This matches something like 4-6 and replaces it with $(seq 4 6).
tr , ' ' replaces all commas with spaces.
At this point, the input 0,2,4-6,8 is transformed to 0 2 $(seq 4 6) 8. The final step is to eval this sequence to get 0 2 4 5 6 8.
The example echo's the output. Alternatively, it can be written to a variable or used in a for loop.

Extract substring from first column

I have a large text file with 2 columns. The first column is large and complicated, but contains a name="..." portion. The second column is just a number.
How can I produce a text file such that the first column contains ONLY the name, but the second column stays the same and shows the number? Basically, I want to extract a substring from the first column only AND have the 2nd column stay unaltered.
Sample data:
application{id="1821", name="app-name_01"} 0
application{id="1822", name="myapp-02", optionalFlag="false"} 1
application{id="1823", optionalFlag="false", name="app_name_public"} 3
...
So the result file would be something like this
app-name_01 0
myapp-02 1
app_name_public 3
...
If your actual Input_file is same as the shown sample then following code may help you in same.
awk '{sub(/.*name=\"/,"");sub(/\".* /," ")} 1' Input_file
Output will be as follows.
app-name_01 0
myapp-02 1
app_name_public 3
Using GNU awk
$ awk 'match($0,/name="([^"]*)"/,a){print a[1],$NF}' infile
app-name_01 0
myapp-02 1
app_name_public 3
Non-Gawk
awk 'match($0,/name="([^"]*)"/){t=substr($0,RSTART,RLENGTH);gsub(/name=|"/,"",t);print t,$NF}' infile
app-name_01 0
myapp-02 1
app_name_public 3
Input:
$ cat infile
application{id="1821", name="app-name_01"} 0
application{id="1822", name="myapp-02", optionalFlag="false"} 1
application{id="1823", optionalFlag="false", name="app_name_public"} 3
...
Here's a sed solution:
sed -r 's/.*name="([^"]+).* ([0-9]+)$/\1 \2/g' Input_file
Explanation:
With the parantheses your store in groups what's inbetween.
First group is everything after name=" till the first ". [^"] means "not a double-quote".
Second group is simply "one or more numbers at the end of the line preceeded with a space".

Replace every n'th occurrence in huge line in a loop

I have this line for example:
1,2,3,4,5,6,7,8,9,10
I want to insert a newline (\n) every 2nd occurrence of "," (replace the 2nd, with newline) .
This might work for you (GNU sed):
sed 's/,/\n/2;P;D' file
If I understand what you're trying to do correctly, then
echo '1,2,3,4,5,6,7,8,9,10' | sed 's/\([^,]*,[^,]*\),/\1\n/g'
seems like the most straightforward way. \([^,]*,[^,]*\) will capture 1,2, 3,4, and so forth, and the commas between them are replaced with newlines through the usual s///g. This will print
1,2
3,4
5,6
7,8
9,10
Can't add comment to wintermutes answer but it doesn't need the first , section as it will have to have had a previous field to be comma separated.
sed 's/\(,[^,]*\),/\1\n/g'
Will work the same
Also I'll add another alternative( albeit worse and leaves a trailing newline)
echo "1,2,3,4,5,6,7,8,9,10" | xargs -d"," -n2 | tr ' ' ','
I would use awk to do this:
$ awk -F, '{ for (i=1; i<=NF; ++i) printf "%s%s", $i, (i%2?FS:RS) }' file
1,2
3,4
5,6
7,8
9,10
It loops through each field, printing each one followed by either the field separator (defined as a comma) or the record separator (a newline) depending on the value of i%2.
It's slightly longer than the sed versions presented by others, although one nice thing about it is that you can alter the number of columns per line easily by changing the 2 to whatever value you like.
To avoid a trailing comma after the last field in the case where the number of fields isn't evenly divisible, you can change the ternary to i<NF&&i%2?FS:RS.

Removing last column from rows that have three columns using bash

I have a file that contains several lines of data. Some lines contain three columns, but most contain only two. All lines are single-tab separated. For those that contain three columns, the third column is typically redundant and contains the same data as the second so I'd like to remove it.
I imagine awk or cut would be appropriate, but I'm drawing a blank on how to test the row for three columns so my script will only work on those rows. I know awk is a very powerful language with logic and whatnot built into it, I'm just not that strong with it.
I looked at a similar question, but I'm not sure what is going on with the awk answer. Should the -4 be -1 since I only want to remove one column? What about if the row has two columns; will it remove the second even though I don't want to do anything?
I modified it to what I think it would be:
awk -F"\t" -v OFS="\t" '{ for (i=1;i<=NF-4;i++){ print $i }}'
But when I run it (with the file) nothing happens. If I change NF-1 or NF-2 I get some output, but it only a handful of lines and only the first column.
Can anyone clue me into what I should be doing?
If you just want to remove the third column, you could just print the first and the second:
awk -F '\t' '{print $1 "\t" $2}'
And it's similar to cut:
cut -f 1,2
The awk variable NF gives you the number for fields. So an expression like this should work for you.
awk -F, 'NF == 3 {print $1 "," $2} NF != 3 {print $0}'
Running it on an input file like so
a,b,c
x,y
u,v,w
l,m
gives me
$ cat test | awk -F, 'NF == 3 {print $1 "," $2} NF != 3 {print $0}'
a,b
x,y
u,v
l,m
This might work for you (GNU sed):
sed 's/\t[^\t]*//2g' file
Restricts the file to two columns.
awk 'NF==3{print $1"\t"$2}NF==2{print}' your_file
Testde below:
> cat temp
1 2
3 4 5
6 7
8 9 10
>
> awk 'NF==3{print $1"\t"$2}NF==2{print}' temp
1 2
3 4
6 7
8 9
>
or in a much more simplere way in awk:
awk 'NF==3{print $1"\t"$2}NF==2' your_file
Or you can also go with perl:
perl -lane 'print "$F[0]\t$F[1]"' your_file

Separate comma delimited cells to new rows with shell script

I have a table with comma delimited columns and I want to separate the comma delimited values in my specified column to new rows. For example, the given table is
Name Start Name2
A 1,2 X,a
B 5 Y,b
C 6,7,8 Z,c
And I need to separate the comma delimited values in column 2 to get the table below
Name Start Name2
A 1 X,a
A 2 X,a
B 5 Y,b
C 6 Z,c
C 7 Z,c
C 8 Z,c
I am wondering if there is any solution with shell script, so that I can create a workflow pipe.
Note: the original table may contain more than 3 columns.
Assuming the format of your input and output does not change:
awk 'BEGIN{FS="[ ,]"} {print $1, $2, $NF; print $1, $3, $NF}' input_file
Input:
input_file:
A 1,2 X
B 5,6 Y
Output:
A 1 X
A 2 X
B 5 Y
B 6 Y
Explanation:
awk: invoke awk, a tool for manipulating lines (records) and fields
'...': content enclosed by single-quotes are supplied to awk as instructions
'BEGIN{FS="[ ,]"}: before reading any lines, tell awk to use both space and comma as delimiters; FS stands for Field Separator.
{print $1, $2, $NF; print $1, $3, $NF}: For each input line read, print the 1st, 2nd and last field on one line, and then print the 1st, 3rd, and last field on the next line. NF stands for Number of Fields, so $NF is the last field.
input_file: supply the name of the input file to awk as an argument.
In response to updated input format:
awk 'BEGIN{FS="[ ,]"} {print $1, $2, $4","$5; print $1, $3, $4","$5}' input_file
After Runner's modification of the original question another approach might look like this:
#!/bin/sh
# Usage $0 <file> <column>
#
FILE="${1}"
COL="${2}"
# tokens separated by linebreaks
IFS="
"
for LINE in `cat ${FILE}`; do
# get number of columns
COLS="`echo ${LINE} | awk '{print NF}'`"
# get actual field by COL, this contains the keys to be splitted into individual lines
# replace comma with newline to "reuse" newline field separator in IFS
KEYS="`echo ${LINE} | cut -d' ' -f${COL}-${COL} | tr ',' '\n'`"
COLB=$(( ${COL} - 1 ))
COLA=$(( ${COL} + 1 ))
# get text from columns before and after actual field
if [ ${COLB} -gt 0 ]; then
BEFORE="`echo ${LINE} | cut -d' ' -f1-${COLB}` "
else
BEFORE=""
fi
AFTER=" `echo ${LINE} | cut -d' ' -f${COLA}-`"
# echo "-A: $COLA ($AFTER) | B: $COLB ($BEFORE)-"
# iterate keys and re-build original line
for KEY in ${KEYS}; do
echo "${BEFORE}${KEY}${AFTER}"
done
done
With this shell file you might do what you want. This will split column 2 into multiple lines.
./script.sh input.txt 2
If you'd like to pass inputs though standard input using pipes (e.g. to split multiple columns in one go) you could change the 6. line to:
if [ "${1}" == "-" ]; then
FILE="/dev/stdin"
else
FILE="${1}"
fi
And run it this way:
./script.sh input.txt 1 | ./script.sh - 2 | ./script.sh - 3
Note that cut is very sensitiv about the field separators. Soif the line starts with a space character, column 1 would be "" (empty). If the fields were separated by amixture of spaces and tabs this script would have other issues too. In this case (as explained above) filtering the input resource (so that fields are only separated by one space character) should do it. If this is not possible or the data in each column contains space characters too, the script might get more complicated.

Resources