Do some calculation in a text file in shell - linux

I have a text file:
$cat ifile.txt
this is a text file
assign x to 9 and y to 10.0702
define f(x)=x+y
I would like to disable the original line and divide the x-value by 2 and multiply the y-value by 2
My desired output is
$cat ofile.txt
this is a text file
#assign x to 9 and y to 10.0702
assign x to 5 and y to 20.1404
define f(x)=x+y
Here 5 is calculate from 9/2 and rounded to the next integer and 20.14 is calculated from 10.07x2 and not rounded
I am thinking of the following way, but can't write a script.
if [ line contains "assign x to" ]; then new_x_value=[next word]/2
if [ line contains "and y to" ]; then new_y_value=[next word]x2
if [ line contains "assign x to" ];
then disable it and add a line "assign x to new_x_value and y to new_y_value"

Would you please try the following:
#!/bin/bash
pat="(assign x to )([[:digit:]]+)( and y to )([[:digit:].]+)"
while IFS= read -r line; do
if [[ $line =~ $pat ]]; then
echo "#$line"
x2=$(echo "(${BASH_REMATCH[2]} + 1) / 2" | bc)
y2=$(echo "${BASH_REMATCH[4]} * 2" | bc)
echo "${BASH_REMATCH[1]}$x2${BASH_REMATCH[3]}$y2"
else
echo "$line"
fi
done < ifile.txt > ofile.txt
Output:
this is a text file
#assign x to 9 and y to 10.0702
assign x to 5 and y to 20.1404
define f(x)=x+y
The regex (assign x to )([[:digit:]]+)( and y to )([[:digit:].]+) matches
a literal string, followed by digits, followed by a literal string,
and followed by digits including decimal point.
The bc command (${BASH_REMATCH[2]} + 1) / 2 caclulates the ceiling
value of the input divided by 2.
The next bc command ${BASH_REMATCH[4]} * 2 multiplies the input by 2.
The reason I have picked bash is just because it supports back reference in regex and is easier to parse and reuse the input parameters than awk. As often pointed out, bash is not suitable for processing large files due to the performance reason. If you plan to large / multiple files, it will be recommended to use other languages like perl.
With perl you can say:
perl -pe 's|(assign x to )([0-9]+)( and y to )([0-9.]+)|
"#$&\n" . $1 . int(($2 + 1) / 2) . $3 . $4 * 2|ge' ifile.txt > ofile.txt
[EDIT]
If your ifile.txt looks like:
this is a text file
assign x to 9 and y to 10.0702 45
define f(x)=x+y
There are more than one space before the numbers.
One more value exists at the end (after whitespaces).
Then please try the following instead:
pat="(assign x to +)([[:digit:]]+)( and y to +)([[:digit:].]+)( +)([[:digit:].]+)"
while IFS= read -r line; do
if [[ $line =~ $pat ]]; then
echo "#$line"
x2=$(echo "(${BASH_REMATCH[2]} + 1) / 2" | bc)
y2=$(echo "${BASH_REMATCH[4]} * 2" | bc)
y3=$(echo "${BASH_REMATCH[6]} * 2" | bc)
echo "${BASH_REMATCH[1]}$x2${BASH_REMATCH[3]}$y2${BASH_REMATCH[5]}$y3"
else
echo "$line"
fi
done < ifile.txt > ofile.txt
Result:
this is a text file
#assign x to 9 and y to 10.0702 45
assign x to 5 and y to 20.1404 90
define f(x)=x+y
The plus sign after a whitespace is a regex quantifier and defines the number of repetition. In this case it matches one or more whitespace(s).

One in awk:
awk '
/assign/ { # when assign met in record
for(i=1;i<=NF-1;i++) # iterate from the beginning
if($i=="to" && $(i-1)=="x") # if to x
$(i+1)=((v=$(i+1)/2)>(u=int(v))?u+1:u) # ceil of division
else if($i=="to" && $(i-1)=="y") # if to y
$(i+1)*=2 # multiply by 2
}1' file # output
Output:
this is a text file
assign x to 5 and y to 20.1404
define f(x)=x+y
Sanity checking of the ceiling calculation left as homework...

awk '{if(match($0,/^assign/)){b=$0;split($0,a," ");a[8]=a[8]/2;a[4]=a[4]/2; for (x in a) {c = a[x] " " c; $0 = "#" b "\n" c } } { print }}'
Demo :
:>awk ' { if(match ($0, /^assign/)) {b=$0;split($0,a," ");a[8]=a[8]/2; a[4]=a[4]/2; for (x in a) {c = a[x] " " c; $0 = "#" b "\n" c } } { print }}' <ifile
this is a text file
#assign x to 9 and y to 10.0702
to x assign 5.0351 to y and 4.5
define f(x)=x+y
:>
Explanation:
awk ' {
if(match ($0, /^assign/)) <--- $0 is whole input record. ^ is start of line.
We are checking if record is starting with "assign"
{b=$0; <-- Assign input value to variable b
split($0,a," "); <-- Create a array by splitting input record with space as separator
a[8]=a[8]/2; a[4]=a[4]/2; <-- Divide value stored in 8 and 4 index
for (x in a) <-- Loop for getting all values of array
{c = a[x] " " c; <-- Create a variable by concatenating values of a
$0 = "#" b "\n" c <-- Update value of current record. "\n" new line operator
} }
{ print }}'

Related

How to transpose values and output results in new file

My data :
"1,2,3,4,5,64,3,9",,,,,1,aine
"2,3,4,5",,,,,3,bb
"3,4,5,6,6,2",,,,,2,ff
I have to transpose values inside "...." delimiter like this : how to transpose values two by two using shell?
and Output the result (2 columns) in a new file with the filename = (last-1) columns digits. I have to transpose for each lines of my input file.
What I would like :
$ ls
1 2 3 4 5 6 7 8
example : cat 1
1 2
3 4
5 64
3 9
cat 2 :
3 4
5 6
6 2
cat 3 :
2 3
4 5
Bonus : If I can get every last words (last columns) as title of new files It would be perfect.
Ok, it took a time but i finally solved your problem with the code below:
#!/bin/bash
while read -r LINE; do
FILE_NAME=$(echo {$LINE##*,,,,,} | cut -d ',' -f 1 | tr -d "\"")
DATA=$(echo ${LINE%%,,,,,*} | tr -d "\"" | tr "," " ")
touch $FILE_NAME
i=1
for num in $DATA ;do
echo -n "$num"
if [[ $(($i%2)) == 0 ]]; then
echo ""
else
echo -n " "
fi
i=$((i+1))
done > $FILE_NAME
done < input.txt
in my solution i imagine that your input should be placed in file input.txt and all of your input lines have ,,,,, as a separator. Works like a charm with your sample input.
Assuming there are no colons in the input (choose a different temporary delimiter if necessary) the first part can be done with:
awk '{s = ""; n = split($2,k,","); for(i = 1; i <= n; i+=2 ) { s = sprintf( "%s%c%s:%s", s, s ? ":" : "", k[i+1], k[i])} $2 = s}1' FS=\" OFS=\" input | sort -t , -k6n | tr : ,
eg:
$ cat input
"1,2,3,4,5,64,3,9",,,,,1,aine
"2,3,4,5",,,,,3,bb
"3,4,5,6,6,2",,,,,2,ff
$ awk '{s = ""; n = split($2,k,","); for(i = 1; i <= n; i+=2 ) { s = sprintf( "%s%c%s:%s", s, s ? ":" : "", k[i+1], k[i])} $2 = s}1' FS=\" OFS=\" input | sort -t , -k6n | tr : ,
"2,1,4,3,64,5,9,3",,,,,1,aine
"4,3,6,5,2,6",,,,,2,ff
"3,2,5,4",,,,,3,bb
But it's not clear why you want to do the first part at all when you can just skip straight to part 2 with:
awk '{n = split($2,k,","); m = split($3, j, ","); fname = j[6];
for( i = 1; i <= n; i+=2 ) printf("%d %d\n", k[i+1], k[i]) > fname}' FS=\" input
My answer can't keep up with the changes to the question! If you are outputting the lines into files, then there is no need to sort on the penultimate column. If you want the filenames to be the final column, it's not clear why you ever mentioned using the penultimate column at all. Just change fname in the above to j[7] to get the final column.

AWK print every other column, starting from the last column (and next to last column) for N interations (print from right to left)

Hopefully someone out there in the world can help me, and anyone else with a similar problem, find a simple solution to capturing data. I have spent hours trying a one liner to solve something I thought was a simple problem involving awk, a csv file, and saving the output as a bash variable. In short here's the nut...
The Missions:
1) To output every other column, starting from the LAST COLUMN, with a specific iteration count.
2) To output every other column, starting from NEXT TO LAST COLUMN, with a specific iteration count.
The Data (file.csv):
#12#SayWhat#2#4#2.25#3#1.5#1#1#1#3.25
#7#Smarty#9#6#5.25#5#4#4#3#2#3.25
#4#IfYouLike#4#1#.2#1#.5#2#1#3#3.75
#3#LaughingHard#8#8#13.75#8#13#6#8.5#4#6
#10#AtFunny#1#3#.2#2#.5#3#3#5#6.5
#8#PunchLines#7#7#10.25#7#10.5#8#11#6#12.75
Desired results for Mission 1:
2#2.25#1.5#1#3.25
9#5.25#4#3#3.25
4#.2#.5#1#3.75
8#13.75#13#8.5#6
1#.2#.5#3#6.5
7#10.25#10.5#11#12.75
Desired results for Mission 2:
SayWhat#4#3#1#1
Smarty#6#5#4#2
IfYouLike#1#1#2#3
LaughingHard#8#8#6#4
AtFunny#3#2#3#5
PunchLines#7#7#8#6
My Attempts:
The closes I have come to solving any of the above problems, is an ugly pipe (which is OK for skinning a cat) for Mission 1. However, it doesn't use any declared iterations (which should be 5). Also, I'm completely lost on solving Mission 2.
Any help to simplify the below and solving Mission 2 will be HELLA appreciated!
outcome=$( awk 'BEGIN {FS = "#"} {for (i = 0; i <= NF; i += 2) printf ("%s%c", $(NF-i), i + 2 <= NF ? "#" : "\n");}' file.csv | sed 's/##.*//g' | awk -F# '{for (i=NF;i>0;i--){printf $i"#"};printf "\n"}' | sed 's/#$//g' | awk -F# '{$1="";print $0}' OFS=# | sed 's/^#//g' );
Also, if doing a loop for a specific number of iterations is helpful in solving this problem, then magic number is 5. Maybe a solution could be a for-loop that is counting from right to left and skipping every other column as 1 iteration, with the starting column declared as an awk variable (Just a thought I have no way of knowing how to do)
Thank you for looking over this problem.
There are certainly more elegant ways to do this, but I am not really an awk person:
Part 1:
awk -F# '{ x = ""; for (f = NF; f > (NF - 5 * 2); f -= 2) { x = x ? $f "#" x : $f ; } print x }' file.csv
Output:
2#2.25#1.5#1#3.25
9#5.25#4#3#3.25
4#.2#.5#1#3.75
8#13.75#13#8.5#6
1#.2#.5#3#6.5
7#10.25#10.5#11#12.75
Part 2:
awk -F# '{ x = ""; for (f = NF - 1; f > (NF - 5 * 2); f -= 2) { x = x ? $f "#" x : $f ; } print x }' file.csv
Output:
SayWhat#4#3#1#1
Smarty#6#5#4#2
IfYouLike#1#1#2#3
LaughingHard#8#8#6#4
AtFunny#3#2#3#5
PunchLines#7#7#8#6
The literal 5 in each of those is your "number of iterations."
Sample data:
$ cat mission.dat
#12#SayWhat#2#4#2.25#3#1.5#1#1#1#3.25
#7#Smarty#9#6#5.25#5#4#4#3#2#3.25
#4#IfYouLike#4#1#.2#1#.5#2#1#3#3.75
#3#LaughingHard#8#8#13.75#8#13#6#8.5#4#6
#10#AtFunny#1#3#.2#2#.5#3#3#5#6.5
#8#PunchLines#7#7#10.25#7#10.5#8#11#6#12.75
One awk solution:
NOTE: OP can add logic to validate the input parameters.
$ cat mission
#!/bin/bash
# format: mission { 1 | 2 } { number_of_fields_to_display }
mission=${1} # assumes user inputs "1" or "2"
offset=$(( mission - 1 )) # subtract one to determine awk/NF offset
iteration_count=${2} # assume for now this is a positive integer
awk -F"#" -v offset=${offset} -v itcnt=${iteration_count} 'BEGIN { OFS=FS }
{ # we will start by counting fields backwards until we run out of fields
# or we hit "itcnt==iteration_count" fields
loopcnt=0
for (i=NF-offset ; i>=0; i-=2) # offset=0 for mission=1; offset=1 for mission=2
{ loopcnt++
if (loopcnt > itcnt)
break
fstart=i # keep track of the field we want to start with
}
# now printing our fields starting with field # "fstart";
# prefix the first printf with a empty string, then each successive
# field is prefixed with OFS=#
pfx = ""
for (i=fstart; i<= NF-offset; i+=2)
{ printf "%s%s",pfx,$i
pfx=OFS
}
# terminate a line of output with a linefeed
printf "\n"
}
' mission.dat
Some test runs:
###### mission #1
# with offset/iteration = 4
$ mission 1 4
2.25#1.5#1#3.25
5.25#4#3#3.25
.2#.5#1#3.75
13.75#13#8.5#6
.2#.5#3#6.5
10.25#10.5#11#12.75
#with offset/iteration = 5
$ mission 1 5
2#2.25#1.5#1#3.25
9#5.25#4#3#3.25
4#.2#.5#1#3.75
8#13.75#13#8.5#6
1#.2#.5#3#6.5
7#10.25#10.5#11#12.75
# with offset/iteration = 6
$ mission 1 6
12#2#2.25#1.5#1#3.25
7#9#5.25#4#3#3.25
4#4#.2#.5#1#3.75
3#8#13.75#13#8.5#6
10#1#.2#.5#3#6.5
8#7#10.25#10.5#11#12.75
###### mission #2
# with offset/iteration = 4
$ mission 2 4
4#3#1#1
6#5#4#2
1#1#2#3
8#8#6#4
3#2#3#5
7#7#8#6
# with offset/iteration = 5
$ mission 2 5
SayWhat#4#3#1#1
Smarty#6#5#4#2
IfYouLike#1#1#2#3
LaughingHard#8#8#6#4
AtFunny#3#2#3#5
PunchLines#7#7#8#6
# with offset/iteration = 6;
# notice we pick up field #1 = empty string so output starts with a '#'
$ mission 2 6
#SayWhat#4#3#1#1
#Smarty#6#5#4#2
#IfYouLike#1#1#2#3
#LaughingHard#8#8#6#4
#AtFunny#3#2#3#5
#PunchLines#7#7#8#6
this is probably not what you're asking but perhaps will give you an idea.
$ awk -F_ -v skip=4 -v endoff=0 '
BEGIN {OFS=FS}
{offset=(NF-endoff)%skip;
for(i=offset;i<=NF-endoff;i+=skip) printf "%s",$i (i>=(NF-endoff)?ORS:OFS)}' file
112_116_120
122_126_130
132_136_140
142_146_150
you specify the number of skips between columns and the end offset as input variables. Here, for last column end offset is set to zero and skip column is 4.
For clarity I used the input file
$ cat file
_111_112_113_114_115_116_117_118_119_120
_121_122_123_124_125_126_127_128_129_130
_131_132_133_134_135_136_137_138_139_140
_141_142_143_144_145_146_147_148_149_150
changing FS for your format should work.

Awk: How do I count occurrences of a string across columns and find the maximum across rows?

I have a problem with my bash script on Linux.
My input looks like this:
input
Karydhs y n y y y n n y n n n y n y n
Markopoulos y y n n n y n y n y y n n n y
name3 y n y n n n n n y y n y n y n
etc...
which y=yes and n=no and that is the results of voting... and now I want with using awk to display the name and the total yes vote of each person (name) and the person that win (got the most y), any ideas?
I do something like this:
awk '{count=0 for (I=1;i<=15;i++) if (a[I]="y") count++} {print $1,count}' filename
Here is a fast (no sort required, no explicit "for" loop), one-pass solution that takes into account the possibility of ties:
awk 'NF==0{next}
{name=$1; $1=""; gsub(/[^y]/,"",$0); l=length($0);
print name, l;
if (mx=="" || mx < l) { mx=l; tie=""; winner=name; }
else if (mx == l) {
tie = 1; winner = winner", "name;
}
}
END {fmt = tie ? "The winners have won %d votes each:\n" :
"The winner has won %d votes:\n";
printf fmt, mx;
print winner;
}'
Output:
Karydhs 7
Markopoulos 7
name3 6
The winners have won 7 votes each:
Karydhs, Markopoulos
NOTE: The program above is presented for readability, but is accepted with the line breaks shown by GNU awk. Certain awks disallow splitting the ternary conditional.
What about this?
awk '{ for (i=2;i<NF;i++) { if ($i=="y") { a[$1" "$i]++} } } END { print "Yes tally"; l=0; for (i in a) { print i,a[i]; if (l>a[i]) { l=l } else { l=a[i];name=i } } split(name,a," "); print "Winner is ",a[1],"with ",l,"votes" } ' f
Yes tally
name3 y 6
Markopoulos y 6
Karydhs y 7
Winner is Karydhs with 7 votes
Here's yet another approach.
{ name=$1; $1=""; votes[name]=length(gensub("[^y]","","g")); }
END {asorti(votes,rank); for (r in rank) print rank[r], votes[rank[r]]; }
It is similar to the answer from #mklement0, but it uses asorti()¹ to sort inside of awk.
name=$1 saves the name from token 1
$1=""; clears token 1, which has the side effect of removing it from $0
votes[name] is an array indexed by the candidate's name
gensub("[^y]","","g") removes everything but 'y's from what's left of $0
and length() counts them
asorti(votes,rank) sorts votes by index into rank; at this point the arrays look like this:
votes rank
[name3] = 6 [1] = Karydhs
[Markopoulos] = 7 [2] = Markopoulos
[Karydhs] = 7 [3] = name3
for (r in rank) print rank[r], votes[rank[r]]; prints the results:
Karydhs 7
Markopoulos 7
name3 6
¹ the asorti() function may not be available in some versions of awk
Alternative two-pass awk
$ awk '{print $1; $1=""}1' votes |
awk -Fy 'NR%2{printf "%s ",$0; next} {print NF-1}' |
sort -k2nr
Karydhs 7
Markopoulos 7
name3 6
A simpler - and POSIX-compliant - awk solution, assisted by sort; note that no winner information (which may apply to multiple lines) is explicitly printed, but the sorting by votes in descending order should make the winner(s) obvious.
awk '{
printf "%s", $1
$1=""
yesCount=gsub("y", "")
printf " %s\n", yesCount
}' file |
sort -t ' ' -k2,2nr
printf "%s", $1 prints the name field only, without a trailing newline.
$1="" clears the 1st field, causing $0, the input line, to be rebuilt so that it contains the vote columns only.
yesCount=gsub("y", "") performs a dummy substitution that takes advantage of the fact that Awk's gsub() function returns the count of replacements performed; in effect, the return value is the number of y values on the line.
printf " %s\n", yesCount then prints the number of yes votes as the second output field and terminates the line.
sort -t ' ' -k2,2,nr then sorts the resulting lines by the second (-k2,2) space-separated (-t ' ') field, numerically (n), in reverse order (r) so that the highest yes-vote counts appear first.

Insert new line if different number appears in a column

I have a column
1
1
1
2
2
2
I would like to insert a blank line when the value in the column changes:
1
1
1
<- blank line
2
2
2
I would recommend using awk:
awk -v i=1 'NR>1 && $i!=p { print "" }{ p=$i } 1' file
On any line after the first, if value of the "i"th column is different to the previous value, print a blank line. Always set the value of p. The 1 at the end evaluates to true, which means that awk prints the line. i can be set to the column number of your choice.
while read L; do [[ "$L" != "$PL" && "$PL" != "" ]] && echo; echo "$L"; PL="$L"; done < file
awk(1) seems like the obvious answer to this problem:
#!/usr/bin/awk -f
BEGIN { prev = "" }
/./ {
if (prev != "" && prev != $1) print ""
print
prev = $1
}
You can also do this with SED:
sed '{N;s/^\(.*\)\n\1$/\1\n\1/;tx;P;s/^.*\n/\n/;P;D;:x;P;D}'
The long version with explanations is:
sed '{
N # read second line; (terminate if there are no more lines)
s/^\(.*\)\n\1$/\1\n\1/ # try to replace two identical lines with themselves
tx # if replacement succeeded then goto label x
P # print the first line
s/^.*\n/\n/ # replace first line by empty line
P # print this empty line
D # delete empty line and proceed with input
:x # label x
P # print first line
D # delete first line and proceed with input
}'
One thing I like about using (GNU) SED (what which is not clear if it is useful to you from your question) is that you can easily apply changes in-place with the -i switch, e.g.
sed -i '{N;s/^\(.*\)\n\1$/\1\n\1/;tx;P;s/^.*\n/\n/;P;D;:x;P;D}' FILE
You could use getline function in Awk to match the current line against the following line:
awk '{f=$1; print; getline}f != $1{print ""}1' file

Is there such commands to merge multiple files in shell?

For example, there're 5 numbers => [1,2,3,4,5] and 3 groups
File1(Group1):
1
3
5
File2(Group2):
3
4
File3(Group3):
1
5
Output (column1: whether in Group1, column2: whether in Group2, column3: whether in Group3 [NA means not..]):
1 NA 1
3 3 NA
NA 4 NA
5 NA 5
Or something like this (+ means in, - means not):
1 + - +
3 + + -
4 - + -
5 + - +
I tried join and merge, but looks like both of them doesn't work well for multiple files.. (for example, 8 files)
You say there's numbers 1-5, but this is, as far as I can see, irrelevant for the output you want. You only use numbers found in your files in the output. This code will do what you want:
use strict;
use warnings;
use feature 'say';
my #hashes;
my %seen;
local $/; # read entire file at once
while (<>) {
my #nums = split; # split file into elements
$seen{$_}++ for #nums; # dedupe elements
push #hashes, { map { $_ => $_ } #nums }; # map into hash
}
my #all = sort { $a <=> $b } keys %seen; # sort deduped elements
# my #all = 1 .. 5; # OR: provide hard-coded list
for my $num (#all) { # for all unique numbers
my #fields;
for my $href (#hashes) { # check each hash
push #fields, $href->{$num} // "NA"; # enter "NA" if not found
}
say join "\t", #fields; # print the fields
}
You may replace the sorted deduped list in #all with just my #all = 1 .. 5 or any other valid list. It will then add lines for those numbers and print out extra "NA" fields for the missing values.
You should also be aware that this relies on the fact that your file contents are numbers, but only as far as it comes to the sorting of the #all array, so if you replace it with your own list, or your own sorting routine, you can use any value.
This script will take an arbitrary number of files and process them. For example:
$ perl script.pl f1.txt f2.txt f3.txt
1 NA 1
3 3 NA
NA 4 NA
5 NA 5
Credit to Brent Stewart for figuring out what the OP meant.
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
my #lines;
my $filecount = 0;
# parse
for my $filename (#ARGV){
open my $fh, '<', $filename;
while( my $line = <$fh> ){
chomp($line);
next unless length $line;
$lines[$line][$filecount]++;
}
close $fh;
}continue{
$filecount++;
}
# print
for my $linenum ( 1..$#lines ){
my $line = $lines[$linenum];
next unless $line;
print ' ' x (5-length $linenum), $linenum, ' ';
for my $elem( #$line ){
print $elem ? 'X' : ' '
}
print "\n";
}
1 X X
3 XX
4 X
5 X X
For two files, you can easily use join as shown below (assuming file1 and file2 are sorted):
$ join -e NA -o 1.1,2.1 -a 1 -a 2 file1 file2
1 NA
3 3
NA 4
5 NA
It gets more complicated if you have more than two files though.
Here is a brute force grep solution:
#!/bin/bash
files=(file1 file2 file3)
sort -nu "${files[#]}" | while read line; do
for f in "${files[#]}"; do
if grep -qFx "$line" "$f"; then
printf "${line}\t"
else
printf "NA\t"
fi
done
printf "\n"
done
Output:
1 NA 1
3 3 NA
NA 4 NA
5 NA 5
If your input files are monotonically increasing and just consist of a single integer on each line as your input samples suggest, you could simply pre-process the input files and use paste:
for i in file{1,2,3}; do # List input files
awk '{ a += 1; while( $1 > a ) { print "NA"; a += 1 }} 1' $i > $i.out
done
paste file{1,2,3}.out
This leaves the trailing entries in some columns empty. Fixing that is left as an exercise for the reader.

Resources