AWK make a simple subtract and find the minimum value of that - linux

I have this matrix:
{{1,4},{6,8}}
and I want to substract the second value from the first value like: 4-1 and 8-6
and then, comparer both and show what was the minimun value from both, in this case: 8-6=2
All of this using AWK in terminal

You seem a little confused about whether you want to subtract the first from the second or the second from the first. Also, about whether your data is in a file or a variable. However, this should get you started...
If we replace any mad braces or commas with spaces:
echo "{{1,4},{6,8}}" | awk '{gsub(/[{},]/," "); print}'
1 4 6 8
Now we can access the fields as $1 through $4 and do what you want:
echo "{{1,4},{6,8}}" | awk '{gsub(/[{},]/," "); x=$2-$1; y=$4-$3; if(x<y)print x; else print y}'
2
As a, maybe more elegant, alternative suggested by #3161993 in the comments, you could set the field separator to be one or more open or close braces or commas, like this:
awk -F '[,{}]+' '{x=$3-$2; y=$5-$4; if(x<y) print x; else print y}' <<< "{{1,4},{6,8}}"
2
And, as #EdMorton kindly pointed out, it can be made a bit more succinct with a ternary operator like this:
awk -F '[,{}]+' '{x=$3-$2; y=$5-$4; print (x<y ? x : y)}' <<< "{{1,4},{6,8}}"

Related

AWK - string containing required fields

I thought it would be easy to define a string such as "1 2 3" and use it within AWK (GAWK) to extract the required fields, how wrong I have been.
I have tried creating AWK arrays, BASH arrays, splitting, string substitution etc, but could not find any method to use the resulting 'chunks' (ie the column/field numbers) in a print statement.
I believe Akshay Hegde has provided an excellent solution with the get_cols function, here
but it was over 8 years ago, and I am really struggling to work out 'how it works', namely, what this is doing;
s = length(s) ? s OFS $(C[i]) : $(C[i])
I am unable to post a comment asking for clarification due to my lack of reputation (and it is an old post).
Is someone able to explain how the solution works?
NB I don't think I need the sub as I using the following to cleanup (replace all non-numeric characters with a comma, ie seperator, and sort numerically)
Columns=$(echo $Input_string | sed 's/[^0-9]\+/,/g') Columns=$(echo $Columns | xargs -n1 | sort -n | xargs)
(using this string, the awk would be Executed using awk -v cols=$Columns -f test.awk infile in the given solution)
Given the informative answer from #Ed Morton, with a nice worked example, I have attempted to remove the need for a function (and also an additional awk program file). The intention is to have this within a shell script, and I would rather it be self contained, but also, further investigation into 'how it works'.
Fields="1 2 3"
echo $Fields | awk -F "," '{n=split($0,Column," "); for(i=1;i<=n;i++) s = length(s) ? s OFS $(Column[i]) : $(Column[i])}END{print "s="s " arr1="Column[1]" arr2="Column[2]" arr3="Column[3]}'
The results have surprised me (taking note of my Comment to Ed)
s=1 2 3 arr1=1 arr2=2 arr3=3
The above clearly shows the split has worked into the array, but I thought s would include $ for each ternary operator concatenation, ie "$1 $2 $3"
Moreso, I was hoping to append the actual file to the above command, which I have found allows me to use echo $string | awk '{program}' file.name
NB it is a little insulting that my question has been marked as -1 indicating little research effort, as I have spent days trying to work this out.
Taking all the information above, I think s results in "1 2 3", but the print doesn't accept this in the same way as it does as it is called from a function, simply trying to 'print 1 2 3' in relation to the file, which seems to be how all my efforts have ended up.
This really confuses me, as Ed's 'diagonal' example works from command line, indicating that concept of 'print s' is absolutely fine when used with a file name input.
Can anyone suggest how this (example below) can work?
I don't know if using echo pipe and appending the file name is strictly allowed, but it appears to work (?!?!?!)
(failed result)
echo $Fields | awk -F "," '{n=split($0,Column," "); for(i=1;i<=n;i++) s = length(s) ? s OFS $(Column[i]) : $(Column[i])}END{print s}' myfile.txt
This appears to go through myfile.txt and output all lines containing many comma separated values, ie the whole file (I haven't included the values, just for illustration only)
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
what this is doing; s = length(s) ? s OFS $(C[i]) : $(C[i])
You have encountered a ternary operator, it has following syntax
condition ? valueiftrue : valueiffalse
length function, when provided with single argument does return number of characters, in GNU AWK integer 0 is considered false, others integers are considered true, so in this case it is is not empty check. When s is not empty (it might be also not initalized yet, as GNU AWK will assume empty string in such case), it is concatenated with output field separator (OFS, default is space) and C[i]-th field value and assigned to variable s, when s is empty value of C[i]-th field value. Used multiple time this allows building of string of values sheared by OFS, consider following simple example, let say you want to get diagonal of 2D matrix, stored in file.txt with following content
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
then you might do
awk '{s = length(s) ? s OFS $(NR) : $(NR)}END{print s}' file.txt
which will get output
1 7 13 19 25
Explanation: NR is number row, so 1st row $(NR) is 1st field, for 2nd row it is 2nd field, for 3rd it is 3rd field and so on
(tested in GNU Awk 5.0.1)

BASH - Extract Data from String

I have a log that returns thousands of lines of data, I want to extract a few values from that.
In the log there is only one line containing the unquie unit reference so I can grep for that using:
grep "unit=Central-C152" logfile.txt
That produces a line of output similar to the following:
a3cd23e,85d58f5,53f534abef7e7,unit=Central-C152,locale=32325687-8595-9856-1236-12546975,11="School",1="Mr Green",2="Qual",3="SWE",8="report",5="channel",7="reset",6="velum"
The format of the line may change in that the order of the values won't always be in the same position.
I'm trying to work out how to get the value of 2 and 7 in to separate variables.
I had thought about cut on , or = but as the values aren't in a set order I couldn't work out that best way to do it.
I' trying to get:
var state=value of 2 without quotes
var mode=value of 7 without quotes
Can anyone advise on the best way to do this ?
Thanks
Could you please try following to create variable's values.
state=$(awk '/unit=Central-C152/ && match($0,/2=\"[^"]*/){print substr($0,RSTART+3,RLENGTH-3)}' Input_file)
mode=$(awk '/unit=Central-C152/ && match($0,/7=\"[^"]*/){print substr($0,RSTART+3,RLENGTH-3)}' Input_file)
You could print them too by doing following.
echo "$state"
echo "$mode"
Explanation: Adding explanation of command too now.
awk ' ##Starting awk program here.
/unit=Central-C152/ && match($0,/2=\"[^"]*/){ ##Checking condition if a line has string (unit=Central-C152) and using match using REGEX to check from 2 to till "
print substr($0,RSTART+3,RLENGTH-3) ##Printing substring starting from RSTART+3 till RLENGTH-3 characters.
}
' Input_file ##Mentioning Input_file name here.
You are probably better off doing all of the processing in Awk.
awk -F, '/unit=Central-C152/ {
for(i=1;i<=NF;++i)
if($i ~ /^[27]="/) {
b[++k] = $i
sub(/^[27]="/, "", b[k])
sub(/"$/, "", b[k])
gsub(/\\/, "", b[k])
}
print "state " b[1] ", mode " b[2]
}' logfile.txt
This presupposes that the fields always occur in the same order (2 before 7). Maybe you need to change or disable the gsub to remove backslashes in the values.
If you want to do more than print the values, refactoring whatever Bash code you have into Awk is often a better approach than doing this processing in Bash.
Assuming you already have the line in a variable such as with:
line="$(grep 'unit=Central-C152' logfile.txt | head -1)"
You can then simply use the built-in parameter substitution features of bash:
f2=${line#*2=\"} ; f2=${f2%%\"*} ; echo ${f2}
f7=${line#*7=\"} ; f7=${f7%%\"*} ; echo ${f7}
The first command on each line strips off the first part of the line up to and including the <field-number>=". The second command then strips everything off that beyond (and including) the first quote. The third, of course, simply echos the value.
When I run those commands against your input line, I see:
Qual
reset
which is, from what I can see, what you were after.

How to extract specific value using grep and awk?

I am facing a problem to extract a specific value in a .txt file using grep and awk.
I show below an excerpt from the .txt file:
"-
bravais-lattice index = 2
lattice parameter (alat) = 10.0000 a.u.
unit-cell volume = 250.0000 (a.u.)^3
number of atoms/cell = 2
number of atomic types = 1
number of electrons = 28.00
number of Kohn-Sham states= 18
kinetic-energy cutoff = 60.0000 Ry
charge density cutoff = 300.0000 Ry
convergence threshold = 1.0E-09
mixing beta = 0.7000"
I also defined some variable: ELEMENT and lat.
I want to extract the "unit-cell volume" value which is equal to 250.00.
I tried the following to extract the value using grep and awk:
volume=`grep "unit-cell volume" ./latt.10/$ELEMENT.scf.latt_$lat.out | awk '{printf "%15.12f\n",$5}'`
However, when i run the bash file I always get 00.000000 as a result instead of the correct value of 250.00.
Can anyone help, please?
Thanks in advance.
awk '{printf "%15.12f\n",$5}'
You're asking awk to print out the fifth field of the line ($5).
unit-cell volume = 250.0000 (a.u.)^3
1 2 3 4 5
The fifth field is (a.u.)^3, which you are then asking awk to interpret as a number via the %f format code. It's not a number, though (or actually, doesn't start with a number), and when awk is asked to treat a non-numeric string as a number, it uses 0 instead. Thus it prints 0.
Solution: use $4 instead.
By the way, you can skip invoking grep by using awk itself to select the line, e.g.
awk /^ unit-cell/ {...}
The /^ unit-cell/ is a regular expression that matches "unit-cell" (with a leading space) at the beginning of the line. Adjust as necessary if you have other lines that start with unit-cell which you don't want to select.
You never need grep when you're using awk since awk can do anything useful that grep can do. It sounds like this is all you need:
$ awk -F'=' '/unit-cell volume/{printf "%.2f\n",$2}' file
250.00
The above works because when FS is = that means $2 is <spaces>250.000 (a.u.)^3 and when awk is asked to convert a string to a number it strips off leading spaces and anything after the numeric part so that leaves 250.000 to be converted to a number by %.2f.
In the script you posted $5 was failing because the 5th space-separated field in:
$1 $2 $3 $4 $5
<unit-cell> <volume> <=> <250.0000> <(a.u.)^3>
is (a.u.)^3 - you could have just added print $5 to see that.
Since you are processing key-value pairs where the key can have variable amount on space in it, you need to tune that field number ($4, $5 etc.) separately for each record you want to process unless you set the field separator (FS) appropriately to FS=" *= *". Then the key will always be in $1 and value in $2.
Then use split to split the value and unit parts from each other.
Also, you can loose that grep by defining in awk a pattern (or condition, /unit-cell volume/) for that printaction:
$ awk 'BEGIN{FS=" *= *"} /unit-cell volume/{split($2,a," +");print a[1]}' file
250.0000
Explained:
$ awk '
BEGIN { FS=" *= *" } # set appropriate field separator
/unit-cell volume/ { # pattern or condition
split($2,a," +") # split value part to value and possible unit parts
print a[1] # output value part
}' file

Retaining one member of a pair

Good afternoon to all,
I have a file containing two fields, each representing a member of a pair.
I want to retain one member of each pair and it does not matter which member as these are codes for duplicate samples in a study.
Each pair appears twice in my file, with each member of the pair appearing once in either column.
An example of an input file is:
XXX1 XXX7
XXX2 XXX4
abc2 dcb3
XXX7 XXX1
dcb3 abc2
XXX4 XXX2
And an example of the desired output would be
XXX1
XXX2
abc2
How might this be accomplished in bash? Thank you.
Here is a combination of GNU awk, cut and sort, store the scipt as duplicatePairs.awk:
{ if ( $1 < $2) print $1, $2
else print $2, $1
}
and run it like this: awk -f duplicatePairs.awk your_file | sort -u | cut -d" " -f1
The if sorts the pairs such that a line with x,y and a line with y,x will be printed the same. Then sort -u can remove the duplicate lines. And the cut selects the first column.
With a slightly larger awk script, we can solve the requirements "awk-only":
{
smallest = $1;
if ( $1 > $2) {
smallest = $2
}
if( !(smallest in seen) ) {
seen [ smallest ] = 1
print smallest
}
}
Run it like this: awk -f duplicatePairs.awk your_file
While the answer posted by Lars above works very well I would like to suggest an alternative, just in case someone stumbles upon this problem.
I had previously used awk '!seen[$2,$1]++ {print $1}' to the same result. I didn't realize it had worked since the number of lines in my file wasn't halved. This turned out to be because of some wrong assumptions I made about my data.

sed - pass match to external command

I have written a little script using sed to transform this:
kaefert#Ultrablech ~ $ cat /sys/class/power_supply/BAT0/uevent
POWER_SUPPLY_NAME=BAT0
POWER_SUPPLY_STATUS=Full
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_CYCLE_COUNT=0
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=7400000
POWER_SUPPLY_VOLTAGE_NOW=8370000
POWER_SUPPLY_POWER_NOW=0
POWER_SUPPLY_ENERGY_FULL_DESIGN=45640000
POWER_SUPPLY_ENERGY_FULL=44541000
POWER_SUPPLY_ENERGY_NOW=44541000
POWER_SUPPLY_MODEL_NAME=UX32-65
POWER_SUPPLY_MANUFACTURER=ASUSTeK
POWER_SUPPLY_SERIAL_NUMBER=
into a csv file format like this:
kaefert#Ultrablech ~ $ Documents/Asus\ Zenbook\ UX32VD/power_to_csv.sh
"date";"status";"voltage µV";"power µW";"energy full µWh";"energy now µWh"
2012-07-30 11:29:01;"Full";8369000;0;44541000;44541000
2012-07-30 11:29:02;"Full";8369000;0;44541000;44541000
2012-07-30 11:29:04;"Full";8369000;0;44541000;44541000
... (in a loop)
What I would like now is to divide each of those numbers by 1.000.000 so that they don't represent µV but V and W instead of µW, so that they are easily interpretable on a quick glance. Of course I could do this manually afterwards once I've opened this csv inside libre office calc, but I would like to automatize it.
So what I found is, that I can call external programs in between sed, like this:
...
s/\nPOWER_SUPPLY_PRESENT=1\nPOWER_SUPPLY_TECHNOLOGY=Li-ion\nPOWER_SUPPLY_CYCLE_COUNT=0\nPOWER_SUPPLY_VOLTAGE_MIN_DESIGN=7400000\nPOWER_SUPPLY_VOLTAGE_NOW=\([0-9]\{1,\}\)/";'`echo 0`'\1/
and that I could get values like I want by something like this:
echo "scale=6;3094030/1000000" | bc | sed 's/0\{1,\}$//'
But the problem now is, how do I pass my match "\1" into the external command?
If you are interested in looking at the full script, you'll find it there:
http://koega.no-ip.org/mediawiki/index.php/Battery_info_to_csv
if your sed is GNU sed. you can use 'e' to pass matched group to external command/tools within sed command.
an example might be helpful to make it clear:
say, you have a problem:
you have a string "120+20foobar" now you want to get the calculation result of 120+20 part, and replace "oo" to "xx" in "foobar" part.
Note that this example is not for solving the problem above, just for
showing the sed 'e' usage
so you could make 120+20 in the first match group, and rest in 2nd group, then pass two groups to different command/tools and then get the result. like:
kent$ echo "100+20foobar"|sed -r 's#([0-9+]*)(.*)#echo \1 \|bc\;echo \2 \| sed "s/oo/xx/g"#ge'
120
fxxbar
in this way, you could nest many seds one in another one, till you get lost. :D
As sed doesn't do arithmetic on its own I would recommend using awk for something like this, e.g. to divide 3rd, 5th and 6th field by a million do something like this:
awk -F';' -v OFS=';' '
NR == 1
NR != 1 {
$3 /= 1e6
$5 /= 1e6
$6 /= 1e6
print
}'
Explanation
-F';' and -v OFS=';' specify the input and output field separator.
NR == 1 pass first line through without change.
NR != 1 if it is not the first line, divide and print.
To divide by 1,000,000 directly, you do so :
Q='3094030/1000000'
sed ':r /^[[:digit:]]\{7\}/{s$\([[:digit:]]*\)\([[:digit:]]\{6\}\)/1000000$\1.\2$;p;d};s:^:0:;br;d'

Resources