Using bc as daemon in BASH shell from awk

Using bc as daemon in BASH shell from awk - linux

# mkfifo inp out
# bc -ql <inp >out &
[1] 6766
#
# exec 3>inp 4<out
# echo "scale=3; 4/5;" >&3
# read a <&4; echo $a
.800
#
# awk ' BEGIN { printf("4/5\n") >"/dev/fd/3"; exit 1;} '
# read a <&4; echo $a
.800
#
# awk ' BEGIN { printf("4/5\n") >"/dev/fd/3"; exit 1;} '
# awk ' BEGIN { getline a <"/dev/fd/4"; printf("%s\n", a); } '
^C
In BASH environment I can communicate with bc program using fifo.
But in awk I can write but no read with getline function.
How can I read from "/dev/fd/4" in awk.
My awk version is: mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan
Thanks
Laci
Continued:
I did some further experiment and I summarize my result.
Awk script language suits best for my task,
and I need to use "bc" because I have to count with very long numbers (about 100 digits).
The next two scripts show that using named pipe is faster than unnamed (about 83 times).
1) With unnamed pipe:
# time for((i=6000;i;i--)); do a=`echo "$i/1"|bc -ql`; done
real 0m13.936s
2) With named pipe:
# mkfifo in out
# bc -ql <in >out &
# exec 3>in 4<out
#
# time for((i=500000;i;i--)); do echo "$i/1" >&3; read a <&4; done
real 0m14.391s
3) In the awk environment the bc usage is a bit slower (about 18 times) than in bash but it works this way:
# time awk ' BEGIN {
# for(i=30000;i;i--){
# printf("%d/1\n",i) >"/dev/fd/3";
# system("read a </dev/fd/4; echo $a >tmp_1");
# getline a <"tmp_1"; close("tmp_1");}
# } '
real 0m14.178s
4)What can be the problem when I try to do accordig to "man awk" ? :
# awk ' BEGIN {
# for(i=4;i;i--){
# printf("%d/1\n",i) >"/dev/fd/3"; system("sleep .1");
# "read a </dev/fd/4; echo $a" | getline a ;print a;}
# } '
4.000
4.000
4.000
4.000
The above "awk" script was able to pick up only the first number from the pipe.
The other three numbers remained in the pipe.
These will be visible when I'm reading the pipe after the above awk script.
# for((;;)); do read a </dev/fd/4; echo $a; done
3.000
2.000
1.000
Thanks for gawk.

It sounds like you're looking for gawk's co-process ability, see http://www.gnu.org/software/gawk/manual/gawk.html#Getline_002fCoprocess. Given awks support of math functions, though, I wonder why you'd want to use bc...

Try:
mkfifo inp out
bc -l <inp >out &
awk ' BEGIN { printf("4/5\n"); exit 0;} ' > inp
read a < out; echo $a
awk ' BEGIN { printf("4/5\n"); exit 0;} ' > inp
awk ' BEGIN { getline a; printf("%s\n", a); exit 0 } ' < out
rm inp
rm out

Related

How to find list of words (in thousands) in list of tsv files (hundreds), with output as number of match for each string in each file, in linux?

I have hundreds of tsv file with following structure (example):
GH1 123 family1
GH2 23 family2
.
.
.
GH4 45 family4
GH6 34 family6
And i have a text file with list of words (thousands):
GH1
GH2
GH3
.
.
.
GH1000
I want to get output which contain number of each words occurred in each file like this
GH1 GH2 GH3 ... GH1000
filename1 1 1 0... 4
.
.
.
filename2 2 3 1... 0
I try this code but it gives me zero only
for file in *.tsv; do
echo $file >> output.tsv
cat fore.txt | while read line; do
awk -F "\\t" '{print $1}' $file | grep -wc $line >>output.tsv
echo "\\t">>output.tsv;
done ;
done

Use the following script.
Just put sdtout to output.txt file.
#!/bin/bash
while read p; do
echo -n "$p "
done <words.txt
echo ""
for file in *.tsv; do
echo -n "$file = "
while read p; do
COUNT=$(sed 's/$p/$p\n/g' $file | grep -c "$p")
echo -n "$COUNT "
done <words.txt
echo ""
done

Here is a simple Awk script which collects a list like the one you describe.
awk 'BEGIN { printf "\t" }
NR==FNR { a[$1] = n = FNR;
printf "\t%s", $1; next }
FNR==1 {
if(f) { printf "%s", f;
for (i=1; i<=n; i++)
printf "\t%s", 0+b[i] }
printf "\n"
delete b
f = FILENAME }
$1 in a { b[$1]++ }' fore.txt *.tsv /etc/motd
To avoid repeating the big block in END, we add a short sentinel file at the end whose only purpose is to supply a file after the last whose counts will not be reported.
The shell's while read loop is slow and inefficient and somewhat error-prone (you basically always want read -r and handling incomplete text files is hairy); in addition, the brute-force method will require reading the word file once per iteration, which incurs a heavy I/O penalty.

How to efficiently loop through the lines of a file in Bash?

I have a file example.txt with about 3000 lines with a string in each line. A small file example would be:
>cat example.txt
saudifh
sometestPOIFJEJ
sometextASLKJND
saudifh
sometextASLKJND
IHFEW
foo
bar
I want to check all repeated lines in this file and output them. The desired output would be:
>checkRepetitions.sh
found two equal lines: index1=1 , index2=4 , value=saudifh
found two equal lines: index1=3 , index2=5 , value=sometextASLKJND
I made a script checkRepetions.sh:
#!bin/bash
size=$(cat example.txt | wc -l)
for i in $(seq 1 $size); do
i_next=$((i+1))
line1=$(cat example.txt | head -n$i | tail -n1)
for j in $(seq $i_next $size); do
line2=$(cat example.txt | head -n$j | tail -n1)
if [ "$line1" = "$line2" ]; then
echo "found two equal lines: index1=$i , index2=$j , value=$line1"
fi
done
done
However this script is very slow, it takes more than 10 minutes to run. In python it takes less than 5 seconds... I tried to store the file in memory by doing lines=$(cat example.txt) and doing line1=$(cat $lines | cut -d',' -f$i) but this is still very slow...

When you do not want to use awk (a good tool for the job, parsing the input only once),
you can run through the lines several times. Sorting is expensive, but this solution avoids the loops you tried.
grep -Fnxf <(uniq -d <(sort example.txt)) example.txt
With uniq -d <(sort example.txt) you find all lines that occur more than once. Next grep will search for these (option -f) complete (-x) lines without regular expressions (-F) and show the line it occurs (-n).

See why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons why your script is so slow.
$ cat tst.awk
{ val2hits[$0] = val2hits[$0] FS NR }
END {
for (val in val2hits) {
numHits = split(val2hits[val],hits)
if ( numHits > 1 ) {
printf "found %d equal lines:", numHits
for ( hitNr=1; hitNr<=numHits; hitNr++ ) {
printf " index%d=%d ,", hitNr, hits[hitNr]
}
print " value=" val
}
}
}
$ awk -f tst.awk file
found 2 equal lines: index1=1 , index2=4 , value=saudifh
found 2 equal lines: index1=3 , index2=5 , value=sometextASLKJND
To give you an idea of the performance difference using a bash script that's written to be as efficient as possible and an equivalent awk script:
bash:
$ cat tst.sh
#!/bin/bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: bash 4.0 required" >&2; exit 1;; esac
# initialize an associative array, mapping each string to the last line it was seen on
declare -A lines=( )
lineNum=0
while IFS= read -r line; do
(( ++lineNum ))
if [[ ${lines[$line]} ]]; then
printf 'Content previously seen on line %s also seen on line %s: %s\n' \
"${lines[$line]}" "$lineNum" "$line"
fi
lines[$line]=$lineNum
done < "$1"
$ time ./tst.sh file100k > ou.sh
real 0m15.631s
user 0m13.806s
sys 0m1.029s
awk:
$ cat tst.awk
lines[$0] {
printf "Content previously seen on line %s also seen on line %s: %s\n", \
lines[$0], NR, $0
}
{ lines[$0]=NR }
$ time awk -f tst.awk file100k > ou.awk
real 0m0.234s
user 0m0.218s
sys 0m0.016s
There are no differences in the output of both scripts:
$ diff ou.sh ou.awk
$
The above is using 3rd-run timing to avoid caching issues and being tested against a file generated by the following awk script:
awk 'BEGIN{for (i=1; i<=10000; i++) for (j=1; j<=10; j++) print j}' > file100k
When the input file had zero duplicate lines (generated by seq 100000 > nodups100k) the bash script executed in about the same amount of time as it did above while the awk script executed much faster than it did above:
$ time ./tst.sh nodups100k > ou.sh
real 0m15.179s
user 0m13.322s
sys 0m1.278s
$ time awk -f tst.awk nodups100k > ou.awk
real 0m0.078s
user 0m0.046s
sys 0m0.015s

To demonstrate a relatively efficient (within the limits of the language and runtime) native-bash approach, which you can see running in an online interpreter at https://ideone.com/iFpJr7:
#!/bin/bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: bash 4.0 required" >&2; exit 1;; esac
# initialize an associative array, mapping each string to the last line it was seen on
declare -A lines=( )
lineNum=0
while IFS= read -r line; do
lineNum=$(( lineNum + 1 ))
if [[ ${lines[$line]} ]]; then
printf 'found two equal lines: index1=%s, index2=%s, value=%s\n' \
"${lines[$line]}" "$lineNum" "$line"
fi
lines[$line]=$lineNum
done <example.txt
Note the use of while read to iterate line-by-line, as described in BashFAQ #1: How can I read a file line-by-line (or field-by-field)?; this permits us to open the file only once and read through it without needing any command substitutions (which fork off subshells) or external commands (which need to be individually started up by the operating system every time they're invoked, and are likewise expensive).
The other part of the improvement here is that we're reading the whole file only once -- implementing an O(n) algorithm -- as opposed to running O(n^2) comparisons as the original code did.

How to get average CPU temperature from bash?

How can I get the average CPU temperature from bash on Linux? Preferably in degrees Fahrenheit. The script should be able to handle different numbers of CPUs.

You do it like so:
Installation
sudo apt install lm-sensors
sudo sensors-detect --auto
get_cpu_temp.sh
#!/bin/bash
# 1. get temperature
## a. split response
## Core 0: +143.6°F (high = +186.8°F, crit = +212.0°F)
IFS=')' read -ra core_temp_arr <<< $(sensors -f | grep '^Core\s[[:digit:]]\+:') #echo "${core_temp_arr[0]}"
## b. find cpu usage
total_cpu_temp=0
index=0
for i in "${core_temp_arr[#]}"; do :
temp=$(echo $i | sed -n 's/°F.*//; s/.*[+-]//; p; q')
let index++
total_cpu_temp=$(echo "$total_cpu_temp + $temp" | bc)
done
avg_cpu_temp=$(echo "scale=2; $total_cpu_temp / $index" | bc)
## c. build entry
temp_status="CPU: $avg_cpu_temp F"
echo $temp_status
exit 0
output
CPU: 135.50 F

You can also read CPU temperatures directly from sysfs (path may differ from machine/OS to machine/OS though):
Bash:
temp_file=$(mktemp -t "temp-"$(date +'%Y%m%d#%H:%M:%S')"-XXXXXX")
ls $temp_file
while true; do
cat /sys/class/thermal/thermal_zone*/temp | tr '\n' ' ' >> "$temp_file"
printf "\n" >> $temp_file
sleep 2
done
If you're a fish user, you may add a function to your config dir, let's say: ~/.config/fish/functions/temp.fish
Fish
function temp
set temp_file (mktemp -t "temp-"(date +'%Y%m%d#%H:%M:%S')"-XXXXXX")
ls $temp_file
while true
cat /sys/class/thermal/thermal_zone*/temp | tr '\n' ' ' >> "$temp_file"
printf "\n" >> $temp_file
sleep 2
end
end
Example

Bash script to isolate words in a file

Here is my initial input data to be extracted:
david ex1=10 ex2=12 quiz1=5 quiz2=9 exam=99
judith ex1=8 ex2=16 quiz1=4 quiz2=10 exam=90
sam ex1=8 quiz1=5 quiz2=11 exam=85
song ex1=8 ex2=20 quiz2=11 exam=87
How do extract each word to be formatted in this way:
david
ex1=10
ex2=12
etc...
As I eventually want to have output like this:
david 12 99
judith 16 90
sam 0 85
song 20 87
when I run my program with the commands:
./marks ex2 exam < file

Supposed your input file is named input.txt, just replace space char by new line char using tr command line tool:
tr ' ' '\n' < input.txt
For your second request, you may have to extract specific field on each line, so the cut and awk commands may be useful (note that my example is certainly improvable):
while read p; do
echo -n "$(echo $p | cut -d ' ' -f1) " # name
echo -n "$(echo $p | cut -d ' ' -f3 | cut -d '=' -f2) " # ex2 val
echo -n $(echo $p | awk -F"exam=" '{ print $2 }') # exam val
echo
done < input.txt

This script does what you want:
#!/bin/bash
a=$#
awk -v a="$a" -F'[[:space:]=]+' '
BEGIN {
split(a, b) # split field names into array b
}
{
printf "%s ", $1 # print first field
for (i in b) { # loop through fields to search for
f = 0 # unset "found" flag
for (j=2; j<=NF; j+=2) # loop though remaining fields, 2 at a time
if ($j == b[i]) { # if field matches value in array
printf "%s ",$(j+1)
f = 1 # set "found" flag
}
if (!f) printf "0 " # add 0 if field not found
}
print "" # add newline
}' file
Testing it out
$ ./script.sh ex2 exam
david 12 99
judith 16 90
sam 0 85
song 20 87

wput speed result as pass or fail

I'm using the following to output the result of an upload speed test
wput 10MB.zip ftp://user:pass#host 2>&1 | grep '\([0-9.]\+[KM]/s\)'
which returns
18:14:38 (10MB.zip) - '10.49M/s' [10485760]
Transfered 10,485,760 bytes in 1 file at 10.23M/s
I'd like to have the result 10.23M/s (i.e. the speed) echoed, and a comparison result:
if speed=>5 MB/s then echo "pass" else echo "fail"
So, the final output would be:
PASS 7 M/s
23/01/2013
ideally i'd like it all done on a single line so far i've got
wput 100M.bin ftp://test:test#0.0.0.0 2>&1 | grep -o '\([0-9.]\+[KM]/s\)$' | awk ' { if (($1 > 5) && ($2 == "M/s")) { printf("FAST %s\n ", $0); }}'
however it doesn't output anything if I remove
&& ($2 == "M/s"))
it works but I obviously want to it output above 5M/s and as it is it would still echo fast if it was over 1K/s. Can someone tell me what i've missed.

Using awk:
# Over 5M/s
$ cat pass
18:14:38 (10MB.zip) - '10.49M/s' [10485760]
Transfered 10,485,760 bytes in 1 file at 10.23M/s
$ awk 'END{f="FAIL "$NF;p="PASS "$NF;if($NF~/K\/s/){print f;exit};gsub(/M\/s/,"");print(int($NF)>5?p:f)}' pass
PASS 10.23M/s
# Under 5M/s
$ cat fail
18:14:38 (10MB.zip) - '3.49M/s' [10485760]
Transfered 10,485,760 bytes in 1 file at 3.23M/s
$ awk 'END{f="FAIL "$NF;p="PASS "$NF;if($NF~/K\/s/){print f;exit};gsub(/M\/s/,"");print(int($NF)>5?p:f)}' fail
FAIL 3.23M/s
# Also Handle K/s
$ cat slow
18:14:38 (10MB.zip) - '3.49M/s' [10485760]
Transfered 10,485,760 bytes in 1 file at 8.23K/s
$ awk 'END{f="FAIL "$NF;p="PASS "$NF;if($NF~/K\/s/){print f;exit};gsub(/M\/s/,"");print(int($NF)>5?p:f)}' slow
FAIL 8.23K/s
Not sure where you get 7 M/s from?

According to #Rubens, you can use grep -o with your regex to show the speed, just append $ for end of line
wput 10MB.zip ftp://user:pass#host 2>&1 | grep -o '\([0-9.]\+[KM]/s\)$'
With perl you can easily do the remaining stuff
use strict;
use warnings;
while (<>) {
if (m!\s+((\d+\.\d+)([KM])/s)$!) {
if ($2 > 5 && $3 eq 'M') {
print "PASS $1\n";
} else {
print "FAIL $1\n";
}
}
}
and then call it
wput 10MB.zip ftp://user:pass#host 2>&1 | perl script.pl

This is an answer to the question update.
With the awk program, you haven't split the speed into numeric and unit value. It is just one string.
Because fast speed is greater than 5 M/s, you can ignore K/s and extract the speed by splitting at the character M. Then you have the speed in $1 and can compare it
wput 100M.bin ftp://test:test#0.0.0.0 2>&1 | grep -o '[0-9.]\+M/s$' | awk -F '/M/' '{ if ($1 > 5) { printf("FAST %s\n ", $0); }}'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using bc as daemon in BASH shell from awk - linux

It sounds like you're looking for gawk's co-process ability, see http://www.gnu.org/software/gawk/manual/gawk.html#Getline_002fCoprocess. Given awks support of math functions, though, I wonder why you'd want to use bc...

Try: mkfifo inp out bc -l <inp >out & awk ' BEGIN { printf("4/5\n"); exit 0;} ' > inp read a < out; echo $a awk ' BEGIN { printf("4/5\n"); exit 0;} ' > inp awk ' BEGIN { getline a; printf("%s\n", a); exit 0 } ' < out rm inp rm out

Related

How to find list of words (in thousands) in list of tsv files (hundreds), with output as number of match for each string in each file, in linux?

How to efficiently loop through the lines of a file in Bash?

How to get average CPU temperature from bash?

Bash script to isolate words in a file

wput speed result as pass or fail

Categories

Resources