Grabbing data between two commas - linux

I am in the process of writing a simple script to grab battery information from acpi so I can format and output it.
Currently, I am using cut to grab this information, but as the battery state changes, cut does not grab the correct data and instead will grab a portion of a string instead of the battery percentage.
When running acpi -b, I get the following output:
Battery 0: Unknown, 100
Occasionally, acpi -b will also return the following, or something similar if it is charging or discharging:
Battery 0: Discharging, 98%, 02:14:14 remaining
So, without using cut, I'd like to be able to grab the data after the first comma, and, on occasion when present, grab the information between both commas. Right now, I am using sed to strip whitespace and the percentage sign from the output. Here is that command:
acpi -b | cut -c20-24 | sed 's/ //g;s/%//g'

You can use use this simple awk command:
s='Battery 0: Unknown, 100'
awk -F',[[:blank:]]*' '{sub(/%/, "", $2); print $2}' <<< "$s"
100
s='Battery 0: Discharging, 98%, 02:14:14 remaining'
awk -F',[[:blank:]]*' '{sub(/%/, "", $2); print $2}' <<< "$s"
98
awk breakup:
-F,[[:blank:]]* # makes comma followed by 0 or more spaces as field separator
sub(/%/, "", $2) # remove trailing %
print $2 # print 2nd field

With sed:
$ str1='Battery 0: Unknown, 100'
$ str2='Battery 0: Discharging, 98%, 02:14:14 remaining'
$ sed 's/^[^,]*, \([0-9]*\).*$/\1/' <<< "$str1"
100
$ sed 's/^[^,]*, \([0-9]*\).*$/\1/' <<< "$str2"
98
The substitution matches everything up to the first comma and a space (^[^,]*,), then captures any sequence of numbers (\([0-9]*\)) and matches the rest of the line (.*$), then substitutes the whole line with the captured numbers.

Related

Filtering a file with values over 0.70 using AWK

I have a file of targets predicted by Diana and I would like to extract those with values over 0.70
>AAGACAACGUUUAAACCA|ENST00000367816|0.999999999975474
UTR3 693-701 0.00499294596715397
UTR3 1045-1053 0.405016433077734
>AAGACAACGUUUAAACCA|ENST00000392971|0.996695852735028
CDS 87-95 0.0112208345874892
I don't know why this script doesn't want to work if it seems to be correct
for file in SC*
do
grep ">" $file | awk 'BEGIN{FS="|"}{if($3 >= 0.70)}{print $2, $3}' > 70/$file.tab
done
The issue is it doesn't filter, can you help me to find out the error?
For a start, that's not a valid awk script since you have a misplaced } character:
BEGIN{FS="|"}{if($3 >= 0.70)}{print $2, $3}
# |
# +-------------+
# move here |
# V
BEGIN{FS="|"}{if($3 >= 0.70){print $2, $3}}
You also don't need grep because awk can do that itself, and you can also set the field separator without a BEGIN block. For example, here's a command that will output field 3 values greater than 0.997, on lines starting with > (using | as a field separator):
pax> awk -F\| '/^>/ && $3 > 0.997 { print $3 }' prog.in
0.999999999975474
I chose 0.997 to ensure one of the lines in your input file was filtered out for being too low (as proof that it works). For your desired behaviour, the command would be:
pax> awk -F\| '/^>/ && $3 > 0.7 { print $2, $3 }' prog.in
ENST00000367816 0.999999999975474
ENST00000392971 0.996695852735028
Keep in mind I've used > 0.7 as per your "values over 0.70" in the heading and text of your question. If you really mean "values 0.70 and above" as per the code in your question, simply change > into >=.
Looks like you are running a for loop to kick off awk program multiple times(it means each time a file processes an awk program process will be kicked off), you need not to do that, awk program could read all the files with same name/format by itself, so apart from fixing your typo in awk program pass all files into your awk program too like:
awk -F\| 'FNR==1{close(out); out="70/"FILENAME".tab"} /^>/ && $3 > 0.7 { print $2,$3 > out }' SC*
i think it's perhaps safe to regex filter in string mode, instead of numerically :
$3 !~/0[.][0-6]/
if it started to interpret the input as a number, and does a numeric compare, that would be subject to rounding errors limited to float-point math. with a string-based filter, you could avoid values above
~ 0 . 699 999 999 999 999 95559107901… (approx. IEEE754 double-precision of 7E-1 )
being rounded up.

awk, sed, grep specific strings from a file in Linux

Here is part of the complete file that I am trying to filter:
Hashmode: 13761 - VeraCrypt PBKDF2-HMAC-SHA256 + XTS 512 bit + boot-mode (Iterations: 200000)
Speed.#2.........: 2038 H/s (56.41ms) # Accel:128 Loops:32 Thr:256 Vec:1
Speed.#3.........: 2149 H/s (53.51ms) # Accel:128 Loops:32 Thr:256 Vec:1
Speed.#*.........: 4187 H/s
The aim is to print the following:
13761 VeraCrypt PBKDF2-HMAC-SHA256 4187 H/s
Here is what I tried.
The complete file is called complete.txt
cat complete.txt | grep Hashmode | awk '{print $2,$4,$5}' > mode.txt
Output:
13761 VeraCrypt PBKDF2-HMAC-SHA256
Then:
cat complete.txt | grep Speed.# | awk '{print $2,$3}' > speed.txt
Output:
4187 H/s
Then:
paste mode.txt speed.txt
The issue is that the lines do not match. There are approx 200 types of modes to filter within the file 'complete.txt'
I also have a feeling that this can be done using a much simpler command with sed or awk.
I am guessing you are looking for something like the following.
awk '/Hashmode:/ { if(label) print label, speed; label = $2 " " $4 " " $5 }
/Speed\.#/ { speed = $2 " " $ 3 }
END { if (label) print label, speed }' complete.txt
We match up the Hashmode line with the last Speed.# line which follows, then print when we see a new Hashmode, or reach end of file. (Failing to print the last one is a common beginner bug.)
This might work for you (GNU sed):
sed -E '/Hashmode:/{:a;x;s/^[^:]*: (\S+) -( \S+ \S+ ).*\nSpeed.*:\s*(\S+ \S+).*/\1\2\3/p;x;h;d};H;$!d;ba' file
If a line contains Hashmode, swap to the hold space and using pattern matching, manipulate its contents to the desired format and print, swap back to the pattern space, copy the current line to the hold space and delete the current line.
Otherwise, append the current line to the hold space and delete the current line, unless the current line is the last line in the file, in which case process the line as if it contained Hashmode.
N.B. The first time Hashmode is encountered, nothing is output. Subsequent matches and the end-of-file condition will be the only times printing occurs.

How to filter a the required content from a string in linux?

I had a string like:-
sometext sometext BASEDIR=/someword/someword/someword/1342.32 sometext sometext.
Could someone tell me, how to filter this number 1342.32, from the above string in linux??
$ echo "sometext BASEDIR=/someword/1342.32 sometext." |
sed "s/[^0-9.]//g"
> 1342.32.
The sed command searches for anything not in the set "0123456789" or ".", and replaces it with nothing (deletes it). It does this in global mode, so it doesn't stop on the first match.
This is enough if you're just trying to read it. If you're trying to feed the number into another command and need a real number, you will need to clean it up:
$ ... | cut -f 1-2 -d "."
> 1342.32
cut splits the input on the delemiter, then selects fields 1 and 2 (numbered from one). So "1.2.3.4" would return "1.2".
If sometext is always delimited from the surrounding fields by a white space, try this
cat log.txt | awk '{for (i=1;i<=NF;i++) {if ($i ~
/BASEDIR/) {print i,$i}}}' | awk -F/ '{for (i=1;i<=NF;i++) {if ($i ~
/^[0-9][0-9]*$/) {print $i}}}'
The code snippet above assumes that your data is contained in a file called log.txt and organised in records(read this awk-wise)
This works also if digits appear in sometext before BASEDIR as well as if the input has additional lines:
sed -n 's,.*BASEDIR=\(/\w*\)*/\([0-9.]*\).*,\2,p'
-n do not output lines without BASEDIR…
\(/\w*\)* group of / and someword, repeated
\([0-9.]*\) group of repeated digit or decimal point
\2 replacement of everything matched (the entire line) with the 2nd group
p print the result

Filtering Linux command output

I need to get a row based on column value just like querying a database. I have a command output like this,
Name ID Mem VCPUs State
Time(s)
Domain-0 0 15485 16 r-----
1779042.1
prime95-01 512 1
-b---- 61.9
Here I need to list only those rows where state is "r". Something like this,
Domain-0 0 15485 16
r----- 1779042.1
I have tried using "grep" and "awk" but still I am not able to succeed.
Any help me is much appreciated
Regards,
Raaj
There is a variaty of tools available for filtering.
If you only want lines with "r-----" grep is more than enough:
command | grep "r-----"
Or
cat filename | grep "r-----"
grep can handle this for you:
yourcommand | grep -- 'r-----'
It's often useful to save the (full) output to a file to analyse later. For this I use tee.
yourcommand | tee somefile | grep 'r-----'
If you want to find the line containing "-b----" a little later on without re-running yourcommand, you can just use:
grep -- '-b----' somefile
No need for cat here!
I recommend putting -- after your call to grep since your patterns contain minus-signs and if the minus-sign is at the beginning of the pattern, this would look like an option argument to grep rather than a part of the pattern.
try:
awk '$5 ~ /^r.*/ { print }'
Like this:
cat file | awk '$5 ~ /^r.*/ { print }'
grep solution:
command | grep -E "^([^ ]+ ){4}r"
What this does (-E switches on extended regexp):
The first caret (^) matches the beginning of the line.
[^ ] matches exactly one occurence of a non-space character, the following modifier (+) allows it to also match more occurences.
Grouped together with the trailing space in ([^ ]+ ), it matches any sequence of non-space characters followed by a single space. The modifyer {4} requires this construct to be matched exactly four times.
The single "r" is then the literal character you are searching for.
In plain words this could be written like "If the line starts <^> with four strings that are followed by a space <([^ ]+ ){4}> and the next character is , then the line matches."
A very good introduction into regular expressions has been written by Jan Goyvaerts (http://www.regular-expressions.info/quickstart.html).
Filtering by awk cmd in linux:-
Firstly find the column for this cmd and store file2 :-
awk '/Domain-0 0 15485 /' file1 >file2
Output:-
Domain-0 0 15485 16
r----- 1779042.1
after that awk cmd in file2:-
awk '{print $1,$2,$3,$4,"\n",$5,$6}' file2
Final Output:-
Domain-0 0 15485 16
r----- 1779042.1

Getting n-th line of text output

I have a script that generates two lines as output each time. I'm really just interested in the second line. Moreover I'm only interested in the text that appears between a pair of #'s on the second line. Additionally, between the hashes, another delimiter is used: ^A. It would be great if I can also break apart each part of text that is ^A-delimited (Note that ^A is SOH special character and can be typed by using Ctrl-A)
output | sed -n '1p' #prints the 1st line of output
output | sed -n '1,3p' #prints the 1st, 2nd and 3rd line of output
your.program | tail +2 | cut -d# -f2
should get you 2/3 of the way.
Improving Grumdrig's answer:
your.program | head -n 2| tail -1 | cut -d# -f2
I'd probably use awk for that.
your_script | awk -F# 'NR == 2 && NF == 3 {
num_tokens=split($2, tokens, "^A")
for (i = 1; i <= num_tokens; ++i) {
print tokens[i]
}
}'
This says
1. Set the field separator to #
2. On lines that are the 2nd line, and also have 3 fields (text#text#text)
3. Split the middle (2nd) field using "^A" as the delimiter into the array named tokens
4. Print each token
Obviously this makes a lot of assumptions. You might need to tweak it if, for example, # or ^A can appear legitimately in the data, without being separators. But something like that should get you started. You might need to use nawk or gawk or something, I'm not entirely sure if plain awk can handle splitting on a control character.
bash:
read
read line
result="${line#*#}"
result="${result%#*}"
IFS=$'\001' read result -a <<< "$result"
$result is now an array that contains the elements you're interested in. Just pipe the output of the script to this one.
here's a possible awk solution
awk -F"#" 'NR==2{
for(i=2;i<=NF;i+=2){
split($i,a,"\001") # split on SOH
for(o in a ) print o # print the splitted hash
}
}' file

Resources