Extract multiple floating numbers from a line - linux

I want to extract timeTaken values from following line:
<some other log data> Exception, Curl1-Time: 0.258315s. Curl2-Time: 3.9092588424683s Exiting.
I am using following command with grep and awk:
grep -Po "Exception, Curl1-Time: \K(\d+.\d*)s. Curl2-Time: (\d+.\d+)" app.log | awk '{print $1 + $3}'
This outputs: 4.167565
Can this be done in more smarter way, maybe using sed or any other
bash tool.
Is it ok to ignore trailing "s." in time-taken
values as the result of addition is correct.

You already use PCRE. Why not use Perl itself?
perl -lne 'print $1 + $2
if /Exception, Curl1-Time: ([\d.]+)s\. Curl2-Time: ([\d.]+)/
' < input

If you have GNU's grep, then you can execute:
var="<some other log data> Exception, Curl1-Time: 0.258315s. Curl2-Time: 3.9092588424683s Exiting."
grep -Eo '[[:digit:]]+\.[[:digit:]]+s?' <<< "$var"
Or you can use awk and stay POSIX:
var="<some other log data> Exception, Curl1-Time: 0.258315s. Curl2-Time: 3.9092588424683s Exiting."
awk '{ while (match($0, /[[:digit:]]+\.[[:digit:]]+s?/)) { print substr($0, RSTART, RLENGTH); $0 = substr($0, RSTART + RLENGTH) } }' <<< "$var"
As you can see, both commands use the regex [[:digit:]]+\.[[:digit:]]+s? to match a pattern of one or more digits, a dot, one or more digits and an optional 's'.
GNU's grep uses the -o option to extract the matching regex pattern.
The awk version uses its match and substr functions, to match and extract relevant data.
After a regex match, RSTART and RLENGTH are set and we can use them to calculate a start and end positions for substr.
RLENGTH is the length of the substring matched by the match function.
RSTART is the start-index in characters of the substring matched by the match function.
see section Built-in Functions for String Manipulation

sed 's/.*Curl1-Time: \([0-9]\.[0-9]*\)s.*\([0-9]\.[0-9]*\)s.*$/\1 \2/p' filename | awk '{print ($1+$2);}'
Regex pattern matching ".Curl1-Time: ([0-9].[0-9])s.([0-9].[0-9])s.*$" ---> Pattern within the braces is the number matching regex.
Entire line is replaced with two matching patterns. i.e the output of sed will be two numbers with spaces in between them. e.g. 1234 34567
awk parses the sed output with default space delimiter and sums up them and prints the result.

Related

sed: Getting Two Extra Characters in the Results

How can I display the next two characters from sed results (wildcard characters and then stop the results)?
echo 'this is a test line' | sed 's/^.*te*/te../'
Expecting
test
Actual results te.. line
You can use
sed -n 's/.*\(te..\).*/\1/p' <<< 'this is a test line'
See the online demo. Here,
-n - suppresses the default line output
.*\(te..\).* - matches any zero or more chars, then captured into Group 1 te and any two chars, and then matches the rest of the string
\1 - replaces the whole match with the value of Group 1
p - only prints the result of the substitution.
GNU AWK solution
echo 'this is a test line' | awk 'BEGIN{FPAT="te.."}{print $1}'
output
test
Explanation: Inform AWK to detect fields like te.. using FPAT (Field PATtern) then just print 1st field.
(tested in GNU Awk 5.0.1)

Replace substring of a string of characters with other characters in awk

I have a file which contains a very long string of characters and I would like to replace a substring of it with Ns. Example:
test
ABCDABCDABCD
I would like to replace a substring of it with all letters N with awk command and sed, all the characters from index 5 to 8, so the total length of letter N is 4.
Output
ABCDNNNNABCD
I tried something like this:
awk '{ v=substr($0,5,4); sed -i "s/$v/N/g";print substr($0,1,4)""v""substr($0,9,12)}' test
however, this command seems to give this output:
ABCDABCDABC
And no substitution was made
I would like to have in the code the number of the index from where to start the substitution, (here, for example, is 4) and the length number of the substitution ( here also 4), so I can just modify these numbers in case I want to start in another position and for a different length of substitution because in reality, I have a string with thousands of letter and I want to replace hundreds of characters so substitution of pattern does not work in my case
You want to use awk and sed? Seems like you actually want one of:
$ echo ABCDABCDABCD | perl -pe 'substr($_,4,4)="NNNN"'
ABCDNNNNABCD
$ echo ABCDABCDABCD | perl -pe 'substr($_,4,4)="N"x4'
ABCDNNNNABCD
$ echo 'ABCDABCDABCD' |
awk -v b=5 -v e=8 '{
t=substr($0,b,e-b+1); gsub(/./,"N",t); print substr($0,1,b-1) t substr($0,e+1)
}'
ABCDNNNNABCD
echo ABCDABCDABCD | awk '{$0=gensub(/ABCD/,"NNNN",2)}1'
ABCDNNNNABCD

Find words containing 20 vowels grep

I found many similar questions but most of them ask for vowels in a row which is easy. I want to find words that contain 20 vowels not in a row using grep.
I originally thought grep -Ei [aeiou]{20} would do it but that seems to search only for 20 vowels in a row
Use a regular expression that searches for 20 vowels separated by any quantity of consonants.
grep -Ei "[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*\
[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*\
[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*\
[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*\
[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*\
[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*\
[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*[aeiou][b-df-hj-np-tv-z]*"
The backslash is just informing the shell that the expression continues on the next line. It is not part of the regex itself.
If you understand that part, you can shorten it considerably using groups. This regexp is the same as above, but using groups in parenthesis with repetition.
grep -Ei "([aeiou][b-df-hj-np-tv-z]*){20}"
I don't believe that's a problem that calls for just a regex. Here's a programmatic approach. We redefine the field separator to the empty string; each character is a field. We iterate over the line; if a character is a vowel we increment a counter. If, at the end of the string, the count is 20, we print it:
cat nicks.awk
BEGIN{
FS=""
}
{
c=0;
for( i=1;i<=NF;i=i+1 ){
if ($i ~ /[aeiou]/ ){
c=c+1;
}
};
if(c==20){
print $0
}
}
And this is what it does ... it only prints back the one string that has 20 vowels.
echo "contributorNickSequestionsfoundcontainingvowelsgrcep" | awk -f nicks.awk
echo "contributorNickSeoquestionsfoundcontainingvowelsgrcep" | awk -f nicks.awk
contributorNickSeoquestionsfoundcontainingvowelsgrcep
echo "contributorNickSaeoquestionsfoundcontainingvowelsgrcep" | awk -f nicks.awk
If all you really need is to find 20 vowels in a line then that's just:
awk '{x=tolower($0)} gsub(/[aeiou]/,"&",x)==20' file
or with grep:
grep -Ei '^[^aeiou]*([aeiou][^aeiou]*){20}$' file
To find words (assuming each is space separated) there's many options including this with GNU awk:
awk -v RS='\\s+' -v IGNORECASE=1 'gsub(/[aeiou]/,"&")==20' file
or this with any awk:
awk '{for (i=1;i<=NF;i++) {x=tolower($i); if (gsub(/[aeiou]/,"&",x)==20) print $i} }' file

search for a string and after getting result cut that word and store result in variable

I Have a file name abc.lst i ahve stored that in a variable it contain 3 words string among them i want to grep second word and in that i want to cut the word from expdp to .dmp and store that into variable
example:-
REFLIST_OP=/tmp/abc.lst
cat $REFLIST_OP
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
Desired Output:-
expdp_TEST_P119_*_18112017.dmp
I Have tried below command :-
FULL_DMP_NAME=`cat $REFLIST_OP|grep /orabackup|awk '{print $2}'`
echo $FULL_DMP_NAME
/data/abc/GOon/expdp_TEST_P119_*_18112017.dmp
REFLIST_OP=/tmp/abc.lst
awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
Test Results:
$ REFLIST_OP=/tmp/abc.lst
$ cat "$REFLIST_OP"
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
$ awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
expdp_TEST_P119_*_18112017.dmp
To save in variable
myvar=$( awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" )
Following awk may help you on same.
awk -F'/| ' '{print $6}' Input_file
OR
awk -F'/| ' '{print $6}' "$REFLIST_OP"
Explanation: Simply making space and / as a field separator(as per your shown Input_file) and then printing 6th field of the line which is required by OP.
To see the field number and field's value you could use following command too:
awk -F'/| ' '{for(i=1;i<=NF;i++){print i,$i}}' "$REFLIST_OP"
Using sed with one of these regex
sed -e 's/.*\/\([^[:space:]]*\).*/\1/' abc.lst capture non space characters after /, printing only the captured part.
sed -re 's|.*/([^[:space:]]*).*|\1|' abc.lst Same as above, but using different separator, thus avoiding to escape the /. -r to use unescaped (
sed -e 's|.*/||' -e 's|[[:space:]].*||' abc.lst in two steps, remove up to last /, remove from space to end. (May be easiest to read/understand)
myvar=$(<abc.lst); myvar=${myvar##*/}; myvar=${myvar%% *}; echo $myvar
If you want to avoid external command (sed)

Extract substring of string if position is known

first, I need to extract the substring by a known position in the file.txt
file.txt in bash, but starting from the second line
>header
cgatgcgctctgtgcgtgcgtgcg
so let's assume I want position 10 from the second line, the output should be:
c
second, I want to include the surrounding ±5 characters, resulting in
gcgctctgtgc
{ read -r; read -r; echo "${REPLY:9:1}"; echo "${REPLY:4:11}"; } < file.txt
Output:
c
gcgctctgtgc
The ${parameter:offset:length} syntax for substrings is explained in https://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion.
The read command is explained in https://www.gnu.org/software/bash/manual/bashref.html#index-read.
Input redirection: https://www.gnu.org/software/bash/manual/bashref.html#Redirections.
With awk:
To get the character at position 10, 1-indexed:
awk 'NR==2 {print substr($0, 10, 1)}'
NR==2 is checking if the record is second, if so the statements inside {} would be executed
substr($0, 10, 1) will extract 1 character starting from position 10 from field $0 (the whole record) i.e. only the 10-th character will be extracted. The format for substr() is substr(field, offset, length).
Similarly, to get ±5 characters around 10-th:
awk 'NR==2 {print substr($0, (10-5), 11)}'
(10-5) instead of 5 is just to give you the idea of the stuffs.
Example:
% cat file.txt
>header
cgatgcgctctgtgcgtgcgtgcg
% awk 'NR==2 {print substr($0, 10, 1)}' file.txt
c
% awk 'NR==2 {print substr($0, (10-5), 11)}' file.txt
gcgctctgtgc
use sed and cut:
sed -n '2p' file|cut -c 5-15
sed for access 2nd line and cut for print desired characters

Resources