How to get just numerical value from a string in bash - linux

I have an xml file and i want to extract just the numerical value from a string in the file.One of the solution i came up with is
cat file.xml |grep -i "mu "| grep -o '[0-9]'
But i get each digit separated by new line,e.g for 100,i get 1 then new line,then 0 and so on.The other solution i came up with is
cat file.xml |grep -i "mu "|cut -d ' ' -f 4| tr '=' ' '|cut -d ' ' -f2|tr '""' ' '|sed -e 's/^ *//g' -e 's/ *$//g'
My question: Is there a simpler solution to this problem that i get just a numerical value from a line without caring about fields and not to use cut or tr commands?

Use this egrep:
egrep -o '[0-9]+'

One option you have is to delete everything that is not a digit from your input
tr -cd '[:digit:]'
Or for floating numbers
tr -cd '[:digit:].'

I would encourage avoidance of XML as a format, personally; at least for your own use. Instead of "<mu value="100" />", you could use the following:-
# Name your data file ma-me-mo-mu.txt
100+200+300+400
and then:-
while IFS='+' read ma me mo mu
do
echo "${ma}"
echo "${me}"
echo "${mo}"
echo "${mu}"
done
You don't need to name your columns inside the data file itself. They go in the file name.

Related

How can I add a new line at the end of the output? (Linux help)

i am using this code
cut -c1 | tr -d '\n'
to basically take and print out the first letter of every line. the problem is, I need a new line at the end, but only at the end, after the word "caroline" (these are the content of the testfile
Cannot use AWK, basename, grep, egrep, fgrep or rgrep
Use echo
echo $( cut -c1 | tr -d '\n' ) \n
cut -c1 | tr -d '\n'; echo -e '\n'
Try using awk utility, something like following:-
awk -F\| '$1 > 0 { print substr($1,1,1)}' testfile.txt

String split and extract the last field in bash

I have a text file FILENAME. I want to split the string at - of the first column field and extract the last element from each line. Here "$(echo $line | cut -d, -f1 | cut -d- -f4)"; alone is not giving me the right result.
FILENAME:
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
code I tried:
while read line; do \
DNA="$(echo $line | cut -d, -f1 | cut -d- -f4)";
echo $DNA
done < ${FILENAME}
Result I want
1195060301
1195060302
1195060311
Would you please try the following:
while IFS=, read -r f1 _; do # set field separator to ",", assigns f1 to the 1st field and _ to the rest
dna=${f1##*-} # removes everything before the rightmost "-" from "$f1"
echo "$dna"
done < "$FILENAME"
Well, I had to do with the two lines of codes. May be someone has a better approach.
while read line; do \
DNA="$(echo $line| cut -d, -f1| rev)"
DNA="$(echo $DNA| cut -d- -f1 | rev)"
echo $DNA
done < ${FILENAME}
I do not know the constraints on your input file, but if what you are looking for is a 10-digit number, and there is only ever one 10-digit number per line... This should do niceley
grep -Eo '[0-9]{10,}' input.txt
1195060301
1195060302
1195060311
This essentially says: Show me all 10 digit numbers in this file
input.txt
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
A sed approach:
sed -nE 's/.*-([[:digit:]]+)\,.*/\1/p' input_file
sed options:
-n: Do not print the whole file back, but only explicit /p.
-E: Use Extend Regex without need to escape its grammar.
sed Extended REgex:
's/.*-([[:digit:]]+)\,.*/\1/p': Search, capture one or more digit in group 1, preceded by anything and a dash, followed by a comma and anything, and print only the captured group.
Using awk:
awk -F[,] '{ split($1,arr,"-");print arr[length(arr)] }' FILENAME
Using , as a separator, take the first delimited "piece" of data and further split it into an arr using - as the delimiter and awk's split function. We then print the last index of arr.

Bash issue with floating point numbers in specific format

(Need in bash linux)I have a file with numbers like this
1.415949602
91.09582241
91.12042924
91.40270349
91.45625033
91.70150341
91.70174342
91.70660043
91.70966213
91.72597066
91.7287678315
91.7398645966
91.7542977976
91.7678146465
91.77196659
91.77299733
abcdefghij
91.7827827
91.78288651
91.7838959
91.7855
91.79080605
91.80103075
91.8050505
sed 's/^91\.//' file (working)
Any way possible I can do these 3 steps?
1st I try this
cat input | tr -d 91. > 1.txt (didnt work)
cat input | tr -d "91." > 1.txt (didnt work)
cat input | tr -d '91.' > 1.txt (didnt work)
then
grep -x '.\{10\}' (working)
then
grep "^[6-9]" (working)
Final 1 line solution
cat input.txt | sed 's/\91.//g' | grep -x '.\{10\}' | grep "^[6-9]" > output.txt
Your "final" solution:
cat input.txt |
sed 's/\91.//g' |
grep -x '.\{10\}' |
grep "^[6-9]" > output.txt
should avoid the useless cat, and also move the backslash in the sed script to the correct place (and I added a ^ anchor and removed the g flag since you don't expect more than one match on a line anyway);
sed 's/^91\.//' input.txt |
grep -x '.\{10\}' |
grep "^[6-9]" > output.txt
You might also be able to get rid of at least one useless grep but at this point, I would switch to Awk:
awk '{ sub(/^91\./, "") } /^[6-9].{9}$/' input.txt >output.txt
The sub() does what your sed replacement did; the final condition says to print lines which match the regex.
The same can conveniently, but less readably, be written in sed:
sed -n 's/^91\.([6-9][0-9]\{9\}\)$/\1/p' input.txt >output.txt
assuming your sed dialect supports BRE regex with repetitions like [0-9]\{9\}.

How get value from text file in linux

I have some file xxx.conf in text format. I have some text "disablelog = 1" in this file.
When I use
grep -r "disablelog" oscam.conf
output is
disablelog = 1
But i need only value 1.
Do you have some idea please?
one way is to use awk to print just the value
grep -r "disablelog" oscam.conf | awk '{print $3}'
you could also use sed to replace diablelog = with empty
grep -r 'disablelog' oscam.conf | sed -e 's/disablelog = //'
If you also want to get the lines with or without space before and after = use
grep -r 'disablelog' oscam.conf | sed 's/disablelog\s*=\s*//'
above command will also match
disablelog=1
Assuming you need it as a var in a script:
#!/bin/bash
DISABLELOG=$(awk -F= '/^.*disablelog/{gsub(/ /,"",$2);print $2}' /path/to/oscam.conf)
echo $DISABLELOG
When calling this script, the output should be 1.
Edit: No matter wether there is whitespace or not between the equals sign and the value, the above will handle that. The regex should be anchored in either way to improve performance.
Try:
grep -r "disablelog" oscam.conf | awk -F= '{print $2}'
Just for fun a solution without awk
grep -r disablelog | cut -d= -f2 | xargs
xargs is used here to trim the whitespace

How to extract numbers from a string?

I have string contains a path
string="toto.titi.12.tata.2.abc.def"
I want to extract only the numbers from this string.
To extract the first number:
tmp="${string#toto.titi.*.}"
num1="${tmp%.tata*}"
To extract the second number:
tmp="${string#toto.titi.*.tata.*.}"
num2="${tmp%.abc.def}"
So to extract a parameter I have to do it in 2 steps. How to extract a number with one step?
You can use tr to delete all of the non-digit characters, like so:
echo toto.titi.12.tata.2.abc.def | tr -d -c 0-9
To extract all the individual numbers and print one number word per line pipe through -
tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed 's/ /\n/g'
Breakdown:
Replaces all line breaks with spaces: tr '\n' ' '
Replaces all non numbers with spaces: sed -e 's/[^0-9]/ /g'
Remove leading white space: -e 's/^ *//g'
Remove trailing white space: -e 's/ *$//g'
Squeeze spaces in sequence to 1 space: tr -s ' '
Replace remaining space separators with line break: sed 's/ /\n/g'
Example:
echo -e " this 20 is 2sen\nten324ce 2 sort of" | tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed 's/ /\n/g'
Will print out
20
2
324
2
Here is a short one:
string="toto.titi.12.tata.2.abc.def"
id=$(echo "$string" | grep -o -E '[0-9]+')
echo $id // => output: 12 2
with space between the numbers.
Hope it helps...
Parameter expansion would seem to be the order of the day.
$ string="toto.titi.12.tata.2.abc.def"
$ read num1 num2 <<<${string//[^0-9]/ }
$ echo "$num1 / $num2"
12 / 2
This of course depends on the format of $string. But at least for the example you've provided, it seems to work.
This may be superior to anubhava's awk solution which requires a subshell. I also like chepner's solution, but regular expressions are "heavier" than parameter expansion (though obviously way more precise). (Note that in the expression above, [^0-9] may look like a regex atom, but it is not.)
You can read about this form or Parameter Expansion in the bash man page. Note that ${string//this/that} (as well as the <<<) is a bashism, and is not compatible with traditional Bourne or posix shells.
This would be easier to answer if you provided exactly the output you're looking to get. If you mean you want to get just the digits out of the string, and remove everything else, you can do this:
d#AirBox:~$ string="toto.titi.12.tata.2.abc.def"
d#AirBox:~$ echo "${string//[a-z,.]/}"
122
If you clarify a bit I may be able to help more.
You can also use sed:
echo "toto.titi.12.tata.2.abc.def" | sed 's/[0-9]*//g'
Here, sed replaces
any digits (class [0-9])
repeated any number of times (*)
with nothing (nothing between the second and third /),
and g stands for globally.
Output will be:
toto.titi..tata..abc.def
Convert your string to an array like this:
$ str="toto.titi.12.tata.2.abc.def"
$ arr=( ${str//[!0-9]/ } )
$ echo "${arr[#]}"
12 2
Use regular expression matching:
string="toto.titi.12.tata.2.abc.def"
[[ $string =~ toto\.titi\.([0-9]+)\.tata\.([0-9]+)\. ]]
# BASH_REMATCH[0] would be "toto.titi.12.tata.2.", the entire match
# Successive elements of the array correspond to the parenthesized
# subexpressions, in left-to-right order. (If there are nested parentheses,
# they are numbered in depth-first order.)
first_number=${BASH_REMATCH[1]}
second_number=${BASH_REMATCH[2]}
Using awk:
arr=( $(echo $string | awk -F "." '{print $3, $5}') )
num1=${arr[0]}
num2=${arr[1]}
Hi adding yet another way to do this using 'cut',
echo $string | cut -d'.' -f3,5 | tr '.' ' '
This gives you the following output:
12 2
Fixing newline issue (for mac terminal):
cat temp.txt | tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed $'s/ /\\\n/g'
Assumptions:
there is no embedded white space
the string of text always has 7 period-delimited strings
the string always contains numbers in the 3rd and 5th period-delimited positions
One bash idea that does not require spawning any subprocesses:
$ string="toto.titi.12.tata.2.abc.def"
$ IFS=. read -r x1 x2 num1 x3 num2 rest <<< "${string}"
$ typeset -p num1 num2
declare -- num1="12"
declare -- num2="2"
In a comment OP has stated they wish to extract only one number at a time; the same approach can still be used, eg:
$ string="toto.titi.12.tata.2.abc.def"
$ IFS=. read -r x1 x2 num1 rest <<< "${string}"
$ typeset -p num1
declare -- num1="12"
$ IFS=. read -r x1 x2 x3 x4 num2 rest <<< "${string}"
$ typeset -p num2
declare -- num2="2"
A variation on anubhava's answer that uses parameter expansion instead of a subprocess call to awk, and still working with the same set of initial assumptions:
$ arr=( ${string//./ } )
$ num1=${arr[2]}
$ num2=${arr[4]}
$ typeset -p num1 num2
declare -- num1="12"
declare -- num2="2"

Resources