I have a few strings like this:
var1="string one=3423423 and something which i don't care"
var2="another bigger string=413145 and something which i don't care"
var3="the longest string ever=23442 and something which i don't care"
These strings are the output of a python script (which i am not allowed to touch), and I need a way to extract the 1st part of the string, right after the number. Basically, my outputs should be:
"string one=3423423"
"another bigger string=413145"
"the longest string ever=23442"
As you can see, i can't use positions, or stuff like that, because the number and the string length are not always the same. I assume i would need to use a regex or something, but i don't really understand regexes. Can you please help with a command or something which can do this?
grep -oP '^.*?=\d+' inputfile
string one=3423423
another bigger string=413145
the longest string ever=23442
Here -o flag will enable grep to print only matching part and -p will enable perl regex in grep. Here \d+ means one or more digit. So, ^.*?=\d+ means print from start of the line till you find last digit (first match).
You could use parameter expansion, for example:
var1="string one=3423423 and something which i don't care"
name=${var1%%=*}
value=${var1#*=}
value=${value%%[^0-9]*}
echo "$name=$value"
# prints: string one=3423423
Explanation of ${var1%%=*}:
%% - remove the longest matching suffix
= - match =
* - match everything
Explanation of ${var1#*=}:
# - remove the shortest matching prefix
* - match everything
= - match =
Explanation of ${value%%[^0-9]*}:
%% - remove the longest matching suffix
[^0-9] - match any non-digit
* - match everything
To perform the same thing on more than one values easily,
you could wrap this logic into a function:
extract_and_print() {
local input=$1
local name=${input%%=*}
local value=${input#*=}
value=${value%%[^0-9]*}
echo "$name=$value"
}
extract_and_print "$var1"
extract_and_print "$var2"
extract_and_print "$var3"
$ shopt -s extglob
$ echo "${var1%%+([^0-9])}"
string one=3423423
$ echo "${var2%%+([^0-9])}"
another bigger string=413145
$ echo "${var3%%+([^0-9])}"
the longest string ever=23442
+([^0-9]) is an extended pattern that matches one or more non-digits.
${var%%+([^0-9])} with %%pattern will remove the longest match of that pattern from the end of the variable value.
Refs: patterns, parameter substitution
Related
I need to replace one variable with another variable in a multiple strings.
For example:
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in string1 string2 string3; do
x="$(echo "$str" | sed 's/[a-zA-Z]//g')" # extracting a character between letters
sed 's/$x/$y/'$str # I tried this, but it does not work at all.
echo "$str"
done
Expecting output:
One;two
three;four
five;six
In my output, nothing changes:
One,two
three.four
five:six
You can use bash's substitution operator instead of sed. And simply replace anything that isn't a letter with $y.
#!/bin/bash
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in "$string1" "$string2" "$string3"; do
x=${str//[^a-zA-Z]+/$y}
echo "$x"
done
Output is:
One;two
three;four
five;six
Note that your general approach wouldn't work if the input string has muliple delimiters, e.g. One,two,three. When you remove all the letters you get ,,, but that doesn't appear anywhere in the string.
Addressing issues with OP's current code:
referencing variables requires a leading $, preferably a pair of {}, and (usually) double quotes (eg, to insure embedded spaces are considered as part of the variable's value)
sed can take as input a) a stream of text on stdin, b) a file, c) process substitution or d) a here-document/here-string
when building a sed script that includes variable refences the sed script must be wrapped in double quotes (not single quotes)
Pulling all of this into OP's current code we get:
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in "${string1}" "${string2}" "${string3}"; do # proper references of the 3x "stringX" variables
x="$(echo "$str" | sed 's/[a-zA-Z]//g')"
sed "s/$x/$y/" <<< "${str}" # feeding "str" as here-string to sed; allowing variables "x/y" to be expanded in the sed script
echo "$str"
done
This generates:
One;two # generated by the 2nd sed call
One,two # generated by the echo
;hree.four # generated by the 2nd sed call
three.four # generated by the echo
five;six # generated by the 2nd sed call
five:six # generated by the echo
OK, so we're now getting some output but there are obviously some issues:
the results of the 2nd sed call are being sent to stdout/terminal as opposed to being captured in a variable (presumably the str variable - per the follow-on echo ???)
for string2 we find that x=. which when plugged into the 2nd sed call becomes sed "s/./;/"; from here the . matches the first character it finds which in this case is the 1st t in string2, so the output becomes ;hree.four (and the . is not replaced)
dynamically building sed scripts without knowing what's in x (and y) becomes tricky without some additional coding; instead it's typically easier to use parameter substitution to perform the replacements for us
in this particular case we can replace both sed calls with a single parameter substitution (which also eliminates the expensive overhead of two subprocesses for the $(echo ... | sed ...) call)
Making a few changes to OP's current code we can try:
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in "${string1}" "${string2}" "${string3}"; do
x="${str//[^a-zA-Z]/${y}}" # parameter substitution; replace everything *but* a letter with the contents of variable "y"
echo "${str} => ${x}" # display old and new strings
done
This generates:
One,two => One;two
three.four => three;four
five:six => five;six
In my shell script there is a parameter that comes from certain systems and it gives an answer similar to this one: PAR0000008.
And I need to send only the last number of this parameter to another variable, ie VAR=8.
I used the command VAR=$( echo ${PAR} | cut -c 10 ) and it worked perfectly.
The problem is when the PAR parameter returns with numbers from two decimal places like PAR0000012. I need to discard the leading zeros and send only the number 12 to the variable, but I don't know how to do the logic in the Shell to discard all the characters to the left of the number.
Edit Using grep To Handle 0 As Part Of Final Number
Since you are using POSIX shell, making use of a utility like sed or grep (or cut) makes sense. grep is quite a bit more flexible in parsing the string allowing a REGEX match to handle the job. Say your variable v=PAR0312012 and you want the result r=312012. You can use a command substitution (e.g. $(...)) to parse the value assigning the result to r, e.g.
v=PAR0312012
r=$(echo $v | grep -Eo '[1-9].*$')
echo $r
The grep expression is:
-Eo - use Extended REGEX and only return matching portion of string,
[1-9].*$ - from the first character in [1-9] return the remainder of the string.
This will work for PAR0000012 or PAR0312012 (with result 312012).
Result
For PAR0312012
312012
Another Solution Using expr
If your variable can have zeros as part of the final number portion, then you must find the index where the first [1-9] character occurs, and then assign the substring beginning at that index to your result variable.
POSIX shell provides expr which provides a set of string parsing tools that can to this. The needed commands are:
expr index string charlist
and
expr substr string start end
Where start and end are the beginning and ending indexes to extract from the string. end just has to be long enough to encompass the entire substring, so you can just use the total length of your string, e.g.
v=PAR0312012
ndx=$(expr index "$v" "123456789")
r=$(expr substr "$v" "$ndx" 10)
echo $r
Result
312012
This will handle 0 anywhere after the first [1-9].
(note: the old expr ... isn't the fastest way of handling this, but if you are only concerned with a few tens of thousands of values, it will work fine. A billion numbers and another method will likely be needed)
This can be done easily using Parameter Expension.
var='PAR0000008'
echo "${var##*0}"
//prints 8
echo "${var##*[^1-9]}"
//prints 8
var="${var##*0}"
echo "$var"
//prints 8
var='PAR0000012'
echo "${var##*0}"
//prints 12
echo "${var##*[^1-9]}"
//prints 12
var="${var##*[^1-9]}"
echo "$var"
//prints 12
I am fairly new to bash scripting and was trying to echo only lines that match a specific formatting. I have this code so far:
LINE=1
while read -r CURRENT_LINE
do
if [[ $CURRENT_LINE == ??-?-??? ]]
then
echo "$LINE: $CURRENT_LINE"
fi
((LINE++))
done < "./new-1.txt"
The text file contains number sequences on each line that match the following format: "12-3-456", but also contains sequences that are in different formats as well, such as "123-89203-9420" or "123-456-7890". I can't quite understand why the if statement inside the while loop does not result to True on lines that match the formatting. I've tried using the * as well, but using it gives me incorrect results.
Here are the contents of the text file new-1.txt. I want the script to output "Line 1: 11-1-111", but it doesn't output anything.
11-1-111
222-22-2222
333-33-3333
444-444-4444
555-555-5555
In the regex parlance, the ? makes the character or selection optional, ie , a character/selection is allowed to occur at most one time but zero occurrences are also tolerated.
However, the == operation is not the regex matching operator. It is =~.
So changing your if clause to the below would do the job.
[[ $CURRENT_LINE =~ "^[0-9]{2}-[0-9]{1}-[0-9]{3}$" ]]
Here
The ^ specifies the beginning of regex and $ the end. So we have a tight coupling of the pattern to match
[0-9] denotes a range, here any number from zero to nine.
The {n} mandates that the preceding character/selection should match exactly n number of times
Note : You can also use a more verbose [[:digit:]] instead of [0-9]
I have a string that can be one of:
1.) AA_BB-CC_xxxx-xx.y.y-xxxxxxxx-yyyyyy.tar.gz
or with prefix dropped:
2.) CC_xxxx-xx.y.y-xxxxxxxx-yyyyyy.tar.gz
where A,B,C,D are any number of letters and x and y are digits. I need to extract the following from the above:
AA_BB-CC_xxxx
CC_xxxx
Example:
standalone_version-WIN_2012-16.3.2-20180627-131137.tar.gz
WIN_2008-16.3.2-20180614-094525.tar.gz
need to extract:
standalone_version-WIN_2012
WIN-2008
I'm trying to discard everything from the end till the first dash followed by a digit is encountered. I'm using the following but it returns the whole string:
name=${image_file%%-[0-9].*}
You were almost there! Instead of
name=${image_file%%-[0-9].*}
omit the dot:
name=${image_file%%-[0-9]*}
The expressions in bash %% string trims are patterns, not regular expressions. Therefore * alone matches any number of characters, not .* as in a regex.
Example (tested in bash 4.4.12(3)-release):
$ foo='standalone_version-WIN_2012-16.3.2-20180627-131137.tar.gz'
$ bar='WIN_2008-16.3.2-20180614-094525.tar.gz'
$ echo ${foo%%-[0-9].*}
standalone_version-WIN_2012-16.3.2-20180627-131137.tar.gz
# oops
$ echo ${foo%%-[0-9]*}
standalone_version-WIN_2012
# no dot - works fine
$ echo ${bar%%-[0-9]*}
WIN_2008
# same here.
I am using Bourne Shell. Need to confirm if my understanding of following is correct?
$ echo $SHELL
/bin/bash
$ VAR="NJ:NY:PA" <-- declare an array with semicolon as separator?
$ echo ${VAR#*} <-- show entire array without separator?
NJ:NY:PA
$ echo ${VAR#*:*} <-- show array after first separator?
NY:PA
$ echo ${VAR#*:*:*} <-- show string after two separator
PA
${var#pattern} is a parameter expansion that expands to the value of $var with the shortest possible match for pattern removed from the front of the string.
Thus, ${VAR#*:} removes everything up and including to the first :; ${VAR#*:*:} removes everything up to and including the second :.
The trailing *s on the end of the expansions given in the question don't have any use, and should be avoided: There's no reason whatsoever to use ${var#*:*:*} instead of ${var#*:*:} -- since these match the smallest amount of text possible, and * is allowed to expand to 0 characters, the final * matches and removes nothing.
If what you really want is an array, you might consider using a real array instead.
# read contents of string VAR into an array of states
IFS=: read -r -a states <<<"$VAR"
echo "${states[0]}" # will echo NJ
echo "${states[1]}" # will echo NY
echo "${#states[#]}" # count states; will emit 3
...which also gives you the ability to write:
printf ' - %s\n' "${states[#]}" # put *all* state names into an argument list