Split a string and pick the uppercase substring

Split a string and pick the uppercase substring - string

Consider the following example variables in bash:
PET="cat/DOG/hamster"
FOOD="soup/soup/PIZZA"
SUBJECT="MATH/physics/biology"
How can I split any of those strings by a slash, extract the part that's all uppercase and store it in a variable? For example, how would I take DOG out of the $PET variable and store it in an $OPTION variable?
I need a portable solution that works under bash and zsh specifically.

You could use tr to remove all characters that are not uppercase:
OPTION=$(tr -dc '[:upper:]' <<< $PET)
Note that here-strings (<<< $VARIABLE) are a bash-ism. In other shells you'll have to echo the variable into tr:
OPTION=$(echo "$PET" | tr -dc '[:upper:]')

It sounds like there is only one portion of the string is in uppercase, so you can ignore the splitting portion of the question. This should work in both zsh and bash (although it is not portable in the sense of POSIX compatibility):
$ echo "${PET//[^A-Z]}"
DOG

You can try something like this -
OPTION=$(gawk -F'/' '{for (i=1;i<=NF;i++) if ($i ~ /\<[A-Z]+\>/) print $i}' <<< $PET)
If you like a pure bash solution then you can add following piece of code
#!/bin/bash
PET="cat/DOG/hamster"
IFSBK=$IFS
IFS='/'
for word in $PET; do
if [[ $word =~ [A-Z]+ ]]; then
OPTION="$word"
fi
done
IFS=$IFSBK

Related

Not able to replace the file contents with sed command [duplicate]

I am using the below code for replacing a string
inside a shell script.
echo $LINE | sed -e 's/12345678/"$replace"/g'
but it's getting replaced with $replace instead of the value of that variable.
Could anybody tell what went wrong?

If you want to interpret $replace, you should not use single quotes since they prevent variable substitution.
Try:
echo $LINE | sed -e "s/12345678/${replace}/g"
Transcript:
pax> export replace=987654321
pax> echo X123456789X | sed "s/123456789/${replace}/"
X987654321X
pax> _
Just be careful to ensure that ${replace} doesn't have any characters of significance to sed (like / for instance) since it will cause confusion unless escaped. But if, as you say, you're replacing one number with another, that shouldn't be a problem.

you can use the shell (bash/ksh).
$ var="12345678abc"
$ replace="test"
$ echo ${var//12345678/$replace}
testabc

Not specific to the question, but for folks who need the same kind of functionality expanded for clarity from previous answers:
# create some variables
str="someFileName.foo"
find=".foo"
replace=".bar"
# notice the the str isn't prefixed with $
# this is just how this feature works :/
result=${str//$find/$replace}
echo $result
# result is: someFileName.bar
str="someFileName.sally"
find=".foo"
replace=".bar"
result=${str//$find/$replace}
echo $result
# result is: someFileName.sally because ".foo" was not found

Found a graceful solution.
echo ${LINE//12345678/$replace}

Single quotes are very strong. Once inside, there's nothing you can do to invoke variable substitution, until you leave. Use double quotes instead:
echo $LINE | sed -e "s/12345678/$replace/g"

Let me give you two examples.
Using sed:
#!/bin/bash
LINE="12345678HI"
replace="Hello"
echo $LINE | sed -e "s/12345678/$replace/g"
Without Using sed:
LINE="12345678HI"
str_to_replace="12345678"
replace_str="Hello"
result=${str//$str_to_replace/$replace_str}
echo $result
Hope you will find it helpful!

echo $LINE | sed -e 's/12345678/'$replace'/g'
you can still use single quotes, but you have to "open" them when you want the variable expanded at the right place. otherwise the string is taken "literally" (as #paxdiablo correctly stated, his answer is correct as well)

To let your shell expand the variable, you need to use double-quotes like
sed -i "s#12345678#$replace#g" file.txt
This will break if $replace contain special sed characters (#, \). But you can preprocess $replace to quote them:
replace_quoted=$(printf '%s' "$replace" | sed 's/[#\]/\\\0/g')
sed -i "s#12345678#$replace_quoted#g" file.txt

I had a similar requirement to this but my replace var contained an ampersand. Escaping the ampersand like this solved my problem:
replace="salt & pepper"
echo "pass the salt" | sed "s/salt/${replace/&/\&}/g"

use # if you want to replace things like /. $ etc.
result=$(echo $str | sed "s#$oldstr#$newstr#g")
the above code will replace all occurrences of the specified replacement term
if you want, remove the ending g which means that the only first occurrence will be replaced.

Use this instead
echo $LINE | sed -e 's/12345678/$replace/g'
this works for me just simply remove the quotes

I prefer to use double quotes , as single quptes are very powerful as we used them if dont able to change anything inside it or can invoke the variable substituion .
so use double quotes instaed.
echo $LINE | sed -e "s/12345678/$replace/g"

Is it possible to retrieve one string between 2 special characters from text file using bash?

Let's say I have the following text file
test.txt
ABC_01:Testing-ABCDEFG
If I want to retrieve the string after colon, I will be using
awk -F ":" '/ABC_01/{print $NF}' test.txt
which will return Testing-ABCDEFG
But what should I do if I only want to retrieve the string after the colon and before the hyphen?

You are so close. That is where split() comes in, e.g.
awk -F: '/ABC_01/{ split($NF,arr,"-"); print arr[1] }'
Which will output
Testing
The GNU Awk User's Guide - String Manipulation Functions provides the details on split(). Give it a try and let me know if you have any further questions.

Using Bash's built'in Extended Regex Engine
#!/usr/bin/env bash
while read -r; do
[[ $REPLY =~ :(.*)- ]] || :
echo "${BASH_REMATCH[1]}"
done
Using standard POSIX shell IFS field separators:
#!/usr/bin/env sh
while IFS=':-' read -r _ m _; do
echo "$m"
done

Using (GNU) grep and look-around:
$ grep -oP '(?<=:)[^-]*(?=-)' file
Testing
Explained:
grep GNU grep supports PCRE and look-around
`-o Print only the matched (non-empty) parts of a matching line
-P Interpret PATTERNS as Perl-compatible regular expressions
(?<=:) positive look-behind, ie. preceeded by a colon
[^-]* anything but a hyphen
(?=-) positive look-ahead, ie. followed by a hyphen

how to print last part of a string in shell script

I have this line:
102:20620453:d=2017021012:UGRD:10 m above ground:15 hour fcst::lon=79.500000,lat=9.000000,val=-5.35
Now I want to just print the value -5.35 from this line and nothing else.
I also want this command to be able to extract the -7.04 from this line and nothing else.
102:20620453:d=2017021012:UGRD:10 m above ground:15 hour fcst::lon=280.500000,lat=11.000000,val=-7.04
I have read the other stack overflow questions and they did not seem to quite get at what I was looking for. I noticed that they did you awk or sed. What types of things should I do to be able to extract just the part of the above lines after val=?

There's no need for awk, sed, or any other external tool: bash has its own built-in regular expression support, via the =~ operator to [[ ]], and the BASH_REMATCH array (populated with matched contents).
val_re='[, ]val=([^ ]+)'
line='102:20620453:d=2017021012:UGRD:10 m above ground:15 hour fcst::lon=79.500000,lat=9.000000,val=-5.35'
[[ $line =~ $val_re ]] && echo "${BASH_REMATCH[1]}"
That said, if you really want to remove everything up to and including the string val= (and thus to have your code break if other values were added to the format in the future), you could also do so like this:
val=${line##*val=} # assign everything from $line after the last instance of "val=" to val
The syntax here is parameter expansion. See also BashFAQ #100: How do I do string manipulations in bash?

You can use awk with field separator as = and print last field:
awk -F'=' '{print $NF}' <<< "$str"
-5.35

this will search the string val= from the end and give anything after that
str='102:20620453:d=2017021012:UGRD:10 m above ground:15 hour fcst::lon=79.500000,lat=9.000000,val=-5.35'
echo "$str" | grep -Po '(?<=val=).*'
answer works on GNU grep only

Bash script: find and replace uppercase character on a string

Saying we have a string like:
doSomething()
and we want to obtain:
do_something()
What is the best way to do this?
I've read documentation about strings manipulation but I can't find the right command combination..
update
After #anubhava discussion, I find solution installing gnu-sed:
brew install gnu-sed
And then I can run script in this way:
s="doSomethingElse()"; gsed 's/[[:upper:]]/_\L&/g' <<< "$s"
output: do_something_else()

Using gnu-sed you can do:
s='doSomethingElse()'
sed 's/[[:upper:]]/_\L&/g' <<< "$s"
do_something_else()
Or else with non-gnu-sed (BSD) pipe with tr:
sed 's/[[:upper:]]/_&/g' <<< "$s" | tr [[:upper:]] [[:lower:]]
do_something_else()
Or using perl:
perl -pe 's/[[:upper:]]/_\L$&/g' <<< "$s"
do_something_else()
Or using gnu-awk:
awk -v RS=[[:upper:]] -v ORS= '1; RT{printf "_%s", tolower(RT)}' <<< "$s"
do_something_else()

In bash 4:
$ s="doSomethingElse()"
$ while [[ $s =~ [[:lower:]]([[:upper:]]) ]]; do
> s=${s/[[:upper:]]/_${BASH_REMATCH[1],,}}
> done
$ echo "$s"
do_something_else()
First, the while loop tries to match a lowercase character immediately followed by an uppercase character, and capturing the matched uppercase character. The parameter expansion replaces the first uppercase character in the string with an underscore and the captured uppercase character (converted to lowercase by the ,, operator). The processes repeats until no more lower/upper pairs are found.
If bash allowed capture groups in patterns, something hypothetical like
s=${s//([[:lower:]])([[:upper:]])/${BASH_PATMATCH[1]}_${BASH_PATMATCH[2],,}}
could work without a loop. As is, we need the extra step of using regular expression matches one match at a time to capture the letter to be lowercased.

How should I do if the pattern in awk cmd is a bash variable and contains special character?

Description: The 1-line awk cmd is used to print all lines after the matched line in my shell script as below.
#!/bin/bash
...
awk "f;/${PATTERN}/{f=1}" ${FILE}
Since the ${PATTERN} may contains special character, the cmd will fail in this case.
Q1. How should I handle such kind of situation if regex is used in awk?
Q2. Is it possible to just use the raw string in this cmd instead of regex e.g. /$PATTERN/ to avoid the special character problem?

Close. It's better to pass shell variables in to awk with -v than to place them in the awk script directly.
awk -v pat="${PATTERN}" 'f; $0 ~ pat {f=1}' "${FILE}"
If ${PATTERN} is not a regex, you can use a different operator:
awk -v pat="${PATTERN}" 'f; $0 == pat {f=1}' "${FILE}"
or you can even handle non-regex substrings:
awk -v pat="${PATTERN}" 'f; index($0, pat) {f=1}' "${FILE}"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Split a string and pick the uppercase substring - string

You could use tr to remove all characters that are not uppercase: OPTION=$(tr -dc '[:upper:]' <<< $PET) Note that here-strings (<<< $VARIABLE) are a bash-ism. In other shells you'll have to echo the variable into tr: OPTION=$(echo "$PET" | tr -dc '[:upper:]')

It sounds like there is only one portion of the string is in uppercase, so you can ignore the splitting portion of the question. This should work in both zsh and bash (although it is not portable in the sense of POSIX compatibility): $ echo "${PET//[^A-Z]}" DOG

Related

Not able to replace the file contents with sed command [duplicate]

Is it possible to retrieve one string between 2 special characters from text file using bash?

how to print last part of a string in shell script

Bash script: find and replace uppercase character on a string

How should I do if the pattern in awk cmd is a bash variable and contains special character?

Categories

Resources